-\begin{verbatim}
-Communications
-==============
-
-There are two receiving channels per host: control for information
-messages, and data for load transfers.
-
-Process model
-=============
-
-Each process is made of 3 threads: a receiver thread, a computing
-thread, and a load-balancer thread.
-
-* Receiver thread
- ---------------
-
- Loop
- | wait for a message to come, either on data channel, or on ctrl channel
- | push received message in a buffer of received messages
- | -> ctrl messages on the one side
- | -> data messages on the other side
- +-
-
- The loop terminates when a "finalize" message is received on each
- channel.
-
-* Computing thread
- ----------------
-
- Loop
- | if we received some real load, get it (data messages)
- | if there is some real load to send, send it
- | if we own some load, simulate some computing on it
- | sleep a bit if we are looping too fast
- +-
- send CLOSE on data for all neighbors
- wait for CLOSE on data from all neighbors
-
- The loop terminates when process::still_running() returns false.
- (read the source for full details...)
-
-* Load-balancing thread
- ---------------------
-
- Loop
- | call load-balancing algorithm
- | send ctrl messages
- | sleep (min_lb_iter_duration)
- | receive ctrl messages
- +-
- send CLOSE on ctrl for all neighbors
- wait for CLOSE on ctrl from all neighbors
+In the simulation model the processors exchange messages which are of
+two kinds. First, there are \emph{control messages} which only carry
+information that is exchanged between the processors, such as the
+current load, or the virtual load transfers if this option is
+selected. These messages are rather small, and their size is
+constant. Then, there are \emph{data messages} that carry the real
+load transferred between the processors. The size of a data message
+is a function of the amount of load that it carries, and it can be
+pretty large. In order to receive the messages, each processor has
+two receiving channels, one for each kind of messages. Finally, when
+a message is sent or received, this is done by using the non-blocking
+primitives of SimGrid\footnote{That are \texttt{MSG\_task\_isend()},
+ and \texttt{MSG\_task\_irecv()}.}.
+
+During the simulation, each processor concurrently runs three threads:
+a \emph{receiving thread}, a \emph{computing thread}, and a
+\emph{load-balancing thread}, which we will briefly describe now.
+
+\paragraph{Receiving thread} The receiving thread is in charge of
+waiting for messages to come, either on the control channel, or on the
+data channel. Its behavior is sketched by Algorithm~\ref{algo.recv}.
+When a message is received, it is pushed in a buffer of
+received message, to be later consumed by one of the other threads.
+There are two such buffers, one for the control messages, and one for
+the data messages. The buffers are implemented with a lock-free FIFO
+\cite{sutter.2008.writing} to avoid contention between the threads.
+
+\begin{algorithm}
+ \caption{Receiving thread}
+ \label{algo.recv}
+ \KwData{
+ \begin{algodata}
+ \VAR{ctrl\_chan}, \VAR{data\_chan}
+ & communication channels (control and data) \\
+ \VAR{ctrl\_fifo}, \VAR{data\_fifo}
+ & buffers of received messages (control and data) \\
+ \end{algodata}}
+ \While{true}{%
+ wait for a message to be available on either \VAR{ctrl\_chan},
+ or \VAR{data\_chan}\;
+ \If{a message is available on \VAR{ctrl\_chan}}{%
+ get the message from \VAR{ctrl\_chan}, and push it into \VAR{ctrl\_fifo}\;
+ }
+ \If{a message is available on \VAR{data\_chan}}{%
+ get the message from \VAR{data\_chan}, and push it into \VAR{data\_fifo}\;
+ }
+ }
+\end{algorithm}
+
+\paragraph{Computing thread} The computing thread is in charge of the
+real load management. As exposed in Algorithm~\ref{algo.comp}, it
+iteratively runs the following operations:
+\begin{itemize}
+\item if some load was received from the neighbors, get it;
+\item if there is some load to send to the neighbors, send it;
+\item run some computation, whose duration is function of the current
+ load of the processor.
+\end{itemize}
+Practically, after the computation, the computing thread waits for a
+small amount of time if the iterations are looping too fast (for
+example, when the current load is near zero).
+
+\begin{algorithm}
+ \caption{Computing thread}
+ \label{algo.comp}
+ \KwData{
+ \begin{algodata}
+ \VAR{data\_fifo} & buffer of received data messages \\
+ \VAR{real\_load} & current load \\
+ \end{algodata}}
+ \While{true}{%
+ \If{\VAR{data\_fifo} is empty and $\VAR{real\_load} = 0$}{%
+ wait until a message is pushed into \VAR{data\_fifo}\;
+ }
+ \While{\VAR{data\_fifo} is not empty}{%
+ pop a message from \VAR{data\_fifo}\;
+ get the load embedded in the message, and add it to \VAR{real\_load}\;
+ }
+ \ForEach{neighbor $n$}{%
+ \If{there is some amount of load $a$ to send to $n$}{%
+ send $a$ units of load to $n$, and subtract it from \VAR{real\_load}\;
+ }
+ }
+ \If{$\VAR{real\_load} > 0.0$}{
+ simulate some computation, whose duration is function of \VAR{real\_load}\;
+ ensure that the main loop does not iterate too fast\;
+ }
+ }
+\end{algorithm}
+
+\paragraph{Load-balancing thread} The load-balancing thread is in
+charge of running the load-balancing algorithm, and exchange the
+control messages. It iteratively runs the following operations:
+\begin{itemize}
+\item get the control messages that were received from the neighbors;
+\item run the load-balancing algorithm;
+\item send control messages to the neighbors, to inform them of the
+ processor's current load, and possibly of virtual load transfers;
+\item wait a minimum (configurable) amount of time, to avoid to
+ iterate too fast.
+\end{itemize}