+In the simulation model the processors exchange messages which are of
+two kinds. First, there are \emph{control messages} which only carry
+information that is exchanged between the processors, such as the
+current load, or the virtual load transfers if this option is
+selected. These messages are rather small, and their size is
+constant. Then, there are \emph{data messages} that carry the real
+load transferred between the processors. The size of a data message
+is a function of the amount of load that it carries, and it can be
+pretty large. In order to receive the messages, each processor has
+two receiving channels, one for each kind of messages. Finally, when
+a message is sent or received, this is done by using the non-blocking
+primitives of SimGrid\footnote{That are \texttt{MSG\_task\_isend()},
+ and \texttt{MSG\_task\_irecv()}.}.
+
+During the simulation, each processor concurrently runs three threads:
+a \emph{receiving thread}, a \emph{computing thread}, and a
+\emph{load-balancing thread}, which we will briefly describe now.
+
+For the sake of simplicity, a few details were voluntary omitted from
+these descriptions. For an exhaustive presentation, we refer to the
+actual source code that was used for the experiments%
+\footnote{As mentioned before, our simulator relies on the SimGrid
+ framework~\cite{casanova+legrand+quinson.2008.simgrid}. For the
+ experiments, we used a pre-release of SimGrid 3.7 (Git commit
+ 67d62fca5bdee96f590c942b50021cdde5ce0c07, available from
+ \url{https://gforge.inria.fr/scm/?group_id=12})}, and which is
+available at
+\url{http://info.iut-bm.univ-fcomte.fr/staff/giersch/software/loba.tar.gz}.
+
+\subsubsection{Receiving thread}
+
+The receiving thread is in charge of waiting for messages to come, either on the
+control channel, or on the data channel. Its behavior is sketched by
+Algorithm~\ref{algo.recv}. When a message is received, it is pushed in a buffer
+of received message, to be later consumed by one of the other threads. There
+are two such buffers, one for the control messages, and one for the data
+messages. The buffers are implemented with a lock-free FIFO
+\cite{sutter.2008.writing} to avoid contention between the threads.
+
+\begin{algorithm}
+ \caption{Receiving thread}
+ \label{algo.recv}
+ \KwData{
+ \begin{algodata}
+ \VAR{ctrl\_chan}, \VAR{data\_chan}
+ & communication channels (control and data) \\
+ \VAR{ctrl\_fifo}, \VAR{data\_fifo}
+ & buffers of received messages (control and data) \\
+ \end{algodata}}
+ \While{true}{%
+ wait for a message to be available on either \VAR{ctrl\_chan},
+ or \VAR{data\_chan}\;
+ \If{a message is available on \VAR{ctrl\_chan}}{%
+ get the message from \VAR{ctrl\_chan}, and push it into \VAR{ctrl\_fifo}\;
+ }
+ \If{a message is available on \VAR{data\_chan}}{%
+ get the message from \VAR{data\_chan}, and push it into \VAR{data\_fifo}\;
+ }
+ }
+\end{algorithm}
+
+\subsubsection{Computing thread}
+
+The computing thread is in charge of the real load management. As exposed in
+Algorithm~\ref{algo.comp}, it iteratively runs the following operations:
+\begin{itemize}
+\item if some load was received from the neighbors, get it;
+\item if there is some load to send to the neighbors, send it;
+\item run some computation, whose duration is function of the current
+ load of the processor.
+\end{itemize}
+Practically, after the computation, the computing thread waits for a
+small amount of time if the iterations are looping too fast (for
+example, when the current load is near zero).
+
+\begin{algorithm}
+ \caption{Computing thread}
+ \label{algo.comp}
+ \KwData{
+ \begin{algodata}
+ \VAR{data\_fifo} & buffer of received data messages \\
+ \VAR{real\_load} & current load \\
+ \end{algodata}}
+ \While{true}{%
+ \If{\VAR{data\_fifo} is empty and $\VAR{real\_load} = 0$}{%
+ wait until a message is pushed into \VAR{data\_fifo}\;
+ }
+ \While{\VAR{data\_fifo} is not empty}{%
+ pop a message from \VAR{data\_fifo}\;
+ get the load embedded in the message, and add it to \VAR{real\_load}\;
+ }
+ \ForEach{neighbor $n$}{%
+ \If{there is some amount of load $a$ to send to $n$}{%
+ send $a$ units of load to $n$, and subtract it from \VAR{real\_load}\;
+ }
+ }
+ \If{$\VAR{real\_load} > 0.0$}{
+ simulate some computation, whose duration is function of \VAR{real\_load}\;
+ ensure that the main loop does not iterate too fast\;
+ }
+ }
+\end{algorithm}
+
+\subsubsection{Load-balancing thread}
+
+The load-balancing thread is in charge of running the load-balancing algorithm,
+and exchange the control messages. As shown in Algorithm~\ref{algo.lb}, it
+iteratively runs the following operations:
+\begin{itemize}
+\item get the control messages that were received from the neighbors;
+\item run the load-balancing algorithm;
+\item send control messages to the neighbors, to inform them of the
+ processor's current load, and possibly of virtual load transfers;
+\item wait a minimum (configurable) amount of time, to avoid to
+ iterate too fast.
+\end{itemize}