quelques coquilles...

[loba-papers.git] / supercomp11 / supercomp11.tex
diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex

index 2a6c04bf1c78ba12a233a0cdd2a4c11d3525663d..81ee704bb811b9edf219d5f5ea9a92611f54705d 100644 (file)
--- a/supercomp11/supercomp11.tex
+++ b/supercomp11/supercomp11.tex
@@ -75,9 +75,11 @@ algorithm which is definitively a reference  for many works. In their work, they
  proved that under classical  hypotheses of asynchronous iterative algorithms and
  a  special  constraint   avoiding  \emph{ping-pong}  effect,  an  asynchronous
  iterative algorithm  converge to  the uniform load  distribution. This  work has
-been extended by many authors. For example,
-DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
-with integer load. {\bf Rajouter des choses ici}.
+been extended by many authors. For example, Cortés et al., with
+DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous}, propose a
+version working with integer load.  This work was later generalized by
+the same authors in \cite{cedo+cortes+ripoll+al.2007.convergence}.
+{\bf Rajouter des choses ici}.
  
  Although  the Bertsekas  and Tsitsiklis'  algorithm describes  the  condition to
  ensure the convergence,  there is no indication or  strategy to really implement
@@ -183,12 +185,13 @@ condition or with a weaker condition.
  \section{Best effort strategy}
  \label{Best-effort}
  
-We will describe here a new load-balancing strategy that we called
-\emph{best effort}.  The general idea behind this strategy is, for a
-processor, to send some load to the most of its neighbors, doing its
+In this section we  describe  a new load-balancing strategy that we call
+\emph{best effort}.  The general idea behind this strategy is that each
+processor, that detects it has more load than some of its neighbors, 
+sends some load to the most of its less loaded neighbors, doing its
  best to reach the equilibrium between those neighbors and himself.
  
-More precisely, when a processors $i$ is in its load-balancing phase,
+More precisely, when a processor $i$ is in its load-balancing phase,
  he proceeds as following.
  \begin{enumerate}
  \item First, the neighbors are sorted in non-decreasing order of their
@@ -270,69 +273,69 @@ C'est l'algorithme~2 dans~\cite{bahi+giersch+makhoul.2008.scalable}.
  
  In order to test and validate our approaches, we wrote a simulator
  using the SimGrid
-framework~\cite{casanova+legrand+quinson.2008.simgrid}.  The process
-model is detailed in the next section (\ref{Sim model}), then the
-results of the simulations are presented in section~\ref{Results}.
+framework~\cite{casanova+legrand+quinson.2008.simgrid}.  This
+simulator, which consists of about 2,700 lines of C++, allows to run
+the different load-balancing strategies under various parameters, such
+as the initial distribution of load, the interconnection topology, the
+characteristics of the running platform, etc.  Then several metrics
+are issued that permit to compare the strategies.
+
+The simulation model is detailed in the next section (\ref{Sim
+  model}), then the results of the simulations are presented in
+section~\ref{Results}.
  
  \subsection{Simulation model}
  \label{Sim model}
  
-\begin{verbatim}
-Communications
-==============
-
-There are two receiving channels per host: control for information
-messages, and data for load transfers.
-
-Process model
-=============
-
-Each process is made of 3 threads: a receiver thread, a computing
-thread, and a load-balancer thread.
-
-* Receiver thread
-  ---------------
-
-    Loop
-    | wait for a message to come, either on data channel, or on ctrl channel
-    | push received message in a buffer of received messages
-    | -> ctrl messages on the one side
-    | -> data messages on the other side
-    +-
-
-   The loop terminates when a "finalize" message is received on each
-   channel.
-
-* Computing thread
-  ----------------
-
-    Loop
-    | if we received some real load, get it (data messages)
-    | if there is some real load to send, send it
-    | if we own some load, simulate some computing on it
-    | sleep a bit if we are looping too fast
-    +-
-    send CLOSE on data for all neighbors
-    wait for CLOSE on data from all neighbors
-
-  The loop terminates when process::still_running() returns false.
-  (read the source for full details...)
-
-* Load-balancing thread
-  ---------------------
-
-    Loop
-    | call load-balancing algorithm
-    | send ctrl messages
-    | sleep (min_lb_iter_duration)
-    | receive ctrl messages
-    +-
-    send CLOSE on ctrl for all neighbors
-    wait for CLOSE on ctrl from all neighbors
+In the simulation model the processors exchange messages which are of
+two kinds.  First, there are \emph{control messages} which only carry
+information that is exchanged between the processors, such as the
+current load, or the virtual load transfers if this option is
+selected.  These messages are rather small, and their size is
+constant.  Then, there are \emph{data messages} that carry the real
+load transferred between the processors.  The size of a data message
+is a function of the amount of load that it carries, and it can be
+pretty large.  In order to receive the messages, each processor has
+two receiving channels, one for each kind of messages.  Finally, when
+a message is sent or received, this is done by using the non-blocking
+primitives of SimGrid\footnote{That are \texttt{MSG\_task\_isend()},
+  and \texttt{MSG\_task\_irecv()}.}.
+
+During the simulation, each processor concurrently runs three threads:
+a \emph{receiving thread}, a \emph{computing thread}, and a
+\emph{load-balancing thread}, which we will briefly describe now.
+
+\paragraph{Receiving thread} The receiving thread is in charge of
+waiting for messages to come, either on the control channel, or on the
+data channel.  When a message is received, it is pushed in a buffer of
+received message, to be later consumed by one of the other threads.
+There are two such buffers, one for the control messages, and one for
+the data messages.  The buffers are implemented with a lock-free FIFO
+\cite{sutter.2008.writing} to avoid contention between the threads.
+
+\paragraph{Computing thread} The computing thread is in charge of the
+real load management.  It iteratively runs the following operations:
+\begin{itemize}
+\item if some load was received from the neighbors, get it;
+\item if there is some load to send to the neighbors, send it;
+\item run some computation, whose duration is function of the current
+  load of the processor.
+\end{itemize}
+Practically, after the computation, the computing thread waits for a
+small amount of time if the iterations are looping too fast (for
+example, when the current load is zero).
  
-  The loop terminates when process::still_running() returns false.
-  (read the source for full details...)
-\end{verbatim}
+\paragraph{Load-balancing thread} The load-balancing thread is in
+charge of running the load-balancing algorithm, and exchange the
+control messages.  It iteratively runs the following operations:
+\begin{itemize}
+\item get the control messages that were received from the neighbors;
+\item run the load-balancing algorithm;
+\item send control messages to the neighbors, to inform them of the
+  processor's current load, and possibly of virtual load transfers;
+\item wait a minimum (configurable) amount of time, to avoid to
+  iterate too fast.
+\end{itemize}
  
  \subsection{Validation of our approaches}
  \label{Results}
@@ -381,4 +384,4 @@ Taille : 10 100 très gros
  
  % LocalWords:  Raphaël Couturier Arnaud Giersch Abderrahmane Sider Franche ij
  % LocalWords:  Bertsekas Tsitsiklis SimGrid DASUD Comté Béjaïa asynchronism ji
-% LocalWords:  ik
+% LocalWords:  ik isend irecv