\usepackage{amsmath}
\usepackage{courier}
\usepackage{graphicx}
+\usepackage[ruled,lined]{algorithm2e}
\newcommand{\abs}[1]{\lvert#1\rvert} % \abs{x} -> |x|
+\newenvironment{algodata}{%
+ \begin{tabular}[t]{@{}l@{:~}l@{}}}{%
+ \end{tabular}}
+
+\newcommand{\VAR}[1]{\textit{#1}}
+
\begin{document}
\title{Best effort strategy and virtual load
for asynchronous iterative load balancing}
\author{Raphaël Couturier \and
- Arnaud Giersch \and
- Abderrahmane Sider
+ Arnaud Giersch
}
\institute{R. Couturier \and A. Giersch \at
\email{%
raphael.couturier@univ-fcomte.fr,
arnaud.giersch@univ-fcomte.fr}
- \and
- A. Sider \at
- University of Béjaïa, Béjaïa, Algeria \\
- \email{ar.sider@univ-bejaia.dz}
}
\maketitle
balancing algorithm is implemented most of the time can dissociate messages
concerning load transfers and message concerning load information. In order to
increase the converge of a load balancing algorithm, we propose a simple
-heuristic called \emph{virtual load} which allows a node that receives an load
+heuristic called \emph{virtual load} which allows a node that receives a load
information message to integrate the load that it will receive later in its
load (virtually) and consequently sends a (real) part of its load to some of its
neighbors. In order to validate our approaches, we have defined a simulator
proved that under classical hypotheses of asynchronous iterative algorithms and
a special constraint avoiding \emph{ping-pong} effect, an asynchronous
iterative algorithm converge to the uniform load distribution. This work has
-been extended by many authors. For example,
-DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
-with integer load. {\bf Rajouter des choses ici}.
+been extended by many authors. For example, Cortés et al., with
+DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous}, propose a
+version working with integer load. This work was later generalized by
+the same authors in \cite{cedo+cortes+ripoll+al.2007.convergence}.
+{\bf Rajouter des choses ici}.
Although the Bertsekas and Tsitsiklis' algorithm describes the condition to
ensure the convergence, there is no indication or strategy to really implement
\section{Best effort strategy}
\label{Best-effort}
-We will describe here a new load-balancing strategy that we called
-\emph{best effort}. The general idea behind this strategy is, for a
-processor, to send some load to the most of its neighbors, doing its
+In this section we describe a new load-balancing strategy that we call
+\emph{best effort}. The general idea behind this strategy is that each
+processor, that detects it has more load than some of its neighbors,
+sends some load to the most of its less loaded neighbors, doing its
best to reach the equilibrium between those neighbors and himself.
-More precisely, when a processors $i$ is in its load-balancing phase,
+More precisely, when a processor $i$ is in its load-balancing phase,
he proceeds as following.
\begin{enumerate}
\item First, the neighbors are sorted in non-decreasing order of their
\section{Other strategies}
\label{Other}
-\textbf{Question} faut-il décrire les stratégies makhoul et simple ?
+% \textbf{Question} faut-il décrire les stratégies makhoul et simple ?
-\paragraph{simple} Tentative de respecter simplement les conditions de Bertsekas.
-Parmi les voisins moins chargés que soi, on sélectionne :
-\begin{itemize}
-\item un des moins chargés (vmin) ;
-\item un des plus chargés (vmax),
-\end{itemize}
-puis on équilibre avec vmin en s'assurant que notre charge reste
-toujours supérieure à celle de vmin et à celle de vmax.
+% \paragraph{simple} Tentative de respecter simplement les conditions de Bertsekas.
+% Parmi les voisins moins chargés que soi, on sélectionne :
+% \begin{itemize}
+% \item un des moins chargés (vmin) ;
+% \item un des plus chargés (vmax),
+% \end{itemize}
+% puis on équilibre avec vmin en s'assurant que notre charge reste
+% toujours supérieure à celle de vmin et à celle de vmax.
-On envoie donc (avec "self" pour soi-même) :
-\[
- \min\left(\frac{load(self) - load(vmin)}{2}, load(self) - load(vmax)\right)
-\]
+% On envoie donc (avec "self" pour soi-même) :
+% \[
+% \min\left(\frac{load(self) - load(vmin)}{2}, load(self) - load(vmax)\right)
+% \]
\paragraph{makhoul} Ordonne les voisins du moins chargé au plus chargé
puis calcule les différences de charge entre soi-même et chacun des
\section{Virtual load}
\label{Virtual load}
+In this section, we present the concept of \texttt{virtual load}. In order to
+use this concept, load balancing messages must be sent using two different kinds
+of messages: load information messages and load balancing messages. More
+precisely, a node wanting to send a part of its load to one of its neighbors,
+can first send a load information message containing the load it will send and
+then it can send the load balancing message containing data to be transferred.
+Load information message are really short, consequently they will be received
+very quickly. In opposition, load balancing messages are often bigger and thus
+require more time to be transferred.
+
+The concept of \texttt{virtual load} allows a node that received a load
+information message to integrate the load that it will receive later in its load
+(virtually) and consequently send a (real) part of its load to some of its
+neighbors. In fact, a node that receives a load information message knows that
+later it will receive the corresponding load balancing message containing the
+corresponding data. So if this node detects it is too loaded compared to some
+of its neighbors and if it has enough load (real load), then it can send more
+load to some of its neighbors without waiting the reception of the load
+balancing message.
+
+Doing this, we can expect a faster convergence since nodes have a faster
+information of the load they will receive, so they can take in into account.
+
+\textbf{Question} Est ce qu'on donne l'algo avec virtual load?
+
\section{Simulations}
\label{Simulations}
In order to test and validate our approaches, we wrote a simulator
using the SimGrid
-framework~\cite{casanova+legrand+quinson.2008.simgrid}. The process
-model is detailed in the next section (\ref{Sim model}), then the
-results of the simulations are presented in section~\ref{Results}.
+framework~\cite{casanova+legrand+quinson.2008.simgrid}. This
+simulator, which consists of about 2,700 lines of C++, allows to run
+the different load-balancing strategies under various parameters, such
+as the initial distribution of load, the interconnection topology, the
+characteristics of the running platform, etc. Then several metrics
+are issued that permit to compare the strategies.
+
+The simulation model is detailed in the next section (\ref{Sim
+ model}), and the experimental contexts are described in
+section~\ref{Contexts}. Then the results of the simulations are
+presented in section~\ref{Results}.
\subsection{Simulation model}
\label{Sim model}
-\begin{verbatim}
-Communications
-==============
-
-There are two receiving channels per host: control for information
-messages, and data for load transfers.
-
-Process model
-=============
-
-Each process is made of 3 threads: a receiver thread, a computing
-thread, and a load-balancer thread.
-
-* Receiver thread
- ---------------
-
- Loop
- | wait for a message to come, either on data channel, or on ctrl channel
- | push received message in a buffer of received messages
- | -> ctrl messages on the one side
- | -> data messages on the other side
- +-
-
- The loop terminates when a "finalize" message is received on each
- channel.
-
-* Computing thread
- ----------------
-
- Loop
- | if we received some real load, get it (data messages)
- | if there is some real load to send, send it
- | if we own some load, simulate some computing on it
- | sleep a bit if we are looping too fast
- +-
- send CLOSE on data for all neighbors
- wait for CLOSE on data from all neighbors
-
- The loop terminates when process::still_running() returns false.
- (read the source for full details...)
-
-* Load-balancing thread
- ---------------------
-
- Loop
- | call load-balancing algorithm
- | send ctrl messages
- | sleep (min_lb_iter_duration)
- | receive ctrl messages
- +-
- send CLOSE on ctrl for all neighbors
- wait for CLOSE on ctrl from all neighbors
+In the simulation model the processors exchange messages which are of
+two kinds. First, there are \emph{control messages} which only carry
+information that is exchanged between the processors, such as the
+current load, or the virtual load transfers if this option is
+selected. These messages are rather small, and their size is
+constant. Then, there are \emph{data messages} that carry the real
+load transferred between the processors. The size of a data message
+is a function of the amount of load that it carries, and it can be
+pretty large. In order to receive the messages, each processor has
+two receiving channels, one for each kind of messages. Finally, when
+a message is sent or received, this is done by using the non-blocking
+primitives of SimGrid\footnote{That are \texttt{MSG\_task\_isend()},
+ and \texttt{MSG\_task\_irecv()}.}.
+
+During the simulation, each processor concurrently runs three threads:
+a \emph{receiving thread}, a \emph{computing thread}, and a
+\emph{load-balancing thread}, which we will briefly describe now.
+
+\paragraph{Receiving thread} The receiving thread is in charge of
+waiting for messages to come, either on the control channel, or on the
+data channel. Its behavior is sketched by Algorithm~\ref{algo.recv}.
+When a message is received, it is pushed in a buffer of
+received message, to be later consumed by one of the other threads.
+There are two such buffers, one for the control messages, and one for
+the data messages. The buffers are implemented with a lock-free FIFO
+\cite{sutter.2008.writing} to avoid contention between the threads.
+
+\begin{algorithm}
+ \caption{Receiving thread}
+ \label{algo.recv}
+ \KwData{
+ \begin{algodata}
+ \VAR{ctrl\_chan}, \VAR{data\_chan}
+ & communication channels (control and data) \\
+ \VAR{ctrl\_fifo}, \VAR{data\_fifo}
+ & buffers of received messages (control and data) \\
+ \end{algodata}}
+ \While{true}{%
+ wait for a message to be available on either \VAR{ctrl\_chan},
+ or \VAR{data\_chan}\;
+ \If{a message is available on \VAR{ctrl\_chan}}{%
+ get the message from \VAR{ctrl\_chan}, and push it into \VAR{ctrl\_fifo}\;
+ }
+ \If{a message is available on \VAR{data\_chan}}{%
+ get the message from \VAR{data\_chan}, and push it into \VAR{data\_fifo}\;
+ }
+ }
+\end{algorithm}
+
+\paragraph{Computing thread} The computing thread is in charge of the
+real load management. As exposed in Algorithm~\ref{algo.comp}, it
+iteratively runs the following operations:
+\begin{itemize}
+\item if some load was received from the neighbors, get it;
+\item if there is some load to send to the neighbors, send it;
+\item run some computation, whose duration is function of the current
+ load of the processor.
+\end{itemize}
+Practically, after the computation, the computing thread waits for a
+small amount of time if the iterations are looping too fast (for
+example, when the current load is near zero).
+
+\begin{algorithm}
+ \caption{Computing thread}
+ \label{algo.comp}
+ \KwData{
+ \begin{algodata}
+ \VAR{data\_fifo} & buffer of received data messages \\
+ \VAR{real\_load} & current load \\
+ \end{algodata}}
+ \While{true}{%
+ \If{\VAR{data\_fifo} is empty and $\VAR{real\_load} = 0$}{%
+ wait until a message is pushed into \VAR{data\_fifo}\;
+ }
+ \While{\VAR{data\_fifo} is not empty}{%
+ pop a message from \VAR{data\_fifo}\;
+ get the load embedded in the message, and add it to \VAR{real\_load}\;
+ }
+ \ForEach{neighbor $n$}{%
+ \If{there is some amount of load $a$ to send to $n$}{%
+ send $a$ units of load to $n$, and subtract it from \VAR{real\_load}\;
+ }
+ }
+ \If{$\VAR{real\_load} > 0.0$}{
+ simulate some computation, whose duration is function of \VAR{real\_load}\;
+ ensure that the main loop does not iterate too fast\;
+ }
+ }
+\end{algorithm}
+
+\paragraph{Load-balancing thread} The load-balancing thread is in
+charge of running the load-balancing algorithm, and exchange the
+control messages. It iteratively runs the following operations:
+\begin{itemize}
+\item get the control messages that were received from the neighbors;
+\item run the load-balancing algorithm;
+\item send control messages to the neighbors, to inform them of the
+ processor's current load, and possibly of virtual load transfers;
+\item wait a minimum (configurable) amount of time, to avoid to
+ iterate too fast.
+\end{itemize}
- The loop terminates when process::still_running() returns false.
- (read the source for full details...)
-\end{verbatim}
+\begin{algorithm}
+ \caption{Load-balancing}
+ \label{algo.lb}
+ \While{true}{%
+ \While{\VAR{ctrl\_fifo} is not empty}{%
+ pop a message from \VAR{ctrl\_fifo}\;
+ identify the sender of the message,
+ and update the current knowledge of its load\;
+ }
+ run the load-balancing algorithm to make the decision about load transfers\;
+ \ForEach{neighbor $n$}{%
+ send a control messages to $n$\;
+ }
+ ensure that the main loop does not iterate too fast\;
+ }
+\end{algorithm}
+
+\paragraph{}
+For the sake of simplicity, a few details were voluntary omitted from
+these descriptions. For an exhaustive presentation, we refer to the
+actual code that was used for the experiments, and which is
+available at \textbf{FIXME URL}.
+
+\textbf{FIXME: ajouter des détails sur la gestion de la charge virtuelle ?}
+
+\subsection{Experimental contexts}
+\label{Contexts}
+
+\paragraph{Configurations}
+\begin{description}
+\item[\textbf{platforms}] homogeneous (cluster); heterogeneous (subset
+ of Grid5000)
+\item[\textbf{platform size}] platforms with 16, 64, 256, and 1024 nodes
+\item[\textbf{topologies}] line; torus; hypercube
+\item[\textbf{initial load distribution}] initially on a only node;
+ initially on all nodes
+\item[\textbf{comp/comm ratio}] $10/1$, $1/1$, $1/10$
+\end{description}
+
+\paragraph{Algorithms}
+\begin{description}
+\item[\textbf{strategies}] makhoul; besteffort with $k\in \{1,2,4\}$
+\item[\textbf{variants}] with, and without virtual load (bookkeeping)
+\item[\textbf{domain}] real load, and integer load
+\end{description}
+
+\paragraph{Metrics}
+
+\begin{description}
+\item[\textbf{average idle time}]
+\item[\textbf{average convergence date}]
+\item[\textbf{maximum convergence date}]
+\item[\textbf{data transfer amount}] relative to the total data amount
+\end{description}
\subsection{Validation of our approaches}
\label{Results}
% LocalWords: Raphaël Couturier Arnaud Giersch Abderrahmane Sider Franche ij
% LocalWords: Bertsekas Tsitsiklis SimGrid DASUD Comté Béjaïa asynchronism ji
-% LocalWords: ik
+% LocalWords: ik isend irecv Cortés et al chan ctrl fifo