\documentclass[smallextended]{svjour3}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{mathptmx}
+\usepackage{courier}
\usepackage{graphicx}
\begin{document}
\begin{abstract}
Most of the time, asynchronous load balancing algorithms have extensively been
-studied in a theoretical point of view. The Bertsekas and Tsitsiklis' algorithm
+studied in a theoretical point of view. The Bertsekas and Tsitsiklis'
+algorithm~\cite[section~7.4]{bertsekas+tsitsiklis.1997.parallel}
is certainly the most well known algorithm for which the convergence proof is
given. From a practical point of view, when a node wants to balance a part of
its load to some of its neighbors, the strategy is not described. In this
concerning load transfers and message concerning load information. In order to
increase the converge of a load balancing algorithm, we propose a simple
heuristic called \texttt{virtual load} which allows a node that receives an load
-information message to integrate the load that it will receive latter in its
+information message to integrate the load that it will receive later in its
load (virtually) and consequently sends a (real) part of its load to some of its
neighbors. In order to validate our approaches, we have defined a simulator
based on SimGrid which allowed us to conduct many experiments.
proved that under classical hypotheses of asynchronous iterative algorithms and
a special constraint avoiding \texttt{ping-pong} effect, an asynchronous
iterative algorithm converge to the uniform load distribution. This work has
-been extended by many authors. For example, DASUD proposes a version working with
-integer load. {\bf Rajouter des choses ici}.
+been extended by many authors. For example,
+DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
+with integer load. {\bf Rajouter des choses ici}.
Although the Bertsekas and Tsitsiklis' algorithm describes the condition to
ensure the convergence, there is no indication or strategy to really implement
\section{Bertsekas and Tsitsiklis' asynchronous load balancing algorithm}
\label{BT algo}
-Comment on the problem in the convergence condition.
+In order prove the convergence of asynchronous iterative load balancing
+Bertesekas and Tsitsiklis proposed a model
+in~\cite{bertsekas+tsitsiklis.1997.parallel}. Here we recall some notations.
+Consider that $N={1,...,n}$ processors are connected through a network.
+Communication links are represented by a connected undirected graph $G=(N,V)$
+where $V$ is the set of links connecting differents processors. In this work, we
+consider that processors are homogeneous for sake of simplicity. It is quite
+easy to tackle the heterogeneous case~\cite{ElsMonPre02}. Load of processor $i$
+at time $t$ is represented by $x_i(t)\geq 0$. Let $V(i)$ be the set of
+neighbors of processor $i$. Each processor $i$ has an estimate of the load of
+each of its neighbors $j \in V(i)$ represented by $x_j^i(t)$. According to
+asynchronism and communication delays, this estimate may be outdated. We also
+consider that the load is described by a continuous variable.
+
+When a processor send a part of its load to one or some of its neighbors, the
+transfer takes time to be completed. Let $s_{ij}(t)$ be the amount of load that
+processor $i$ has transfered to processor $j$ at time $t$ and let $r_{ij}(t)$ be the
+amount of load received by processor $j$ from processor $i$ at time $t$. Then
+the amount of load of processor $i$ at time $t+1$ is given by:
+\begin{equation}
+x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t)
+\label{eq:ping-pong}
+\end{equation}
+
+
+Some conditions are required to ensure the convergence. One of them can be
+called the \texttt{ping-pong} condition which specifies that:
+\begin{equation}
+x_i(t)-\sum _{k\in V(i)} s_{ik}(t) \geq x_j^i(t)+s_{ij}(t)
+\end{equation}
+for any processor $i$ and any $j \in V(i)$ such that $x_i(t)>x_j^i(t)$. This
+condition aims at avoiding a processor to send a part of its load and beeing
+less loaded after that.
+
+Nevertheless, we think that this condition may lead to deadlocks in some
+cases. For example, if we consider only three processors and that processor $1$
+is linked to processor $2$ which is also linked to processor $3$ (i.e. a simple
+chain wich 3 processors). Now consider we have the following values at time $t$:
+\begin{eqnarray*}
+x_1(t)=10 \\
+x_2(t)=100 \\
+x_3(t)=99.99\\
+ x_3^2(t)=99.99\\
+\end{eqnarray*}
+In this case, processor $2$ can either sends load to processor $1$ or processor
+$3$. If it sends load to processor $1$ it will not satisfy condition
+(\ref{eq:ping-pong}) because after the sending it will be less loaded that
+$x_3^2(t)$. So we consider that the \texttt{ping-pong} condition is probably to
+strong. Currently, we did not try to make another convergence proof without this
+condition or with a weaker condition.
+
\section{Best effort strategy}
\label{Best-effort}
+\textbf{À traduire} Ordonne les voisins du moins chargé au plus chargé.
+Trouve ensuite, en les prenant dans ce ordre, le nombre maximal de
+voisins tels que tous ont une charge inférieure à la moyenne des
+charges des voisins sélectionnés, et de soi-même.
+
+Les transferts de charge sont ensuite fait en visant cette moyenne pour
+tous les voisins sélectionnés. On envoie une quantité de charge égale
+à (moyenne - charge\_du\_voisin).
+
+~\\\textbf{Question} faut-il décrire les stratégies makhoul et simple ?
+
+\paragraph{simple} Tentative de respecter simplement les conditions de Bertsekas.
+Parmi les voisins moins chargés que soi, on sélectionne :
+\begin{itemize}
+\item un des moins chargés (vmin) ;
+\item un des plus chargés (vmax),
+\end{itemize}
+puis on équilibre avec vmin en s'assurant que notre charge reste
+toujours supérieure à celle de vmin et à celle de vmax.
+
+On envoie donc (avec "self" pour soi-même) :
+\[
+ \min\left(\frac{load(self) - load(vmin)}{2}, load(self) - load(vmax)\right)
+\]
+\paragraph{makhoul} Ordonne les voisins du moins chargé au plus chargé
+puis calcule les différences de charge entre soi-même et chacun des
+voisins.
+
+Ensuite, pour chaque voisin, dans l'ordre, et tant qu'on reste plus
+chargé que le voisin en question, on lui envoie 1/(N+1) de la
+différence calculée au départ, avec N le nombre de voisins.
+
+C'est l'algorithme~2 dans~\cite{bahi+giersch+makhoul.2008.scalable}.
\section{Virtual load}
\label{Virtual load}
\section{Simulations}
\label{Simulations}
+In order to test and validate our approaches, we wrote a simulator
+using the SimGrid
+framework~\cite{casanova+legrand+quinson.2008.simgrid}. The process
+model is detailed in the next section (\ref{Sim model}), then the
+results of the simulations are presented in section~\ref{Results}.
+
\subsection{Simulation model}
+\label{Sim model}
+
+\begin{verbatim}
+Communications
+==============
+
+There are two receiving channels per host: control for information
+messages, and data for load transfers.
+
+Process model
+=============
+
+Each process is made of 3 threads: a receiver thread, a computing
+thread, and a load-balancer thread.
+
+* Receiver thread
+ ---------------
+
+ Loop
+ | wait for a message to come, either on data channel, or on ctrl channel
+ | push received message in a buffer of received messages
+ | -> ctrl messages on the one side
+ | -> data messages on the other side
+ +-
+
+ The loop terminates when a "finalize" message is received on each
+ channel.
+
+* Computing thread
+ ----------------
+
+ Loop
+ | if we received some real load, get it (data messages)
+ | if there is some real load to send, send it
+ | if we own some load, simulate some computing on it
+ | sleep a bit if we are looping too fast
+ +-
+ send CLOSE on data for all neighbors
+ wait for CLOSE on data from all neighbors
+
+ The loop terminates when process::still_running() returns false.
+ (read the source for full details...)
+
+* Load-balancing thread
+ ---------------------
+
+ Loop
+ | call load-balancing algorithm
+ | send ctrl messages
+ | sleep (min_lb_iter_duration)
+ | receive ctrl messages
+ +-
+ send CLOSE on ctrl for all neighbors
+ wait for CLOSE on ctrl from all neighbors
+
+ The loop terminates when process::still_running() returns false.
+ (read the source for full details...)
+\end{verbatim}
\subsection{Validation of our approaches}
+\label{Results}
+
+
+On veut montrer quoi ? :
+
+1) best plus rapide que les autres (simple, makhoul)
+2) avantage virtual load
+
+Est ce qu'on peut trouver des contre exemple?
+Topologies variées
+
+
+Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées
+Mais aussi simulation avec temps court qui montre que seul best converge
+Expés avec ratio calcul/comm rapide et lent
+
+Quelques expés avec charge initiale aléatoire plutot que sur le premier proc
+
+Cadre processeurs homogènes
+
+Topologies statiques
+
+On ne tient pas compte de la vitesse des liens donc on la considère homogène
+
+Prendre un réseau hétérogène et rendre processeur homogène
+
+Taille : 10 100 très gros
+
\section{Conclusion and perspectives}
+\bibliographystyle{spmpsci}
+\bibliography{biblio}
\end{document}
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% ispell-local-dictionary: "american"
+%%% End:
+
+% LocalWords: Raphaël Couturier Arnaud Giersch Abderrahmane Sider
+% LocalWords: Bertsekas Tsitsiklis SimGrid DASUD