\documentclass[smallextended]{svjour3}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{mathptmx}
+\usepackage{courier}
\usepackage{graphicx}
\begin{document}
\begin{abstract}
Most of the time, asynchronous load balancing algorithms have extensively been
-studied in a theoretical point of view. The Bertsekas' algorithm is certainly
-the most well known algorithm for which the convergence proof is given. From a
-practical point of view, when a node wants to balance a part of its load to some
-of its neighbors, the strategy is not described. In this paper, we propose a
-strategy called \texttt{best effort} which tries to balance the load of a node
-to all its less loaded neighbors while ensuring that all the nodes concerned by
-the load balancing phase have the same amount of load. Moreover, asynchronous
-iterative algorithms in which an asynchronous load balancing algorithm is
-implemented most of the time can dissociate messages concerning load transfers
-and message concerning load information. In order to increase the converge of a
-load balancing algorithm, we propose a simple heuristic called \texttt{virtual
- load} which allows a node that receives an load information message to
-integrate the load that it will receive latter in its load (virtually) and
-consequently sends a (real) part of its load to some of its neighbors. In order
-to validate our approaches, we have defined a simulator based on SimGrid which
-allowed us to conduct many experiments.
+studied in a theoretical point of view. The Bertsekas and Tsitsiklis'
+algorithm~\cite[section~7.4]{bertsekas+tsitsiklis.1997.parallel}
+is certainly the most well known algorithm for which the convergence proof is
+given. From a practical point of view, when a node wants to balance a part of
+its load to some of its neighbors, the strategy is not described. In this
+paper, we propose a strategy called \texttt{best effort} which tries to balance
+the load of a node to all its less loaded neighbors while ensuring that all the
+nodes concerned by the load balancing phase have the same amount of load.
+Moreover, asynchronous iterative algorithms in which an asynchronous load
+balancing algorithm is implemented most of the time can dissociate messages
+concerning load transfers and message concerning load information. In order to
+increase the converge of a load balancing algorithm, we propose a simple
+heuristic called \texttt{virtual load} which allows a node that receives an load
+information message to integrate the load that it will receive later in its
+load (virtually) and consequently sends a (real) part of its load to some of its
+neighbors. In order to validate our approaches, we have defined a simulator
+based on SimGrid which allowed us to conduct many experiments.
\end{abstract}
+\section{Introduction}
+
+Load balancing algorithms are extensively used in parallel and distributed
+applications in order to reduce the execution times. They can be applied in
+different scientific fields from high performance computation to micro sensor
+networks. They are iterative by nature. In literature many kinds of load
+balancing algorithms have been studied. They can be classified according
+different criteria: centralized or decentralized, in static or dynamic
+environment, with homogeneous or heterogeneous load, using synchronous or
+asynchronous iterations, with a static topology or a dynamic one which evolves
+during time. In this work, we focus on asynchronous load balancing algorithms
+where computer nodes are considered homogeneous and with homogeneous load with
+no external load. In this context, Bertsekas and Tsitsiklis have proposed an
+algorithm which is definitively a reference for many works. In their work, they
+proved that under classical hypotheses of asynchronous iterative algorithms and
+a special constraint avoiding \texttt{ping-pong} effect, an asynchronous
+iterative algorithm converge to the uniform load distribution. This work has
+been extended by many authors. For example,
+DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
+with integer load. {\bf Rajouter des choses ici}.
+
+Although the Bertsekas and Tsitsiklis' algorithm describes the condition to
+ensure the convergence, there is no indication or strategy to really implement
+the load distribution. In other word, a node can send a part of its load to one
+or many of its neighbors while all the convergence conditions are
+followed. Consequently, we propose a new strategy called \texttt{best effort}
+that tries to balance the load of a node to all its less loaded neighbors while
+ensuring that all the nodes concerned by the load balancing phase have the same
+amount of load. Moreover, when real asynchronous applications are considered,
+using asynchronous load balancing algorithms can reduce the execution
+times. Most of the times, it is simpler to distinguish load information messages
+from data migration messages. Formers ones allows a node to inform its
+neighbors of its current load. These messages are very small, they can be sent
+quite often. For example, if an computing iteration takes a significant times
+(ranging from seconds to minutes), it is possible to send a new load information
+message at each neighbor at each iteration. Latter messages contains data that
+migrates from one node to another one. Depending on the application, it may have
+sense or not that nodes try to balance a part of their load at each computing
+iteration. But the time to transfer a load message from a node to another one is
+often much nore longer that to time to transfer a load information message. So,
+when a node receives the information that later it will receive a data message,
+it can take this information into account and it can consider that its new load
+is larger. Consequently, it can send a part of it real load to some of its
+neighbors if required. We call this trick the \texttt{virtual load} mecanism.
+
+
+
+So, in this work, we propose a new strategy for improving the distribution of
+the load and a simple but efficient trick that also improves the load
+balacing. Moreover, we have conducted many simulations with simgrid in order to
+validate our improvements are really efficient. Our simulations consider that in
+order to send a message, a latency delays the sending and according to the
+network performance and the message size, the time of the reception of the
+message also varies.
+
+In the following of this paper, Section~\ref{BT algo} describes the Bertsekas
+and Tsitsiklis' asynchronous load balancing algorithm. Moreover, we present a
+possible problem in the convergence conditions. Section~\ref{Best-effort}
+presents the best effort strategy which provides an efficient way to reduce the
+execution times. In Section~\ref{Virtual load}, the virtual load mecanism is
+proposed. Simulations allowed to show that both our approaches are valid using a
+quite realistic model detailed in Section~\ref{Simulations}. Finally we give a
+conclusion and some perspectives to this work.
+
+
+
+
+\section{Bertsekas and Tsitsiklis' asynchronous load balancing algorithm}
+\label{BT algo}
+
+In order prove the convergence of asynchronous iterative load balancing
+Bertesekas and Tsitsiklis proposed a model
+in~\cite{bertsekas+tsitsiklis.1997.parallel}. Here we recall some notations.
+Consider that $N={1,...,n}$ processors are connected through a network.
+Communication links are represented by a connected undirected graph $G=(N,V)$
+where $V$ is the set of links connecting differents processors. In this work, we
+consider that processors are homogeneous for sake of simplicity. It is quite
+easy to tackle the heterogeneous case~\cite{ElsMonPre02}. Load of processor $i$
+at time $t$ is represented by $x_i(t)\geq 0$. Let $V(i)$ be the set of
+neighbors of processor $i$. Each processor $i$ has an estimate of the load of
+each of its neighbors $j \in V(i)$ represented by $x_j^i(t)$. According to
+asynchronism and communication delays, this estimate may be outdated. We also
+consider that the load is described by a continuous variable.
+
+When a processor send a part of its load to one or some of its neighbors, the
+transfer takes time to be completed. Let $s_{ij}(t)$ be the amount of load that
+processor $i$ has transfered to processor $j$ at time $t$ and let $r_{ij}(t)$ be the
+amount of load received by processor $j$ from processor $i$ at time $t$. Then
+the amount of load of processor $i$ at time $t+1$ is given by:
+\begin{equation}
+x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t)
+\label{eq:ping-pong}
+\end{equation}
+
+
+Some conditions are required to ensure the convergence. One of them can be
+called the \texttt{ping-pong} condition which specifies that:
+\begin{equation}
+x_i(t)-\sum _{k\in V(i)} s_{ik}(t) \geq x_j^i(t)+s_{ij}(t)
+\end{equation}
+for any processor $i$ and any $j \in V(i)$ such that $x_i(t)>x_j^i(t)$. This
+condition aims at avoiding a processor to send a part of its load and beeing
+less loaded after that.
+
+Nevertheless, we think that this condition may lead to deadlocks in some
+cases. For example, if we consider only three processors and that processor $1$
+is linked to processor $2$ which is also linked to processor $3$ (i.e. a simple
+chain wich 3 processors). Now consider we have the following values at time $t$:
+\begin{eqnarray*}
+x_1(t)=10 \\
+x_2(t)=100 \\
+x_3(t)=99.99\\
+ x_3^2(t)=99.99\\
+\end{eqnarray*}
+In this case, processor $2$ can either sends load to processor $1$ or processor
+$3$. If it sends load to processor $1$ it will not satisfy condition
+(\ref{eq:ping-pong}) because after the sending it will be less loaded that
+$x_3^2(t)$. So we consider that the \texttt{ping-pong} condition is probably to
+strong. Currently, we did not try to make another convergence proof without this
+condition or with a weaker condition.
+\section{Best effort strategy}
+\label{Best-effort}
-qsdqsd
+\section{Virtual load}
+\label{Virtual load}
+\section{Simulations}
+\label{Simulations}
+
+In order to test and validate our approaches, we wrote a simulator
+using the SimGrid
+framework~\cite{casanova+legrand+quinson.2008.simgrid}. The process
+model is detailed in the next section (\ref{Sim model}), then the
+results of the simulations are presented in section~\ref{Results}.
+
+\subsection{Simulation model}
+\label{Sim model}
+
+\subsection{Validation of our approaches}
+\label{Results}
+
+
+On veut montrer quoi ? :
+
+1) best plus rapide que les autres (simple, makhoul)
+2) avantage virtual load
+
+Est ce qu'on peut trouver des contre exemple?
+Topologies variées
+
+
+Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées
+Mais aussi simulation avec temps court qui montre que seul best converge
+
+
+Expés avec ratio calcul/comm rapide et lent
+
+Quelques expés avec charge initiale aléatoire plutot que sur le premier proc
+
+Cadre processeurs homogènes
+
+Topologies statiques
+
+On ne tient pas compte de la vitesse des liens donc on la considère homogène
+
+Prendre un réseau hétérogène et rendre processeur homogène
+
+Taille : 10 100 très gros
+
+\section{Conclusion and perspectives}
+
+
+\bibliographystyle{spmpsci}
+\bibliography{biblio}
\end{document}
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% ispell-local-dictionary: "american"
+%%% End:
+
+% LocalWords: Raphaël Couturier Arnaud Giersch Abderrahmane Sider
+% LocalWords: Bertsekas Tsitsiklis SimGrid DASUD