1 \documentclass[smallextended]{svjour3}
2 \usepackage[utf8]{inputenc}
3 \usepackage[T1]{fontenc}
10 \title{Best effort strategy and virtual load for asynchronous iterative load balancing}
12 \author{Raphaël Couturier \and
17 \institute{F. Author \at
19 Tel.: +123-45-678910\\
21 \email{fauthor@example.com} % \\
22 % \emph{Present address:} of F. Author % if needed
33 Most of the time, asynchronous load balancing algorithms have extensively been
34 studied in a theoretical point of view. The Bertsekas and Tsitsiklis'
35 algorithm~\cite[section~7.4]{bertsekas+tsitsiklis.1997.parallel}
36 is certainly the most well known algorithm for which the convergence proof is
37 given. From a practical point of view, when a node wants to balance a part of
38 its load to some of its neighbors, the strategy is not described. In this
39 paper, we propose a strategy called \texttt{best effort} which tries to balance
40 the load of a node to all its less loaded neighbors while ensuring that all the
41 nodes concerned by the load balancing phase have the same amount of load.
42 Moreover, asynchronous iterative algorithms in which an asynchronous load
43 balancing algorithm is implemented most of the time can dissociate messages
44 concerning load transfers and message concerning load information. In order to
45 increase the converge of a load balancing algorithm, we propose a simple
46 heuristic called \texttt{virtual load} which allows a node that receives an load
47 information message to integrate the load that it will receive later in its
48 load (virtually) and consequently sends a (real) part of its load to some of its
49 neighbors. In order to validate our approaches, we have defined a simulator
50 based on SimGrid which allowed us to conduct many experiments.
55 \section{Introduction}
57 Load balancing algorithms are extensively used in parallel and distributed
58 applications in order to reduce the execution times. They can be applied in
59 different scientific fields from high performance computation to micro sensor
60 networks. They are iterative by nature. In literature many kinds of load
61 balancing algorithms have been studied. They can be classified according
62 different criteria: centralized or decentralized, in static or dynamic
63 environment, with homogeneous or heterogeneous load, using synchronous or
64 asynchronous iterations, with a static topology or a dynamic one which evolves
65 during time. In this work, we focus on asynchronous load balancing algorithms
66 where computer nodes are considered homogeneous and with homogeneous load with
67 no external load. In this context, Bertsekas and Tsitsiklis have proposed an
68 algorithm which is definitively a reference for many works. In their work, they
69 proved that under classical hypotheses of asynchronous iterative algorithms and
70 a special constraint avoiding \texttt{ping-pong} effect, an asynchronous
71 iterative algorithm converge to the uniform load distribution. This work has
72 been extended by many authors. For example,
73 DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
74 with integer load. {\bf Rajouter des choses ici}.
76 Although the Bertsekas and Tsitsiklis' algorithm describes the condition to
77 ensure the convergence, there is no indication or strategy to really implement
78 the load distribution. In other word, a node can send a part of its load to one
79 or many of its neighbors while all the convergence conditions are
80 followed. Consequently, we propose a new strategy called \texttt{best effort}
81 that tries to balance the load of a node to all its less loaded neighbors while
82 ensuring that all the nodes concerned by the load balancing phase have the same
83 amount of load. Moreover, when real asynchronous applications are considered,
84 using asynchronous load balancing algorithms can reduce the execution
85 times. Most of the times, it is simpler to distinguish load information messages
86 from data migration messages. Formers ones allows a node to inform its
87 neighbors of its current load. These messages are very small, they can be sent
88 quite often. For example, if an computing iteration takes a significant times
89 (ranging from seconds to minutes), it is possible to send a new load information
90 message at each neighbor at each iteration. Latter messages contains data that
91 migrates from one node to another one. Depending on the application, it may have
92 sense or not that nodes try to balance a part of their load at each computing
93 iteration. But the time to transfer a load message from a node to another one is
94 often much nore longer that to time to transfer a load information message. So,
95 when a node receives the information that later it will receive a data message,
96 it can take this information into account and it can consider that its new load
97 is larger. Consequently, it can send a part of it real load to some of its
98 neighbors if required. We call this trick the \texttt{virtual load} mecanism.
102 So, in this work, we propose a new strategy for improving the distribution of
103 the load and a simple but efficient trick that also improves the load
104 balacing. Moreover, we have conducted many simulations with simgrid in order to
105 validate our improvements are really efficient. Our simulations consider that in
106 order to send a message, a latency delays the sending and according to the
107 network performance and the message size, the time of the reception of the
110 In the following of this paper, Section~\ref{BT algo} describes the Bertsekas
111 and Tsitsiklis' asynchronous load balancing algorithm. Moreover, we present a
112 possible problem in the convergence conditions. Section~\ref{Best-effort}
113 presents the best effort strategy which provides an efficient way to reduce the
114 execution times. In Section~\ref{Virtual load}, the virtual load mecanism is
115 proposed. Simulations allowed to show that both our approaches are valid using a
116 quite realistic model detailed in Section~\ref{Simulations}. Finally we give a
117 conclusion and some perspectives to this work.
122 \section{Bertsekas and Tsitsiklis' asynchronous load balancing algorithm}
125 In order prove the convergence of asynchronous iterative load balancing
126 Bertesekas and Tsitsiklis proposed a model
127 in~\cite{bertsekas+tsitsiklis.1997.parallel}. Here we recall some notations.
128 Consider that $N={1,...,n}$ processors are connected through a network.
129 Communication links are represented by a connected undirected graph $G=(N,V)$
130 where $V$ is the set of links connecting differents processors. In this work, we
131 consider that processors are homogeneous for sake of simplicity. It is quite
132 easy to tackle the heterogeneous case~\cite{ElsMonPre02}. Load of processor $i$
133 at time $t$ is represented by $x_i(t)\geq 0$. Let $V(i)$ be the set of
134 neighbors of processor $i$. Each processor $i$ has an estimate of the load of
135 each of its neighbors $j \in V(i)$ represented by $x_j^i(t)$. According to
136 asynchronism and communication delays, this estimate may be outdated. We also
137 consider that the load is described by a continuous variable.
139 When a processor send a part of its load to one or some of its neighbors, the
140 transfer takes time to be completed. Let $s_{ij}(t)$ be the amount of load that
141 processor $i$ has transfered to processor $j$ at time $t$ and let $r_{ij}(t)$ be the
142 amount of load received by processor $j$ from processor $i$ at time $t$. Then
143 the amount of load of processor $i$ at time $t+1$ is given by:
145 x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t)
149 \section{Best effort strategy}
154 \section{Virtual load}
157 \section{Simulations}
160 In order to test and validate our approaches, we wrote a simulator
162 framework~\cite{casanova+legrand+quinson.2008.simgrid}. The process
163 model is detailed in the next section (\ref{Sim model}), then the
164 results of the simulations are presented in section~\ref{Results}.
166 \subsection{Simulation model}
169 \subsection{Validation of our approaches}
173 On veut montrer quoi ? :
175 1) best plus rapide que les autres (simple, makhoul)
176 2) avantage virtual load
178 Est ce qu'on peut trouver des contre exemple?
182 Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées
183 Mais aussi simulation avec temps court qui montre que seul best converge
186 Expés avec ratio calcul/comm rapide et lent
188 Quelques expés avec charge initiale aléatoire plutot que sur le premier proc
190 Cadre processeurs homogènes
194 On ne tient pas compte de la vitesse des liens donc on la considère homogène
196 Prendre un réseau hétérogène et rendre processeur homogène
198 Taille : 10 100 très gros
200 \section{Conclusion and perspectives}
203 \bibliographystyle{spmpsci}
204 \bibliography{biblio}
211 %%% ispell-local-dictionary: "american"
214 % LocalWords: Raphaël Couturier Arnaud Giersch Abderrahmane Sider
215 % LocalWords: Bertsekas Tsitsiklis SimGrid DASUD