2 \documentclass[smallextended]{svjour3}
3 \usepackage[utf8]{inputenc}
4 \usepackage[T1]{fontenc}
11 \title{Best effort strategy and virtual load for asynchronous iterative load balancing}
13 \author{Raphaël Couturier \and
18 \institute{F. Author \at
20 Tel.: +123-45-678910\\
22 \email{fauthor@example.com} % \\
23 % \emph{Present address:} of F. Author % if needed
34 Most of the time, asynchronous load balancing algorithms have extensively been
35 studied in a theoretical point of view. The Bertsekas and Tsitsiklis'
36 algorithm~\cite[section~7.4]{bertsekas+tsitsiklis.1997.parallel}
37 is certainly the most well known algorithm for which the convergence proof is
38 given. From a practical point of view, when a node wants to balance a part of
39 its load to some of its neighbors, the strategy is not described. In this
40 paper, we propose a strategy called \texttt{best effort} which tries to balance
41 the load of a node to all its less loaded neighbors while ensuring that all the
42 nodes concerned by the load balancing phase have the same amount of load.
43 Moreover, asynchronous iterative algorithms in which an asynchronous load
44 balancing algorithm is implemented most of the time can dissociate messages
45 concerning load transfers and message concerning load information. In order to
46 increase the converge of a load balancing algorithm, we propose a simple
47 heuristic called \texttt{virtual load} which allows a node that receives an load
48 information message to integrate the load that it will receive later in its
49 load (virtually) and consequently sends a (real) part of its load to some of its
50 neighbors. In order to validate our approaches, we have defined a simulator
51 based on SimGrid which allowed us to conduct many experiments.
56 \section{Introduction}
58 Load balancing algorithms are extensively used in parallel and distributed
59 applications in order to reduce the execution times. They can be applied in
60 different scientific fields from high performance computation to micro sensor
61 networks. They are iterative by nature. In literature many kinds of load
62 balancing algorithms have been studied. They can be classified according
63 different criteria: centralized or decentralized, in static or dynamic
64 environment, with homogeneous or heterogeneous load, using synchronous or
65 asynchronous iterations, with a static topology or a dynamic one which evolves
66 during time. In this work, we focus on asynchronous load balancing algorithms
67 where computer nodes are considered homogeneous and with homogeneous load with
68 no external load. In this context, Bertsekas and Tsitsiklis have proposed an
69 algorithm which is definitively a reference for many works. In their work, they
70 proved that under classical hypotheses of asynchronous iterative algorithms and
71 a special constraint avoiding \texttt{ping-pong} effect, an asynchronous
72 iterative algorithm converge to the uniform load distribution. This work has
73 been extended by many authors. For example,
74 DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
75 with integer load. {\bf Rajouter des choses ici}.
77 Although the Bertsekas and Tsitsiklis' algorithm describes the condition to
78 ensure the convergence, there is no indication or strategy to really implement
79 the load distribution. In other word, a node can send a part of its load to one
80 or many of its neighbors while all the convergence conditions are
81 followed. Consequently, we propose a new strategy called \texttt{best effort}
82 that tries to balance the load of a node to all its less loaded neighbors while
83 ensuring that all the nodes concerned by the load balancing phase have the same
84 amount of load. Moreover, when real asynchronous applications are considered,
85 using asynchronous load balancing algorithms can reduce the execution
86 times. Most of the times, it is simpler to distinguish load information messages
87 from data migration messages. Formers ones allows a node to inform its
88 neighbors of its current load. These messages are very small, they can be sent
89 quite often. For example, if an computing iteration takes a significant times
90 (ranging from seconds to minutes), it is possible to send a new load information
91 message at each neighbor at each iteration. Latter messages contains data that
92 migrates from one node to another one. Depending on the application, it may have
93 sense or not that nodes try to balance a part of their load at each computing
94 iteration. But the time to transfer a load message from a node to another one is
95 often much nore longer that to time to transfer a load information message. So,
96 when a node receives the information that later it will receive a data message,
97 it can take this information into account and it can consider that its new load
98 is larger. Consequently, it can send a part of it real load to some of its
99 neighbors if required. We call this trick the \texttt{virtual load} mecanism.
103 So, in this work, we propose a new strategy for improving the distribution of
104 the load and a simple but efficient trick that also improves the load
105 balacing. Moreover, we have conducted many simulations with simgrid in order to
106 validate our improvements are really efficient. Our simulations consider that in
107 order to send a message, a latency delays the sending and according to the
108 network performance and the message size, the time of the reception of the
111 In the following of this paper, Section~\ref{BT algo} describes the Bertsekas
112 and Tsitsiklis' asynchronous load balancing algorithm. Moreover, we present a
113 possible problem in the convergence conditions. Section~\ref{Best-effort}
114 presents the best effort strategy which provides an efficient way to reduce the
115 execution times. In Section~\ref{Virtual load}, the virtual load mecanism is
116 proposed. Simulations allowed to show that both our approaches are valid using a
117 quite realistic model detailed in Section~\ref{Simulations}. Finally we give a
118 conclusion and some perspectives to this work.
123 \section{Bertsekas and Tsitsiklis' asynchronous load balancing algorithm}
126 In order prove the convergence of asynchronous iterative load balancing
127 Bertesekas and Tsitsiklis proposed a model
128 in~\cite{bertsekas+tsitsiklis.1997.parallel}. Here we recall some notations.
129 Consider that $N={1,...,n}$ processors are connected through a network.
130 Communication links are represented by a connected undirected graph $G=(N,V)$
131 where $V$ is the set of links connecting differents processors. In this work, we
132 consider that processors are homogeneous for sake of simplicity. It is quite
133 easy to tackle the heterogeneous case~\cite{ElsMonPre02}. Load of processor $i$
134 at time $t$ is represented by $x_i(t)\geq 0$. Let $V(i)$ be the set of
135 neighbors of processor $i$. Each processor $i$ has an estimate of the load of
136 each of its neighbors $j \in V(i)$ represented by $x_j^i(t)$. According to
137 asynchronism and communication delays, this estimate may be outdated. We also
138 consider that the load is described by a continuous variable.
140 When a processor send a part of its load to one or some of its neighbors, the
141 transfer takes time to be completed. Let $s_{ij}(t)$ be the amount of load that
142 processor $i$ has transfered to processor $j$ at time $t$ and let $r_{ij}(t)$ be the
143 amount of load received by processor $j$ from processor $i$ at time $t$. Then
144 the amount of load of processor $i$ at time $t+1$ is given by:
146 x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t)
151 Some conditions are required to ensure the convergence. One of them can be
152 called the \texttt{ping-pong} condition which specifies that:
154 x_i(t)-\sum _{k\in V(i)} s_{ik}(t) \geq x_j^i(t)+s_{ij}(t)
156 for any processor $i$ and any $j \in V(i)$ such that $x_i(t)>x_j^i(t)$. This
157 condition aims at avoiding a processor to send a part of its load and beeing
158 less loaded after that.
160 Nevertheless, we think that this condition may lead to deadlocks in some
161 cases. For example, if we consider only three processors and that processor $1$
162 is linked to processor $2$ which is also linked to processor $3$ (i.e. a simple
163 chain wich 3 processors). Now consider we have the following values at time $t$:
170 In this case, processor $2$ can either sends load to processor $1$ or processor
171 $3$. If it sends load to processor $1$ it will not satisfy condition
172 (\ref{eq:ping-pong}) because after the sending it will be less loaded that
173 $x_3^2(t)$. So we consider that the \texttt{ping-pong} condition is probably to
174 strong. Currently, we did not try to make another convergence proof without this
175 condition or with a weaker condition.
178 \section{Best effort strategy}
183 \section{Virtual load}
186 \section{Simulations}
189 In order to test and validate our approaches, we wrote a simulator
191 framework~\cite{casanova+legrand+quinson.2008.simgrid}. The process
192 model is detailed in the next section (\ref{Sim model}), then the
193 results of the simulations are presented in section~\ref{Results}.
195 \subsection{Simulation model}
198 \subsection{Validation of our approaches}
202 On veut montrer quoi ? :
204 1) best plus rapide que les autres (simple, makhoul)
205 2) avantage virtual load
207 Est ce qu'on peut trouver des contre exemple?
211 Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées
212 Mais aussi simulation avec temps court qui montre que seul best converge
215 Expés avec ratio calcul/comm rapide et lent
217 Quelques expés avec charge initiale aléatoire plutot que sur le premier proc
219 Cadre processeurs homogènes
223 On ne tient pas compte de la vitesse des liens donc on la considère homogène
225 Prendre un réseau hétérogène et rendre processeur homogène
227 Taille : 10 100 très gros
229 \section{Conclusion and perspectives}
232 \bibliographystyle{spmpsci}
233 \bibliography{biblio}
240 %%% ispell-local-dictionary: "american"
243 % LocalWords: Raphaël Couturier Arnaud Giersch Abderrahmane Sider
244 % LocalWords: Bertsekas Tsitsiklis SimGrid DASUD