X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/loba-papers.git/blobdiff_plain/1ccb8a3cdf8051ce59428e89b46322f8b6326db2..11a214bdac1c00f5f5f8ed9ff672afdb27a23b7d:/supercomp11/supercomp11.tex diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex index 2fc63f7..0479906 100644 --- a/supercomp11/supercomp11.tex +++ b/supercomp11/supercomp11.tex @@ -1,5 +1,9 @@ \documentclass[smallextended]{svjour3} +\usepackage[utf8]{inputenc} +\usepackage[T1]{fontenc} +\usepackage{mathptmx} +\usepackage{courier} \usepackage{graphicx} \begin{document} @@ -28,7 +32,8 @@ \begin{abstract} Most of the time, asynchronous load balancing algorithms have extensively been -studied in a theoretical point of view. The Bertsekas and Tsitsiklis' algorithm +studied in a theoretical point of view. The Bertsekas and Tsitsiklis' +algorithm~\cite[section~7.4]{bertsekas+tsitsiklis.1997.parallel} is certainly the most well known algorithm for which the convergence proof is given. From a practical point of view, when a node wants to balance a part of its load to some of its neighbors, the strategy is not described. In this @@ -40,7 +45,7 @@ balancing algorithm is implemented most of the time can dissociate messages concerning load transfers and message concerning load information. In order to increase the converge of a load balancing algorithm, we propose a simple heuristic called \texttt{virtual load} which allows a node that receives an load -information message to integrate the load that it will receive latter in its +information message to integrate the load that it will receive later in its load (virtually) and consequently sends a (real) part of its load to some of its neighbors. In order to validate our approaches, we have defined a simulator based on SimGrid which allowed us to conduct many experiments. @@ -65,8 +70,9 @@ algorithm which is definitively a reference for many works. In their work, they proved that under classical hypotheses of asynchronous iterative algorithms and a special constraint avoiding \texttt{ping-pong} effect, an asynchronous iterative algorithm converge to the uniform load distribution. This work has -been extended by many authors. For example, DASUD proposes a version working with -integer load. {\bf Rajouter des choses ici}. +been extended by many authors. For example, +DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working +with integer load. {\bf Rajouter des choses ici}. Although the Bertsekas and Tsitsiklis' algorithm describes the condition to ensure the convergence, there is no indication or strategy to really implement @@ -117,7 +123,57 @@ conclusion and some perspectives to this work. \section{Bertsekas and Tsitsiklis' asynchronous load balancing algorithm} \label{BT algo} -Comment on the problem in the convergence condition. +In order prove the convergence of asynchronous iterative load balancing +Bertesekas and Tsitsiklis proposed a model +in~\cite{bertsekas+tsitsiklis.1997.parallel}. Here we recall some notations. +Consider that $N={1,...,n}$ processors are connected through a network. +Communication links are represented by a connected undirected graph $G=(N,V)$ +where $V$ is the set of links connecting differents processors. In this work, we +consider that processors are homogeneous for sake of simplicity. It is quite +easy to tackle the heterogeneous case~\cite{ElsMonPre02}. Load of processor $i$ +at time $t$ is represented by $x_i(t)\geq 0$. Let $V(i)$ be the set of +neighbors of processor $i$. Each processor $i$ has an estimate of the load of +each of its neighbors $j \in V(i)$ represented by $x_j^i(t)$. According to +asynchronism and communication delays, this estimate may be outdated. We also +consider that the load is described by a continuous variable. + +When a processor send a part of its load to one or some of its neighbors, the +transfer takes time to be completed. Let $s_{ij}(t)$ be the amount of load that +processor $i$ has transfered to processor $j$ at time $t$ and let $r_{ij}(t)$ be the +amount of load received by processor $j$ from processor $i$ at time $t$. Then +the amount of load of processor $i$ at time $t+1$ is given by: +\begin{equation} +x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t) +\label{eq:ping-pong} +\end{equation} + + +Some conditions are required to ensure the convergence. One of them can be +called the \texttt{ping-pong} condition which specifies that: +\begin{equation} +x_i(t)-\sum _{k\in V(i)} s_{ik}(t) \geq x_j^i(t)+s_{ij}(t) +\end{equation} +for any processor $i$ and any $j \in V(i)$ such that $x_i(t)>x_j^i(t)$. This +condition aims at avoiding a processor to send a part of its load and beeing +less loaded after that. + +Nevertheless, we think that this condition may lead to deadlocks in some +cases. For example, if we consider only three processors and that processor $1$ +is linked to processor $2$ which is also linked to processor $3$ (i.e. a simple +chain wich 3 processors). Now consider we have the following values at time $t$: +\begin{eqnarray*} +x_1(t)=10 \\ +x_2(t)=100 \\ +x_3(t)=99.99\\ + x_3^2(t)=99.99\\ +\end{eqnarray*} +In this case, processor $2$ can either sends load to processor $1$ or processor +$3$. If it sends load to processor $1$ it will not satisfy condition +(\ref{eq:ping-pong}) because after the sending it will be less loaded that +$x_3^2(t)$. So we consider that the \texttt{ping-pong} condition is probably to +strong. Currently, we did not try to make another convergence proof without this +condition or with a weaker condition. + \section{Best effort strategy} \label{Best-effort} @@ -130,13 +186,59 @@ Comment on the problem in the convergence condition. \section{Simulations} \label{Simulations} +In order to test and validate our approaches, we wrote a simulator +using the SimGrid +framework~\cite{casanova+legrand+quinson.2008.simgrid}. The process +model is detailed in the next section (\ref{Sim model}), then the +results of the simulations are presented in section~\ref{Results}. + \subsection{Simulation model} +\label{Sim model} \subsection{Validation of our approaches} +\label{Results} + + +On veut montrer quoi ? : + +1) best plus rapide que les autres (simple, makhoul) +2) avantage virtual load + +Est ce qu'on peut trouver des contre exemple? +Topologies variées + + +Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées +Mais aussi simulation avec temps court qui montre que seul best converge + +Expés avec ratio calcul/comm rapide et lent + +Quelques expés avec charge initiale aléatoire plutot que sur le premier proc + +Cadre processeurs homogènes + +Topologies statiques + +On ne tient pas compte de la vitesse des liens donc on la considère homogène + +Prendre un réseau hétérogène et rendre processeur homogène + +Taille : 10 100 très gros \section{Conclusion and perspectives} +\bibliographystyle{spmpsci} +\bibliography{biblio} \end{document} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% ispell-local-dictionary: "american" +%%% End: + +% LocalWords: Raphaël Couturier Arnaud Giersch Abderrahmane Sider +% LocalWords: Bertsekas Tsitsiklis SimGrid DASUD