Add reference for SimGrid.

[loba-papers.git] / supercomp11 / supercomp11.tex
diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex

index 7daaa52bed489911e6845de60a1e20e27e21105b..6a48cd3113581c96dd4715238c48b8c17c0474ad 100644 (file)
--- a/supercomp11/supercomp11.tex
+++ b/supercomp11/supercomp11.tex
@@ -1,5 +1,8 @@
-
  \documentclass[smallextended]{svjour3}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{mathptmx}
+\usepackage{courier}
  \usepackage{graphicx}
  
  \begin{document}
@@ -28,32 +31,185 @@
  \begin{abstract}
  
  Most of the  time, asynchronous load balancing algorithms  have extensively been
-studied in  a theoretical point of  view. The Bertsekas'  algorithm is certainly
-the most well  known algorithm for which the convergence proof  is given. From a
-practical point of view, when a node wants to balance a part of its load to some
-of its  neighbors, the strategy is not  described.  In this paper,  we propose a
-strategy called \texttt{best  effort} which tries to balance the  load of a node
-to all its less loaded neighbors  while ensuring that all the nodes concerned by
-the load balancing  phase have the same amount  of load.  Moreover, asynchronous
-iterative  algorithms  in which  an  asynchronous  load  balancing algorithm  is
-implemented most of  the time can dissociate messages  concerning load transfers
-and message concerning load information.  In order to increase the converge of a
-load balancing  algorithm, we propose a simple  heuristic called \texttt{virtual
-  load}  which  allows a  node  that receives  an  load  information message  to
-integrate  the load  that it  will receive  latter in  its load  (virtually) and
-consequently sends a (real) part of its  load to some of its neighbors. In order
-to validate our  approaches, we have defined a simulator  based on SimGrid which
-allowed us to conduct many experiments.
+studied in a theoretical point  of view. The Bertsekas and Tsitsiklis'
+algorithm~\cite[section~7.4]{bertsekas+tsitsiklis.1997.parallel}
+is certainly  the most well known  algorithm for which the  convergence proof is
+given. From a  practical point of view, when  a node wants to balance  a part of
+its  load to some  of its  neighbors, the  strategy is  not described.   In this
+paper, we propose a strategy  called \texttt{best effort} which tries to balance
+the load of a node to all  its less loaded neighbors while ensuring that all the
+nodes  concerned by  the load  balancing  phase have  the same  amount of  load.
+Moreover,  asynchronous  iterative  algorithms  in which  an  asynchronous  load
+balancing  algorithm is  implemented most  of the  time can  dissociate messages
+concerning load transfers and message  concerning load information.  In order to
+increase  the  converge of  a  load balancing  algorithm,  we  propose a  simple
+heuristic called \texttt{virtual load} which allows a node that receives an load
+information message  to integrate the  load that it  will receive later  in its
+load (virtually) and consequently sends a (real) part of its load to some of its
+neighbors.  In order to  validate our  approaches, we  have defined  a simulator
+based on SimGrid which allowed us to conduct many experiments.
  
  
  \end{abstract}
  
+\section{Introduction}
+
+Load  balancing algorithms  are  extensively used  in  parallel and  distributed
+applications in  order to  reduce the  execution times. They  can be  applied in
+different scientific  fields from high  performance computation to  micro sensor
+networks.   They are  iterative by  nature.  In  literature many  kinds  of load
+balancing  algorithms  have been  studied.   They  can  be classified  according
+different  criteria:   centralized  or  decentralized,  in   static  or  dynamic
+environment,  with  homogeneous  or  heterogeneous load,  using  synchronous  or
+asynchronous iterations, with  a static topology or a  dynamic one which evolves
+during time.  In  this work, we focus on  asynchronous load balancing algorithms
+where computer nodes  are considered homogeneous and with  homogeneous load with
+no external  load. In  this context, Bertsekas  and Tsitsiklis have  proposed an
+algorithm which is definitively a reference  for many works. In their work, they
+proved that under classical  hypotheses of asynchronous iterative algorithms and
+a  special  constraint   avoiding  \texttt{ping-pong}  effect,  an  asynchronous
+iterative algorithm  converge to  the uniform load  distribution. This  work has
+been extended by many authors. For example,
+DASUD~\cite{cortes+ripoll+cedo+al.2002.asynchronous} propose a version working
+with integer load. {\bf Rajouter des choses ici}.
+
+Although  the Bertsekas  and Tsitsiklis'  algorithm describes  the  condition to
+ensure the convergence,  there is no indication or  strategy to really implement
+the load distribution. In other word, a node  can send a part of its load to one
+or   many  of   its  neighbors   while  all   the  convergence   conditions  are
+followed. Consequently,  we propose a  new strategy called  \texttt{best effort}
+that tries to balance the load of  a node to all its less loaded neighbors while
+ensuring that all the nodes concerned  by the load balancing phase have the same
+amount of  load.  Moreover, when real asynchronous  applications are considered,
+using  asynchronous   load  balancing   algorithms  can  reduce   the  execution
+times. Most of the times, it is simpler to distinguish load information messages
+from  data  migration  messages.  Formers  ones  allows  a  node to  inform  its
+neighbors of its  current load. These messages are very small,  they can be sent
+quite often.  For example, if an  computing iteration takes  a significant times
+(ranging from seconds to minutes), it is possible to send a new load information
+message at each  neighbor at each iteration. Latter  messages contains data that
+migrates from one node to another one. Depending on the application, it may have
+sense or not  that nodes try to balance  a part of their load  at each computing
+iteration. But the time to transfer a load message from a node to another one is
+often much nore longer that to  time to transfer a load information message. So,
+when a node receives the information  that later it will receive a data message,
+it can take this information into account  and it can consider that its new load
+is larger.   Consequently, it can  send a part  of it real  load to some  of its
+neighbors if required. We call this trick the \texttt{virtual load} mecanism.
+
+
+
+So, in  this work, we propose a  new strategy for improving  the distribution of
+the  load  and  a  simple  but  efficient trick  that  also  improves  the  load
+balacing. Moreover, we have conducted  many simulations with simgrid in order to
+validate our improvements are really efficient. Our simulations consider that in
+order  to send a  message, a  latency delays  the sending  and according  to the
+network  performance and  the message  size, the  time of  the reception  of the
+message also varies.
+
+In the  following of this  paper, Section~\ref{BT algo} describes  the Bertsekas
+and Tsitsiklis'  asynchronous load balancing  algorithm. Moreover, we  present a
+possible  problem  in  the  convergence  conditions.   Section~\ref{Best-effort}
+presents the best effort strategy which  provides an efficient way to reduce the
+execution  times. In Section~\ref{Virtual  load}, the  virtual load  mecanism is
+proposed. Simulations allowed to show that both our approaches are valid using a
+quite realistic  model detailed in  Section~\ref{Simulations}. Finally we  give a
+conclusion and some perspectives to this work.
+
+
+
+
+\section{Bertsekas  and Tsitsiklis' asynchronous load balancing algorithm}
+\label{BT algo}
+
+In  order  prove  the  convergence  of  asynchronous  iterative  load  balancing
+Bertesekas         and        Tsitsiklis         proposed         a        model
+in~\cite{bertsekas+tsitsiklis.1997.parallel}.   Here we  recall  some notations.
+Consider  that  $N={1,...,n}$  processors   are  connected  through  a  network.
+Communication links  are represented by  a connected undirected  graph $G=(N,V)$
+where $V$ is the set of links connecting differents processors. In this work, we
+consider that  processors are  homogeneous for sake  of simplicity. It  is quite
+easy to tackle the  heterogeneous case~\cite{ElsMonPre02}. Load of processor $i$
+at  time $t$  is  represented  by $x_i(t)\geq  0$.   Let $V(i)$  be  the set  of
+neighbors of processor  $i$.  Each processor $i$ has an estimate  of the load of
+each  of its  neighbors $j  \in V(i)$  represented by  $x_j^i(t)$.  According to
+asynchronism and communication  delays, this estimate may be  outdated.  We also
+consider that the load is described by a continuous variable.
+
+When a processor  send a part of its  load to one or some of  its neighbors, the
+transfer takes time to be completed.  Let $s_{ij}(t)$ be the amount of load that
+processor $i$ has transfered to processor $j$ at time $t$ and let $r_{ij}(t)$ be the
+amount of  load received by processor $j$  from processor $i$ at  time $t$. Then
+the amount of load of processor $i$ at time $t+1$ is given by:
+\begin{equation}
+x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t)
+\end{equation}
+
+
+\section{Best effort strategy}
+\label{Best-effort}
+
+
+
+\section{Virtual load}
+\label{Virtual load}
+
+\section{Simulations}
+\label{Simulations}
  
+In order to test and validate our approaches, we wrote a simulator
+using the SimGrid
+framework~\cite{casanova+legrand+quinson.2008.simgrid}.  The process
+model is detailed in the next section (\ref{Sim model}), then the
+results of the simulations are presented in section~\ref{Results}.
  
+\subsection{Simulation model}
+\label{Sim model}
  
-qsdqsd
+\subsection{Validation of our approaches}
+\label{Results}
  
  
+On veut montrer quoi ? :
  
+1) best plus rapide que les autres (simple, makhoul)
+2) avantage virtual load
+
+Est ce qu'on peut trouver des contre exemple?
+Topologies variées
+
+
+Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées
+Mais aussi simulation avec temps court qui montre que seul best converge
+
+
+Expés avec ratio calcul/comm rapide et lent
+
+Quelques expés avec charge initiale aléatoire plutot que sur le premier proc
+
+Cadre processeurs homogènes
+
+Topologies statiques
+
+On ne tient pas compte de la vitesse des liens donc on la considère homogène
+
+Prendre un réseau hétérogène et rendre processeur homogène
+
+Taille : 10 100 très gros
+
+\section{Conclusion and perspectives}
+
+
+\bibliographystyle{spmpsci}
+\bibliography{biblio}
  
  \end{document}
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% ispell-local-dictionary: "american"
+%%% End:
+
+% LocalWords:  Raphaël Couturier Arnaud Giersch Abderrahmane Sider
+% LocalWords:  Bertsekas Tsitsiklis SimGrid DASUD