X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/loba-papers.git/blobdiff_plain/7f289476ca355b26a16f5eb6e2686286f3775245..5432212c4bb6f191fc44cb3d0e4b1e66aa3dd9e8:/loba-besteffort/loba-besteffort.tex diff --git a/loba-besteffort/loba-besteffort.tex b/loba-besteffort/loba-besteffort.tex index 6a76602..44a6853 100644 --- a/loba-besteffort/loba-besteffort.tex +++ b/loba-besteffort/loba-besteffort.tex @@ -134,20 +134,21 @@ order to send a message, a latency delays the sending and according to the network performance and the message size, the time of the reception of the message also varies. -In the following of this paper, Section~\ref{BT algo} describes the Bertsekas -and Tsitsiklis' asynchronous load balancing algorithm. Moreover, we present a -possible problem in the convergence conditions. Section~\ref{Best-effort} -presents the best effort strategy which provides an efficient way to reduce the -execution times. This strategy will be compared with other ones, presented in -Section~\ref{Other}. In Section~\ref{Virtual load}, the virtual load mechanism -is proposed. Simulations allowed to show that both our approaches are valid -using a quite realistic model detailed in Section~\ref{Simulations}. Finally we -give a conclusion and some perspectives to this work. +In the following of this paper, Section~\ref{sec.bt-algo} describes the +Bertsekas and Tsitsiklis' asynchronous load balancing algorithm. Moreover, we +present a possible problem in the convergence conditions. +Section~\ref{sec.besteffort} presents the best effort strategy which provides an +efficient way to reduce the execution times. This strategy will be compared +with other ones, presented in Section~\ref{sec.other}. In +Section~\ref{sec.virtual-load}, the virtual load mechanism is proposed. +Simulations allowed to show that both our approaches are valid using a quite +realistic model detailed in Section~\ref{sec.simulations}. Finally we give a +conclusion and some perspectives to this work. \section{Bertsekas and Tsitsiklis' asynchronous load balancing algorithm} -\label{BT algo} +\label{sec.bt-algo} In order prove the convergence of asynchronous iterative load balancing Bertsekas and Tsitsiklis proposed a model @@ -170,7 +171,7 @@ amount of load received by processor $j$ from processor $i$ at time $t$. Then the amount of load of processor $i$ at time $t+1$ is given by: \begin{equation} x_i(t+1)=x_i(t)-\sum_{j\in V(i)} s_{ij}(t) + \sum_{j\in V(i)} r_{ji}(t) -\label{eq:ping-pong} +\label{eq.ping-pong} \end{equation} @@ -193,9 +194,9 @@ x_2(t)=100 \\ x_3(t)=99.99\\ x_3^2(t)=99.99\\ \end{eqnarray*} -In this case, processor $2$ can either sends load to processor $1$ or processor -$3$. If it sends load to processor $1$ it will not satisfy condition -(\ref{eq:ping-pong}) because after the sending it will be less loaded that +In this case, processor $2$ can either sends load to processor $1$ or processor +$3$. If it sends load to processor $1$ it will not satisfy condition +(\ref{eq.ping-pong}) because after the sending it will be less loaded that $x_3^2(t)$. So we consider that the \emph{ping-pong} condition is probably to strong. Currently, we did not try to make another convergence proof without this condition or with a weaker condition. @@ -209,7 +210,7 @@ that they are sufficient to ensure the convergence of the load-balancing algorithm. \section{Best effort strategy} -\label{Best-effort} +\label{sec.besteffort} In this section we describe a new load-balancing strategy that we call \emph{best effort}. First, we explain the general idea behind this strategy, @@ -281,12 +282,12 @@ potentially wrong decision has a lower impact. Concretely, once $s_{ij}$ has been evaluated as before, it is simply divided by some configurable factor. That's what we named the ``parameter $k$'' in -Section~\ref{Results}. The amount of data to send is then $s_{ij}(t) = (\bar{x} -- x^i_j(t))/k$. -\FIXME[check that it's still named $k$ in Sec.~\ref{Results}]{} +Section~\ref{sec.results}. The amount of data to send is then $s_{ij}(t) = +(\bar{x} - x^i_j(t))/k$. +\FIXME[check that it's still named $k$ in Sec.~\ref{sec.results}]{} \section{Other strategies} -\label{Other} +\label{sec.other} Another load balancing strategy, working under the same conditions, was previously developed by Bahi, Giersch, and Makhoul in @@ -307,7 +308,7 @@ neighbor. \section{Virtual load} -\label{Virtual load} +\label{sec.virtual-load} In this section, we present the concept of \emph{virtual load}. In order to use this concept, load balancing messages must be sent using two different kinds @@ -337,7 +338,7 @@ information of the load they will receive, so they can take in into account. \FIXME{describe integer mode} \section{Simulations} -\label{Simulations} +\label{sec.simulations} In order to test and validate our approaches, we wrote a simulator using the SimGrid @@ -348,13 +349,12 @@ as the initial distribution of load, the interconnection topology, the characteristics of the running platform, etc. Then several metrics are issued that permit to compare the strategies. -The simulation model is detailed in the next section (\ref{Sim - model}), and the experimental contexts are described in -section~\ref{Contexts}. Then the results of the simulations are -presented in section~\ref{Results}. +The simulation model is detailed in the next section (\ref{sec.model}), and the +experimental contexts are described in section~\ref{sec.exp-context}. Then the +results of the simulations are presented in section~\ref{sec.results}. \subsection{Simulation model} -\label{Sim model} +\label{sec.model} In the simulation model the processors exchange messages which are of two kinds. First, there are \emph{control messages} which only carry @@ -491,10 +491,11 @@ iteratively runs the following operations: \end{algorithm} \paragraph{}\FIXME{ajouter des détails sur la gestion de la charge virtuelle ? -par ex, donner l'idée générale de l'implémentation. l'idée générale est déja décrite en section~\ref{Virtual load}} + par ex, donner l'idée générale de l'implémentation. l'idée générale est déja + décrite en section~\ref{sec.virtual-load}} \subsection{Experimental contexts} -\label{Contexts} +\label{sec.exp-context} In order to assess the performances of our algorithms, we ran our simulator with various parameters, and extracted several metrics, that @@ -602,6 +603,7 @@ Anyway, all these the experiments represent more than 240 hours of computing time. \subsubsection{Metrics} +\label{sec.metrics} In order to evaluate and compare the different load balancing strategies we had to define several metrics. Our goal, when choosing these metrics, was to have @@ -643,7 +645,7 @@ With these constraints in mind, we defined the following metrics: \subsection{Experimental results} -\label{Results} +\label{sec.results} In this section, the results for the different simulations will be presented, and we'll try to explain our observations. @@ -707,13 +709,13 @@ On each of the figures~\ref{fig.results1} and~\ref{fig.resultsN}, the results are given for the process topology being, from top to bottom, a line, a torus or an hypercube. -\FIXME{explain how to read the graphs} - -each bar -> times for an algorithm -recall the different times -no bar -> not run or did not converge in allocated time - -repeated for the different platform sizes. +Finally, on the graphs, the vertical bars show the measured times for each of +the algorithms. These measured times are, from bottom to top, the average idle +time, the average convergence date, and the maximum convergence date (see +Section~\ref{sec.metrics}). The measurements are repeated for the different +platform sizes. Some bars are missing, specially for large platforms. This is +either because the algorithm did not reach the convergence state in the +allocated time, or because we simply decided not to run it. \FIXME{donner les premières conclusions, annoncer le plan de la suite}