X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/loba-papers.git/blobdiff_plain/1630e7fa688ea1692b8a13f554924252b4047321..59636878becb2087601db617b593fd3f35181655:/loba-besteffort/loba-besteffort.tex diff --git a/loba-besteffort/loba-besteffort.tex b/loba-besteffort/loba-besteffort.tex index 9007b17..3ccb2b0 100644 --- a/loba-besteffort/loba-besteffort.tex +++ b/loba-besteffort/loba-besteffort.tex @@ -45,8 +45,13 @@ \author{Arnaud Giersch\corref{cor}} \ead{arnaud.giersch@femto-st.fr} -\address{FEMTO-ST, University of Franche-Comté\\ - 19 avenue du Maréchal Juin, BP 527, 90016 Belfort cedex, France} +\address{% + Institut FEMTO-ST (UMR 6174), + Université de Franche-Comté (UFC), + Centre National de la Recherche Scientifique (CNRS), + École Nationale Supérieure de Mécanique et des Microtechniques (ENSMM), + Université de Technologie de Belfort Montbéliard (UTBM)\\ + 19 avenue du Maréchal Juin, BP 527, 90016 Belfort cedex, France} \cortext[cor]{Corresponding author.} @@ -342,7 +347,7 @@ information of the load they will receive, so they can take in into account. In order to test and validate our approaches, we wrote a simulator using the SimGrid -framework~\cite{casanova+legrand+quinson.2008.simgrid}. This +framework~\cite{simgrid.web,casanova+legrand+quinson.2008.simgrid}. This simulator, which consists of about 2,700 lines of C++, allows to run the different load-balancing strategies under various parameters, such as the initial distribution of load, the interconnection topology, the @@ -648,7 +653,7 @@ With these constraints in mind, we defined the following metrics: \label{sec.results} In this section, the results for the different simulations will be presented, -and we'll try to explain our observations. +and we will try to explain our observations. \subsubsection{Cluster vs grid platforms} @@ -719,56 +724,103 @@ allocated time, or because we simply decided not to run it. \FIXME{annoncer le plan de la suite} -\subsubsection{The \besteffort{} strategy with the load initially on only one - node} +\subsubsection{The \besteffort{} and \makhoul{} strategies without virtual load} -Before looking at the different variations, we'll first show that the plain -\besteffort{} strategy is valuable, and may be as good as the \makhoul{} -strategy. On the graphs from the figure~\ref{fig.results1}, these strategies -are respectively labeled ``b'' and ``a''. +Before looking at the different variations, we will first show that the plain +\besteffort{} strategy is valuable, and may be as good as the \makhoul{} +strategy. On Figures~\ref{fig.results1} and~\ref{fig.resultsN}, +these strategies are respectively labeled ``b'' and ``a''. -twice faster on lines -almost equivalent on torus -worse on hcubes +We can see that the relative performance of these strategies is mainly +influenced by the application topology. It is for the line topology that the +difference is the more important. In this case, the \besteffort{} strategy is +nearly faster than the \makhoul{} strategy. This can be explained by the +fact that the \besteffort{} strategy tries to distribute the load fairly between +all the nodes and with the line topology, it is easy to load balance the load +fairly. --> interconnection +On the contrary, for the hypercube topology, the \besteffort{} strategy performs +worse than the \makhoul{} strategy. In this case, the \makhoul{} strategy which +tries to give more load to few neighbors reaches the equilibrium faster. -plus c'est connecté, moins bon est BE car à vouloir trop bien équilibrer -localement, le processeurs se perturbent mutuellement. Du coup, makhoul qui -équilibre moins bien localement est moins perturbé par ces interférences. +For the torus topology, for which the number of links is between the line and +the hypercube, the \makhoul{} strategy is slightly better but the difference is +more nuanced when the initial load is only on one node. The only case where the +\makhoul{} strategy is really faster than the \besteffort{} strategy is with the +random initial distribution when the communication are slow. -\subsubsection{With the virtual load extension with the load initially on only - one node} +Globally the number of interconnection is very important. The more +the interconnection links are, the faster the \makhoul{} strategy is because +it distributes quickly significant amount of load, even if this is unfair, between +all the neighbors. In opposition, the \besteffort{} strategy distributes the +load fairly so this strategy is better for low connected strategy. -Dans ce cas légère amélioration de la cvg. max. Temps moyen de cvg. amélioré, -mais plus de temps passé en idle, surtout quand les comms coutent cher. -\subsubsection{The \besteffort{} strategy with an initial random load - distribution, and larger platforms} +\subsubsection{Virtual load} -Mêmes conclusions pour line et hcube. -Sur tore, BE se fait exploser quand les comms coutent cher. +The influence of virtual load is most of the time really significant compared to +the same configuration without it. Sometimes it has no effect but {\bf A + VERIFIER} it has never a negative effect on the load balancing we tested. -\FIXME{virer les 1024 ?} +On Figure~\ref{fig.results1}, when the load is initially on one node, it can be +noticed that the average idle times are generally longer with the virtual load +than without it. This can be explained by the fact that, with virtual load, +processors will exchange all the load they need to exchange as soon as the +virtual load has been balanced between all the processors. So consequently they +cannot compute at the beginning. This is especially noticeable when the +communication are slow (on the left part of Figure ~\ref{fig.results1}. -\subsubsection{With the virtual load extension with an initial random load - distribution} +%Dans ce cas légère amélioration de la cvg. max. Temps moyen de cvg. amélioré, +%mais plus de temps passé en idle, surtout quand les comms coutent cher. -Soit c'est équivalent, soit on gagne -> surtout quand les comms coutent cher et -qu'il y a beaucoup de voisins. +%\subsubsection{The \besteffort{} strategy with an initial random load +% distribution, and larger platforms} + +%In +%Mêmes conclusions pour line et hcube. +%Sur tore, BE se fait exploser quand les comms coutent cher. + +%\FIXME{virer les 1024 ?} + +%\subsubsection{With the virtual load extension with an initial random load +% distribution} + +%Soit c'est équivalent, soit on gagne -> surtout quand les comms coutent cher et +%qu'il y a beaucoup de voisins. \subsubsection{The $k$ parameter} +\label{results-k} + +As explained previously when the communication are slow the \besteffort{} +strategy is efficient. This is due to the fact that it tries to balance the load +fairly and consequently a significant amount of the load is transfered between +processors. In this situation, it is possible to reduce the convergence time by +using the leveler parameter (parameter $k$). The advantage of using this +solution is particularly efficient when the initial load is randomly distributed +on the nodes with torus and hypercube topology and slow communication. When +virtual load mechanism is used, the effect of this parameter is also visible +with the same condition. + + + +\subsubsection{With integer load} -Dans le cas où les comms coutent cher et ou BE se fait avoir, on peut ameliorer -les perfs avec le param k. +We also performed some experiments with integer load instead of load with real +value. In this case, the results have globally the same behavior. The most +intereting result, from our point of view, is that the virtual mode allows +processors in a line topology to converge to the uniform load balancing. Without +the virtual load, most of the time, processors converge to what we call the +``stairway effect'', that is to say that there is only a difference of one in +the load of each processor and its neighbors (for example with 10 processors, we +obtain 10 9 8 7 6 6 7 8 9 10 instead of 8 8 8 8 8 8 8 8 8 8). -\subsubsection{With integer load, 1 ou N} +%Cas normal, ligne -> converge pas (effet d'escalier). +%Avec vload, ça converge. -Cas normal, ligne -> converge pas (effet d'escalier). -Avec vload, ça converge. +%Dans les autres cas, résultats similaires au cas réel: redire que vload est +%intéressant. -Dans les autres cas, résultats similaires au cas réel: redire que vload est -intéressant. +\FIXME{ajouter une courbe avec l'équilibrage en entier} \FIXME{virer la metrique volume de comms} @@ -840,4 +892,6 @@ Mésocentre de calcul de Franche-Comté. % LocalWords: SimGrid DASUD Comté asynchronism ji ik isend irecv Cortés et al % LocalWords: chan ctrl fifo Makhoul GFlop xml pre FEMTO Makhoul's fca bdee % LocalWords: cdde Contassot Vivier underlaid du de Maréchal Juin cedex calcul -% LocalWords: biblio +% LocalWords: biblio Institut UMR Université UFC Centre Scientifique CNRS des +% LocalWords: École Nationale Supérieure Mécanique Microtechniques ENSMM UTBM +% LocalWords: Technologie Bahi