X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/loba-papers.git/blobdiff_plain/47d970c595adbdf48d5340fe34cf3d81b012788a..1e4ddc1ac6f6e145312650aa924c3a98871a0dbc:/supercomp11/supercomp11.tex diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex index bb50f38..2c1cb7c 100644 --- a/supercomp11/supercomp11.tex +++ b/supercomp11/supercomp11.tex @@ -364,13 +364,25 @@ During the simulation, each processor concurrently runs three threads: a \emph{receiving thread}, a \emph{computing thread}, and a \emph{load-balancing thread}, which we will briefly describe now. -\paragraph{Receiving thread} The receiving thread is in charge of -waiting for messages to come, either on the control channel, or on the -data channel. Its behavior is sketched by Algorithm~\ref{algo.recv}. -When a message is received, it is pushed in a buffer of -received message, to be later consumed by one of the other threads. -There are two such buffers, one for the control messages, and one for -the data messages. The buffers are implemented with a lock-free FIFO +For the sake of simplicity, a few details were voluntary omitted from +these descriptions. For an exhaustive presentation, we refer to the +actual source code that was used for the experiments% +\footnote{As mentioned before, our simulator relies on the SimGrid + framework~\cite{casanova+legrand+quinson.2008.simgrid}. For the + experiments, we used a pre-release of SimGrid 3.7 (Git commit + 67d62fca5bdee96f590c942b50021cdde5ce0c07, available from + \url{https://gforge.inria.fr/scm/?group_id=12})}, and which is +available at +\url{http://info.iut-bm.univ-fcomte.fr/staff/giersch/software/loba.tar.gz}. + +\subsubsection{Receiving thread} + +The receiving thread is in charge of waiting for messages to come, either on the +control channel, or on the data channel. Its behavior is sketched by +Algorithm~\ref{algo.recv}. When a message is received, it is pushed in a buffer +of received message, to be later consumed by one of the other threads. There +are two such buffers, one for the control messages, and one for the data +messages. The buffers are implemented with a lock-free FIFO \cite{sutter.2008.writing} to avoid contention between the threads. \begin{algorithm} @@ -395,9 +407,10 @@ the data messages. The buffers are implemented with a lock-free FIFO } \end{algorithm} -\paragraph{Computing thread} The computing thread is in charge of the -real load management. As exposed in Algorithm~\ref{algo.comp}, it -iteratively runs the following operations: +\subsubsection{Computing thread} + +The computing thread is in charge of the real load management. As exposed in +Algorithm~\ref{algo.comp}, it iteratively runs the following operations: \begin{itemize} \item if some load was received from the neighbors, get it; \item if there is some load to send to the neighbors, send it; @@ -436,10 +449,11 @@ example, when the current load is near zero). } \end{algorithm} -\paragraph{Load-balancing thread} The load-balancing thread is in -charge of running the load-balancing algorithm, and exchange the -control messages. As shown in Algorithm~\ref{algo.lb}, it iteratively -runs the following operations: +\subsubsection{Load-balancing thread} + +The load-balancing thread is in charge of running the load-balancing algorithm, +and exchange the control messages. As shown in Algorithm~\ref{algo.lb}, it +iteratively runs the following operations: \begin{itemize} \item get the control messages that were received from the neighbors; \item run the load-balancing algorithm; @@ -466,19 +480,7 @@ runs the following operations: } \end{algorithm} -\paragraph{} -For the sake of simplicity, a few details were voluntary omitted from -these descriptions. For an exhaustive presentation, we refer to the -actual source code that was used for the experiments% -\footnote{As mentioned before, our simulator relies on the SimGrid - framework~\cite{casanova+legrand+quinson.2008.simgrid}. For the - experiments, we used a pre-release of SimGrid 3.7 (Git commit - 67d62fca5bdee96f590c942b50021cdde5ce0c07, available from - \url{https://gforge.inria.fr/scm/?group_id=12})}, and which is -available at -\url{http://info.iut-bm.univ-fcomte.fr/staff/giersch/software/loba.tar.gz}. - -\FIXME{ajouter des détails sur la gestion de la charge virtuelle ? +\paragraph{}\FIXME{ajouter des détails sur la gestion de la charge virtuelle ? par ex, donner l'idée générale de l'implémentation. l'idée générale est déja décrite en section~\ref{Virtual load}} \subsection{Experimental contexts} @@ -488,7 +490,7 @@ In order to assess the performances of our algorithms, we ran our simulator with various parameters, and extracted several metrics, that we will describe in this section. -\paragraph{Load balancing strategies} +\subsubsection{Load balancing strategies} Several load balancing strategies were compared. We ran the experiments with the \emph{Best effort}, and with the \emph{Makhoul} strategies. \emph{Best @@ -507,7 +509,7 @@ To summarize the different load balancing strategies, we have: % This gives us as many as $4\times 2\times 2 = 16$ different strategies. -\paragraph{End of the simulation} +\subsubsection{End of the simulation} The simulations were run until the load was nearly balanced among the participating nodes. More precisely the simulation stops when each node holds @@ -520,7 +522,7 @@ real application we would have chosen a decentralized convergence detection algorithm, like the one described by Bahi, Contassot-Vivier, Couturier, and Vernier in \cite{10.1109/TPDS.2005.2}. -\paragraph{Platforms} +\subsubsection{Platforms} In order to show the behavior of the different strategies in different settings, we simulated the executions on two sorts of platforms. These two @@ -546,7 +548,7 @@ processor speeds were normalized, and we arbitrarily chose to fix them to Then we derived each sort of platform with four different number of computing nodes: 16, 64, 256, and 1024 nodes. -\paragraph{Configurations} +\subsubsection{Configurations} The distributed processes of the application were then logically organized along three possible topologies: a line, a torus or an hypercube. We ran tests where @@ -589,7 +591,7 @@ time. Anyway, all these the experiments represent more than 240 hours of computing time. -\paragraph{Metrics} +\subsubsection{Metrics} In order to evaluate and compare the different load balancing strategies we had to define several metrics. Our goal, when choosing these metrics, was to have @@ -630,9 +632,73 @@ With these constraints in mind, we defined the following metrics: \end{description} -\subsection{Validation of our approaches} +\subsection{Experimental results} \label{Results} +In this section, the results for the different simulations will be presented, +and we'll try to explain our observations. + +\subsubsection{Cluster vs grid platforms} + +As mentioned earlier, we simulated the different algorithms on two kinds of +physical platforms: clusters and grids. A first observation that we can make, +is that the graphs we draw from the data have a similar aspect for the two kinds +of platforms. The only noticeable difference is that the algorithms need a bit +more time to achieve the convergence on the grid platforms, than on clusters. +Nevertheless their relative performances remain generally identical. + +This suggests that the relative performances of the different strategies are not +influenced by the characteristics of the physical platform. The differences in +the convergence times can be explained by the fact that on the grid platforms, +distant sites are interconnected by links of smaller bandwith. + +Therefore, in the following, we'll only discuss the results for the grid +platforms. The different results are presented on the +figures~\ref{fig.results1} and~\ref{fig.resultsN}. + +\FIXME{explain how to read the graphs} +ratio 1:1 not given here + +\begin{figure*}[p] + \centering + \includegraphics[width=.5\linewidth]{data/graphs/R1-1:10-grid-line}% + \includegraphics[width=.5\linewidth]{data/graphs/R1-10:1-grid-line} + \includegraphics[width=.5\linewidth]{data/graphs/R1-1:10-grid-torus}% + \includegraphics[width=.5\linewidth]{data/graphs/R1-10:1-grid-torus} + \includegraphics[width=.5\linewidth]{data/graphs/R1-1:10-grid-hcube}% + \includegraphics[width=.5\linewidth]{data/graphs/R1-10:1-grid-hcube} + \caption{Real mode, initially on an only mode, comp/comm ratio = 1/10 (left), or 10/1 (right).} + \label{fig.results1} +\end{figure*} + +\begin{figure*}[p] + \centering + \includegraphics[width=.5\linewidth]{data/graphs/RN-1:10-grid-line}% + \includegraphics[width=.5\linewidth]{data/graphs/RN-10:1-grid-line} + \includegraphics[width=.5\linewidth]{data/graphs/RN-1:10-grid-torus}% + \includegraphics[width=.5\linewidth]{data/graphs/RN-10:1-grid-torus} + \includegraphics[width=.5\linewidth]{data/graphs/RN-1:10-grid-hcube}% + \includegraphics[width=.5\linewidth]{data/graphs/RN-10:1-grid-hcube} + \caption{Real mode, random initial distribution, comp/comm ratio = 1/10 (left), or 10/1 (right).} + \label{fig.resultsN} +\end{figure*} + +\subsubsection{Main results} + +On fig.~\ref{fig.results1}, \dots + +\subsubsection{With the virtual load extension} + +\subsubsection{The $k$ parameter} + +\subsubsection{With an initial random repartition, and larger platforms} + +\subsubsection{With integer load} + +\FIXME{what about the amount of data?} + +\begin{itshape} +\FIXME{remove that part} Dans cet ordre: ... - comparer be/makhoul -> be tient la route @@ -668,33 +734,31 @@ On constate quoi (vérifier avec les chiffres)? \end{itemize} -\begin{itshape} -On veut montrer quoi ? : -\FIXME{remove that part} +% On veut montrer quoi ? : -1) best plus rapide que les autres (simple, makhoul) -2) avantage virtual load +% 1) best plus rapide que les autres (simple, makhoul) +% 2) avantage virtual load -Est ce qu'on peut trouver des contre exemple? -Topologies variées +% Est ce qu'on peut trouver des contre exemple? +% Topologies variées -Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées -Mais aussi simulation avec temps court qui montre que seul best converge +% Simulation avec temps définies assez long et on mesure la qualité avec : volume de calcul effectué, volume de données échangées +% Mais aussi simulation avec temps court qui montre que seul best converge -Expés avec ratio calcul/comm rapide et lent +% Expés avec ratio calcul/comm rapide et lent -Quelques expés avec charge initiale aléatoire plutot que sur le premier proc +% Quelques expés avec charge initiale aléatoire plutot que sur le premier proc -Cadre processeurs homogènes +% Cadre processeurs homogènes -Topologies statiques +% Topologies statiques -On ne tient pas compte de la vitesse des liens donc on la considère homogène +% On ne tient pas compte de la vitesse des liens donc on la considère homogène -Prendre un réseau hétérogène et rendre processeur homogène +% Prendre un réseau hétérogène et rendre processeur homogène -Taille : 10 100 très gros +% Taille : 10 100 très gros \end{itshape} \section{Conclusion and perspectives}