X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/loba-papers.git/blobdiff_plain/47d970c595adbdf48d5340fe34cf3d81b012788a..1e4ddc1ac6f6e145312650aa924c3a98871a0dbc:/supercomp11/supercomp11.tex?ds=inline

diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex
index bb50f38..2c1cb7c 100644
--- a/supercomp11/supercomp11.tex
+++ b/supercomp11/supercomp11.tex
@@ -364,13 +364,25 @@ During the simulation, each processor concurrently runs three threads:
 a \emph{receiving thread}, a \emph{computing thread}, and a
 \emph{load-balancing thread}, which we will briefly describe now.
 
-\paragraph{Receiving thread} The receiving thread is in charge of
-waiting for messages to come, either on the control channel, or on the
-data channel.  Its behavior is sketched by Algorithm~\ref{algo.recv}.
-When a message is received, it is pushed in a buffer of
-received message, to be later consumed by one of the other threads.
-There are two such buffers, one for the control messages, and one for
-the data messages.  The buffers are implemented with a lock-free FIFO
+For the sake of simplicity, a few details were voluntary omitted from
+these descriptions.  For an exhaustive presentation, we refer to the
+actual source code that was used for the experiments%
+\footnote{As mentioned before, our simulator relies on the SimGrid
+  framework~\cite{casanova+legrand+quinson.2008.simgrid}.  For the
+  experiments, we used a pre-release of SimGrid 3.7 (Git commit
+  67d62fca5bdee96f590c942b50021cdde5ce0c07, available from
+  \url{https://gforge.inria.fr/scm/?group_id=12})}, and which is
+available at
+\url{http://info.iut-bm.univ-fcomte.fr/staff/giersch/software/loba.tar.gz}.
+
+\subsubsection{Receiving thread}
+
+The receiving thread is in charge of waiting for messages to come, either on the
+control channel, or on the data channel.  Its behavior is sketched by
+Algorithm~\ref{algo.recv}.  When a message is received, it is pushed in a buffer
+of received message, to be later consumed by one of the other threads.  There
+are two such buffers, one for the control messages, and one for the data
+messages.  The buffers are implemented with a lock-free FIFO
 \cite{sutter.2008.writing} to avoid contention between the threads.
 
 \begin{algorithm}
@@ -395,9 +407,10 @@ the data messages.  The buffers are implemented with a lock-free FIFO
   }
 \end{algorithm}
 
-\paragraph{Computing thread} The computing thread is in charge of the
-real load management.  As exposed in Algorithm~\ref{algo.comp}, it
-iteratively runs the following operations:
+\subsubsection{Computing thread}
+
+The computing thread is in charge of the real load management.  As exposed in
+Algorithm~\ref{algo.comp}, it iteratively runs the following operations:
 \begin{itemize}
 \item if some load was received from the neighbors, get it;
 \item if there is some load to send to the neighbors, send it;
@@ -436,10 +449,11 @@ example, when the current load is near zero).
   }
 \end{algorithm}
 
-\paragraph{Load-balancing thread} The load-balancing thread is in
-charge of running the load-balancing algorithm, and exchange the
-control messages.  As shown in Algorithm~\ref{algo.lb}, it iteratively
-runs the following operations:
+\subsubsection{Load-balancing thread}
+
+The load-balancing thread is in charge of running the load-balancing algorithm,
+and exchange the control messages.  As shown in Algorithm~\ref{algo.lb}, it
+iteratively runs the following operations:
 \begin{itemize}
 \item get the control messages that were received from the neighbors;
 \item run the load-balancing algorithm;
@@ -466,19 +480,7 @@ runs the following operations:
   }
 \end{algorithm}
 
-\paragraph{}
-For the sake of simplicity, a few details were voluntary omitted from
-these descriptions.  For an exhaustive presentation, we refer to the
-actual source code that was used for the experiments%
-\footnote{As mentioned before, our simulator relies on the SimGrid
-  framework~\cite{casanova+legrand+quinson.2008.simgrid}.  For the
-  experiments, we used a pre-release of SimGrid 3.7 (Git commit
-  67d62fca5bdee96f590c942b50021cdde5ce0c07, available from
-  \url{https://gforge.inria.fr/scm/?group_id=12})}, and which is
-available at
-\url{http://info.iut-bm.univ-fcomte.fr/staff/giersch/software/loba.tar.gz}.
-
-\FIXME{ajouter des dÃ©tails sur la gestion de la charge virtuelle ?
+\paragraph{}\FIXME{ajouter des dÃ©tails sur la gestion de la charge virtuelle ?
 par ex, donner l'idÃ©e gÃ©nÃ©rale de l'implÃ©mentation.  l'idÃ©e gÃ©nÃ©rale est dÃ©ja dÃ©crite en section~\ref{Virtual load}}
 
 \subsection{Experimental contexts}
@@ -488,7 +490,7 @@ In order to assess the performances of our algorithms, we ran our
 simulator with various parameters, and extracted several metrics, that
 we will describe in this section.
 
-\paragraph{Load balancing strategies}
+\subsubsection{Load balancing strategies}
 
 Several load balancing strategies were compared.  We ran the experiments with
 the \emph{Best effort}, and with the \emph{Makhoul} strategies.  \emph{Best
@@ -507,7 +509,7 @@ To summarize the different load balancing strategies, we have:
 %
 This gives us as many as $4\times 2\times 2 = 16$ different strategies.
 
-\paragraph{End of the simulation}
+\subsubsection{End of the simulation}
 
 The simulations were run until the load was nearly balanced among the
 participating nodes.  More precisely the simulation stops when each node holds
@@ -520,7 +522,7 @@ real application we would have chosen a decentralized convergence detection
 algorithm, like the one described by Bahi, Contassot-Vivier, Couturier, and
 Vernier in \cite{10.1109/TPDS.2005.2}.
 
-\paragraph{Platforms}
+\subsubsection{Platforms}
 
 In order to show the behavior of the different strategies in different
 settings, we simulated the executions on two sorts of platforms.  These two
@@ -546,7 +548,7 @@ processor speeds were normalized, and we arbitrarily chose to fix them to
 Then we derived each sort of platform with four different number of computing
 nodes: 16, 64, 256, and 1024 nodes.
 
-\paragraph{Configurations}
+\subsubsection{Configurations}
 
 The distributed processes of the application were then logically organized along
 three possible topologies: a line, a torus or an hypercube.  We ran tests where
@@ -589,7 +591,7 @@ time.
 Anyway, all these the experiments represent more than 240 hours of computing
 time.
 
-\paragraph{Metrics}
+\subsubsection{Metrics}
 
 In order to evaluate and compare the different load balancing strategies we had
 to define several metrics.  Our goal, when choosing these metrics, was to have
@@ -630,9 +632,73 @@ With these constraints in mind, we defined the following metrics:
 \end{description}
 
 
-\subsection{Validation of our approaches}
+\subsection{Experimental results}
 \label{Results}
 
+In this section, the results for the different simulations will be presented,
+and we'll try to explain our observations.
+
+\subsubsection{Cluster vs grid platforms}
+
+As mentioned earlier, we simulated the different algorithms on two kinds of
+physical platforms: clusters and grids.  A first observation that we can make,
+is that the graphs we draw from the data have a similar aspect for the two kinds
+of platforms.  The only noticeable difference is that the algorithms need a bit
+more time to achieve the convergence on the grid platforms, than on clusters.
+Nevertheless their relative performances remain generally identical.
+
+This suggests that the relative performances of the different strategies are not
+influenced by the characteristics of the physical platform.  The differences in
+the convergence times can be explained by the fact that on the grid platforms,
+distant sites are interconnected by links of smaller bandwith.
+
+Therefore, in the following, we'll only discuss the results for the grid
+platforms.  The different results are presented on the
+figures~\ref{fig.results1} and~\ref{fig.resultsN}.
+
+\FIXME{explain how to read the graphs}
+ratio 1:1 not given here
+
+\begin{figure*}[p]
+  \centering
+  \includegraphics[width=.5\linewidth]{data/graphs/R1-1:10-grid-line}%
+  \includegraphics[width=.5\linewidth]{data/graphs/R1-10:1-grid-line}
+  \includegraphics[width=.5\linewidth]{data/graphs/R1-1:10-grid-torus}%
+  \includegraphics[width=.5\linewidth]{data/graphs/R1-10:1-grid-torus}
+  \includegraphics[width=.5\linewidth]{data/graphs/R1-1:10-grid-hcube}%
+  \includegraphics[width=.5\linewidth]{data/graphs/R1-10:1-grid-hcube}
+  \caption{Real mode, initially on an only mode, comp/comm ratio = 1/10 (left), or 10/1 (right).}
+  \label{fig.results1}
+\end{figure*}
+
+\begin{figure*}[p]
+  \centering
+  \includegraphics[width=.5\linewidth]{data/graphs/RN-1:10-grid-line}%
+  \includegraphics[width=.5\linewidth]{data/graphs/RN-10:1-grid-line}
+  \includegraphics[width=.5\linewidth]{data/graphs/RN-1:10-grid-torus}%
+  \includegraphics[width=.5\linewidth]{data/graphs/RN-10:1-grid-torus}
+  \includegraphics[width=.5\linewidth]{data/graphs/RN-1:10-grid-hcube}%
+  \includegraphics[width=.5\linewidth]{data/graphs/RN-10:1-grid-hcube}
+  \caption{Real mode, random initial distribution, comp/comm ratio = 1/10 (left), or 10/1 (right).}
+  \label{fig.resultsN}
+\end{figure*}
+
+\subsubsection{Main results}
+
+On fig.~\ref{fig.results1}, \dots
+
+\subsubsection{With the virtual load extension}
+
+\subsubsection{The $k$ parameter}
+
+\subsubsection{With an initial random repartition,  and larger platforms}
+
+\subsubsection{With integer load}
+
+\FIXME{what about the amount of data?}
+
+\begin{itshape}
+\FIXME{remove that part}
 Dans cet ordre:
 ...
 - comparer be/makhoul -> be tient la route
@@ -668,33 +734,31 @@ On constate quoi (vÃ©rifier avec les chiffres)?
 
 \end{itemize}
 
-\begin{itshape}
-On veut montrer quoi ? :
-\FIXME{remove that part}
+% On veut montrer quoi ? :
 
-1) best plus rapide que les autres (simple, makhoul)
-2) avantage virtual load
+% 1) best plus rapide que les autres (simple, makhoul)
+% 2) avantage virtual load
 
-Est ce qu'on peut trouver des contre exemple?
-Topologies variÃ©es
+% Est ce qu'on peut trouver des contre exemple?
+% Topologies variÃ©es
 
 
-Simulation avec temps dÃ©finies assez long et on mesure la qualitÃ© avec : volume de calcul effectuÃ©, volume de donnÃ©es Ã©changÃ©es
-Mais aussi simulation avec temps court qui montre que seul best converge
+% Simulation avec temps dÃ©finies assez long et on mesure la qualitÃ© avec : volume de calcul effectuÃ©, volume de donnÃ©es Ã©changÃ©es
+% Mais aussi simulation avec temps court qui montre que seul best converge
 
-ExpÃ©s avec ratio calcul/comm rapide et lent
+% ExpÃ©s avec ratio calcul/comm rapide et lent
 
-Quelques expÃ©s avec charge initiale alÃ©atoire plutot que sur le premier proc
+% Quelques expÃ©s avec charge initiale alÃ©atoire plutot que sur le premier proc
 
-Cadre processeurs homogÃ¨nes
+% Cadre processeurs homogÃ¨nes
 
-Topologies statiques
+% Topologies statiques
 
-On ne tient pas compte de la vitesse des liens donc on la considÃ¨re homogÃ¨ne
+% On ne tient pas compte de la vitesse des liens donc on la considÃ¨re homogÃ¨ne
 
-Prendre un rÃ©seau hÃ©tÃ©rogÃ¨ne et rendre processeur homogÃ¨ne
+% Prendre un rÃ©seau hÃ©tÃ©rogÃ¨ne et rendre processeur homogÃ¨ne
 
-Taille : 10 100 trÃ¨s gros
+% Taille : 10 100 trÃ¨s gros
 \end{itshape}
 
 \section{Conclusion and perspectives}