wip

[loba-papers.git] / supercomp11 / supercomp11.tex
diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex

index 3a1ec31ce717a0b652e07596b528264e97c93ab2..3010b1be6b6830b9cbd4615a113936e5a933fe49 100644 (file)
--- a/supercomp11/supercomp11.tex
+++ b/supercomp11/supercomp11.tex
@@ -468,28 +468,87 @@ available at
  
  In order to assess the performances of our algorithms, we ran our
  simulator with various parameters, and extracted several metrics, that
-we will describe in this section.  Overall, the experiments represent
-more than 240 hours of computing time.
+we will describe in this section.
  
  \paragraph{Load balancing strategies}
  
-We ran the experiments with the \emph{Best effort}, and with the \emph{Makhoul}
-strategies.  \emph{Best effort} was tested with parameter $k = 1$, $k = 2$, and
-$k = 4$.  Secondly, each strategy was run in its two variants: with, and without
-the management of \emph{virtual load}.  Finally, we tested each configuration
-with \emph{real}, and with \emph{integer} load.
-This gives us as many as 32 different strategies.
+Several load balancing strategies were compared.  We ran the experiments with
+the \emph{Best effort}, and with the \emph{Makhoul} strategies.  \emph{Best
+  effort} was tested with parameter $k = 1$, $k = 2$, and $k = 4$.  Secondly,
+each strategy was run in its two variants: with, and without the management of
+\emph{virtual load}.  Finally, we tested each configuration with \emph{real},
+and with \emph{integer} load.
+
+To summarize the different load balancing strategies, we have:
+\begin{description}
+\item[\textbf{strategies:}] \emph{Makhoul}, or \emph{Best effort} with $k\in
+  \{1,2,4\}$
+\item[\textbf{variants:}] with, or without virtual load
+\item[\textbf{domain:}] real load, or integer load
+\end{description}
+%
+This gives us as many as $4\times 2\times 2 = 16$ different strategies.
+
  
  \paragraph{Configurations}
+
+In order to show the behaviour of the different strategies in different
+settings, we simulated the executions on two sorts of platforms.  These two
+sorts of platforms differ by their underlaid network topology.  On the one hand,
+we have homogeneous platforms, modeled as a cluster.  On the other hand, we have
+heterogeneous platforms, modeled as the interconnection of a number of clusters.
+The heterogeneous platform descriptions were created by taking a subset of the
+Grid'5000 infrastructure\footnote{Grid'5000 is a French large scale experimental
+  Grid (see \url{https://www.grid5000.fr/}).}, as described in the platform file
+\texttt{g5k.xml} distributed with SimGrid.  Note that the heterogeneity of the
+platform only comes from the network topology.  The processor speeds, and
+network bandwidths were normalized since our algorithms currently are not aware
+of such heterogeneity.  We arbitrarily chose to fix the processor speed to
+1~GFlop/s, and the network bandwidth to 125~MB/s, with a latency of 50~$\mu$s,
+except for the links between geographically distant sites, where the network
+bandwidth was fixed to 2.25~GB/s, with a latency of 500~$\mu$s.
+
+Then we derived each sort of platform with four different number of computing
+nodes: 16, 64, 256, and 1024 nodes.
+
+The distributed processes of the application were then logically organized along
+three possible topologies: a line, a torus or an hypercube.  We ran tests where
+the total load was initially on an only node (at one end for the line topology),
+and other tests where the load was initially randomly distributed accross all
+the participating nodes.
+
+For each of the preceding configuration, we finally had to cohose the
+computation and communication costs of a load unit.  We chose them, such as to
+have three different computation over communication cost ratios, and hence model
+three different kinds of applications:
+\begin{itemize}
+\item mainly communicating, with a computation/communication cost ratio of $1/10$;
+\item mainly computing, with a computation/communication cost ratio of $10/1$ ;
+\item balanced, with a computation/communication cost ratio of $1/1$.
+\end{itemize}
+
+To summarize the various configurations, we have:
  \begin{description}
-\item[\textbf{platforms}] homogeneous (cluster); heterogeneous (subset
-  of Grid5000)
-\item[\textbf{platform size}] platforms with 16, 64, 256, and 1024 nodes
-\item[\textbf{topologies}] line; torus; hypercube
-\item[\textbf{initial load distribution}] initially on a only node;
-  initially on all nodes
-\item[\textbf{comp/comm ratio}] $10/1$, $1/1$, $1/10$
+\item[\textbf{platforms:}] homogeneous (cluster), or heterogeneous (subset of
+  Grid'5000)
+\item[\textbf{platform sizes:}] platforms with 16, 64, 256, or 1024 nodes
+\item[\textbf{process topologies:}] line, torus, or hypercube
+\item[\textbf{initial load distribution:}] initially on a only node, or
+  initially randomly distributed over all nodes
+\item[\textbf{computation/communication ratio:}] $10/1$, $1/1$, or $1/10$
  \end{description}
+%
+This gives us as many as $2\times 4\times 3\times 2\times 3 = 144$ different
+configurations.
+%
+Combined with the various load balancing strategies, we had $16\times 144 =
+2304$ distinct settings to evaluate.  In fact, as it will be shown later, we
+didn't run all the strategies, nor all the configurations for the bigger
+platforms with 1024 nodes, since to simulations would have run for a too long
+time.
+
+Anyway, all these the experiments represent more than 240 hours of computing
+time.
  
  \paragraph{Metrics}