+In order to assess the performances of our algorithms, we ran our
+simulator with various parameters, and extracted several metrics, that
+we will describe in this section.
+
+\subsubsection{Load balancing strategies}
+
+Several load balancing strategies were compared. We ran the experiments with
+the \emph{Best effort}, and with the \emph{Makhoul} strategies. \emph{Best
+ effort} was tested with parameter $k = 1$, $k = 2$, and $k = 4$. Secondly,
+each strategy was run in its two variants: with, and without the management of
+\emph{virtual load}. Finally, we tested each configuration with \emph{real},
+and with \emph{integer} load.
+
+To summarize the different load balancing strategies, we have:
+\begin{description}
+\item[\textbf{strategies:}] \emph{Makhoul}, or \emph{Best effort} with $k\in
+ \{1,2,4\}$
+\item[\textbf{variants:}] with, or without virtual load
+\item[\textbf{domain:}] real load, or integer load
+\end{description}
+%
+This gives us as many as $4\times 2\times 2 = 16$ different strategies.
+
+\subsubsection{End of the simulation}
+
+The simulations were run until the load was nearly balanced among the
+participating nodes. More precisely the simulation stops when each node holds
+an amount of load at less than 1\% of the load average, during an arbitrary
+number of computing iterations (2000 in our case).
+
+Note that this convergence detection was implemented in a centralized manner.
+This is easy to do within the simulator, but it's obviously not realistic. In a
+real application we would have chosen a decentralized convergence detection
+algorithm, like the one described by Bahi, Contassot-Vivier, Couturier, and
+Vernier in \cite{10.1109/TPDS.2005.2}.
+
+\subsubsection{Platforms}
+
+In order to show the behavior of the different strategies in different
+settings, we simulated the executions on two sorts of platforms. These two
+sorts of platforms differ by their underlaid network topology. On the one hand,
+we have homogeneous platforms, modeled as a cluster. On the other hand, we have
+heterogeneous platforms, modeled as the interconnection of a number of clusters.
+
+The clusters were modeled by a fixed number of computing nodes interconnected
+through a backbone link. Each computing node has a computing power of
+1~GFlop/s, and is connected to the backbone by a network link whose bandwidth is
+of 125~MB/s, with a latency of 50~$\mu$s. The backbone has a network bandwidth
+of 2.25~GB/s, with a latency of 500~$\mu$s.
+
+The heterogeneous platform descriptions were created by taking a subset of the
+Grid'5000 infrastructure\footnote{Grid'5000 is a French large scale experimental
+ Grid (see \url{https://www.grid5000.fr/}).}, as described in the platform file
+\texttt{g5k.xml} distributed with SimGrid. Note that the heterogeneity of the
+platform here only comes from the network topology. Indeed, since our
+algorithms currently do not handle heterogeneous computing resources, the
+processor speeds were normalized, and we arbitrarily chose to fix them to
+1~GFlop/s.
+
+Then we derived each sort of platform with four different number of computing
+nodes: 16, 64, 256, and 1024 nodes.
+
+\subsubsection{Configurations}
+
+The distributed processes of the application were then logically organized along
+three possible topologies: a line, a torus or an hypercube. We ran tests where
+the total load was initially on an only node (at one end for the line topology),
+and other tests where the load was initially randomly distributed across all the
+participating nodes. The total amount of load was fixed to a number of load
+units equal to 1000 times the number of node. The average load is then of 1000
+load units.
+
+For each of the preceding configuration, we finally had to choose the
+computation and communication costs of a load unit. We chose them, such as to
+have three different computation over communication cost ratios, and hence model
+three different kinds of applications:
+\begin{itemize}
+\item mainly communicating, with a computation/communication cost ratio of $1/10$;
+\item mainly computing, with a computation/communication cost ratio of $10/1$ ;
+\item balanced, with a computation/communication cost ratio of $1/1$.
+\end{itemize}
+
+To summarize the various configurations, we have:
+\begin{description}
+\item[\textbf{platforms:}] homogeneous (cluster), or heterogeneous (subset of
+ Grid'5000)
+\item[\textbf{platform sizes:}] platforms with 16, 64, 256, or 1024 nodes
+\item[\textbf{process topologies:}] line, torus, or hypercube
+\item[\textbf{initial load distribution:}] initially on a only node, or
+ initially randomly distributed over all nodes
+\item[\textbf{computation/communication ratio:}] $10/1$, $1/1$, or $1/10$
+\end{description}
+%
+This gives us as many as $2\times 4\times 3\times 2\times 3 = 144$ different
+configurations.
+%
+Combined with the various load balancing strategies, we had $16\times 144 =
+2304$ distinct settings to evaluate. In fact, as it will be shown later, we
+didn't run all the strategies, nor all the configurations for the bigger
+platforms with 1024 nodes, since to simulations would have run for a too long
+time.
+
+Anyway, all these the experiments represent more than 240 hours of computing
+time.
+
+\subsubsection{Metrics}
+
+In order to evaluate and compare the different load balancing strategies we had
+to define several metrics. Our goal, when choosing these metrics, was to have
+something tending to a constant value, i.e. to have a measure which is not
+changing anymore once the convergence state is reached. Moreover, we wanted to
+have some normalized value, in order to be able to compare them across different
+settings.
+
+With these constraints in mind, we defined the following metrics:
+%