X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/loba-papers.git/blobdiff_plain/c8b7a837b49a20dcbce655c5c891aea3f25542f5..f13866736c2d1dda2bc227e4a05626ea5535e7d6:/supercomp11/supercomp11.tex diff --git a/supercomp11/supercomp11.tex b/supercomp11/supercomp11.tex index 49477d7..1faf1b0 100644 --- a/supercomp11/supercomp11.tex +++ b/supercomp11/supercomp11.tex @@ -29,12 +29,12 @@ } \institute{R. Couturier \and A. Giersch \at - LIFC, University of Franche-Comté, Belfort, France \\ + FEMTO-ST, University of Franche-Comté, Belfort, France \\ % Tel.: +123-45-678910\\ % Fax: +123-45-678910\\ \email{% - raphael.couturier@univ-fcomte.fr, - arnaud.giersch@univ-fcomte.fr} + raphael.couturier@femto-st.fr, + arnaud.giersch@femto-st.fr} } \maketitle @@ -489,8 +489,19 @@ To summarize the different load balancing strategies, we have: % This gives us as many as $4\times 2\times 2 = 16$ different strategies. +\paragraph{End of the simulation} -\paragraph{Configurations} +The simulations were run until the load was nearly balanced among the +participating nodes. More precisely the simulation stops when each node holds +an amount of load at less than 1\% of the load average, during an arbitrary +number of computing iterations (2000 in our case). + +Note that this convergence detection was implemented in a centralized manner. +This is easy to do within the simulator, but it's obviously not realistic. In a +real application we would have chosen a decentralized convergence detection +algorithm, like the one described in \cite{10.1109/TPDS.2005.2}. + +\paragraph{Platforms} In order to show the behavior of the different strategies in different settings, we simulated the executions on two sorts of platforms. These two @@ -511,11 +522,15 @@ bandwidth was fixed to 2.25~GB/s, with a latency of 500~$\mu$s. Then we derived each sort of platform with four different number of computing nodes: 16, 64, 256, and 1024 nodes. +\paragraph{Configurations} + The distributed processes of the application were then logically organized along three possible topologies: a line, a torus or an hypercube. We ran tests where the total load was initially on an only node (at one end for the line topology), -and other tests where the load was initially randomly distributed across all -the participating nodes. +and other tests where the load was initially randomly distributed across all the +participating nodes. The total amount of load was fixed to a number of load +units equal to 1000 times the number of node. The average load is then of 1000 +load units. For each of the preceding configuration, we finally had to choose the computation and communication costs of a load unit. We chose them, such as to @@ -552,13 +567,45 @@ time. \paragraph{Metrics} +In order to evaluate and compare the different load balancing strategies we had +to define several metrics. Our goal, when choosing these metrics, was to have +something tending to a constant value, i.e. to have a measure which is not +changing anymore once the convergence state is reached. Moreover, we wanted to +have some normalized value, in order to be able to compare them across different +settings. + +With these constraints in mind, we defined the following metrics: +% \begin{description} -\item[\textbf{average idle time}] -\item[\textbf{average convergence date}] -\item[\textbf{maximum convergence date}] -\item[\textbf{data transfer amount}] relative to the total data amount +\item[\textbf{average idle time:}] that's the total time spent, when the nodes + don't hold any share of load, and thus have nothing to compute. This total + time is divided by the number of participating nodes, such as to have a number + that can be compared between simulations of different sizes. + + This metric is expected to give an idea of the ability of the strategy to + diffuse the load quickly. A smaller value is better. + +\item[\textbf{average convergence date:}] that's the average of the dates when + all nodes reached the convergence state. The dates are measured as a number + of (simulated) seconds since the beginning of the simulation. + +\item[\textbf{maximum convergence date:}] that's the date when the last node + reached the convergence state. + + These two dates give an idea of the time needed by the strategy to reach the + equilibrium state. A smaller value is better. + +\item[\textbf{data transfer amount:}] that's the sum of the amount of all data + transfers during the simulation. This sum is then normalized by dividing it + by the total amount of data present in the system. + + This metric is expected to give an idea of the efficiency of the strategy in + terms of data movements, i.e. its ability to reach the equilibrium with fewer + transfers. Again, a smaller value is better. + \end{description} + \subsection{Validation of our approaches} \label{Results}