Add citation for SimGrid.

[mpi-energy2.git] / Heter_paper.tex
diff --git a/Heter_paper.tex b/Heter_paper.tex

index 5be2093240fbadc541f75bc58e03a256235a58a0..a569aaab4350ac75c4b4374872edf0804c79b3fb 100644 (file)
--- a/Heter_paper.tex
+++ b/Heter_paper.tex
@@ -298,7 +298,7 @@ In this section we proposed an heterogeneous scaling algorithm, (figure~\ref{HSA
  The algorithm is numerates the suitable range of available scaling factors for each node in the heterogeneous cluster, returns a set of optimal frequency scaling factors for each node. Using heterogeneous cluster is produces different workloads for each node. Therefore, the fastest nodes waiting at the barrier for the slowest nodes to finish there work as in figure (\ref{fig:heter}). Our algorithm takes into account these imbalanced workloads when is starts to search for selecting the best scaling factors. So, the algorithm is selecting the initial frequencies values for each node proportional to the times of computations that gathered from the first iteration. As an example in figure (\ref{fig:st_freq}), the algorithm don't test the first frequencies of the fastest nodes until it converge their frequencies to the frequency of the slowest node. If the algorithm is starts test changing the frequency of the slowest nodes from beginning, we are loosing performance and then not selecting the best tradeoff (the distance). This case will be similar to the homogeneous cluster when all nodes scales their frequencies together from the  beginning. In this case there is a small distance between energy and performance curves, for example see the figure(\ref{fig:r1}).  Then the algorithm searching for optimal frequency  scaling factor from the selected frequencies until the last available ones. 
  \begin{figure}[t]
    \centering
  The algorithm is numerates the suitable range of available scaling factors for each node in the heterogeneous cluster, returns a set of optimal frequency scaling factors for each node. Using heterogeneous cluster is produces different workloads for each node. Therefore, the fastest nodes waiting at the barrier for the slowest nodes to finish there work as in figure (\ref{fig:heter}). Our algorithm takes into account these imbalanced workloads when is starts to search for selecting the best scaling factors. So, the algorithm is selecting the initial frequencies values for each node proportional to the times of computations that gathered from the first iteration. As an example in figure (\ref{fig:st_freq}), the algorithm don't test the first frequencies of the fastest nodes until it converge their frequencies to the frequency of the slowest node. If the algorithm is starts test changing the frequency of the slowest nodes from beginning, we are loosing performance and then not selecting the best tradeoff (the distance). This case will be similar to the homogeneous cluster when all nodes scales their frequencies together from the  beginning. In this case there is a small distance between energy and performance curves, for example see the figure(\ref{fig:r1}).  Then the algorithm searching for optimal frequency  scaling factor from the selected frequencies until the last available ones. 
  \begin{figure}[t]
    \centering
-    \includegraphics[scale=0.5]{fig/start_freq.pdf}
+    \includegraphics[scale=0.5]{fig/start_freq}
    \caption{Selecting the initial frequencies}
    \label{fig:st_freq}
  \end{figure}
    \caption{Selecting the initial frequencies}
    \label{fig:st_freq}
  \end{figure}
@@ -391,9 +391,25 @@ called in the MPI program.
  
  \section{Experimental results}
  \label{sec.expe}
  
  \section{Experimental results}
  \label{sec.expe}
-The experiments  of this work are executed on the simulator Simgrid/SMPI v3.10. We configure the simulator to use a heterogeneous cluster 
-with one core per node. The proposed heterogeneous cluster has four different types of nodes. Each node in cluster has different characteristics 
-such as the maximum frequency speed, the number of available frequencies and dynamic and static powers values, see table (\ref{table:platform}). These different types of processing nodes simulate some real Intel processors. The maximum number of nodes that supported by the cluster is 144 nodes according to  characteristics of some MPI programs of the NAS benchmarks that used. We are use the same number from each type of nodes when running the MPI programs, for example if we execute the program on 8 node, there are 2 nodes from each type participating in the computing. The dynamic and static power values is different from one type to other. Each node has a dynamic and static power values proportional to their performance/GFlops, for more details see the Intel data sheets in \cite{47}.  Each node has a percentage of  80\% for dynamic power and 20\% for static power from the hole power consumption, the same assumption is made in \cite{45,3}. These nodes are connected via an ethernet network with 1 Gbit/s bandwidth.
+
+The experiments of this work are executed on the simulator Simgrid/SMPI
+v3.10~\cite{casanova+giersch+legrand+al.2014.versatile}. We configure the
+simulator to use a heterogeneous cluster with one core per node. The proposed
+heterogeneous cluster has four different types of nodes. Each node in cluster
+has different characteristics such as the maximum frequency speed, the number of
+available frequencies and dynamic and static powers values, see table
+(\ref{table:platform}). These different types of processing nodes simulate some
+real Intel processors. The maximum number of nodes that supported by the cluster
+is 144 nodes according to characteristics of some MPI programs of the NAS
+benchmarks that used. We are use the same number from each type of nodes when
+running the MPI programs, for example if we execute the program on 8 node, there
+are 2 nodes from each type participating in the computing. The dynamic and
+static power values is different from one type to other. Each node has a dynamic
+and static power values proportional to their performance/GFlops, for more
+details see the Intel data sheets in \cite{47}.  Each node has a percentage of
+80\% for dynamic power and 20\% for static power from the hole power
+consumption, the same assumption is made in \cite{45,3}. These nodes are
+connected via an ethernet network with 1 Gbit/s bandwidth.
  \begin{table}[htb]
    \caption{Heterogeneous nodes characteristics}
    % title of Table
  \begin{table}[htb]
    \caption{Heterogeneous nodes characteristics}
    % title of Table
@@ -674,11 +690,11 @@ The results of the previous section are obtained using a percentage of 80\% for
  \begin{figure}
    \centering
    \subfloat[Comparison the average of the results on 8 nodes]{%
  \begin{figure}
    \centering
    \subfloat[Comparison the average of the results on 8 nodes]{%
-    \includegraphics[width=.22\textwidth]{fig/sen_comp.pdf}\label{fig:sen_comp}}%
+    \includegraphics[width=.22\textwidth]{fig/sen_comp}\label{fig:sen_comp}}%
    \quad%
    \subfloat[Comparison the selected frequency scaling factors for 8 nodes]{%
    \quad%
    \subfloat[Comparison the selected frequency scaling factors for 8 nodes]{%
-    \includegraphics[width=.24\textwidth]{fig/three_scenarios.pdf}\label{fig:scales_comp}}
-  \label{fig:avg}
+    \includegraphics[width=.24\textwidth]{fig/three_scenarios}\label{fig:scales_comp}}
+  \label{fig:comp}
    \caption{The comparison of the three power scenarios}
  \end{figure}
  
    \caption{The comparison of the three power scenarios}
  \end{figure}