X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy.git/blobdiff_plain/9c1a517c1bdb917fc08b852362114c77b7b1636d..fee64fcb60f40dd77da66ab700cae962ede23ad4:/paper.tex diff --git a/paper.tex b/paper.tex index 7e30a9c..a89ffb1 100644 --- a/paper.tex +++ b/paper.tex @@ -118,7 +118,6 @@ we conclude in Section~\ref{sec.concl} with a summary and some future works. \section{Related works} \label{sec.relwork} -\AG{Consider introducing the models (sec.~\ref{sec.exe}) before related works} In this section, some heuristics to compute the scaling factor are presented and classified into two categories: offline and online methods. @@ -133,7 +132,7 @@ values could be computed based on information retrieved by analyzing the code of the program and the computing system that will execute it. In ~\cite{40}, Azevedo et al. detect during compilation the dependency points between -tasks in a parallel program. This information is then used to lower the frequency of +tasks in a multi-task program. This information is then used to lower the frequency of some processors in order to eliminate slack times. A slack time is the period of time during which a processor that have already finished its computation, have to wait for a set of processors to finish their computations and send their results to the waiting processor in order to continue its task that is @@ -156,7 +155,7 @@ To maintain the performance of the parallel program , they set the processor with the biggest load to the highest gear and then compute the scaling factor values for the rest of the processors. Although this model was built for parallel architectures, it can be adapted to distributed architectures by taking into account the communications. The primary contribution of our paper is presenting a new online scaling factor selection method which has the following characteristics : \begin{enumerate} -\item It is based on Rauber's analytical model to predict the energy consumption and the execution time of the application with different frequency gears. +\item It is based on Rauber and Rünger analytical model to predict the energy consumption of the application with different frequency gears. \item It selects the frequency scaling factor for simultaneously optimizing energy reduction and maintaining performance. \item It is well adapted to distributed architectures because it takes into account the communication time. \item It is well adapted to distributed applications with imbalanced tasks. @@ -241,7 +240,7 @@ new frequency value~(\emph {P-state}) in the governor. The CPU governor is an interface driver supplied by the operating system's kernel to lower a core's frequency. This factor reduces quadratically the dynamic power which may cause degradation in performance and thus, the increase of the static energy because the execution time is increased~\cite{36}. If the tasks are sorted according to their execution times before scaling in a descending order, the total energy consumption model for a parallel -homogeneous platform, as presented by Rauber et al.~\cite{3}, can be written as a function of the scaling factor \emph S, as in EQ~(\ref{eq:energy}). +homogeneous platform, as presented by Rauber and Rünger~\cite{3}, can be written as a function of the scaling factor \emph S, as in EQ~(\ref{eq:energy}). \begin{equation} \label{eq:energy} @@ -269,7 +268,6 @@ EQ~(\ref{eq:energy}). The optimal scaling factor is computed by minimizing the d \left( 1 + \sum_{i=2}^{N} \frac{T_i^3}{T_1^3} \right) } \end{equation} -\JC{The following 2 sections can be merged easily} \section{Performance evaluation of MPI programs} \label{sec.mpip} @@ -484,7 +482,7 @@ frequency by the new one see EQ~(\ref{eq:s}). In our cluster there are 18 available frequency states for each processor. This leads to 18 run states for each program. We use seven MPI programs of the NAS parallel benchmarks: CG, MG, EP, FT, BT, LU -and SP. Figure~(\ref{fig:pred}) presents plots of the real execution times and the simulated ones. The maximum normalized error between the predicted execution time and the real time (SimGrid time) for all programs is between 0.0073 to 0.031. The better case is for CG and the worse case is for LU. +and SP. Figure~(\ref{fig:pred}) presents plots of the real execution times and the simulated ones. The maximum normalized error between these two execution times varies between 0.0073 to 0.031 dependent on the executed benchmark. The smallest prediction error was for CG and the worst one was for LU. \subsection{The experimental results for the scaling algorithm } The proposed algorithm was applied to seven MPI programs of the NAS benchmarks (EP, CG, MG, FT, BT, LU and SP) which were run with three classes (A, B and @@ -496,7 +494,7 @@ respectively. Depending on EQ~(\ref{eq:energy}), we measure the energy consumption for all the NAS MPI programs while assuming the power dynamic with the highest frequency is equal to \np[W]{20} and the power static is equal to \np[W]{4} for all experiments. These power values were also -used by Rauber and Rünger in~\cite{3}. The results showed that the algorithm selected +used by Rauber and Rünger in~\cite{3}. The results showed that the algorithm selected different scaling factors for each program depending on the communication features of the program as in the plots~(\ref{fig:nas}). These plots illustrate that there are different distances between the normalized energy and the normalized @@ -708,8 +706,8 @@ In the near future, we would like to adapt this scaling factor selection method \section*{Acknowledgment} -\AG{Jean-Claude, why did you remove the Mésocentre here?} -As a PhD student, M. Ahmed Fanfakh, would like to thank the University of +This work has been supported by the Labex ACTION project (contract ``ANR-11-LABX-01-01'').Computations have been performed on the supercomputer facilities of the +Mésocentre de calcul de Franche-Comté. As a PhD student, M. Ahmed Fanfakh, would like to thank the University of Babylon (Iraq) for supporting his work. % trigger a \newpage just before the given reference