From: Arnaud Giersch Date: Mon, 17 Mar 2014 22:28:18 +0000 (+0100) Subject: Corrections + reindent source code. X-Git-Tag: ispa14_submission~33 X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy.git/commitdiff_plain/9a13cef00784821bbdca89ede93342ec0e21938e Corrections + reindent source code. Hint: use "git log -p --word-diff" to see the differences. --- diff --git a/paper.tex b/paper.tex index d7ced09..8039f59 100644 --- a/paper.tex +++ b/paper.tex @@ -40,16 +40,18 @@ Complete affiliation, add an email address, etc.} \begin{abstract} -The important technique for energy reduction of parallel systems is CPU frequency -scaling. This operation used by many researchers to reduce energy consumption in many -ways. Frequency scaling operation also has big impact on the performance. In some cases, -the performance degradation ratio is bigger than energy saving ratio when the frequency scaled -to down level. Therefore, the trade offs between the energy and performance becomes more -important topic when using this technique. In this paper we developed an algorithm that -select the frequency scaling factor for both energy and performance simultaneously. -This algorithm takes into account the communication times when selecting the frequency scaling -factor. It is works online without training or profiling to have very small overhead. -The algorithm has better energy-performance trade offs compared to other methods. + The important technique for energy reduction of parallel systems is CPU + frequency scaling. This operation is used by many researchers to reduce energy + consumption in many ways. Frequency scaling operation also has a big impact on + the performances. In some cases, the performance degradation ratio is bigger + than energy saving ratio when the frequency is scaled to low level. Therefore, + the trade offs between the energy and performance becomes more important topic + when using this technique. In this paper we developed an algorithm that select + the frequency scaling factor for both energy and performance simultaneously. + This algorithm takes into account the communication times when selecting the + frequency scaling factor. It works online without training or profiling to + have a very small overhead. The algorithm has better energy-performance trade + offs compared to other methods. \end{abstract} \section{Introduction} @@ -97,13 +99,16 @@ proposed algorithm with Rauber and Rünger methods~\cite{3}. The comparison's results show that our algorithm gives better energy-time trade off. -This paper is organized as follows: Section~\ref{sec.relwork} presents the works from other authors. -Section~\ref{sec.ptasks} shows the execution of parallel tasks and sources of idle times. Section~\ref{sec.energy} resumes the -energy model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance of MPI program. -Section~\ref{sec.verif} verifies the performance prediction model. Section~\ref{sec.compet} presents -the energy-performance trade offs objective function. Section~\ref{sec.optim} demonstrates the proposed -energy-performance algorithm. Section~\ref{sec.expe} presents the results of our experiments. -Section~\ref{sec.compare} shows the comparison results. Finally, we conclude in Section~\ref{sec.concl}. +This paper is organized as follows: Section~\ref{sec.relwork} presents the works +from other authors. Section~\ref{sec.ptasks} shows the execution of parallel +tasks and sources of idle times. Section~\ref{sec.energy} resumes the energy +model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance +of MPI program. Section~\ref{sec.verif} verifies the performance prediction +model. Section~\ref{sec.compet} presents the energy-performance trade offs +objective function. Section~\ref{sec.optim} demonstrates the proposed +energy-performance algorithm. Section~\ref{sec.expe} presents the results of our +experiments. Section~\ref{sec.compare} shows the comparison results. Finally, +we conclude in Section~\ref{sec.concl}. \section{Related Works} \label{sec.relwork} @@ -118,40 +123,43 @@ presented and classified in two parts: offline and online methods. The DVFS offline methods are static and are not executed during the runtime of the program. Some approaches used heuristics to select the best DVFS state -during the compilation phases as for example in Azevedo et al.~\cite{40}. They use -dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there are dependency points -between tasks. While in~\cite{29}, Xie et al. used breadth-first search -algorithm to do that. Their goal is to save energy with time limits. Another -approach gathers and stores the runtime information for each DVFS state, then -selects the suitable DVFS offline to optimize energy-time -trade offs. As an example Rountree et al.~\cite{8}, use liner programming +during the compilation phases as for example in Azevedo et al.~\cite{40}. They +use dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there +are dependency points between tasks. While in~\cite{29}, Xie et al. used +breadth-first search algorithm to do that. Their goal is to save energy with +time limits. Another approach gathers and stores the runtime information for +each DVFS state, then selects the suitable DVFS offline to optimize energy-time +trade offs. As an example, Rountree et al.~\cite{8} use liner programming algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression algorithm for the same goal. The offline study that shows the DVFS impact on the -communication time of the MPI program is~\cite{17}, where Freeh et al. show that these -times do not change when the frequency is scaled down. +communication time of the MPI program is~\cite{17}, where Freeh et al. show that +these times do not change when the frequency is scaled down. \subsection{The online DVFS orientations} -The objective of online DVFS orientations works is to dynamically compute and set the frequency of -the CPU during the runtime of the program for saving energy. Estimating and -predicting approaches for the energy-time trade offs are developed by Kimura, Peraza, Yu-Liang et al. -~\cite{11,2,31}. These works select the best DVFS setting depending on the slack -times. These times happen when the processors have to wait for data from other -processors to compute their task. For example, during the synchronous -communications that take place in MPI programs, some processors are -idle. The optimal DVFS can be selected using learning methods. Therefore, in Dhiman, Hao Shen et al. -~\cite{39,19} used machine learning to converge to the suitable DVFS -configuration. Their learning algorithms take big time to converge when the -number of available frequencies is high. Also, the communication sections of the MPI -program can be used to save energy. In~\cite{1}, Lim et al. developed an -algorithm that detects the communication sections and changes the frequency -during these sections only. This approach changes the frequency many times -because an iteration may contain more than one communication section. The domain -of analytical modeling used for choosing the optimal frequency as in Rauber and Rünger~\cite{3}. they -developed an analytical mathematical model to determine the -optimal frequency scaling factor for any number of concurrent tasks. They set the slowest task to maximum frequency for maintaining performance. In this paper we compare our algorithm with -Rauber and Rünger model~\cite{3}, because their model can be used for any number of -concurrent tasks for homogeneous platforms. The primary contributions of this paper are: +The objective of the online DVFS orientations is to dynamically compute and set +the frequency of the CPU for saving energy during the runtime of the +programs. Estimating and predicting approaches for the energy-time trade offs +are developed by Kimura, Peraza, Yu-Liang et al. ~\cite{11,2,31}. These works +select the best DVFS setting depending on the slack times. These times happen +when the processors have to wait for data from other processors to compute their +task. For example, during the synchronous communications that take place in MPI +programs, some processors are idle. The optimal DVFS can be selected using +learning methods. Therefore, in Dhiman, Hao Shen et al. ~\cite{39,19} used +machine learning to converge to the suitable DVFS configuration. Their learning +algorithms take big time to converge when the number of available frequencies is +high. Also, the communication sections of the MPI program can be used to save +energy. In~\cite{1}, Lim et al. developed an algorithm that detects the +communication sections and changes the frequency during these sections +only. This approach changes the frequency many times because an iteration may +contain more than one communication section. The domain of analytical modeling +can also be used for choosing the optimal frequency as in Rauber and +Rünger~\cite{3}. They developed an analytical mathematical model to determine +the optimal frequency scaling factor for any number of concurrent tasks. They +set the slowest task to maximum frequency for maintaining performance. In this +paper we compare our algorithm with Rauber and Rünger model~\cite{3}, because +their model can be used for any number of concurrent tasks for homogeneous +platforms. The primary contributions of this paper are: \begin{enumerate} \item Selecting the frequency scaling factor for simultaneously optimizing energy and performance, while taking into account the communication time. @@ -232,16 +240,18 @@ maximum and the new frequency as in EQ~(\ref{eq:s}). \label{eq:s} S = \frac{F_\textit{max}}{F_\textit{new}} \end{equation} -The value of the scale $S$ is greater than 1 when changing the frequency to -any new frequency value~(\emph {P-state}) in governor, the CPU governor is an interface -driver supplied by the operating system kernel (e.g. Linux) to lowering core's frequency. -The scaling factor is equal to 1 when the frequency set is to the maximum frequency. -The energy consumption model for parallel homogeneous platform depends on the scaling factor \emph S. This factor reduces quadratically the dynamic power. Also, this factor increases the -static energy linearly because the execution time is increased~\cite{36}. The -energy model depending on the frequency scaling factor for homogeneous platform -for any number of concurrent tasks was developed by Rauber and Rünger~\cite{3}. This model -considers the two power metrics for measuring the energy of the parallel tasks as -in EQ~(\ref{eq:energy}): +The value of the scale $S$ is greater than 1 when changing the frequency to any +new frequency value~(\emph {P-state}) in governor, the CPU governor is an +interface driver supplied by the operating system kernel (e.g. Linux) to +lowering core's frequency. The scaling factor is equal to 1 when the frequency +set is to the maximum frequency. The energy consumption model for parallel +homogeneous platform depends on the scaling factor \emph S. This factor reduces +quadratically the dynamic power. Also, this factor increases the static energy +linearly because the execution time is increased~\cite{36}. The energy model +depending on the frequency scaling factor for homogeneous platform for any +number of concurrent tasks was developed by Rauber and Rünger~\cite{3}. This +model considers the two power metrics for measuring the energy of the parallel +tasks as in EQ~(\ref{eq:energy}): \begin{equation} \label{eq:energy} @@ -265,12 +275,13 @@ the time value $T_i$ depends on the new frequency value as in EQ~(\ref{eq:si}). = \frac{F_\textit{max}}{F_\textit{new}} \cdot \frac{T_1}{T_i} \end{equation} where $F$ is the number of available frequencies. In this paper we depend on -Rauber and Rünger energy model EQ~(\ref{eq:energy}) for two reasons: (1)-this model is used -for homogeneous platform that we work on in this paper. 2-we compare our -algorithm with Rauber and Rünger scaling model. Rauber and Rünger scaling factor that reduce - energy consumption derived from the EQ~(\ref{eq:energy}). They take the -derivation for this equation (to be minimized) and set it to zero to produce the -scaling factor as in EQ~(\ref{eq:sopt}). +Rauber and Rünger energy model EQ~(\ref{eq:energy}) for two reasons: (1) this +model is used for homogeneous platform that we work on in this paper, and (2) we +compare our algorithm with Rauber and Rünger scaling model. Rauber and Rünger +scaling factor that reduce energy consumption derived from the +EQ~(\ref{eq:energy}). They take the derivation for this equation (to be +minimized) and set it to zero to produce the scaling factor as in +EQ~(\ref{eq:sopt}). \begin{equation} \label{eq:sopt} S_\textit{opt} = \sqrt[3]{\frac{2}{n} \cdot \frac{P_\textit{dyn}}{P_\textit{static}} \cdot @@ -765,9 +776,11 @@ In this paper we develop the simultaneous energy-performance algorithm. It is wo \AG{Right?} Computations have been performed on the supercomputer facilities of the -Mésocentre de calcul de Franche-Comté. As a PhD student , M. Ahmed Fanfakh , would -likes to thank the University of Babylon /Iraq for supporting my scholarship program that allows me -working on this paper. +Mésocentre de calcul de Franche-Comté. +As a PhD student, M. Ahmed Fanfakh, would like to thank the University of +Babylon (Iraq) for supporting his scholarship program that allows him to work on +this paper. +\AG{What about simply: ``[...] for supporting his work.''} % trigger a \newpage just before the given reference % number - used to balance the columns on the last page