Complete affiliation, add an email address, etc.}
\begin{abstract}
-The important technique for energy reduction of parallel systems is CPU frequency
-scaling. This operation used by many researchers to reduce energy consumption in many
-ways. Frequency scaling operation also has big impact on the performance. In some cases,
-the performance degradation ratio is bigger than energy saving ratio when the frequency scaled
-to down level. Therefore, the trade offs between the energy and performance becomes more
-important topic when using this technique. In this paper we developed an algorithm that
-select the frequency scaling factor for both energy and performance simultaneously.
-This algorithm takes into account the communication times when selecting the frequency scaling
-factor. It is works online without training or profiling to have very small overhead.
-The algorithm has better energy-performance trade offs compared to other methods.
+ The important technique for energy reduction of parallel systems is CPU
+ frequency scaling. This operation is used by many researchers to reduce energy
+ consumption in many ways. Frequency scaling operation also has a big impact on
+ the performances. In some cases, the performance degradation ratio is bigger
+ than energy saving ratio when the frequency is scaled to low level. Therefore,
+ the trade offs between the energy and performance becomes more important topic
+ when using this technique. In this paper we developed an algorithm that select
+ the frequency scaling factor for both energy and performance simultaneously.
+ This algorithm takes into account the communication times when selecting the
+ frequency scaling factor. It works online without training or profiling to
+ have a very small overhead. The algorithm has better energy-performance trade
+ offs compared to other methods.
\end{abstract}
\section{Introduction}
The comparison's results show that our
algorithm gives better energy-time trade off.
-This paper is organized as follows: Section~\ref{sec.relwork} presents the works from other authors.
-Section~\ref{sec.ptasks} shows the execution of parallel tasks and sources of idle times. Section~\ref{sec.energy} resumes the
-energy model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance of MPI program.
-Section~\ref{sec.verif} verifies the performance prediction model. Section~\ref{sec.compet} presents
-the energy-performance trade offs objective function. Section~\ref{sec.optim} demonstrates the proposed
-energy-performance algorithm. Section~\ref{sec.expe} presents the results of our experiments.
-Section~\ref{sec.compare} shows the comparison results. Finally, we conclude in Section~\ref{sec.concl}.
+This paper is organized as follows: Section~\ref{sec.relwork} presents the works
+from other authors. Section~\ref{sec.ptasks} shows the execution of parallel
+tasks and sources of idle times. Section~\ref{sec.energy} resumes the energy
+model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance
+of MPI program. Section~\ref{sec.verif} verifies the performance prediction
+model. Section~\ref{sec.compet} presents the energy-performance trade offs
+objective function. Section~\ref{sec.optim} demonstrates the proposed
+energy-performance algorithm. Section~\ref{sec.expe} presents the results of our
+experiments. Section~\ref{sec.compare} shows the comparison results. Finally,
+we conclude in Section~\ref{sec.concl}.
\section{Related Works}
\label{sec.relwork}
The DVFS offline methods are static and are not executed during the runtime of
the program. Some approaches used heuristics to select the best DVFS state
-during the compilation phases as for example in Azevedo et al.~\cite{40}. They use
-dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there are dependency points
-between tasks. While in~\cite{29}, Xie et al. used breadth-first search
-algorithm to do that. Their goal is to save energy with time limits. Another
-approach gathers and stores the runtime information for each DVFS state, then
-selects the suitable DVFS offline to optimize energy-time
-trade offs. As an example Rountree et al.~\cite{8}, use liner programming
+during the compilation phases as for example in Azevedo et al.~\cite{40}. They
+use dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there
+are dependency points between tasks. While in~\cite{29}, Xie et al. used
+breadth-first search algorithm to do that. Their goal is to save energy with
+time limits. Another approach gathers and stores the runtime information for
+each DVFS state, then selects the suitable DVFS offline to optimize energy-time
+trade offs. As an example, Rountree et al.~\cite{8} use liner programming
algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression
algorithm for the same goal. The offline study that shows the DVFS impact on the
-communication time of the MPI program is~\cite{17}, where Freeh et al. show that these
-times do not change when the frequency is scaled down.
+communication time of the MPI program is~\cite{17}, where Freeh et al. show that
+these times do not change when the frequency is scaled down.
\subsection{The online DVFS orientations}
-The objective of online DVFS orientations works is to dynamically compute and set the frequency of
-the CPU during the runtime of the program for saving energy. Estimating and
-predicting approaches for the energy-time trade offs are developed by Kimura, Peraza, Yu-Liang et al.
-~\cite{11,2,31}. These works select the best DVFS setting depending on the slack
-times. These times happen when the processors have to wait for data from other
-processors to compute their task. For example, during the synchronous
-communications that take place in MPI programs, some processors are
-idle. The optimal DVFS can be selected using learning methods. Therefore, in Dhiman, Hao Shen et al.
-~\cite{39,19} used machine learning to converge to the suitable DVFS
-configuration. Their learning algorithms take big time to converge when the
-number of available frequencies is high. Also, the communication sections of the MPI
-program can be used to save energy. In~\cite{1}, Lim et al. developed an
-algorithm that detects the communication sections and changes the frequency
-during these sections only. This approach changes the frequency many times
-because an iteration may contain more than one communication section. The domain
-of analytical modeling used for choosing the optimal frequency as in Rauber and Rünger~\cite{3}. they
-developed an analytical mathematical model to determine the
-optimal frequency scaling factor for any number of concurrent tasks. They set the slowest task to maximum frequency for maintaining performance. In this paper we compare our algorithm with
-Rauber and Rünger model~\cite{3}, because their model can be used for any number of
-concurrent tasks for homogeneous platforms. The primary contributions of this paper are:
+The objective of the online DVFS orientations is to dynamically compute and set
+the frequency of the CPU for saving energy during the runtime of the
+programs. Estimating and predicting approaches for the energy-time trade offs
+are developed by Kimura, Peraza, Yu-Liang et al. ~\cite{11,2,31}. These works
+select the best DVFS setting depending on the slack times. These times happen
+when the processors have to wait for data from other processors to compute their
+task. For example, during the synchronous communications that take place in MPI
+programs, some processors are idle. The optimal DVFS can be selected using
+learning methods. Therefore, in Dhiman, Hao Shen et al. ~\cite{39,19} used
+machine learning to converge to the suitable DVFS configuration. Their learning
+algorithms take big time to converge when the number of available frequencies is
+high. Also, the communication sections of the MPI program can be used to save
+energy. In~\cite{1}, Lim et al. developed an algorithm that detects the
+communication sections and changes the frequency during these sections
+only. This approach changes the frequency many times because an iteration may
+contain more than one communication section. The domain of analytical modeling
+can also be used for choosing the optimal frequency as in Rauber and
+Rünger~\cite{3}. They developed an analytical mathematical model to determine
+the optimal frequency scaling factor for any number of concurrent tasks. They
+set the slowest task to maximum frequency for maintaining performance. In this
+paper we compare our algorithm with Rauber and Rünger model~\cite{3}, because
+their model can be used for any number of concurrent tasks for homogeneous
+platforms. The primary contributions of this paper are:
\begin{enumerate}
\item Selecting the frequency scaling factor for simultaneously optimizing energy and performance,
while taking into account the communication time.
\label{eq:s}
S = \frac{F_\textit{max}}{F_\textit{new}}
\end{equation}
-The value of the scale $S$ is greater than 1 when changing the frequency to
-any new frequency value~(\emph {P-state}) in governor, the CPU governor is an interface
-driver supplied by the operating system kernel (e.g. Linux) to lowering core's frequency.
-The scaling factor is equal to 1 when the frequency set is to the maximum frequency.
-The energy consumption model for parallel homogeneous platform depends on the scaling factor \emph S. This factor reduces quadratically the dynamic power. Also, this factor increases the
-static energy linearly because the execution time is increased~\cite{36}. The
-energy model depending on the frequency scaling factor for homogeneous platform
-for any number of concurrent tasks was developed by Rauber and Rünger~\cite{3}. This model
-considers the two power metrics for measuring the energy of the parallel tasks as
-in EQ~(\ref{eq:energy}):
+The value of the scale $S$ is greater than 1 when changing the frequency to any
+new frequency value~(\emph {P-state}) in governor, the CPU governor is an
+interface driver supplied by the operating system kernel (e.g. Linux) to
+lowering core's frequency. The scaling factor is equal to 1 when the frequency
+set is to the maximum frequency. The energy consumption model for parallel
+homogeneous platform depends on the scaling factor \emph S. This factor reduces
+quadratically the dynamic power. Also, this factor increases the static energy
+linearly because the execution time is increased~\cite{36}. The energy model
+depending on the frequency scaling factor for homogeneous platform for any
+number of concurrent tasks was developed by Rauber and Rünger~\cite{3}. This
+model considers the two power metrics for measuring the energy of the parallel
+tasks as in EQ~(\ref{eq:energy}):
\begin{equation}
\label{eq:energy}
= \frac{F_\textit{max}}{F_\textit{new}} \cdot \frac{T_1}{T_i}
\end{equation}
where $F$ is the number of available frequencies. In this paper we depend on
-Rauber and Rünger energy model EQ~(\ref{eq:energy}) for two reasons: (1)-this model is used
-for homogeneous platform that we work on in this paper. 2-we compare our
-algorithm with Rauber and Rünger scaling model. Rauber and Rünger scaling factor that reduce
- energy consumption derived from the EQ~(\ref{eq:energy}). They take the
-derivation for this equation (to be minimized) and set it to zero to produce the
-scaling factor as in EQ~(\ref{eq:sopt}).
+Rauber and Rünger energy model EQ~(\ref{eq:energy}) for two reasons: (1) this
+model is used for homogeneous platform that we work on in this paper, and (2) we
+compare our algorithm with Rauber and Rünger scaling model. Rauber and Rünger
+scaling factor that reduce energy consumption derived from the
+EQ~(\ref{eq:energy}). They take the derivation for this equation (to be
+minimized) and set it to zero to produce the scaling factor as in
+EQ~(\ref{eq:sopt}).
\begin{equation}
\label{eq:sopt}
S_\textit{opt} = \sqrt[3]{\frac{2}{n} \cdot \frac{P_\textit{dyn}}{P_\textit{static}} \cdot
\AG{Right?}
Computations have been performed on the supercomputer facilities of the
-Mésocentre de calcul de Franche-Comté. As a PhD student , M. Ahmed Fanfakh , would
-likes to thank the University of Babylon /Iraq for supporting my scholarship program that allows me
-working on this paper.
+Mésocentre de calcul de Franche-Comté.
+As a PhD student, M. Ahmed Fanfakh, would like to thank the University of
+Babylon (Iraq) for supporting his scholarship program that allows him to work on
+this paper.
+\AG{What about simply: ``[...] for supporting his work.''}
% trigger a \newpage just before the given reference
% number - used to balance the columns on the last page