From: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Date: Mon, 17 Mar 2014 22:28:18 +0000 (+0100)
Subject: Corrections + reindent source code.
X-Git-Tag: ispa14_submission~33
X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy.git/commitdiff_plain/9a13cef00784821bbdca89ede93342ec0e21938e?ds=sidebyside;hp=-c

Corrections + reindent source code.

Hint: use "git log -p --word-diff" to see the differences.
---

9a13cef00784821bbdca89ede93342ec0e21938e
diff --git a/paper.tex b/paper.tex
index d7ced09..8039f59 100644
--- a/paper.tex
+++ b/paper.tex
@@ -40,16 +40,18 @@
   Complete affiliation, add an email address, etc.}
 
 \begin{abstract}
-The important technique for energy reduction of parallel systems is CPU frequency 
-scaling. This operation used by many researchers to reduce energy consumption in many
-ways. Frequency scaling operation also has big impact on the performance. In some cases, 
-the performance degradation ratio is bigger than energy saving ratio  when the frequency scaled 
-to down level. Therefore, the trade offs between the energy and performance becomes more
-important topic when using this technique. In this paper we developed an algorithm that 
-select the frequency scaling factor for both  energy and performance simultaneously. 
-This algorithm takes into account the communication times when selecting the frequency scaling
-factor. It is works online without training or profiling to have very small overhead.
-The algorithm has better energy-performance trade offs compared to other methods.         
+  The important technique for energy reduction of parallel systems is CPU
+  frequency scaling. This operation is used by many researchers to reduce energy
+  consumption in many ways. Frequency scaling operation also has a big impact on
+  the performances. In some cases, the performance degradation ratio is bigger
+  than energy saving ratio when the frequency is scaled to low level. Therefore,
+  the trade offs between the energy and performance becomes more important topic
+  when using this technique. In this paper we developed an algorithm that select
+  the frequency scaling factor for both energy and performance simultaneously.
+  This algorithm takes into account the communication times when selecting the
+  frequency scaling factor. It works online without training or profiling to
+  have a very small overhead.  The algorithm has better energy-performance trade
+  offs compared to other methods.
 \end{abstract}
 
 \section{Introduction}
@@ -97,13 +99,16 @@ proposed algorithm with Rauber and RÃ¼nger methods~\cite{3}.
 The comparison's results show that our
 algorithm gives better energy-time trade off.
 
-This paper is organized as follows: Section~\ref{sec.relwork} presents the works from other authors. 
-Section~\ref{sec.ptasks} shows the execution of parallel tasks and sources of idle times. Section~\ref{sec.energy} resumes the 
-energy model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance of MPI program.
-Section~\ref{sec.verif} verifies the performance prediction model. Section~\ref{sec.compet} presents 
-the energy-performance trade offs objective function. Section~\ref{sec.optim} demonstrates the proposed 
-energy-performance algorithm. Section~\ref{sec.expe} presents the results of our experiments. 
-Section~\ref{sec.compare} shows the comparison results. Finally, we conclude in Section~\ref{sec.concl}.
+This paper is organized as follows: Section~\ref{sec.relwork} presents the works
+from other authors.  Section~\ref{sec.ptasks} shows the execution of parallel
+tasks and sources of idle times. Section~\ref{sec.energy} resumes the energy
+model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance
+of MPI program.  Section~\ref{sec.verif} verifies the performance prediction
+model. Section~\ref{sec.compet} presents the energy-performance trade offs
+objective function. Section~\ref{sec.optim} demonstrates the proposed
+energy-performance algorithm. Section~\ref{sec.expe} presents the results of our
+experiments.  Section~\ref{sec.compare} shows the comparison results. Finally,
+we conclude in Section~\ref{sec.concl}.
 
 \section{Related Works}
 \label{sec.relwork}
@@ -118,40 +123,43 @@ presented and classified in two parts: offline and online methods.
 
 The DVFS offline methods are static and are not executed during the runtime of
 the program. Some approaches used heuristics to select the best DVFS state
-during the compilation phases as for example in Azevedo et al.~\cite{40}. They use
-dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there are dependency points
-between tasks. While in~\cite{29}, Xie et al. used breadth-first search
-algorithm to do that. Their goal is to save energy with time limits. Another
-approach gathers and stores the runtime information for each DVFS state, then
-selects the suitable DVFS offline to optimize energy-time
-trade offs. As an example Rountree et al.~\cite{8}, use liner programming
+during the compilation phases as for example in Azevedo et al.~\cite{40}. They
+use dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there
+are dependency points between tasks. While in~\cite{29}, Xie et al. used
+breadth-first search algorithm to do that. Their goal is to save energy with
+time limits. Another approach gathers and stores the runtime information for
+each DVFS state, then selects the suitable DVFS offline to optimize energy-time
+trade offs. As an example, Rountree et al.~\cite{8} use liner programming
 algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression
 algorithm for the same goal. The offline study that shows the DVFS impact on the
-communication time of the MPI program is~\cite{17}, where Freeh et al. show that these
-times do not change when the frequency is scaled down.
+communication time of the MPI program is~\cite{17}, where Freeh et al. show that
+these times do not change when the frequency is scaled down.
 
 \subsection{The online DVFS orientations}
 
-The objective of online DVFS orientations works is to dynamically compute and set the frequency of
-the CPU during the runtime of the program for saving energy. Estimating and
-predicting approaches for the energy-time trade offs are developed by Kimura, Peraza, Yu-Liang et al.
-~\cite{11,2,31}. These works select the best DVFS setting depending on the slack
-times. These times happen when the processors have to wait for data from other
-processors to compute their task. For example, during the synchronous
-communications that take place in MPI programs, some processors are
-idle. The optimal DVFS can be selected using learning methods. Therefore, in Dhiman, Hao Shen et al.
-~\cite{39,19} used machine learning to converge to the suitable DVFS
-configuration. Their learning algorithms take big time to converge when the
-number of available frequencies is high. Also, the communication sections of the MPI
-program can be used to save energy. In~\cite{1}, Lim et al. developed an
-algorithm that detects the communication sections and changes the frequency
-during these sections only. This approach changes the frequency many times
-because an iteration may contain more than one communication section. The domain
-of analytical modeling used for choosing the optimal frequency as in Rauber and RÃ¼nger~\cite{3}. they
-developed an analytical mathematical model to determine the
-optimal frequency scaling factor for any number of concurrent tasks. They set the slowest task to maximum frequency for maintaining performance.  In this paper we compare our algorithm with
-Rauber and RÃ¼nger model~\cite{3}, because their model can be used for any number of
-concurrent tasks for homogeneous platforms. The primary contributions of this paper are:
+The objective of the online DVFS orientations is to dynamically compute and set
+the frequency of the CPU for saving energy during the runtime of the
+programs. Estimating and predicting approaches for the energy-time trade offs
+are developed by Kimura, Peraza, Yu-Liang et al.  ~\cite{11,2,31}. These works
+select the best DVFS setting depending on the slack times. These times happen
+when the processors have to wait for data from other processors to compute their
+task. For example, during the synchronous communications that take place in MPI
+programs, some processors are idle. The optimal DVFS can be selected using
+learning methods. Therefore, in Dhiman, Hao Shen et al.  ~\cite{39,19} used
+machine learning to converge to the suitable DVFS configuration. Their learning
+algorithms take big time to converge when the number of available frequencies is
+high. Also, the communication sections of the MPI program can be used to save
+energy. In~\cite{1}, Lim et al. developed an algorithm that detects the
+communication sections and changes the frequency during these sections
+only. This approach changes the frequency many times because an iteration may
+contain more than one communication section. The domain of analytical modeling
+can also be used for choosing the optimal frequency as in Rauber and
+RÃ¼nger~\cite{3}. They developed an analytical mathematical model to determine
+the optimal frequency scaling factor for any number of concurrent tasks. They
+set the slowest task to maximum frequency for maintaining performance.  In this
+paper we compare our algorithm with Rauber and RÃ¼nger model~\cite{3}, because
+their model can be used for any number of concurrent tasks for homogeneous
+platforms. The primary contributions of this paper are:
 \begin{enumerate}
 \item Selecting the frequency scaling factor for simultaneously optimizing energy and performance,
    while taking into account the communication time.
@@ -232,16 +240,18 @@ maximum and the new frequency as in EQ~(\ref{eq:s}).
   \label{eq:s}
  S = \frac{F_\textit{max}}{F_\textit{new}}
 \end{equation}
-The value of the scale $S$ is greater than 1 when changing the frequency to
-any new frequency value~(\emph {P-state}) in governor, the CPU  governor is an interface 
-driver supplied by the operating system kernel  (e.g. Linux) to lowering core's frequency. 
-The scaling factor is equal to 1 when the frequency set is to the maximum frequency.  
-The energy consumption model for parallel homogeneous platform  depends on the scaling factor \emph S. This factor reduces quadratically the dynamic power.  Also, this factor increases the
-static energy linearly because the execution time is increased~\cite{36}.  The
-energy model depending on the frequency scaling factor for homogeneous platform
-for any number of concurrent tasks was developed by Rauber and RÃ¼nger~\cite{3}. This model
-considers the two power metrics for measuring the energy of the parallel tasks as
-in EQ~(\ref{eq:energy}):
+The value of the scale $S$ is greater than 1 when changing the frequency to any
+new frequency value~(\emph {P-state}) in governor, the CPU governor is an
+interface driver supplied by the operating system kernel (e.g. Linux) to
+lowering core's frequency.  The scaling factor is equal to 1 when the frequency
+set is to the maximum frequency.  The energy consumption model for parallel
+homogeneous platform depends on the scaling factor \emph S. This factor reduces
+quadratically the dynamic power.  Also, this factor increases the static energy
+linearly because the execution time is increased~\cite{36}.  The energy model
+depending on the frequency scaling factor for homogeneous platform for any
+number of concurrent tasks was developed by Rauber and RÃ¼nger~\cite{3}. This
+model considers the two power metrics for measuring the energy of the parallel
+tasks as in EQ~(\ref{eq:energy}):
 
 \begin{equation}
   \label{eq:energy}
@@ -265,12 +275,13 @@ the time value $T_i$ depends on the new frequency value as in EQ~(\ref{eq:si}).
       = \frac{F_\textit{max}}{F_\textit{new}} \cdot \frac{T_1}{T_i}
 \end{equation}
 where $F$ is the number of available frequencies. In this paper we depend on
-Rauber and RÃ¼nger energy model EQ~(\ref{eq:energy}) for two reasons: (1)-this model is used
-for homogeneous platform that we work on in this paper. 2-we compare our
-algorithm with Rauber and RÃ¼nger scaling model.  Rauber and RÃ¼nger scaling factor that reduce
- energy consumption derived from the EQ~(\ref{eq:energy}). They take the
-derivation for this equation (to be minimized) and set it to zero to produce the
-scaling factor as in EQ~(\ref{eq:sopt}).
+Rauber and RÃ¼nger energy model EQ~(\ref{eq:energy}) for two reasons: (1) this
+model is used for homogeneous platform that we work on in this paper, and (2) we
+compare our algorithm with Rauber and RÃ¼nger scaling model.  Rauber and RÃ¼nger
+scaling factor that reduce energy consumption derived from the
+EQ~(\ref{eq:energy}). They take the derivation for this equation (to be
+minimized) and set it to zero to produce the scaling factor as in
+EQ~(\ref{eq:sopt}).
 \begin{equation}
   \label{eq:sopt}
   S_\textit{opt} = \sqrt[3]{\frac{2}{n} \cdot \frac{P_\textit{dyn}}{P_\textit{static}} \cdot
@@ -765,9 +776,11 @@ In this paper we develop the simultaneous energy-performance algorithm. It is wo
 
 \AG{Right?}
 Computations have been performed on the supercomputer facilities of the
-MÃ©socentre de calcul de Franche-ComtÃ©. As a PhD student , M. Ahmed Fanfakh , would 
-likes to thank the University of Babylon /Iraq for supporting  my scholarship program that allows  me  
-working on this paper.    
+MÃ©socentre de calcul de Franche-ComtÃ©.
+As a PhD student, M. Ahmed Fanfakh, would like to thank the University of
+Babylon (Iraq) for supporting his scholarship program that allows him to work on
+this paper.
+\AG{What about simply: ``[...] for supporting his work.''}
 
 % trigger a \newpage just before the given reference
 % number - used to balance the columns on the last page