From: afanfakh <afanfakh@fanfakh.afanfakh>
Date: Mon, 17 Mar 2014 14:56:43 +0000 (+0100)
Subject: Merge branch 'master' of ssh://info.iut-bm.univ-fcomte.fr/mpi-energy
X-Git-Tag: ispa14_submission~34
X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy.git/commitdiff_plain/b2e13cc196d2e0b917988a193d737cdcb4d983ee?ds=inline;hp=-c

Merge branch 'master' of ssh://info.iut-bm.univ-fcomte.fr/mpi-energy

Conflicts:
	paper.tex
---

b2e13cc196d2e0b917988a193d737cdcb4d983ee
diff --combined paper.tex
index 8d86a9d,89f43e9..d7ced09
--- a/paper.tex
+++ b/paper.tex
@@@ -9,11 -9,10 +9,10 @@@
  \usepackage{listings}
  \usepackage{colortbl}
  \usepackage{amsmath}
- % \usepackage{sectsty}
- % \usepackage{titlesec}
- % \usepackage{secdot}
- %\usepackage[font={footnotesize,bt}]{caption}
- %\usepackage[font=scriptsize,labelfont=bf]{caption}
+ 
+ \usepackage[autolanguage,np]{numprint}
+ \renewcommand*\npunitcommand[1]{\text{#1}}
+ 
  \usepackage{xspace}
  \usepackage[textsize=footnotesize]{todonotes}
  \newcommand{\AG}[2][inline]{\todo[color=green!50,#1]{\sffamily\textbf{AG:} #2}\xspace}
@@@ -94,13 -93,13 +93,13 @@@ this algorithm to seven MPI benchmarks
  benchmarks (NPB v3.3) developed by NASA~\cite{44}. Our experiments are executed
  using the simulator SimGrid/SMPI v3.10~\cite{Casanova:2008:SGF:1397760.1398183}
  over an homogeneous distributed memory architecture. Furthermore, we compare the
- proposed algorithm with Rauber and R\"{u}nger methods~\cite{3}.
+ proposed algorithm with Rauber and RÃ¼nger methods~\cite{3}.
  The comparison's results show that our
  algorithm gives better energy-time trade off.
  
  This paper is organized as follows: Section~\ref{sec.relwork} presents the works from other authors. 
  Section~\ref{sec.ptasks} shows the execution of parallel tasks and sources of idle times. Section~\ref{sec.energy} resumes the 
- energy model of homogenous platform. Section~\ref{sec.mpip} evaluates the performance of MPI program. 
+ energy model of homogeneous platform. Section~\ref{sec.mpip} evaluates the performance of MPI program.
  Section~\ref{sec.verif} verifies the performance prediction model. Section~\ref{sec.compet} presents 
  the energy-performance trade offs objective function. Section~\ref{sec.optim} demonstrates the proposed 
  energy-performance algorithm. Section~\ref{sec.expe} presents the results of our experiments. 
@@@ -148,10 -147,10 +147,10 @@@ program can be used to save energy. In~
  algorithm that detects the communication sections and changes the frequency
  during these sections only. This approach changes the frequency many times
  because an iteration may contain more than one communication section. The domain
- of analytical modeling used for choosing the optimal frequency as inRauber and R\"{u}nger~\cite{3}. they
+ of analytical modeling used for choosing the optimal frequency as in Rauber and RÃ¼nger~\cite{3}. they
  developed an analytical mathematical model to determine the
  optimal frequency scaling factor for any number of concurrent tasks. They set the slowest task to maximum frequency for maintaining performance.  In this paper we compare our algorithm with
- Rauber and R\"{u}nger model~\cite{3}, because their model can be used for any number of
+ Rauber and RÃ¼nger model~\cite{3}, because their model can be used for any number of
  concurrent tasks for homogeneous platforms. The primary contributions of this paper are:
  \begin{enumerate}
  \item Selecting the frequency scaling factor for simultaneously optimizing energy and performance,
@@@ -200,7 -199,7 +199,7 @@@ The energy consumption by the processo
  dynamic and the static power. This general power formulation is used by many
  researchers~\cite{9,3,15,26}. The dynamic power of the CMOS processors
  $P_{dyn}$ is related to the switching activity $\alpha$, load capacitance $C_L$,
- the supply voltage $V$ and operational frequency $f$ respectively as follow :
+ the supply voltage $V$ and operational frequency $f$ respectively as follow:
  \begin{equation}
    \label{eq:pd}
    P_\textit{dyn} = \alpha \cdot C_L \cdot V^2 \cdot f
@@@ -240,7 -239,7 +239,7 @@@ The scaling factor is equal to 1 when t
  The energy consumption model for parallel homogeneous platform  depends on the scaling factor \emph S. This factor reduces quadratically the dynamic power.  Also, this factor increases the
  static energy linearly because the execution time is increased~\cite{36}.  The
  energy model depending on the frequency scaling factor for homogeneous platform
- for any number of concurrent tasks was developed by Rauber and R\"{u}nger~\cite{3}. This model
+ for any number of concurrent tasks was developed by Rauber and RÃ¼nger~\cite{3}. This model
  considers the two power metrics for measuring the energy of the parallel tasks as
  in EQ~(\ref{eq:energy}):
  
@@@ -266,9 -265,9 +265,9 @@@ the time value $T_i$ depends on the ne
        = \frac{F_\textit{max}}{F_\textit{new}} \cdot \frac{T_1}{T_i}
  \end{equation}
  where $F$ is the number of available frequencies. In this paper we depend on
- Rauber and R\"{u}nger energy model EQ~(\ref{eq:energy}) for two reasons: (1)-this model is used
+ Rauber and RÃ¼nger energy model EQ~(\ref{eq:energy}) for two reasons: (1)-this model is used
  for homogeneous platform that we work on in this paper. 2-we compare our
- algorithm with Rauber and R\"{u}nger scaling model.  Rauber and R\"{u}nger scaling factor that reduce
+ algorithm with Rauber and RÃ¼nger scaling model.  Rauber and RÃ¼nger scaling factor that reduce
   energy consumption derived from the EQ~(\ref{eq:energy}). They take the
  derivation for this equation (to be minimized) and set it to zero to produce the
  scaling factor as in EQ~(\ref{eq:sopt}).
@@@ -314,7 -313,7 +313,7 @@@ communication time consists of the begi
  sending or receiving till the message is synchronously sent or received. In this
  paper we predict the execution time of the program for any new scaling factor
  value. Depending on this prediction we can produce our energy-performance scaling
 -method as we will show in the coming sections. In the next section we make an
 +method as we will show in the coming sections. In the next section we make to finishan
  investigation study for the EQ~(\ref{eq:tnew}).
  
  \section{Performance Prediction Verification}
@@@ -342,7 -341,7 +341,7 @@@ In our cluster there are 18 available f
  2.5 GHz to 800 MHz, there is 100 MHz difference between two successive
  frequencies. For more details on the characteristics of the platform refer to
  table~(\ref{table:platform}). This lead to 18 run states for each program. We
- use seven MPI programs of the NAS parallel benchmarks : CG, MG, EP, FT, BT, LU
+ use seven MPI programs of the NAS parallel benchmarks: CG, MG, EP, FT, BT, LU
  and SP. The average normalized errors between the predicted execution time and
  the real time (SimGrid time) for all programs is between 0.0032 to 0.0133. AS an
  example, we are present the execution times of the NAS benchmarks as in the
@@@ -360,7 -359,7 +359,7 @@@ it is linear see~\cite{17}. The relatio
  is not straightforward. Moreover, they are not measured using the same metric.
  For solving this problem, we normalize the energy by calculating the ratio
  between the consumed energy with scaled frequency and the consumed energy
- without scaled frequency :
+ without scaled frequency:
  \begin{multline}
    \label{eq:enorm}
    E_\textit{Norm} = \frac{ E_\textit{Reduced}}{E_\textit{Original}} \\
@@@ -370,7 -369,7 +369,7 @@@
                P_\textit{dyn} \cdot \left(T_1+\sum_{i=2}^{N}\frac{T_i^3}{T_1^2}\right) +
                P_\textit{static} \cdot T_1 \cdot N }
  \end{multline}
- By the same way we can normalize the performance as follows :
+ By the same way we can normalize the performance as follows:
  \begin{equation}
    \label{eq:pnorm}
    P_\textit{Norm} = \frac{T_\textit{New}}{T_\textit{Old}}
@@@ -391,7 -390,7 +390,7 @@@ paper we are present a method to find t
  optimize both energy and performance simultaneously without adding big
  overheads.  Our solution for this problem is to make the optimization process
  have the same direction. Therefore, we inverse the equation of normalize
- performance as follows :
+ performance as follows:
  \begin{equation}
    \label{eq:pnorm_en}
    P^{-1}_\textit{Norm} = \frac{ T_\textit{Old}}{ T_\textit{New}}
@@@ -424,7 -423,7 +423,7 @@@ EQ~(\ref{eq:max}).  Our objective funct
  static power values stored in a data file. Moreover, this function works in
  optimal way when the energy function has a convex form with frequency scaling
  factor as shown in~\cite{15,3,19}. Energy measurement model is not the
- objective of this paper and we choose Rauber and R\"{u}nger model as an example with two
+ objective of this paper and we choose Rauber and RÃ¼nger model as an example with two
  reasons that mentioned before.
  
  \section{Optimal Scaling Factor for Performance and Energy}
@@@ -495,7 -494,7 +494,7 @@@ in the MPI program
  After obtaining the optimal scale factor from the EPSA algorithm. The program
  calculates the new frequency $F_i$ for each task proportionally to its time
  value $T_i$. By substitution of the EQ~(\ref{eq:s}) in the EQ~(\ref{eq:si}), we
- can calculate the new frequency $F_i$ as follows :
+ can calculate the new frequency $F_i$ as follows:
  \begin{equation}
    \label{eq:fi}
    F_i = \frac{F_\textit{max} \cdot T_i}{S_\textit{optimal} \cdot T_\textit{max}}
@@@ -525,11 -524,11 +524,11 @@@ frequencies
    \caption{Platform File Parameters}
    % title of Table
    \centering
-   \begin{tabular}{ | l | l | l |l | l |l |l |  p{2cm} |}
+   \begin{tabular}{|*{7}{l|}}
      \hline
      Max & Min & Backbone & Backbone&Link &Link& Sharing  \\
      Freq. & Freq. & Bandwidth & Latency & Bandwidth& Latency&Policy  \\ \hline
-     2.5 &800 & 2.25 GBps &$5\times 10^{-7} s$& 1 GBps & $5\times 10^{-5} s$ &Full  \\
+     \np{2.5} & \np{800} & \np[GBps]{2.25} &\np[$\mu$s]{0.5}& \np[GBps]{1} & \np[$\mu$s]{50} &Full  \\
      GHz& MHz&  & & &  &Duplex  \\\hline
    \end{tabular}
    \label{table:platform}
@@@ -545,7 -544,7 +544,7 @@@ inversed performance curves, because th
  for each MPI program.  When there are little or not communications, the inversed
  performance curve is very close to the energy curve. Then the distance between
  the two curves is very small. This lead to small energy savings. The opposite
 -happens when there are a lot of communication, the distance between the two
 +happens when there are a lot of communication, theto finish distance between the two
  curves is big.  This lead to more energy savings (e.g. CG and FT), see
  table~(\ref{table:factors results}). All discovered frequency scaling factors
  optimize both the energy and the performance simultaneously for all the NAS
@@@ -565,10 -564,13 +564,10 @@@ same time over all available scales
    \label{fig:nas}
  \end{figure*}
  \begin{table}[htb]
 -  \caption{Optimal Scaling Factors Results}
 +  \caption{The EPSA Scaling Factors Results}
    % title of Table
    \centering
-   \begin{tabular}{ | l | l | l |l | r |}
 -  \AG{Use the same number of decimals for all numbers in a column,
 -    and vertically align the numbers along the decimal points.
 -    The same for all the following tables.}
+   \begin{tabular}{|l|*{4}{r|}}
      \hline
      Program & Optimal & Energy  & Performance&Energy-Perf.\\
      Name & Scaling Factor& Saving \%&Degradation \% &Distance  \\ \hline
@@@ -595,7 -597,7 +594,7 @@@ EPSA to selects smaller scaling factor 
  \section{Comparing Results}
  \label{sec.compare}
  
- In this section, we compare our EPSA algorithm results with Rauber and R\"{u}nger
+ In this section, we compare our EPSA algorithm results with Rauber and RÃ¼nger
  methods~\cite{3}. He had two scenarios, the first is to reduce energy to optimal
  level without considering the performance as in EQ~(\ref{eq:sopt}). We refer to
  this scenario as $R_{E}$. The second scenario is similar to the first
@@@ -603,13 -605,13 +602,13 @@@ except setting the slower task to the m
  to keep the performance from degradation as mush as possible. We refer to this
  scenario as $R_{E-P}$. The comparison is made in tables~(\ref{table:compare
    Class A},\ref{table:compare Class B},\ref{table:compare Class C}). These
- tables show the results of our EPSA and Rauber and R\"{u}nger scenarios for all the NAS
+ tables show the results of our EPSA and Rauber and RÃ¼nger scenarios for all the NAS
  benchmarks programs for classes A,B and C.
  \begin{table}[p]
    \caption{Comparing Results for  The NAS Class A}
    % title of Table
    \centering
-   \begin{tabular}{ | l | l | l |l | l | r|  }
+   \begin{tabular}{|l|l|*{4}{r|}}
      \hline
      Method&Program&Factor& Energy& Performance &Energy-Perf.\\
      Name &Name&Value& Saving \%&Degradation \% &Distance
@@@ -650,7 -652,7 +649,7 @@@
    \caption{Comparing Results for The NAS Class B}
    % title of Table
    \centering
-   \begin{tabular}{ | l | l | l |l | l |r|  }
+   \begin{tabular}{|l|l|*{4}{r|}}
      \hline
      Method&Program&Factor& Energy& Performance &Energy-Perf.\\
      Name &Name&Value& Saving \%&Degradation \% &Distance
@@@ -692,7 -694,7 +691,7 @@@
    \caption{Comparing Results for The NAS Class C}
    % title of Table
    \centering
-   \begin{tabular}{ | l | l | l |l | l |r|  }
+   \begin{tabular}{|l|l|*{4}{r|}}
      \hline
      Method&Program&Factor& Energy& Performance &Energy-Perf.\\
      Name &Name&Value& Saving \%&Degradation \% &Distance
@@@ -733,7 -735,7 +732,7 @@@ As shown in these tables our scaling fa
  such as Rauber's scaling factor EQ~(\ref{eq:sopt}), but it is optimal for both
  the energy and the performance simultaneously. Our $EPSA$ optimal scaling factors
  has better simultaneous optimization for both the energy and the performance
- compared to Rauber and R\"{u}nger energy-performance method ($R_{E-P}$). Also, in
+ compared to Rauber and RÃ¼nger energy-performance method ($R_{E-P}$). Also, in
  ($R_{E-P}$) method when setting the frequency to maximum value for the
  slower task lead to a small improvement of the performance. Also the results
  show that this method keep or improve energy saving. Because of the energy
@@@ -743,7 -745,7 +742,7 @@@ increased
  Figure~(\ref{fig:compare}) shows the maximum distance between the energy saving
  percent and the performance degradation percent. Therefore, this means it is the
  same resultant of our objective function EQ~(\ref{eq:max}). Our algorithm always
- gives positive energy to performance trade offs while Rauber and R\"{u}nger method
+ gives positive energy to performance trade offs while Rauber and RÃ¼nger method
  ($R_{E-P}$) gives in some time negative trade offs such as in BT and
  EP. The positive trade offs with highest values lead to maximum energy savings
  concatenating with less performance degradation and this the objective of this
@@@ -755,20 -757,20 +754,20 @@@ than the first
    \includegraphics[width=.33\textwidth]{compare_class_A.pdf}
    \includegraphics[width=.33\textwidth]{compare_class_B.pdf}
    \includegraphics[width=.33\textwidth]{compare_class_c.pdf}
-   \caption{Comparing Our EPSA with Rauber and R\"{u}nger Methods}
+   \caption{Comparing Our EPSA with Rauber and RÃ¼nger Methods}
    \label{fig:compare}
  \end{figure}
 -
  \section{Conclusion}
  \label{sec.concl}
 -
 -\AG{the conclusion needs to be written\dots{} one day}
 +In this paper we develop the simultaneous energy-performance algorithm. It is works based on the trade off relation between the energy and performance. The results showed that when the scaling factor is big value leads to more energy saving. Also, it show that when the the scaling factor is small value leads to the fact that the scaling factor has bigger impact on performance than energy. Then the algorithm optimize the energy saving and performance in the same time to have positive trade off. The optimal trade off refer to maximum distance between the energy and the inversed performance curves. Also, the results explained when setting the slowest task to maximum frequency usually not have a big improvement on performance. 
  
  \section*{Acknowledgment}
  
  \AG{Right?}
  Computations have been performed on the supercomputer facilities of the
 -MÃ©socentre de calcul de Franche-ComtÃ©.
 +MÃ©socentre de calcul de Franche-ComtÃ©. As a PhD student , M. Ahmed Fanfakh , would 
 +likes to thank the University of Babylon /Iraq for supporting  my scholarship program that allows  me  
 +working on this paper.    
  
  % trigger a \newpage just before the given reference
  % number - used to balance the columns on the last page
@@@ -787,5 -789,5 +786,5 @@@
  %%% ispell-local-dictionary: "american"
  %%% End:
  
- %  LocalWords:  Badri Charr FIXME Tianhe DVFS HPC NAS NPB SMPI Rauber's Rauber
- %  LocalWords:  CMOS EQ $$EPSA$$ Franche ComtÃ© Tflop
+ %  LocalWords:  Fanfakh Charr FIXME Tianhe DVFS HPC NAS NPB SMPI Rauber's Rauber
+ %  LocalWords:  CMOS EQ EPSA Franche ComtÃ© Tflop RÃ¼nger