More remarks.

author Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>

Tue, 18 Mar 2014 09:54:33 +0000 (10:54 +0100)

committer Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>

Tue, 18 Mar 2014 09:54:33 +0000 (10:54 +0100)
author Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Tue, 18 Mar 2014 09:54:33 +0000 (10:54 +0100)
committer Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Tue, 18 Mar 2014 09:54:33 +0000 (10:54 +0100)
diff --git a/paper.tex b/paper.tex

index 8039f59d1c098b1df5081f3ee0ec978ce698df2d..43d284fab5ae43a9fae64243006f904165684ba4 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -109,7 +109,7 @@ objective function. Section~\ref{sec.optim} demonstrates the proposed
  energy-performance algorithm. Section~\ref{sec.expe} presents the results of our
  experiments.  Section~\ref{sec.compare} shows the comparison results. Finally,
  we conclude in Section~\ref{sec.concl}.
-
+\AG{There are too many sections!}
  \section{Related Works}
  \label{sec.relwork}
  
@@ -189,7 +189,7 @@ platform. These tasks can exchange the data via synchronous message passing.
  Therefore, the execution time of a task consists of the computation time and the
  communication time. Moreover, the synchronous communications between tasks can
  lead to idle time while tasks wait at the synchronization barrier for other tasks to
-finish their communications (see figure~(\ref{fig:h1})). The imbalanced communications happen when nodes have to send/receive different amount of data or each node is communicates with different number of nodes. Another source for idle times is the imbalanced computations. This happen when processing different
+finish their communications (see figure~(\ref{fig:h1})). The imbalanced communications happen when nodes have to send/receive different amount of data or each node is communicates with different number of nodes. Another source for idle times is the imbalanced computations. This happens when processing different
  amounts of data on each processor  (see figure~(\ref{fig:h2})). In
  this case the fastest tasks have to wait at the synchronization barrier for the
  slowest tasks to finish their job. In both cases the overall execution time
@@ -223,6 +223,7 @@ design dependent parameter and $I_{leak}$ is a technology-dependent
  parameter. Energy consumed by an individual processor $E_{ind}$ is the summation
  of the dynamic and the static power multiplied by the execution time for example
  see~\cite{36,15}.
+\AG{What's an ``execution time for example'' ? Add the correct punctuation.}
  \begin{equation}
    \label{eq:eind}
     E_\textit{ind} = ( P_\textit{dyn} + P_\textit{static} ) \cdot T
@@ -309,11 +310,13 @@ communication process the processors remain idle until the communication has
  finished. For that reason any change in the frequency has no impact on the time
  of communication but it has obvious impact on the time of
  computation~\cite{17}. We have made many tests on a real cluster to prove that the
+\AG{Caution: in general, tests don't \emph{prove} anything}
  frequency scaling factor \emph S has a linear relation with computation time
  only. To predict the execution time of MPI program, the communication time and 
  the computation time for the slower task must be first precisely specified. Secondly, 
  these times are used to predict the execution time for any MPI program as a function of 
  the new scaling factor as in the EQ~(\ref{eq:tnew}).
+\AG{EQ~xx, without ``the''. Change everywhere.}
  \begin{equation}
    \label{eq:tnew}
   \textit  T_\textit{new} = T_\textit{Max Comp Old} \cdot S + T_{\textit{Max Comm Old}}
@@ -324,14 +327,15 @@ communication time consists of the beginning times which an MPI calls for
  sending or receiving till the message is synchronously sent or received. In this
  paper we predict the execution time of the program for any new scaling factor
  value. Depending on this prediction we can produce our energy-performance scaling
-method as we will show in the coming sections. In the next section we make to finishan
+method as we will show in the coming sections. In the next section we make to finishan\AG{finishan?}
  investigation study for the EQ~(\ref{eq:tnew}).
  
  \section{Performance Prediction Verification}
  \label{sec.verif}
  
+\AG{This section presents experimental results. It should be put just before Sec.~\ref{sec.expe}}
  In this section we evaluate the precision of our performance prediction methods
-on the NAS benchmark. We use the EQ~(\ref{eq:tnew}) that predicts the execution
+on the NAS benchmarks. We use the EQ~(\ref{eq:tnew}) that predicts the execution
  time for any scale value. The NAS programs run the class B for comparing the
  real execution time with the predicted execution time. Each program runs offline
  with all available scaling factors on 8 or 9 nodes to produce real execution
@@ -473,13 +477,17 @@ scaling factor for both energy and performance at the same time.
  \end{algorithm}
  The proposed EPSA algorithm works online during the execution time of the MPI
  program. It selects the optimal scaling factor by gathering some information
-from the program after one iteration. This algorithm has small execution time
-(between 0.00152 $ms$ for 4 nodes to 0.00665 $ms$ for 32 nodes). The data
+from the program after one iteration.
+\AG{Which information?}
+ This algorithm has small execution time
+(between 0.00152 $ms$ for 4 nodes to 0.00665 $ms$ for 32 nodes).
+\AG{Algorithmic complexity?}
+ The data
  required by this algorithm is the computation time and the communication time
  for each task from the first iteration only. When these times are measured, the
  MPI program calls the EPSA algorithm to choose the new frequency using the
-optimal scaling factor. Then the program set the new frequency to the
-system. The algorithm is called just one time during the execution of the
+optimal scaling factor. Then the program sets the new frequency to the
+system\AG[]{???}. The algorithm is called just one time during the execution of the
  program. The DVFS algorithm~(\ref{dvfs}) shows where and when the EPSA algorithm is called
  in the MPI program.
  %\begin{minipage}{\textwidth}
@@ -492,7 +500,7 @@ in the MPI program.
   \For {$J:=1$ to $Some-Iterations \; $}
    \State -Computations Section.
     \State -Communications Section.
-   \If {$(J==1)$} 
+   \If {$(J=1)$} 
       \State -Gather all times of computation and\par\hspace{13 pt} communication from each node.
       \State -Call EPSA with these times.
       \State -Calculate the new frequency from optimal scale.
@@ -502,7 +510,7 @@ in the MPI program.
  \end{algorithmic}
  \end{algorithm}
  
-After obtaining the optimal scale factor from the EPSA algorithm. The program
+After obtaining the optimal scale factor from the EPSA algorithm.\AG[]{comma} The program
  calculates the new frequency $F_i$ for each task proportionally to its time
  value $T_i$. By substitution of the EQ~(\ref{eq:s}) in the EQ~(\ref{eq:si}), we
  can calculate the new frequency $F_i$ as follows:
@@ -528,7 +536,9 @@ respectively. Our experiments are executed on the simulator SimGrid/SMPI
  v3.10. We design a platform file that simulates a cluster with one core per
  node. This cluster is a homogeneous architecture with distributed memory. The
  detailed characteristics of our platform file are shown in the
-table~(\ref{table:platform}). Each node in the cluster has 18 frequency values
+table~(\ref{table:platform}).
+\AG{Are those characteristics realistic?}
+ Each node in the cluster has 18 frequency values
  from 2.5 GHz to 800 MHz with 100 MHz difference between each two successive
  frequencies.
  \begin{table}[htb]
@@ -545,8 +555,10 @@ frequencies.
    \label{table:platform}
  \end{table}
  Depending on the EQ~(\ref{eq:energy}), we measure the energy consumption for all
-the NAS MPI programs while assuming the power dynamic is equal to 20W and the
-power static is equal to 4W for all experiments. We run the proposed EPSA
+the NAS MPI programs while assuming the power dynamic is equal to \np[W]{20} and
+the power static is equal to \np[W]{4} for all experiments.
+\AG{How did you choose those values (available frequencies, power consumption)?}
+ We run the proposed EPSA
  algorithm for all these programs. The results showed that the algorithm selected
  different scaling factors for each program depending on the communication
  features of the program as in the figure~(\ref{fig:nas}). This figure shows that
@@ -554,9 +566,9 @@ there are different distances between the normalized energy and the normalized
  inversed performance curves, because there are different communication features
  for each MPI program.  When there are little or not communications, the inversed
  performance curve is very close to the energy curve. Then the distance between
-the two curves is very small. This lead to small energy savings. The opposite
+the two curves is very small. This leads to small energy savings. The opposite
  happens when there are a lot of communication, theto finish distance between the two
-curves is big.  This lead to more energy savings (e.g. CG and FT), see
+curves is big.  This leads to more energy savings (e.g. CG and FT), see
  table~(\ref{table:factors results}). All discovered frequency scaling factors
  optimize both the energy and the performance simultaneously for all the NAS
  programs. In table~(\ref{table:factors results}), we record all optimal scaling
@@ -606,15 +618,15 @@ EPSA to selects smaller scaling factor values (inducing smaller energy savings).
  \label{sec.compare}
  
  In this section, we compare our EPSA algorithm results with Rauber and Rünger
-methods~\cite{3}. He had two scenarios, the first is to reduce energy to optimal
-level without considering the performance as in EQ~(\ref{eq:sopt}). We refer to
-this scenario as $R_{E}$. The second scenario is similar to the first
+methods~\cite{3}. They had two scenarios, the first is to reduce energy to
+optimal level without considering the performance as in EQ~(\ref{eq:sopt}). We
+refer to this scenario as $R_{E}$. The second scenario is similar to the first
  except setting the slower task to the maximum frequency (when the scale $S=1$)
  to keep the performance from degradation as mush as possible. We refer to this
  scenario as $R_{E-P}$. The comparison is made in tables~(\ref{table:compare
    Class A},\ref{table:compare Class B},\ref{table:compare Class C}). These
-tables show the results of our EPSA and Rauber and Rünger scenarios for all the NAS
-benchmarks programs for classes A,B and C.
+tables show the results of our EPSA and Rauber and Rünger scenarios for all the
+NAS benchmarks programs for classes A,B and C.
  \begin{table}[p]
    \caption{Comparing Results for  The NAS Class A}
    % title of Table
@@ -771,6 +783,7 @@ than the first.
  \section{Conclusion}
  \label{sec.concl}
  In this paper we develop the simultaneous energy-performance algorithm. It is works based on the trade off relation between the energy and performance. The results showed that when the scaling factor is big value leads to more energy saving. Also, it show that when the the scaling factor is small value leads to the fact that the scaling factor has bigger impact on performance than energy. Then the algorithm optimize the energy saving and performance in the same time to have positive trade off. The optimal trade off refer to maximum distance between the energy and the inversed performance curves. Also, the results explained when setting the slowest task to maximum frequency usually not have a big improvement on performance. 
+\AG{Needs to be better written.  Add some future works.}
  
  \section*{Acknowledgment}
author	Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
	Tue, 18 Mar 2014 09:54:33 +0000 (10:54 +0100)
committer	Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
	Tue, 18 Mar 2014 09:54:33 +0000 (10:54 +0100)