-\AG{Consider introducing the models (sec.~\ref{sec.ptasks},
- maybe~\ref{sec.energy}) before related works}
-
-In the this section some heuristics to compute the scaling factor are
-presented and classified in two parts: offline and online methods.
-
-\subsection{The offline DVFS orientations}
-
-The DVFS offline methods are static and are not executed during the runtime of
-the program. Some approaches used heuristics to select the best DVFS state
-during the compilation phases as for example in Azevedo et al.~\cite{40}. They use
-dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there are dependency points
-between tasks. While in~\cite{29}, Xie et al. used breadth-first search
-algorithm to do that. Their goal is to save energy with time limits. Another
-approach gathers and stores the runtime information for each DVFS state, then
-selects the suitable DVFS offline to optimize energy-time
-trade offs. As an example Rountree et al.~\cite{8}, use liner programming
-algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression
-algorithm for the same goal. The offline study that shows the DVFS impact on the
-communication time of the MPI program is~\cite{17}, where Freeh et al. show that these
-times do not change when the frequency is scaled down.
-
-\subsection{The online DVFS orientations}
-
-The objective of online DVFS orientations works is to dynamically compute and set the frequency of
-the CPU during the runtime of the program for saving energy. Estimating and
-predicting approaches for the energy-time trade offs are developed by Kimura, Peraza, Yu-Liang et al.
-~\cite{11,2,31}. These works select the best DVFS setting depending on the slack
-times. These times happen when the processors have to wait for data from other
-processors to compute their task. For example, during the synchronous
-communications that take place in MPI programs, some processors are
-idle. The optimal DVFS can be selected using learning methods. Therefore, in Dhiman, Hao Shen et al.
-~\cite{39,19} used machine learning to converge to the suitable DVFS
-configuration. Their learning algorithms take big time to converge when the
-number of available frequencies is high. Also, the communication sections of the MPI
-program can be used to save energy. In~\cite{1}, Lim et al. developed an
-algorithm that detects the communication sections and changes the frequency
-during these sections only. This approach changes the frequency many times
-because an iteration may contain more than one communication section. The domain
-of analytical modeling used for choosing the optimal frequency as inRauber and R\"{u}nger~\cite{3}. they
-developed an analytical mathematical model to determine the
-optimal frequency scaling factor for any number of concurrent tasks. They set the slowest task to maximum frequency for maintaining performance. In this paper we compare our algorithm with
-Rauber and R\"{u}nger model~\cite{3}, because their model can be used for any number of
-concurrent tasks for homogeneous platforms. The primary contributions of this paper are:
+\AG{Consider introducing the models sec.~\ref{sec.exe} maybe before related works}
+
+In this section, some heuristics to compute the scaling factor are
+presented and classified into two categories: offline and online methods.
+
+\subsection{Offline scaling factor selection methods}
+
+The offline scaling factor selection methods are executed before the runtime of
+the program. They return static scaling factor values to the processors
+participating in the execution of the parallel program. On one hand, the scaling
+factor
+values could be computed based on information retrieved by analyzing the code of
+the program and the computing system that will execute it. In ~\cite{40},
+Azevedo et
+al. detect during compilation the dependency points between
+tasks in a parallel program. This information is then used to lower the frequency of
+some processors in order to eliminate slack times. A slack time is the period of time during which a processor that have already finished its computation, have to wait
+for a set of processors to finish their computations and send their results to the
+waiting processor in order to continue its task that is
+dependent on the results of computations being executed on other processors.
+Freeh et al. showed in ~\cite{17} that the
+communication times of MPI programs do not change when the frequency is scaled down.
+On the other hand, some offline scaling factor selection methods use the
+information gathered from previous full or
+partial executions of the program. A part or the whole program is usually executed over all the available frequency gears and the the execution time and the energy consumed with each frequency gear are measured. Then an heuristic or an exact method uses the retrieved information to compute the values of the scaling factor for the processors.
+ In~\cite{29}, Xie et al. use an exact exponential breadth-first search algorithm to compute the scaling factor values that give the optimal energy reduction while respecting a deadline for a sequential program. They also present a linear heuristic that approximates the optimal solution. In~\cite{8} , Rountree et al. use a linear programming
+algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression algorithm for the same goal.
+The main drawback for these methods is that they all require executing a part or the whole program on all frequency gears for each new instance of the same program.
+
+\subsection{Online scaling factor selection methods}
+The online scaling factor selection methods are executed during the runtime of the program. They are usually integrated into iterative programs where the same block of instructions is executed many times. During the first few iterations, many informations are measured such as the execution time, the energy consumed using a multimeter, the slack times, ... Then a method will exploit these measurements to compute the scaling factor values for each processor. This operation, measurements and computing new scaling factor, can be repeated as much as needed if the iterations are not regular. Kimura, Peraza, Yu-Liang et al. ~\cite{11,2,31} used learning methods to select the appropriate scaling factor values to eliminate the slack times during runtime. However, as seen in ~\cite{39,19}, machine learning methods can take a lot of time to converge when the number of available gears is big. To reduce the impact of slack times, in~\cite{1}, Lim et al. developed an algorithm that detects the
+communication sections and changes the frequency during these sections
+only. This approach might change the frequency of each processor many times per iteration if an iteration
+contains more than one communication section. In ~\cite{3}, Rauber and Rünger used an analytical model that after measuring the energy consumed and the execution time with the highest frequency gear, it can predict the energy consumed and the execution time for every frequency gear . These predictions may be used to choose the optimal gear for each processor executing the parallel program to reduce energy consumption.
+To maintain the performance of the parallel program , they
+set the processor with the biggest load to the highest gear and then compute the scaling factor values for the rest of the processors. Although this model was built for parallel architectures, it can be adapted to distributed architectures by taking into account the communications.
+The primary contribution of this paper is presenting a new online scaling factor selection method which has the following characteristics :