-\AG{Consider introducing the models (sec.~\ref{sec.ptasks},
- maybe~\ref{sec.energy}) before related works}
-
-In the this section some heuristics to compute the scaling factor are
-presented and classified in two parts: offline and online methods.
-
-\subsection{The offline DVFS orientations}
-
-The DVFS offline methods are static and are not executed during the runtime of
-the program. Some approaches used heuristics to select the best DVFS state
-during the compilation phases as for example in Azevedo et al.~\cite{40}. They
-use dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there
-are dependency points between tasks. While in~\cite{29}, Xie et al. used
-breadth-first search algorithm to do that. Their goal is to save energy with
-time limits. Another approach gathers and stores the runtime information for
-each DVFS state, then selects the suitable DVFS offline to optimize energy-time
-trade offs. As an example, Rountree et al.~\cite{8} use liner programming
-algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression
-algorithm for the same goal. The offline study that shows the DVFS impact on the
-communication time of the MPI program is~\cite{17}, where Freeh et al. show that
-these times do not change when the frequency is scaled down.
-
-\subsection{The online DVFS orientations}
-
-The objective of the online DVFS orientations is to dynamically compute and set
-the frequency of the CPU for saving energy during the runtime of the
-programs. Estimating and predicting approaches for the energy-time trade offs
-are developed by Kimura, Peraza, Yu-Liang et al. ~\cite{11,2,31}. These works
-select the best DVFS setting depending on the slack times. These times happen
-when the processors have to wait for data from other processors to compute their
-task. For example, during the synchronous communications that take place in MPI
-programs, some processors are idle. The optimal DVFS can be selected using
-learning methods. Therefore, in Dhiman, Hao Shen et al. ~\cite{39,19} used
-machine learning to converge to the suitable DVFS configuration. Their learning
-algorithms take big time to converge when the number of available frequencies is
-high. Also, the communication sections of the MPI program can be used to save
-energy. In~\cite{1}, Lim et al. developed an algorithm that detects the
-communication sections and changes the frequency during these sections
-only. This approach changes the frequency many times because an iteration may
-contain more than one communication section. The domain of analytical modeling
-can also be used for choosing the optimal frequency as in Rauber and
-Rünger~\cite{3}. They developed an analytical mathematical model to determine
-the optimal frequency scaling factor for any number of concurrent tasks. They
-set the slowest task to maximum frequency for maintaining performance. In this
-paper we compare our algorithm with Rauber and Rünger model~\cite{3}, because
-their model can be used for any number of concurrent tasks for homogeneous
-platforms. The primary contributions of this paper are:
+
+In this section, some heuristics to compute the scaling factor are presented and
+classified into two categories: offline and online methods.
+
+\subsection{Offline scaling factor selection methods}
+
+The offline scaling factor selection methods are executed before the runtime of
+the program. They return static scaling factor values to the processors
+participating in the execution of the parallel program. On one hand, the
+scaling factor values could be computed based on information retrieved by
+analyzing the code of the program and the computing system that will execute it.
+In~\cite{40}, Azevedo et al. detect during compilation the dependency points
+between tasks in a multi-task program. This information is then used to lower
+the frequency of some processors in order to eliminate slack times. A slack
+time is the period of time during which a processor that have already finished
+its computation, have to wait for a set of processors to finish their
+computations and send their results to the waiting processor in order to
+continue its task that is dependent on the results of computations being
+executed on other processors. Freeh et al. showed in~\cite{17} that the
+communication times of MPI programs do not change when the frequency is scaled
+down. On the other hand, some offline scaling factor selection methods use the
+information gathered from previous full or partial executions of the program. A
+part or the whole program is usually executed over all the available frequency
+gears and the the execution time and the energy consumed with each frequency
+gear are measured. Then an heuristic or an exact method uses the retrieved
+information to compute the values of the scaling factor for the processors.
+In~\cite{29}, Xie et al. use an exact exponential breadth-first search algorithm
+to compute the scaling factor values that give the optimal energy reduction
+while respecting a deadline for a sequential program. They also present a
+linear heuristic that approximates the optimal solution. In~\cite{8} , Rountree
+et al. use a linear programming algorithm, while in~\cite{38,34}, Cochran et
+al. use multi logistic regression algorithm for the same goal. The main
+drawback for these methods is that they all require executing a part or the
+whole program on all frequency gears for each new instance of the same program.
+
+\subsection{Online scaling factor selection methods}
+
+The online scaling factor selection methods are executed during the runtime of
+the program. They are usually integrated into iterative programs where the same
+block of instructions is executed many times. During the first few iterations,
+many informations are measured such as the execution time, the energy consumed
+using a multimeter, the slack times, \dots{} Then a method will exploit these
+measurements to compute the scaling factor values for each processor. This
+operation, measurements and computing new scaling factor, can be repeated as
+much as needed if the iterations are not regular. Kimura, Peraza, Yu-Liang et
+al.~\cite{11,2,31} used varied heuristics to select the appropriate scaling
+factor values to eliminate the slack times during runtime. However, as seen
+in~\cite{39,19}, machine learning methods can take a lot of time to converge
+when the number of available gears is big. To reduce the impact of slack times,
+in~\cite{1}, Lim et al. developed an algorithm that detects the communication
+sections and changes the frequency during these sections only. This approach
+might change the frequency of each processor many times per iteration if an
+iteration contains more than one communication section. In~\cite{3}, Rauber and
+Rünger used an analytical model that can predict the consumed energy and the
+execution time for every frequency gear after measuring the consumed energy and
+the execution time with the highest frequency gear. These predictions may be
+used to choose the optimal gear for each processor executing the parallel
+program to reduce energy consumption. To maintain the performance of the
+parallel program , they set the processor with the biggest load to the highest
+gear and then compute the scaling factor values for the rest of the processors.
+Although this model was built for parallel architectures, it can be adapted to
+distributed architectures by taking into account the communications. The
+primary contribution of our paper is presenting a new online scaling factor
+selection method which has the following characteristics: