-\AG{Consider introducing the models (sec.~\ref{sec.ptasks},
- maybe~\ref{sec.energy}) before related works}
-
-In the this section some heuristics to compute the scaling factor are
-presented and classified in two parts: offline and online methods.
-
-\subsection{The offline DVFS orientations}
-
-The DVFS offline methods are static and are not executed during the runtime of
-the program. Some approaches used heuristics to select the best DVFS state
-during the compilation phases as for example in Azevedo et al.~\cite{40}. They use
-dynamic voltage scaling (DVS) algorithm to choose the DVS setting when there are dependency points
-between tasks. While in~\cite{29}, Xie et al. used breadth-first search
-algorithm to do that. Their goal is to save energy with time limits. Another
-approach gathers and stores the runtime information for each DVFS state, then
-selects the suitable DVFS offline to optimize energy-time
-trade offs. As an example Rountree et al.~\cite{8}, use liner programming
-algorithm, while in~\cite{38,34}, Cochran et al. use multi logistic regression
-algorithm for the same goal. The offline study that shows the DVFS impact on the
-communication time of the MPI program is~\cite{17}, where Freeh et al. show that these
-times do not change when the frequency is scaled down.
-
-\subsection{The online DVFS orientations}
-
-The objective of online DVFS orientations works is to dynamically compute and set the frequency of
-the CPU during the runtime of the program for saving energy. Estimating and
-predicting approaches for the energy-time trade offs are developed by Kimura, Peraza, Yu-Liang et al.
-~\cite{11,2,31}. These works select the best DVFS setting depending on the slack
-times. These times happen when the processors have to wait for data from other
-processors to compute their task. For example, during the synchronous
-communications that take place in MPI programs, some processors are
-idle. The optimal DVFS can be selected using learning methods. Therefore, in Dhiman, Hao Shen et al.
-~\cite{39,19} used machine learning to converge to the suitable DVFS
-configuration. Their learning algorithms take big time to converge when the
-number of available frequencies is high. Also, the communication sections of the MPI
-program can be used to save energy. In~\cite{1}, Lim et al. developed an
-algorithm that detects the communication sections and changes the frequency
-during these sections only. This approach changes the frequency many times
-because an iteration may contain more than one communication section. The domain
-of analytical modeling used for choosing the optimal frequency as in Rauber and Rünger~\cite{3}. they
-developed an analytical mathematical model to determine the
-optimal frequency scaling factor for any number of concurrent tasks. They set the slowest task to maximum frequency for maintaining performance. In this paper we compare our algorithm with
-Rauber and Rünger model~\cite{3}, because their model can be used for any number of
-concurrent tasks for homogeneous platforms. The primary contributions of this paper are:
+
+In this section, some heuristics to compute the scaling factor are presented and
+classified into two categories: offline and online methods.
+
+\subsection{Offline scaling factor selection methods}
+
+The offline scaling factor selection methods are executed before the runtime of
+the program. They return static scaling factor values to the processors
+participating in the execution of the parallel program. On one hand, the
+scaling factor values could be computed based on information retrieved by
+analyzing the code of the program and the computing system that will execute it.
+In~\cite{40}, Azevedo et al. detect during compilation the dependency points
+between tasks in a multi-task program. This information is then used to lower
+the frequency of some processors in order to eliminate slack times. A slack
+time is the period of time during which a processor that have already finished
+its computation, have to wait for a set of processors to finish their
+computations and send their results to the waiting processor in order to
+continue its task that is dependent on the results of computations being
+executed on other processors. Freeh et al. showed in~\cite{17} that the
+communication times of MPI programs do not change when the frequency is scaled
+down. On the other hand, some offline scaling factor selection methods use the
+information gathered from previous full or partial executions of the program. A
+part or the whole program is usually executed over all the available frequency
+gears and the the execution time and the energy consumed with each frequency
+gear are measured. Then an heuristic or an exact method uses the retrieved
+information to compute the values of the scaling factor for the processors.
+In~\cite{29}, Xie et al. use an exact exponential breadth-first search algorithm
+to compute the scaling factor values that give the optimal energy reduction
+while respecting a deadline for a sequential program. They also present a
+linear heuristic that approximates the optimal solution. In~\cite{8} , Rountree
+et al. use a linear programming algorithm, while in~\cite{38,34}, Cochran et
+al. use multi logistic regression algorithm for the same goal. The main
+drawback for these methods is that they all require executing a part or the
+whole program on all frequency gears for each new instance of the same program.
+
+\subsection{Online scaling factor selection methods}
+
+The online scaling factor selection methods are executed during the runtime of
+the program. They are usually integrated into iterative programs where the same
+block of instructions is executed many times. During the first few iterations,
+many informations are measured such as the execution time, the energy consumed
+using a multimeter, the slack times, \dots{} Then a method will exploit these
+measurements to compute the scaling factor values for each processor. This
+operation, measurements and computing new scaling factor, can be repeated as
+much as needed if the iterations are not regular. Kimura, Peraza, Yu-Liang et
+al.~\cite{11,2,31} used varied heuristics to select the appropriate scaling
+factor values to eliminate the slack times during runtime. However, as seen
+in~\cite{39,19}, machine learning methods can take a lot of time to converge
+when the number of available gears is big. To reduce the impact of slack times,
+in~\cite{1}, Lim et al. developed an algorithm that detects the communication
+sections and changes the frequency during these sections only. This approach
+might change the frequency of each processor many times per iteration if an
+iteration contains more than one communication section. In~\cite{3}, Rauber and
+Rünger used an analytical model that can predict the consumed energy and the
+execution time for every frequency gear after measuring the consumed energy and
+the execution time with the highest frequency gear. These predictions may be
+used to choose the optimal gear for each processor executing the parallel
+program to reduce energy consumption. To maintain the performance of the
+parallel program , they set the processor with the biggest load to the highest
+gear and then compute the scaling factor values for the rest of the processors.
+Although this model was built for parallel architectures, it can be adapted to
+distributed architectures by taking into account the communications. The
+primary contribution of our paper is presenting a new online scaling factor
+selection method which has the following characteristics: