-Energy reduction process for a high performance clusters recently performed using
-dynamic voltage and frequency scaling (DVFS) technique. DVFS is a technique enabled
-in a modern processors to scaled down both of the voltage and the frequency of
-the CPU while it is in the computing mode to reduce the energy consumption. DVFS is
-also allowed in the graphical processors GPUs, to achieved the same goal. Applying
-DVFS has a dramatical side effect if it is applied to minimum levels to gain more
-energy reduction, producing a high percentage of performance degradations for the
-parallel applications. Many researchers used different strategies to solve this
-nonlinear problem for example in~\cite{19,42}, their methods add big overheads to
-the algorithm to select the suitable frequency. In this paper we present a method
-to find the optimal set of frequency scaling factors for a heterogeneous cluster to
-simultaneously optimize both the energy and the execution time without adding a big
-overhead. This work is developed from our previous work of a homogeneous cluster~\cite{45}.
-Therefore we are interested to present some works that concerned the heterogeneous clusters
-enabled DVFS. In general, the heterogeneous cluster works fall into two categorizes:
-GPUs-CPUs heterogeneous clusters and CPUs-CPUs heterogeneous clusters. In GPUs-CPUs
-heterogeneous clusters some parallel tasks executed on a GPUs and the others executed
-on a CPUs. As an example of this works, Luley et al.~\cite{51}, proposed a heterogeneous
-cluster composed of Intel Xeon CPUs and NVIDIA GPUs. Their main goal is to determined the
-energy efficiency as a function of performance per watt, the best tradeoff is done when the
-performance per watt function is maximized. In the work of Kia Ma et al.~\cite{49},
-They developed a scheduling algorithm to distributed different workloads proportional
-to the computing power of the node to be executed on a CPU or a GPU, emphasize all tasks
-must be finished in the same time.
-Recently, Rong et al.~\cite{50}, Their study explain that a heterogeneous clusters enabled
-DVFS using GPUs and CPUs gave better energy and performance efficiency than other clusters
-composed of only CPUs. The CPUs-CPUs heterogeneous clusters consist of number of computing
-nodes all of the type CPU. Our work in this paper can be classified to this type of the
-clusters. As an example of this works see Naveen et al.~\cite{52} work, They developed a
-policy to dynamically assigned the frequency to a heterogeneous cluster. The goal is to
-minimizing a fixed metric of $energy*delay^2$. Where our proposed method is automatically
-optimized the relation between the energy and the delay of the iterative applications.
-Other works such as Lizhe et al.~\cite{53}, their algorithm divided the executed tasks into
-two types: the critical and non critical tasks. The algorithm scaled down the frequency of
-the non critical tasks as function to the amount of the slack and communication times that
-have with maximum of performance degradation percentage of 10\%. In our method there is no
-fixed bounds for performance degradation percentage and the bound is dynamically computed
-according to the energy and the performance tradeoff relation of the executed application.
-There are some approaches used a heterogeneous cluster composed from two different types
-of Intel and AMD processors such as~\cite{54} and \cite{55}, they predicated both the energy
-and the performance for each frequency gear, then the algorithm selected the best gear that gave
-the best tradeoff. In contrast our algorithm works over a heterogeneous platform composed of
-four different types of processors. Others approaches such as \cite{56} and \cite{57}, they
-are selected the best frequencies for a specified heterogeneous clusters offline using some
-heuristic methods. While our proposed algorithm works online during the execution time of
-iterative application. Greedy dynamic approach used by Chen et al.~\cite{58}, minimized
-the power consumption of a heterogeneous severs with time/space complexity, this approach
-had considerable overhead. In our proposed scaling algorithm has very small overhead and
-it is works without any previous analysis for the application time complexity.
+DVFS is a technique used in modern processors to scale down both the voltage and
+the frequency of the CPU while computing, in order to reduce the energy
+consumption of the processor. DVFS is also allowed in GPUs to achieve the same
+goal. Reducing the frequency of a processor lowers its number of FLOPS and might
+degrade the performance of the application running on that processor, especially
+if it is compute bound. Therefore selecting the appropriate frequency for a
+processor to satisfy some objectives while taking into account all the
+constraints, is not a trivial operation. Many researchers used different
+strategies to tackle this problem. Some of them developed online methods that
+compute the new frequency while executing the application, such
+as~\cite{Hao_Learning.based.DVFS,Spiliopoulos_Green.governors.Adaptive.DVFS}.
+Others used offline methods that might need to run the application and profile
+it before selecting the new frequency, such
+as~\cite{Rountree_Bounding.energy.consumption.in.MPI,Cochran_Pack_and_Cap_Adaptive_DVFS}.
+The methods could be heuristics, exact or brute force methods that satisfy
+varied objectives such as energy reduction or performance. They also could be
+adapted to the execution's environment and the type of the application such as
+sequential, parallel or distributed architecture, homogeneous or heterogeneous
+platform, synchronous or asynchronous application, \dots{}
+
+In this paper, we are interested in reducing energy for message passing iterative synchronous applications running over heterogeneous platforms.
+Some works have already been done for such platforms and they can be classified into two types of heterogeneous platforms:
+\begin{itemize}
+
+\item the platform is composed of homogeneous GPUs and homogeneous CPUs.
+\item the platform is only composed of heterogeneous CPUs.
+
+\end{itemize}
+
+For the first type of platform, the computing intensive parallel tasks are
+executed on the GPUs and the rest are executed on the CPUs. Luley et
+al.~\cite{Luley_Energy.efficiency.evaluation.and.benchmarking}, proposed a
+heterogeneous cluster composed of Intel Xeon CPUs and NVIDIA GPUs. Their main
+goal was to maximize the energy efficiency of the platform during computation by
+maximizing the number of FLOPS per watt generated.
+In~\cite{KaiMa_Holistic.Approach.to.Energy.Efficiency.in.GPU-CPU}, Kai Ma et
+al. developed a scheduling algorithm that distributes workloads proportional to
+the computing power of the nodes which could be a GPU or a CPU. All the tasks
+must be completed at the same time. In~\cite{Rong_Effects.of.DVFS.on.K20.GPU},
+Rong et al. showed that a heterogeneous (GPUs and CPUs) cluster that enables
+DVFS gave better energy and performance efficiency than other clusters only
+composed of CPUs.
+
+The work presented in this paper concerns the second type of platform, with
+heterogeneous CPUs. Many methods were conceived to reduce the energy
+consumption of this type of platform. Naveen et
+al.~\cite{Naveen_Power.Efficient.Resource.Scaling} developed a method that
+minimizes the value of $\mathit{energy}\times \mathit{delay}^2$ (the delay is
+the sum of slack times that happen during synchronous communications) by
+dynamically assigning new frequencies to the CPUs of the heterogeneous cluster.
+Lizhe et al.~\cite{Lizhe_Energy.aware.parallel.task.scheduling} proposed an
+algorithm that divides the executed tasks into two types: the critical and non
+critical tasks. The algorithm scales down the frequency of non critical tasks
+proportionally to their slack and communication times while limiting the
+performance degradation percentage to less than \np[\%]{10}.
+In~\cite{Joshi_Blackbox.prediction.of.impact.of.DVFS}, they developed a
+heterogeneous cluster composed of two types of Intel and AMD processors. They
+use a gradient method to predict the impact of DVFS operations on performance.
+In~\cite{Shelepov_Scheduling.on.Heterogeneous.Multicore} and
+\cite{Li_Minimizing.Energy.Consumption.for.Frame.Based.Tasks}, the best
+frequencies for a specified heterogeneous cluster are selected offline using
+some heuristic. Chen et
+al.~\cite{Chen_DVFS.under.quality.of.service.requirements} used a greedy dynamic
+programming approach to minimize the power consumption of heterogeneous servers
+while respecting given time constraints. This approach had considerable
+overhead. In contrast to the above described papers, this paper presents the
+following contributions :
+\begin{enumerate}
+\item two new energy and performance models for message passing iterative
+ synchronous applications running over a heterogeneous platform. Both models
+ take into account communication and slack times. The models can predict the
+ required energy and the execution time of the application.
+
+\item a new online frequency selecting algorithm for heterogeneous
+ platforms. The algorithm has a very small overhead and does not need any
+ training or profiling. It uses a new optimization function which
+ simultaneously maximizes the performance and minimizes the energy consumption
+ of a message passing iterative synchronous application.
+
+\end{enumerate}