+\end{keyword}
+
+\end{frontmatter}
+
+
+
+\section{Introduction}
+\label{sec.intro}
+The need for more computing power is continually increasing. To partially
+satisfy this need, most supercomputers constructors just put more computing
+nodes in their platform. The resulting platforms may achieve higher floating
+point operations per second (FLOPS), but the energy consumption and the heat
+dissipation are also increased. As an example, the Chinese supercomputer
+Tianhe-2 had the highest FLOPS in June 2015 according to the Top500 list
+\cite{TOP500_Supercomputers_Sites}. However, it was also the most power hungry
+platform with its over 3 million cores consuming around 17.8 megawatts.
+Moreover, according to the U.S. annual energy outlook 2015
+\cite{U.S_Annual.Energy.Outlook.2015}, the price of energy for 1 megawatt-hour
+was approximately equal to \$70. Therefore, the price of the energy consumed by
+the Tianhe-2 platform is approximately more than \$10 million each year. The
+computing platforms must be more energy efficient and offer the highest number
+of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN
+which became the top of the Green500 list in June 2015 \cite{Green500_List}.
+This heterogeneous platform executes more than 7 GFlops per watt while consuming
+50.32 kilowatts.
+
+Besides platform improvements, there are many software and hardware techniques
+to lower the energy consumption of these platforms, such as DVFS, scheduling and other techniques.
+ DVFS is a widely used process to reduce the energy consumption of a
+processor by lowering its frequency
+\cite{Rizvandi_Some.Observations.on.Optimal.Frequency}. However, it also reduces
+the number of FLOPS executed by the processor which may increase the execution
+time of the application running over that processor. Therefore, researchers use
+different optimization strategies to select the frequency that gives the best
+trade-off between the energy reduction and performance degradation ratio. In
+\cite{Our_first_paper} and \cite{pdsec2015}, a frequency selecting algorithm
+was proposed to reduce the energy consumption of message passing
+applications with iterations running over homogeneous and heterogeneous clusters respectively.
+The results of the experiments showed significant energy consumption
+reductions. All the experimental results were conducted over the SimGrid
+simulator \cite{SimGrid}, which offers easy tools to describe homogeneous and heterogeneous platforms, and to simulate the execution of message passing parallel
+applications over them.
+
+
+This paper presents the following contributions :
+\begin{enumerate}
+\item two new energy and performance models for message passing
+ synchronous applications with iterations running over a heterogeneous grid platform. Both models
+ take into account communications and slack times. The models can predict the
+ required energy and the execution time of the application.
+
+\item a new online frequency selecting algorithm for heterogeneous grid
+ platforms. The algorithm has a very small overhead and does not need any
+ training nor profiling. It uses a new optimization function which
+ simultaneously maximizes the performance and minimizes the energy consumption
+ of a message passing synchronous application with iterations. The algorithm was applied to the NAS
+parallel benchmarks and evaluated over a real testbed, the Grid'5000 platform
+\cite{grid5000}.
+
+\end{enumerate}
+
+
+
+This paper is organized as follows: Section~\ref{sec.relwork} presents some
+related works from other authors. Section~\ref{sec.exe} describes how the
+execution time of message passing programs can be predicted. It also presents
+an energy model that predicts the energy consumption of an application running
+over a grid platform. Section~\ref{sec.compet} presents the
+energy-performance objective function that maximizes the reduction of energy
+consumption while minimizing the degradation of the program's performance.
+Section~\ref{sec.optim} details the proposed frequencies selecting algorithm.
+Section~\ref{sec.expe} presents the results of applying the algorithm on the
+NAS parallel benchmarks and executing them on the Grid'5000 testbed.
+It also evaluates the algorithm over multi-core per node architectures and over three different power scenarios. Moreover, it shows the
+comparison results between the proposed method and an existing method. Finally,
+in Section~\ref{sec.concl} the paper ends with a summary and some future works.