-The need and demand for more computing power have been increasing since the birth of the first computing unit and it is not expected to slow
-down in the coming years. To satisfy this demand, researchers and supercomputers
-constructors have been regularly increasing the number of computing cores and processors in
-supercomputers (for example in November 2013, according to the TOP500
-list~\cite{43}, the Tianhe-2 was the fastest supercomputer. It has more than 3
-millions of cores and delivers more than 33 Tflop/s while consuming 17808
-kW). This large increase in number of computing cores has led to large energy
-consumption by these architectures. Moreover, the price of energy is expected to
-continue its ascent according to the demand. For all these reasons energy
-reduction became an important topic in the high performance computing field. To
-tackle this problem, many researchers used DVFS (Dynamic Voltage Frequency
-Scaling) operations which reduce dynamically the frequency and voltage of cores
-and thus their energy consumption. Indeed, modern CPUs offer a set of acceptable frequencies which are usually called gears, and the user or the operating system can modify the frequency of the processor according to its needs. However, DVFS also degrades the
-performance of computation. Therefore researchers try to reduce the frequency to
-minimum when processors are idle (waiting for data from other processors or
-communicating with other processors). Moreover, depending on their objectives
-they use heuristics to find the best scaling factor during the computation. If
-they aim for performance they choose the best scaling factor that reduces the
-consumed energy while affecting as little as possible the performance. On the
-other hand, if they aim for energy reduction, the chosen scaling factor must
-produce the most energy efficient execution without considering the degradation
-of the performance. It is important to notice that lowering the frequency to
-minimum value does not always give the most energy efficient execution due to energy
-leakage. The best scaling factor might be chosen during execution (online) or
-during a pre-execution phase. In this paper, we present an
-algorithm that selects a frequency scaling factor that simultaneously takes into
-consideration the energy consumption by the CPU and the performance of the application. The
-main objective of HPC systems is to execute as fast as possible the application.
-Therefore, our algorithm selects the scaling factor online with
-very small footprint. The proposed algorithm takes into account the
-communication times of the MPI program to choose the scaling factor. This
-algorithm has ability to predict both energy consumption and execution time over
-all available scaling factors. The prediction achieved depends on some
-computing time information, gathered at the beginning of the runtime. We apply
-this algorithm to seven MPI benchmarks. These MPI programs are the NAS parallel
-benchmarks (NPB v3.3) developed by NASA~\cite{44}. Our experiments are executed
-using the simulator SimGrid/SMPI v3.10~\cite{Casanova:2008:SGF:1397760.1398183}
-over an homogeneous distributed memory architecture. Furthermore, we compare the
-proposed algorithm with Rauber and Rünger methods~\cite{3}.
-The comparison's results show that our algorithm gives better energy-time trade-off.
-
-This paper is organized as follows: Section~\ref{sec.relwork} presents some related works
-from other authors. Section~\ref{sec.exe} explains the execution of parallel
-tasks and the sources of slack times. It also presents an energy
-model for homogeneous platforms. Section~\ref{sec.mpip} describes how the performance
-of MPI programs can be predicted . Section~\ref{sec.compet} presents the energy-performance
-objective function that maximizes the reduction of energy consumption while minimizing the degradation of the program's performance. Section~\ref{sec.optim} details the proposed energy-performance algorithm. Section~\ref{sec.expe} verifies the accuracy of the performance prediction
-model and presents the results of the proposed algorithm. It also shows the comparison results between our method and other existing methods. Finally,
-we conclude in Section~\ref{sec.concl} with a summary and some future works.
+The need and demand for more computing power have been increasing since the
+birth of the first computing unit and it is not expected to slow down in the
+coming years. To satisfy this demand, researchers and supercomputers
+constructors have been regularly increasing the number of computing cores and
+processors in supercomputers (for example in November 2013, according to the
+TOP500 list~\cite{43}, the Tianhe-2 was the fastest supercomputer. It has more
+than 3 million of cores and delivers more than \np[Tflop/s]{33} while consuming
+\np[kW]{17808}). This large increase in number of computing cores has led to
+large energy consumption by these architectures. Moreover, the price of energy
+is expected to continue its ascent according to the demand. For all these
+reasons energy reduction has become an important topic in the high performance
+computing field. To tackle this problem, many researchers use DVFS (Dynamic
+Voltage Frequency Scaling) operations which reduce dynamically the frequency and
+voltage of cores and thus their energy consumption. Indeed, modern CPUs offer a
+set of acceptable frequencies which are usually called gears, and the user or
+the operating system can modify the frequency of the processor according to its
+needs. However, DVFS also degrades the performance of computation. Therefore
+researchers try to reduce the frequency to the minimum when processors are idle
+(waiting for data from other processors or communicating with other processors).
+Moreover, depending on their objectives, they use heuristics to find the best
+scaling factor during the computation. If they aim for performance they choose
+the best scaling factor that reduces the consumed energy while affecting as
+little as possible the performance. On the other hand, if they aim for energy
+reduction, the chosen scaling factor must produce the most energy efficient
+execution without considering the degradation of the performance. It is
+important to notice that lowering the frequency to the minimum value does not always
+give the most energy efficient execution due to energy leakage. The best
+scaling factor might be chosen during execution (online) or during a
+pre-execution phase. In this paper, we present an algorithm that selects a
+frequency scaling factor that simultaneously takes into consideration the energy
+consumption by the CPU and the performance of the application. The main
+objective of HPC systems is to execute as fast as possible the application.
+Therefore, our algorithm selects the scaling factor online with very small
+overhead. The proposed algorithm takes into account the communication times of
+the MPI program to choose the scaling factor. This algorithm has the ability to
+predict both energy consumption and execution time over all available scaling
+factors. The prediction achieved depends on some computing time information,
+gathered at the beginning of the runtime. We apply this algorithm to the NAS parallel benchmarks (NPB v3.3)~\cite{44}. Our experiments are executed using the simulator
+SimGrid/SMPI v3.10~\cite{Casanova:2008:SGF:1397760.1398183} over an homogeneous
+distributed memory architecture. Furthermore, we compare the proposed algorithm
+with Rauber and Rünger methods~\cite{3}. The comparison's results show that our
+algorithm gives better energy-time trade-off.
+
+This paper is organized as follows: Section~\ref{sec.relwork} presents some
+related works from other authors. Section~\ref{sec.exe} presents an energy
+model for homogeneous platforms. Section~\ref{sec.mpip} describes how the
+performance of MPI programs can be predicted. Section~\ref{sec.compet} presents
+the energy-performance objective function that maximizes the reduction of energy
+consumption while minimizing the degradation of the program's performance.
+Section~\ref{sec.optim} details the proposed energy-performance algorithm.
+Section~\ref{sec.expe} verifies the accuracy of the performance prediction model
+and presents the results of the proposed algorithm. It also shows the
+comparison results between our method and other existing methods. Finally, we
+conclude in Section~\ref{sec.concl} with a summary and some future works.
+