\maketitle
\begin{abstract}
-
+Green computing emphasizes the importance of energy conservation, minimizing the negative impact
+on the environment while achieving high performance and minimizing operating costs. So, energy reduction
+process in a high performance clusters it can be archived using dynamic voltage and frequency
+scaling (DVFS) technique, through reducing the frequency of a CPU. Using DVFS to lower levels
+result in a high increase in performance degradation ratio. Therefore selecting the best frequencies
+must give the best possible tradeoff between the energy and the performance of parallel program.
+
+In this paper we present a new online heterogeneous scaling algorithm that selects the best vector
+of frequency scaling factors. These factors give the best tradeoff between the energy saving and the
+performance degradation. The algorithm has small overhead and works without training and profiling.
+We developed a new energy model for distributed iterative application running on heterogeneous cluster.
+The proposed algorithm experimented on Simgrid simulator that applying the NAS parallel benchmarks.
+It reduces the energy consumption up to 35\% while limits the performance degradation as much as possible.
\end{abstract}
\section{Introduction}
cluster composed of Intel Xeon CPUs and NVIDIA GPUs. Their main goal is to determined the
energy efficiency as a function of performance per watt, the best tradeoff is done when the
performance per watt function is maximized. In the work of Kia Ma et al.
-~\cite{KaiMa_Holistic.Approach.to.Energy.Efficiency.in.GPU-CPU}, They developed a scheduling
+~\cite{KaiMa_Holistic.Approach.to.Energy.Efficiency.in.GPU-CPU}, they developed a scheduling
algorithm to distributed different workloads proportional to the computing power of the node
-to be executed on a CPU or a GPU, emphasize all tasks must be finished in the same time.
+to be executed on CPU or GPU, emphasize all tasks must be finished in the same time.
Recently, Rong et al.~\cite{Rong_Effects.of.DVFS.on.K20.GPU}, Their study explain that
a heterogeneous clusters enabled DVFS using GPUs and CPUs gave better energy and performance
efficiency than other clusters composed of only CPUs.
\subsection{The results for different power consumption scenarios}
-
+\label{sec.compare}
The results of the previous section were obtained while using processors that consume during computation
an overall power which is 80\% composed of dynamic power and 20\% of static power. In this section,
these ratios are changed and two new power scenarios are considered in order to evaluate how the proposed
table~(\ref{table:platform}), it takes on average \np[ms]{0.04} for 4 nodes and \np[ms]{0.15} on average for 144 nodes
to compute the best scaling factors vector. The algorithm complexity is $O(F\cdot (N \cdot4) )$, where $F$ is the number
of iterations and $N$ is the number of computing nodes. The algorithm needs from 12 to 20 iterations to select the best
-vector of frequency scaling factors that gives the results of the section (\ref{sec.res}).
+vector of frequency scaling factors that gives the results of the sections (\ref{sec.res}) and (\ref{sec.compare}) .
\section{Conclusion}
\label{sec.concl}
-
+In this paper, we have presented a new online heterogeneous scaling algorithm
+that selects the best possible vector of frequency scaling factors. This vector
+gives the maximum distance (optimal tradeoff) between the normalized energy and
+the performance curves. In addition, we developed a new energy model for measuring
+and predicting the energy of distributed iterative applications running over heterogeneous
+cluster. The proposed method evaluated on Simgrid/SMPI simulator to built a heterogeneous
+platform to executes NAS parallel benchmarks. The results of the experiments showed the ability of
+the proposed algorithm to changes its behaviour to selects different scaling factors when
+the number of computing nodes and both of the static and the dynamic powers are changed.
+
+In the future, we plan to improve this method to apply on asynchronous iterative applications
+where each task does not wait the others tasks to finish there works. This leads us to develop a new
+energy model to an asynchronous iterative applications, where the number of iterations is not
+known in advance and depends on the global convergence of the iterative system.
\section*{Acknowledgment}
+
% trigger a \newpage just before the given reference
% number - used to balance the columns on the last page
% adjust value as needed - may need to be readjusted if