(DVFS) is one of them. It reduces the frequency of a CPU to lower its energy
consumption. However, lowering the frequency of a CPU might increase the
execution time of an application running on that processor. Therefore, the
-frequency that gives the best tradeoff between the energy consumption and the
+frequency that gives the best trade-off between the energy consumption and the
performance of an application must be selected.\\
In this paper, a new online frequencies selecting algorithm for heterogeneous
platforms is presented. It selects the frequency which tries to give the best
-tradeoff between energy saving and performance degradation, for each node
+trade-off between energy saving and performance degradation, for each node
computing the message passing iterative application. The algorithm has a small
overhead and works without training or profiling. It uses a new energy model for
message passing iterative applications running on a heterogeneous platform. The
-proposed algorithm is evaluated on the Simgrid simulator while running the NAS
+proposed algorithm is evaluated on the SimGrid simulator while running the NAS
parallel benchmarks. The experiments show that it reduces the energy
consumption by up to 35\% while limiting the performance degradation as much as
possible. Finally, the algorithm is compared to an existing method, the
the number of FLOPS executed by the processor which might increase the execution
time of the application running over that processor. Therefore, researchers use
different optimization strategies to select the frequency that gives the best
-tradeoff between the energy reduction and performance degradation ratio. In
+trade-off between the energy reduction and performance degradation ratio. In
\cite{Our_first_paper}, a frequency selecting algorithm was proposed to reduce
the energy consumption of message passing iterative applications running over
homogeneous platforms. The results of the experiments show significant energy
The work presented in this paper concerns the second type of platform, with heterogeneous CPUs.
Many methods were conceived to reduce the energy consumption of this type of platform. Naveen et al.~\cite{Naveen_Power.Efficient.Resource.Scaling}
-developed a method that minimizes the value of $energy*delay^2$ (the delay is the sum of slack times that happen during synchronous communications) by dynamically assigning new frequencies to the CPUs of the heterogeneous cluster. Lizhe et al.~\cite{Lizhe_Energy.aware.parallel.task.scheduling} proposed
+developed a method that minimizes the value of $energy\cdot delay^2$ (the delay is the sum of slack times that happen during synchronous communications) by dynamically assigning new frequencies to the CPUs of the heterogeneous cluster. Lizhe et al.~\cite{Lizhe_Energy.aware.parallel.task.scheduling} proposed
an algorithm that divides the executed tasks into two types: the critical and
non critical tasks. The algorithm scales down the frequency of non critical tasks proportionally to their slack and communication times while limiting the performance degradation percentage to less than 10\%. In~\cite{Joshi_Blackbox.prediction.of.impact.of.DVFS}, they developed
a heterogeneous cluster composed of two types
Therefore, the overall execution time of the program is the execution time of the slowest
task which has the highest computation time and no slack time.
- \begin{figure}[t]
+ \begin{figure}[!t]
\centering
\includegraphics[scale=0.6]{fig/commtasks}
\caption{Parallel tasks on a heterogeneous platform}
\textit T_\textit{new} =
\max_{i=1,2,\dots,N} ({TcpOld_{i}} \cdot S_{i}) + MinTcm
\end{equation}
-Where:\\
+Where:
\begin{equation}
\label{eq:perf2}
MinTcm = \min_{i=1,2,\dots,N} (Tcm_i)
\end{multline}
-\begin{figure}
+\begin{figure}[!t]
\centering
\subfloat[Homogeneous platform]{%
\includegraphics[width=.33\textwidth]{fig/homo}\label{fig:r1}}%
the difference between the faster nodes and the slower nodes is, the bigger the
maximum distance between the energy curve and the performance curve is while
the scaling factors are varying which results in bigger energy savings.
-\begin{figure}[t]
+\begin{figure}[!t]
\centering
\includegraphics[scale=0.5]{fig/start_freq}
\caption{Selecting the initial frequencies}
computing power (which corresponds to 80\% of its dynamic power and the
remaining 20\% to the static power), the same assumption was made in
\cite{Our_first_paper,Rauber_Analytical.Modeling.for.Energy}. Finally, These
-nodes were connected via an ethernet network with 1 Gbit/s bandwidth.
+nodes were connected via an Ethernet network with 1 Gbit/s bandwidth.
-\begin{table}[htb]
+\begin{table}[!t]
\caption{Heterogeneous nodes characteristics}
% title of Table
\centering
type &GFLOPS & Freq. & Freq. & Freq. & power & power \\
& & GHz & GHz &GHz & & \\
\hline
- 1 &40 & 2.5 & 1.2 & 0.1 & 20~w &4~w \\
+ 1 &40 & 2.5 & 1.2 & 0.1 & 20~W &4~W \\
\hline
- 2 &50 & 2.66 & 1.6 & 0.133 & 25~w &5~w \\
+ 2 &50 & 2.66 & 1.6 & 0.133 & 25~W &5~W \\
\hline
- 3 &60 & 2.9 & 1.2 & 0.1 & 30~w &6~w \\
+ 3 &60 & 2.9 & 1.2 & 0.1 & 30~W &6~W \\
\hline
- 4 &70 & 3.4 & 1.6 & 0.133 & 35~w &7~w \\
+ 4 &70 & 3.4 & 1.6 & 0.133 & 35~W &7~W \\
\hline
\end{tabular}
-\begin{table}[htb]
+\begin{table}[!t]
\caption{Running NAS benchmarks on 4 nodes }
% title of Table
\centering
\hline
\end{tabular}
\label{table:res_4n}
-\end{table}
+% \end{table}
-\begin{table}[htb]
+\medskip
+% \begin{table}[!t]
\caption{Running NAS benchmarks on 8 and 9 nodes }
% title of Table
\centering
\hline
\end{tabular}
\label{table:res_8n}
-\end{table}
+% \end{table}
-\begin{table}[htb]
+\medskip
+% \begin{table}[!t]
\caption{Running NAS benchmarks on 16 nodes }
% title of Table
\centering
\hline
\end{tabular}
\label{table:res_16n}
-\end{table}
+% \end{table}
-\begin{table}[htb]
+\medskip
+% \begin{table}[!t]
\caption{Running NAS benchmarks on 32 and 36 nodes }
% title of Table
\centering
\hline
\end{tabular}
\label{table:res_32n}
-\end{table}
+% \end{table}
-\begin{table}[htb]
+\medskip
+% \begin{table}[!t]
\caption{Running NAS benchmarks on 64 nodes }
% title of Table
\centering
\hline
\end{tabular}
\label{table:res_64n}
-\end{table}
-
+% \end{table}
-\begin{table}[htb]
+\medskip
+% \begin{table}[!t]
\caption{Running NAS benchmarks on 128 and 144 nodes }
% title of Table
\centering
-\begin{figure}
+\begin{figure}[!t]
\centering
\subfloat[Energy saving]{%
\includegraphics[width=.33\textwidth]{fig/energy}\label{fig:energy}}%
higher static power percentage in the first scenario which makes it more
relevant in the overall consumed energy. Indeed, the static energy is related
to the execution time and if the performance is degraded the amount of consumed
-static energy directly increas. Therefore, the proposed algorithm does not
+static energy directly increases. Therefore, the proposed algorithm does not
really significantly scale down much the frequencies of the nodes in order to
limit the increase of the execution time and thus limiting the effect of the
consumed static energy.
degradation.
- \begin{table}[htb]
+ \begin{table}[!t]
\caption{The results of the 70\%-30\% power scenario}
% title of Table
\centering
-\begin{table}[htb]
+\begin{table}[!t]
\caption{The results of the 90\%-10\% power scenario}
% title of Table
\centering
\end{table}
-\begin{figure}
+\begin{figure}[!t]
\centering
\subfloat[Comparison between the results on 8 nodes]{%
\includegraphics[width=.33\textwidth]{fig/sen_comp}\label{fig:sen_comp}}%
\label{sec.compare_EDP}
In this section, the scaling factors selection algorithm, called MaxDist,
is compared to Spiliopoulos et al. algorithm \cite{Spiliopoulos_Green.governors.Adaptive.DVFS}, called EDP.
-They developed a green governor that regularly applies an online frequency selecting algorithm to reduce the energy consumed by a multicore architecture without degrading much its performance. The algorithm selects the frequencies that minimize the energy and delay products, $EDP=Enegry*Delay$ using the predicted overall energy consumption and execution time delay for each frequency.
+They developed a green governor that regularly applies an online frequency selecting algorithm to reduce the energy consumed by a multicore architecture without degrading much its performance. The algorithm selects the frequencies that minimize the energy and delay products, $EDP=Energy\cdot Delay$ using the predicted overall energy consumption and execution time delay for each frequency.
To fairly compare both algorithms, the same energy and execution time models, equations (\ref{eq:energy}) and (\ref{eq:fnew}), were used for both algorithms to predict the energy consumption and the execution times. Also Spiliopoulos et al. algorithm was adapted to start the search from the
initial frequencies computed using the equation (\ref{eq:Fint}). The resulting algorithm is an exhaustive search algorithm that minimizes the EDP and has the initial frequencies values as an upper bound.
-Both algorithms were applied to the parallel NAS benchmarks to compare their efficiency. Table \ref{table:compare_EDP} presents the results of comparing the execution times and the energy consumptions for both versions of the NAS benchmarks while running the class C of each benchmark over 8 or 9 heterogeneous nodes. The results show that our algorithm provides better energy savings than Spiliopoulos et al. algorithm,
+Both algorithms were applied to the parallel NAS benchmarks to compare their efficiency. Table \ref{table:compare_EDP} presents the results of comparing the execution times and the energy consumption for both versions of the NAS benchmarks while running the class C of each benchmark over 8 or 9 heterogeneous nodes. The results show that our algorithm provides better energy savings than Spiliopoulos et al. algorithm,
on average it results in 29.76\% energy saving while their algorithm returns just 25.75\%. The average of performance degradation percentage is approximately the same for both algorithms, about 4\%.
For all benchmarks, our algorithm outperforms Spiliopoulos et al. algorithm in
-terms of energy and performance tradeoff, see figure (\ref{fig:compare_EDP}),
+terms of energy and performance trade-off, see figure (\ref{fig:compare_EDP}),
because it maximizes the distance between the energy saving and the performance
degradation values while giving the same weight for both metrics.
-\begin{table}[h]
+\begin{table}[!t]
\caption{Comparing the proposed algorithm}
\centering
\begin{tabular}{|l|l|l|l|l|l|l|l|}
-\begin{figure}[t]
+\begin{figure}[!t]
\centering
\includegraphics[scale=0.5]{fig/compare_EDP.pdf}
- \caption{Tradeoff comparison for NAS benchmarks class C}
+ \caption{Trade-off comparison for NAS benchmarks class C}
\label{fig:compare_EDP}
\end{figure}
\label{sec.concl}
In this paper, a new online frequency selecting algorithm has been presented. It
selects the best possible vector of frequency scaling factors that gives the
-maximum distance (optimal tradeoff) between the predicted energy and the
+maximum distance (optimal trade-off) between the predicted energy and the
predicted performance curves for a heterogeneous platform. This algorithm uses a
new energy model for measuring and predicting the energy of distributed
iterative applications running over heterogeneous platforms. To evaluate the
proposed method, it was applied on the NAS parallel benchmarks and executed over
-a heterogeneous platform simulated by Simgrid. The results of the experiments
+a heterogeneous platform simulated by SimGrid. The results of the experiments
showed that the algorithm reduces up to 35\% the energy consumption of a message
passing iterative method while limiting the degradation of the performance. The
algorithm also selects different scaling factors according to the percentage of
the computing and communication times, and according to the values of the static
and dynamic powers of the CPUs. Finally, the algorithm was compared to
Spiliopoulos et al. algorithm and the results showed that it outperforms their
-algorithm in terms of energy-time tradeoff.
+algorithm in terms of energy-time trade-off.
In the near future, this method will be applied to real heterogeneous platforms
to evaluate its performance in a real study case. It would also be interesting
%%% End:
% LocalWords: Fanfakh Charr FIXME Tianhe DVFS HPC NAS NPB SMPI Rauber's Rauber
-% LocalWords: CMOS EPSA Franche Comté Tflop Rünger IUT Maréchal Juin cedex
+% LocalWords: CMOS EPSA Franche Comté Tflop Rünger IUT Maréchal Juin cedex GPU
% LocalWords: de badri muslim MPI TcpOld TcmOld dNew dOld cp Sopt Tcp Tcm Ps
-% LocalWords: Scp Fmax Fdiff SimGrid GFlops Xeon EP BT
+% LocalWords: Scp Fmax Fdiff SimGrid GFlops Xeon EP BT GPUs CPUs AMD
+% LocalWords: Spiliopoulos scalability