the performance of an application must be selected.
In this paper, a new online frequency selecting algorithm for heterogeneous
- platforms is presented. It selects the frequencies and tries to give the best
+ platforms (heterogeneous CPUs) is presented. It selects the frequencies and tries to give the best
trade-off between energy saving and performance degradation, for each node
computing the message passing iterative application. The algorithm has a small
overhead and works without training or profiling. It uses a new energy model
for message passing iterative applications running on a heterogeneous
platform. The proposed algorithm is evaluated on the SimGrid simulator while
running the NAS parallel benchmarks. The experiments show that it reduces the
- energy consumption by up to \np[\%]{35} while limiting the performance
+ energy consumption by up to \np[\%]{34} while limiting the performance
degradation as much as possible. Finally, the algorithm is compared to an
- existing method, the comparison results showing that it outperforms the
- latter.
+ existing method, the comparison results show that it outperforms the
+ latter, on average it saves \np[\%]{4} more energy while keeping the same performance.
\end{abstract}
goal. Reducing the frequency of a processor lowers its number of FLOPS and may
degrade the performance of the application running on that processor, especially
if it is compute bound. Therefore selecting the appropriate frequency for a
-processor to satisfy some objectives while taking into account all the
+processor to satisfy some objectives, while taking into account all the
constraints, is not a trivial operation. Many researchers used different
strategies to tackle this problem. Some of them developed online methods that
compute the new frequency while executing the application, such
$\Tnew$ and $\Told$ are computed as in (\ref{eq:pnorm}).
While the main goal is to optimize the energy and execution time at the same
-time, the normalized energy and execution time curves are not in the same
-direction. According to the equations~(\ref{eq:pnorm}) and (\ref{eq:enorm}), the
+time, the normalized energy and execution time curves do not evolve (increase/decrease) in the same way. According to the equations~(\ref{eq:pnorm}) and (\ref{eq:enorm}), the
vector of frequency scaling factors $S_1,S_2,\dots,S_N$ reduce both the energy
and the execution time simultaneously. But the main objective is to produce
maximum energy reduction with minimum execution time reduction.
This problem can be solved by making the optimization process for energy and
-execution time following the same direction. Therefore, the equation of the
+execution time follow the same evolution according to the vector of scaling factors. Therefore, the equation of the
normalized execution time is inverted which gives the normalized performance
equation, as follows:
\begin{multline}
\item[{$\Fmax[i]$}] array of the maximum frequencies for all nodes.
\item[{$\Pd[i]$}] array of the dynamic powers for all nodes.
\item[{$\Ps[i]$}] array of the static powers for all nodes.
- \item[{$\Fdiff[i]$}] array of the difference between two successive frequencies for all nodes.
+ \item[{$\Fdiff[i]$}] array of the differences between two successive frequencies for all nodes.
\end{description}
\Ensure $\Sopt[1],\Sopt[2] \dots, \Sopt[N]$ is a vector of optimal scaling factors
according to the computed initial frequency scaling factors. The resulting new
frequencies are highlighted in Figure~\ref{fig:st_freq}. This set of
frequencies can be considered as a higher bound for the search space of the
-optimal vector of frequencies because selecting frequency scaling factors higher
+optimal vector of frequencies because selecting scaling factors higher
than the higher bound will not improve the performance of the application and it
will increase its overall energy consumption. Therefore the algorithm that
selects the frequency scaling factors starts the search method from these
initial frequencies and takes a downward search direction toward lower
-frequencies. The algorithm iterates on all left frequencies, from the higher
+frequencies. The algorithm iterates on all remaining frequencies, from the higher
bound until all nodes reach their minimum frequencies, to compute their overall
energy consumption and performance, and select the optimal frequency scaling
factors vector. At each iteration the algorithm determines the slowest node
until they reach the higher bound. It can also be noticed that the higher the
difference between the faster nodes and the slower nodes is, the bigger the
maximum distance between the energy curve and the performance curve is while the
-scaling factors are varying which results in bigger energy savings.
+scaling factors are varying which results in bigger energy savings.
+Finally, in a homogeneous platform the energy consumption is increased when the scaling factor is very high.
+Indeed, the dynamic energy saved by reducing the frequency of the processor is compensated by the significant increase of the execution time and thus the increased of the static energy. On the other hand, in a heterogeneous platform this is not the case.
\subsection{The evaluation of the proposed algorithm}
\label{sec.verif.algo}
very precise, the maximum normalized difference between the predicted execution
time and the real execution time is equal to 0.03 for all the NAS benchmarks.
-Since the proposed algorithm is not an exact method it does not test all the
+Since the proposed algorithm is not an exact method, it does not test all the
possible solutions (vectors of scaling factors) in the search space. To prove
its efficiency, it was compared on small instances to a brute force search
algorithm that tests all the possible solutions. The brute force algorithm was
\label{sec.expe}
To evaluate the efficiency and the overall energy consumption reduction of
-Algorithm~\ref{HSA}, it was applied to the NAS parallel benchmarks NPB v3.3. The
+Algorithm~\ref{HSA}, it was applied to the NAS parallel benchmarks NPB v3.3 which
+is composed of synchronous message passing applications. The
experiments were executed on the simulator SimGrid/SMPI which offers easy tools
to create a heterogeneous platform and run message passing applications over it.
The heterogeneous platform that was used in the experiments, had one core per
MG, FT, BT, LU and SP) and the benchmarks were executed with the three classes:
A, B and C. However, due to the lack of space in this paper, only the results of
the biggest class, C, are presented while being run on different number of
-nodes, ranging from 4 to 128 or 144 nodes depending on the benchmark being
+nodes, ranging from 8 to 128 or 144 nodes depending on the benchmark being
executed. Indeed, the benchmarks CG, MG, LU, EP and FT had to be executed on 1,
2, 4, 8, 16, 32, 64, or 128 nodes. The other benchmarks such as BT and SP had
to be executed on 1, 4, 9, 16, 36, 64, or 144 nodes.
\begin{table}[!t]
- \caption{Running NAS benchmarks on 4 nodes }
- % title of Table
- \centering
- \begin{tabular}{|*{7}{r|}}
- \hline
- \hspace{-2.2084pt}%
- Program & Execution & Energy & Energy & Performance & Distance \\
- name & time/s & consumption/J & saving\% & degradation\% & \\
- \hline
- CG & 64.64 & 3560.39 & 34.16 & 6.72 & 27.44 \\
- \hline
- MG & 18.89 & 1074.87 & 35.37 & 4.34 & 31.03 \\
- \hline
- EP & 79.73 & 5521.04 & 26.83 & 3.04 & 23.79 \\
- \hline
- LU & 308.65 & 21126.00 & 34.00 & 6.16 & 27.84 \\
- \hline
- BT & 360.12 & 21505.55 & 35.36 & 8.49 & 26.87 \\
- \hline
- SP & 234.24 & 13572.16 & 35.22 & 5.70 & 29.52 \\
- \hline
- FT & 81.58 & 4151.48 & 35.58 & 0.99 & 34.59 \\
- \hline
- \end{tabular}
- \label{table:res_4n}
+
% \end{table}
- \medskip
+
% \begin{table}[!t]
\caption{Running NAS benchmarks on 8 and 9 nodes }
% title of Table
energy consumption model (\ref{eq:energy}), with and without applying the
algorithm. The execution time was also measured for all these experiments. Then,
the energy saving and performance degradation percentages were computed for each
-instance. The results are presented in Tables~\ref{table:res_4n},
+instance. The results are presented in Tables
\ref{table:res_8n}, \ref{table:res_16n}, \ref{table:res_32n},
\ref{table:res_64n} and \ref{table:res_128n}. All these results are the average
values from many experiments for energy savings and performance degradation.
The tables show the experimental results for running the NAS parallel benchmarks
-on different number of nodes. The experiments show that the algorithm
-significantly reduces the energy consumption (up to \np[\%]{35}) and tries to
+on different numbers of nodes. The experiments show that the algorithm
+significantly reduces the energy consumption (up to \np[\%]{34}) and tries to
limit the performance degradation. They also show that the energy saving
percentage decreases when the number of computing nodes increases. This
reduction is due to the increase of the communication times compared to the
increase. While for the EP and SP benchmarks, the energy saving percentage is
not affected by the increase of the number of computing nodes, because in these
benchmarks there are little or no communications. Finally, the energy saving of
-the GC benchmark significantly decrease when the number of nodes increase
+the CG benchmark significantly decreases when the number of nodes increase
because this benchmark has more communications than the others. The second plot
shows that the performance degradation percentages of most of the benchmarks
decrease when they run on a big number of nodes because they spend more time
iterative applications running over heterogeneous platforms. To evaluate the
proposed method, it was applied on the NAS parallel benchmarks and executed over
a heterogeneous platform simulated by SimGrid. The results of the experiments
-showed that the algorithm reduces up to \np[\%]{35} the energy consumption of a
+showed that the algorithm reduces up to \np[\%]{34} the energy consumption of a
message passing iterative method while limiting the degradation of the
performance. The algorithm also selects different scaling factors according to
the percentage of the computing and communication times, and according to the
\section*{Acknowledgment}
-This work has been partially supported by the Labex
-ACTION project (contract ``ANR-11-LABX-01-01''). As a PhD student,
-Mr. Ahmed Fanfakh, would like to thank the University of
-Babylon (Iraq) for supporting his work.
-
+This work has been partially supported by the Labex ACTION project (contract
+``ANR-11-LABX-01-01''). Computations have been performed on the supercomputer
+facilities of the Mésocentre de calcul de Franche-Comté. As a PhD student,
+Mr. Ahmed Fanfakh, would like to thank the University of Babylon (Iraq) for
+supporting his work.
% trigger a \newpage just before the given reference
% number - used to balance the columns on the last page
%!PS-Adobe-2.0 EPSF-2.0
%%Title: energy.eps
%%Creator: gnuplot 4.6 patchlevel 0
-%%CreationDate: Thu Nov 6 09:05:32 2014
+%%CreationDate: Thu Feb 19 16:44:03 2015
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 320 239
%%EndComments
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Nov 6 09:05:32 2014)
+ /CreationDate (Thu Feb 19 16:44:03 2015)
/DOCINFO pdfmark
end
} ifelse
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 940 M
+602 948 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
-518 940 M
+518 948 M
( 5) Rshow
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 1291 M
+602 1308 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 1643 M
+602 1668 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 1994 M
+602 2028 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 2346 M
+602 2387 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 2697 M
+602 2747 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 3049 M
+602 3107 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 3400 M
+602 3467 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-1.000 UL
-LTb
-655 3338 N
-0 210 V
-4438 0 V
-0 -210 V
--4438 0 V
-Z stroke
% Begin plot #1
1.500 UP
2.000 UL
LT0
0.00 0.00 1.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-1289 3443 M
+1259 3443 M
(CG) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.00 0.00 1.00 C 739 3443 M
+0.00 0.00 1.00 C 709 3443 M
298 0 V
-725 2989 M
-846 2785 L
-242 -348 V
-484 -713 V
-969 -564 V
-4479 870 L
-725 2989 Box
-846 2785 Box
-1088 2437 Box
-1572 1724 Box
-2541 1160 Box
-4479 870 Box
-888 3443 Box
+725 3047 M
+846 2838 L
+242 -357 V
+484 -730 V
+969 -578 V
+4479 876 L
+725 3047 Box
+846 2838 Box
+1088 2481 Box
+1572 1751 Box
+2541 1173 Box
+4479 876 Box
+858 3443 Box
% End plot #1
% Begin plot #2
1.500 UP
LT0
1.00 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-1923 3443 M
+1893 3443 M
(MG) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-1.00 0.00 0.00 C 1373 3443 M
+1.00 0.00 0.00 C 1343 3443 M
298 0 V
-725 3074 M
-846 2963 L
-242 -90 V
-484 -251 V
+725 3134 M
+846 3020 L
+242 -93 V
+484 -256 V
969 24 V
-4479 1908 L
-725 3074 TriD
-846 2963 TriD
-1088 2873 TriD
-1572 2622 TriD
-2541 2646 TriD
-4479 1908 TriD
-1522 3443 TriD
+4479 1940 L
+725 3134 TriD
+846 3020 TriD
+1088 2927 TriD
+1572 2671 TriD
+2541 2695 TriD
+4479 1940 TriD
+1492 3443 TriD
% End plot #2
% Begin plot #3
1.500 UP
LT0
0.50 0.00 0.50 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-2557 3443 M
+2527 3443 M
(EP) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.50 0.00 0.50 C 2007 3443 M
+0.50 0.00 0.50 C 1977 3443 M
298 0 V
-725 2474 M
+725 2519 M
121 15 V
242 -13 V
484 9 V
969 10 V
1938 -2 V
-725 2474 Star
-846 2489 Star
-1088 2476 Star
-1572 2485 Star
-2541 2495 Star
-4479 2493 Star
-2156 3443 Star
+725 2519 Star
+846 2534 Star
+1088 2521 Star
+1572 2530 Star
+2541 2540 Star
+4479 2538 Star
+2126 3443 Star
% End plot #3
% Begin plot #4
1.500 UP
LT0
0.18 0.31 0.31 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-3191 3443 M
+3161 3443 M
(LU) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.18 0.31 0.31 C 2641 3443 M
+0.18 0.31 0.31 C 2611 3443 M
298 0 V
-725 2979 M
-846 2573 L
-242 40 V
-484 -365 V
-969 -116 V
-4479 1760 L
-725 2979 TriUF
-846 2573 TriUF
-1088 2613 TriUF
-1572 2248 TriUF
-2541 2132 TriUF
-4479 1760 TriUF
-2790 3443 TriUF
+725 3036 M
+846 2620 L
+242 41 V
+484 -374 V
+969 -118 V
+4479 1788 L
+725 3036 TriUF
+846 2620 TriUF
+1088 2661 TriUF
+1572 2287 TriUF
+2541 2169 TriUF
+4479 1788 TriUF
+2760 3443 TriUF
% End plot #4
% Begin plot #5
1.500 UP
LT0
0.18 0.55 0.34 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-3825 3443 M
+3795 3443 M
(BT) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.18 0.55 0.34 C 3275 3443 M
+0.18 0.55 0.34 C 3245 3443 M
298 0 V
-725 3074 M
-876 2861 L
-212 184 V
-606 -23 V
-847 -183 V
-4964 2218 L
-725 3074 BoxF
-876 2861 BoxF
-1088 3045 BoxF
-1694 3022 BoxF
-2541 2839 BoxF
-4964 2218 BoxF
-3424 3443 BoxF
+725 3133 M
+876 2915 L
+212 189 V
+606 -24 V
+847 -187 V
+4964 2257 L
+725 3133 BoxF
+876 2915 BoxF
+1088 3104 BoxF
+1694 3080 BoxF
+2541 2893 BoxF
+4964 2257 BoxF
+3394 3443 BoxF
% End plot #5
% Begin plot #6
1.500 UP
LT0
0.85 0.65 0.13 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-4459 3443 M
+4429 3443 M
(SP) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.85 0.65 0.13 C 3909 3443 M
+0.85 0.65 0.13 C 3879 3443 M
298 0 V
-725 3064 M
-876 2327 L
-212 -158 V
+725 3123 M
+876 2368 L
+212 -161 V
606 17 V
-847 149 V
-2423 132 V
-725 3064 Circle
-876 2327 Circle
-1088 2169 Circle
-1694 2186 Circle
-2541 2335 Circle
-4964 2467 Circle
-4058 3443 Circle
+847 152 V
+2423 136 V
+725 3123 Circle
+876 2368 Circle
+1088 2207 Circle
+1694 2224 Circle
+2541 2376 Circle
+4964 2512 Circle
+4028 3443 Circle
% End plot #6
% Begin plot #7
1.500 UP
LT0
0.55 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-5093 3443 M
+5063 3443 M
(FT) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.55 0.00 0.00 C 4543 3443 M
+0.55 0.00 0.00 C 4513 3443 M
298 0 V
-725 3089 M
-846 2769 L
-242 40 V
-484 -597 V
-969 -207 V
-4479 1492 L
-725 3089 CircleF
-846 2769 CircleF
-1088 2809 CircleF
-1572 2212 CircleF
-2541 2005 CircleF
-4479 1492 CircleF
-4692 3443 CircleF
+725 3148 M
+846 2821 L
+242 41 V
+484 -612 V
+969 -212 V
+4479 1513 L
+725 3148 CircleF
+846 2821 CircleF
+1088 2862 CircleF
+1572 2250 CircleF
+2541 2038 CircleF
+4479 1513 CircleF
+4662 3443 CircleF
% End plot #7
1.000 UL
LTb
grestore
end
showpage
-%%Trailer
-%%DocumentFonts: Helvetica
%!PS-Adobe-2.0 EPSF-2.0
%%Title: heter2.eps
%%Creator: gnuplot 4.6 patchlevel 0
-%%CreationDate: Thu Nov 6 10:45:38 2014
+%%CreationDate: Thu Feb 19 12:00:04 2015
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 320 239
%%EndComments
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Nov 6 10:45:38 2014)
+ /CreationDate (Thu Feb 19 12:00:04 2015)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
/Helvetica findfont 220 scalefont setfont
4496 3443 M
-(Normalize performance) Rshow
+(Normalized performance) Rshow
/Helvetica findfont 140 scalefont setfont
LT2
LC2 setrgbcolor
grestore
end
showpage
-\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0%%Trailer
-%%DocumentFonts: Helvetica
%!PS-Adobe-2.0 EPSF-2.0
%%Title: per_deg.eps
%%Creator: gnuplot 4.6 patchlevel 0
-%%CreationDate: Thu Nov 6 09:06:50 2014
+%%CreationDate: Thu Feb 19 16:42:46 2015
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 320 239
%%EndComments
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Nov 6 09:06:50 2014)
+ /CreationDate (Thu Feb 19 16:42:46 2015)
/DOCINFO pdfmark
end
} ifelse
BackgroundColor 0 lt 3 1 roll 0 lt exch 0 lt or or not {BackgroundColor C 1.000 0 0 5400.00 3780.00 BoxColFill} if
1.000 UL
LTb
-602 662 M
+602 674 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
-518 662 M
+518 674 M
( 0) Rshow
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 1399 M
+602 1538 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 2136 M
+602 2402 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 2874 M
+602 3266 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-602 3611 M
-63 0 V
-4482 0 R
--63 0 V
-/Helvetica findfont 190 scalefont setfont
--4566 0 R
-( 20) Rshow
-/Helvetica findfont 140 scalefont setfont
-1.000 UL
-LTb
604 588 M
0 63 V
0 2960 R
LTb
1.000 UP
/Helvetica findfont 190 scalefont setfont
-649 728 M
+649 752 M
( ) Lshow
/Helvetica findfont 140 scalefont setfont
1.000 UL
LTb
-1.000 UL
-LTb
-655 3338 N
-0 210 V
-4438 0 V
-0 -210 V
--4438 0 V
-Z stroke
% Begin plot #1
1.500 UP
2.000 UL
LT0
0.00 0.00 1.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-1289 3443 M
+1259 3443 M
(CG) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.00 0.00 1.00 C 739 3443 M
+0.00 0.00 1.00 C 709 3443 M
298 0 V
-725 1653 M
-121 59 V
-242 361 V
-484 -629 V
-2541 910 L
-4479 824 L
-725 1653 Box
-846 1712 Box
-1088 2073 Box
-1572 1444 Box
-2541 910 Box
-4479 824 Box
-888 3443 Box
+725 1835 M
+121 70 V
+242 423 V
+484 -737 V
+2541 965 L
+4479 865 L
+725 1835 Box
+846 1905 Box
+1088 2328 Box
+1572 1591 Box
+2541 965 Box
+4479 865 Box
+858 3443 Box
% End plot #1
% Begin plot #2
1.500 UP
LT0
1.00 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-1923 3443 M
+1893 3443 M
(MG) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-1.00 0.00 0.00 C 1373 3443 M
+1.00 0.00 0.00 C 1343 3443 M
298 0 V
-725 1301 M
-121 307 V
-242 -54 V
-484 413 V
-969 812 V
-4479 2193 L
-725 1301 TriD
-846 1608 TriD
-1088 1554 TriD
-1572 1967 TriD
-2541 2779 TriD
-4479 2193 TriD
-1522 3443 TriD
+725 1424 M
+121 359 V
+242 -63 V
+484 483 V
+969 951 V
+4479 2469 L
+725 1424 TriD
+846 1783 TriD
+1088 1720 TriD
+1572 2203 TriD
+2541 3154 TriD
+4479 2469 TriD
+1492 3443 TriD
% End plot #2
% Begin plot #3
1.500 UP
LT0
0.50 0.00 0.50 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-2557 3443 M
+2527 3443 M
(EP) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.50 0.00 0.50 C 2007 3443 M
+0.50 0.00 0.50 C 1977 3443 M
298 0 V
-725 1110 M
-846 735 L
-242 10 V
-484 -80 V
-969 456 V
-4479 666 L
-725 1110 Star
-846 735 Star
-1088 745 Star
-1572 665 Star
-2541 1121 Star
-4479 666 Star
-2156 3443 Star
+725 1199 M
+846 760 L
+242 12 V
+484 -94 V
+969 534 V
+4479 680 L
+725 1199 Star
+846 760 Star
+1088 772 Star
+1572 678 Star
+2541 1212 Star
+4479 680 Star
+2126 3443 Star
% End plot #3
% Begin plot #4
1.500 UP
LT0
0.18 0.31 0.31 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-3191 3443 M
+3161 3443 M
(LU) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.18 0.31 0.31 C 2641 3443 M
+0.18 0.31 0.31 C 2611 3443 M
298 0 V
-725 1571 M
-846 663 L
-242 966 V
-484 -605 V
-969 180 V
-4479 1010 L
-725 1571 TriUF
-846 663 TriUF
-1088 1629 TriUF
-1572 1024 TriUF
-2541 1204 TriUF
-4479 1010 TriUF
-2790 3443 TriUF
+725 1739 M
+846 676 L
+242 1132 V
+484 -709 V
+969 211 V
+4479 1082 L
+725 1739 TriUF
+846 676 TriUF
+1088 1808 TriUF
+1572 1099 TriUF
+2541 1310 TriUF
+4479 1082 TriUF
+2760 3443 TriUF
% End plot #4
% Begin plot #5
1.500 UP
LT0
0.18 0.55 0.34 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-3825 3443 M
+3795 3443 M
(BT) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.18 0.55 0.34 C 3275 3443 M
+0.18 0.55 0.34 C 3245 3443 M
298 0 V
-725 1913 M
-151 -87 V
-212 -309 V
-606 5 V
-847 951 V
-4964 851 L
-725 1913 BoxF
-876 1826 BoxF
-1088 1517 BoxF
-1694 1522 BoxF
-2541 2473 BoxF
-4964 851 BoxF
-3424 3443 BoxF
+725 2140 M
+876 2038 L
+212 -362 V
+606 6 V
+847 1114 V
+4964 896 L
+725 2140 BoxF
+876 2038 BoxF
+1088 1676 BoxF
+1694 1682 BoxF
+2541 2796 BoxF
+4964 896 BoxF
+3394 3443 BoxF
% End plot #5
% Begin plot #6
1.500 UP
LT0
0.85 0.65 0.13 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-4459 3443 M
+4429 3443 M
(SP) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.85 0.65 0.13 C 3909 3443 M
+0.85 0.65 0.13 C 3879 3443 M
298 0 V
-725 1501 M
-876 1072 L
-212 154 V
-606 -55 V
-2541 666 L
+725 1658 M
+876 1155 L
+212 180 V
+606 -64 V
+2541 680 L
2423 3 V
-725 1501 Circle
-876 1072 Circle
-1088 1226 Circle
-1694 1171 Circle
-2541 666 Circle
-4964 669 Circle
-4058 3443 Circle
+725 1658 Circle
+876 1155 Circle
+1088 1335 Circle
+1694 1271 Circle
+2541 680 Circle
+4964 683 Circle
+4028 3443 Circle
% End plot #6
% Begin plot #7
1.500 UP
LT0
0.55 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
-5093 3443 M
+5063 3443 M
(FT) Rshow
/Helvetica findfont 140 scalefont setfont
LT0
-0.55 0.00 0.00 C 4543 3443 M
+0.55 0.00 0.00 C 4513 3443 M
298 0 V
-725 809 M
-121 228 V
-242 581 V
-484 -527 V
-969 289 V
-4479 1081 L
-725 809 CircleF
-846 1037 CircleF
-1088 1618 CircleF
-1572 1091 CircleF
-2541 1380 CircleF
-4479 1081 CircleF
-4692 3443 CircleF
+725 847 M
+121 267 V
+242 680 V
+484 -617 V
+969 339 V
+4479 1166 L
+725 847 CircleF
+846 1114 CircleF
+1088 1794 CircleF
+1572 1177 CircleF
+2541 1516 CircleF
+4479 1166 CircleF
+4662 3443 CircleF
% End plot #7
1.000 UL
LTb
grestore
end
showpage
+%%Trailer
+%%DocumentFonts: Helvetica
--- /dev/null
+============================== Standard 1 ==============================
+
+> *** Key Contributions: Please describe the key contributions of the
+ paper or lack thereof. Your comments should be specific and
+ justify your overall recommendation.
+
+This paper presents a new online frequency selecting algorithm for
+distributed iterative applications running on heterogeneous CPU nodes.
+Contrary to previous work (for homogeneous CPU), this heterogeneous
+context implies a vector of scaling factors and "slack times" before
+synchronizing the processes at each iteration. The models and the
+algorithm are clearly presented and detailed, and are validated on
+several benchmarks thanks to a simulator. Comparison with another
+scaling factor selection algorithm (which does not take into account
+communication times and heterogeneity) shows the relevance of this new
+algorithm which manages to significantly reduce the energy consumption
+with acceptable performance overhead.
+
+Overall, this is a very solid work, and the paper is well-written and
+very clear.
+
+The main flaw of this paper is that the evaluation is only done via a
+simulator. As mentioned in future work, evaluations on real
+heterogeneous CPU platforms (with real power measurements) will be
+necessary (as future work) to validate definitely this algorithm and
+the models.
+
+> *** Suggestions for Improvement: Additional comments and suggestions
+ for improvement in the technical content or the presentation.
+ Please be as detailed and constructive as you can be.
+
+The energy and performance models rely on compute-bound programs,
+where the computation time is linearly proportional to the processor
+frequency. Does this apply to all NAS benchmarks ? The authors should
+specify which NAS benchmarks are memory-bound (if any), and how their
+model apply to these memory-bound benchmarks.
+
+Moreover, in section III it seems that the authors assume that the
+communication time (without slack time) is the same for all processors
+provided they have the same communication volume. This could be
+pointed out more clearly in the paper. Also, does this apply to all
+NAS benchmarks? Does it also depends on the placement of the MPI
+processes? I assume that for the same communication volume, the
+communication time will differ whether the processes are on
+neighbouring nodes or are on distant nodes (especially with 128 or 144
+nodes).
+Could the authors discuss in the text?
+
+The authors consider that the communication time only apply to static
+power, which means that no CPU cycle is used for the MPI
+communications. Does this implies specific networks (like Infiniband)
+with RDMA?
+This could be clarified in the paper.
+
+Finally, the algorithm applies to synchronous iterative applications:
+is this the case for all NAS benchmarks evaluated in this paper? This
+could also be specified in the paper.
+
+Figures 2a and 2b : I do not understand why the energy curve in Fig.2b
+does not have the same shape as the one in Fig.2a.
+Could the authors specify this in the text?
+
+Minor comments :
+- The authors could specify in the abstract that "heterogeneous
+ platforms" refer to heterogeneous CPUs (not to CPU-GPU nodes).
+- The terms "in the same direction" (used twice in section IV) are
+ unclear and should be rewritten.
+- Section V.A : replace "because selecting frequency scaling factors
+ higher than the higher bound" by "because selecting frequencies
+ higher than the higher bound"?
+
+> *** Significance: Assess the significance of the topic addressed in
+ the paper.
+
+Excellent (5)
+
+> *** Originality/Novelty (of contribution): How novel are the
+ concepts presented in the paper?
+
+Above average (4)
+
+> *** Technical Soundness: How strong are the techniques and
+ methodologies used in the paper?
+
+Excellent (5)
+
+> *** Overall Recommendation: Your final rating should be consistent
+ with your ratings on previous questions.
+
+Accept (5)
+
+============================== Standard 2 ==============================
+
+> *** Key Contributions: Please describe the key contributions of the
+ paper or lack thereof. Your comments should be specific and
+ justify your overall recommendation.
+
+The paper proposed a frequency selection algorithm for heterogeneous
+platforms. The algorithm proposed the maximum distance between the
+energy consumption and the performance to get the trade off scale
+factor. on This is an interesting paper with good trial to cover many
+factors.
+
+The paper ran NPB benchmarks to verify the algorithm but there is no
+comparison between the results at the the trade-off scale factor and
+those from all other possible scale factors without applying the
+algorithm. Without this, it is not reliable to validate the algorithm.
+
+> *** Suggestions for Improvement: Additional comments and suggestions
+ for improvement in the technical content or the presentation.
+ Please be as detailed and constructive as you can be.
+
+There are too much tables i.e. II-VII in section VI. Better to
+summarize them in a couple of figures.
+
+It is necessary to describe the overhead of the algorithm which is
+missed in the paper.
+
+> *** Significance: Assess the significance of the topic addressed in
+ the paper.
+
+Average (3)
+
+> *** Originality/Novelty (of contribution): How novel are the
+ concepts presented in the paper?
+
+Average (3)
+
+> *** Technical Soundness: How strong are the techniques and
+ methodologies used in the paper?
+
+Acceptable (3)
+
+> *** Overall Recommendation: Your final rating should be consistent
+ with your ratings on previous questions.
+
+Weak Accept (4)
+
+============================== Standard 3 ==============================
+
+> *** Key Contributions: Please describe the key contributions of the
+ paper or lack thereof. Your comments should be specific and
+ justify your overall recommendation.
+
+The paper develops DVFS performance models and an online algorithm to
+optimize time and energy for iterative message passing applications on
+a heterogeneous CPU cluster. An objective function is developed to
+express the time energy tradeoff. Results using a simulated framework
+show worthwhile energy gains for acceptable loss of execution time. A
+comparison with a more general pre-existing algorithm show modest
+improvements in energy and and energy-time tradeoff.
+
+The paper is well-written and is technically sound. Its significance
+is slightly diminished due to the fact that previous work has largely
+dealt with this issue on scenarios that are of stronger interests
+and/or are less specialized.
+
+> *** Suggestions for Improvement: Additional comments and suggestions
+ for improvement in the technical content or the presentation.
+ Please be as detailed and constructive as you can be.
+
+The abstract would be sharpened it it contained numbers relating to
+the performance degradation and comparison.
+
+III.A. The modelling of the communication time being independent of
+the frequency is questionable, even if it is backed up by a 10year old
+reference. While slack time is not affected, my own research has shown
+that communication bandwidth does clearly increase with frequency,
+albeit in a sub-linear fashion. The use of taking the minimum for
+communication time (3) needs better explanation, as it is
+counter-intuitive.
+
+I would like to some explanation as to why it takes so many iterations
+for the algorithm to select the best vector, and whether this can be
+improved. While the NAS benchmarks have a standard number of
+iterations, it would be helpful to the reader to indicate what these
+are in VI.
+
+The results on a real heterogeneous platform in the future work will
+be interesting.
+
+There are a number of small grammatical errors:
+
+p2. ``to satisfy some objectives while taking into account all the
+constraints,'': a comma is needed before `while' to match the 2nd
+
+Fig2(b) normalize -> normalized
+
+p4 ``following the same direction'': use `follow'
+
+Alg1: F_diff_i: difference -> differences
+
+p6: on all left frequencies -> on all remaining frequencies
+
+while it lowers the frequency of all other nodes ->
+while it lowers the frequencies of all other nodes
+
+``the proposed algorithm is not an exact method it does'':
+put a : before it
+
+p8: on different number of nodes -> on different numbers of nodes
+the GC benchmark significantly decrease ->
+the CG benchmark significantly decreases
+
+> *** Significance: Assess the significance of the topic addressed in
+ the paper.
+
+Above average (4)
+
+> *** Originality/Novelty (of contribution): How novel are the
+ concepts presented in the paper?
+
+Above average (4)
+
+> *** Technical Soundness: How strong are the techniques and
+ methodologies used in the paper?
+
+Excellent (5)
+
+> *** Overall Recommendation: Your final rating should be consistent
+ with your ratings on previous questions.
+
+Strong Accept (6)
+
+============================== Standard 4 ==============================
+
+> *** Key Contributions: Please describe the key contributions of the
+ paper or lack thereof. Your comments should be specific and
+ justify your overall recommendation.
+
+In this paper, a new online frequency selecting algorithm for
+heterogeneous platforms is presented. It selects the frequencies and
+tries to give the best trade-off between energy saving and performance
+degradation, for each node computing the message passing iterative
+application. The algorithm has a small overhead and works without
+training or profiling. It uses a new energy model for message passing
+iterative applications running on a het- erogeneous platform. The
+proposed algorithm is evaluated on the SimGrid simulator while running
+the NAS parallel benchmarks. The experiments show that it reduces the
+energy consumption by up to 35 % while limiting the performance
+degradation as much as possible. Finally, the algorithm is compared to
+an existing method, the comparison results showing that it outperforms
+the latter.
+
+> *** Suggestions for Improvement: Additional comments and suggestions
+ for improvement in the technical content or the presentation.
+ Please be as detailed and constructive as you can be.
+
+I did not see every clearly that if the proposed online algorithm can
+achieve the optimal selection. If only the heustrics, then how close
+to the optimal? I would like to see more theoretical or experimental
+results if possible since the authors claims the "the best trade-off
+between energy saving and performance degradation".
+
+> *** Significance: Assess the significance of the topic addressed in
+ the paper.
+
+Excellent (5)
+
+> *** Originality/Novelty (of contribution): How novel are the
+ concepts presented in the paper?
+
+Excellent (5)
+
+> *** Technical Soundness: How strong are the techniques and
+ methodologies used in the paper?
+
+Strong (4)
+
+> *** Overall Recommendation: Your final rating should be consistent
+ with your ratings on previous questions.
+
+Strong Accept (6)
+
+============================== Standard 5 ==============================
+
+> *** Key Contributions: Please describe the key contributions of the
+ paper or lack thereof. Your comments should be specific and
+ justify your overall recommendation.
+
+The paper considers the DVFS technique and presents an energy model
+for DVFS systems that also takes the communication time into
+consideration. An new algorithm for selecting the scaling factors is
+presented. The algorithm uses a vector of scaling factors, one for
+each node, and determines the scaling factors such that best trade-off
+between minimizing the energy consumption and maximizing the
+performance for a synchronous iterative algorithm is reached. The
+algorithm works during execution time and uses the first interation
+step for collecting the information required for the scaling factor
+selection. An experimental evaluation is given using the SimGrid
+environment.
+
+The paper is well written and structured and should be accepted. It
+is solid work and provides new contributions by extending earlier
+energy models with communication time concerns and proposes a new
+algorithm for DVFS control.
+
+> *** Suggestions for Improvement: Additional comments and suggestions
+ for improvement in the technical content or the presentation.
+ Please be as detailed and constructive as you can be.
+
+Algorithm 1 in Section V could be explained in more detail. As far as
+I can see, it tests all possible frequencies or scaling factors for
+the different nodes and selects the best as indicated by the model. I
+was wondering whether all combinations of scaling factors are tested
+or whether this is not necessary because of the behavior of the
+communication.
+The accuracy of the frequency selection depends on the accuracy of the
+model used for the computation of the scaling factors. It would be
+interesting to see how accurate the model is for real systems.
+However, I see that this might be difficult to capture in practice.
+
+> *** Significance: Assess the significance of the topic addressed in
+ the paper.
+
+Excellent (5)
+
+> *** Originality/Novelty (of contribution): How novel are the
+ concepts presented in the paper?
+
+Above average (4)
+
+> *** Technical Soundness: How strong are the techniques and
+ methodologies used in the paper?
+
+Excellent (5)
+
+> *** Overall Recommendation: Your final rating should be consistent
+ with your ratings on previous questions.
+
+Accept (5)