From: jean-claude Date: Mon, 21 Sep 2015 07:41:02 +0000 (+0200) Subject: Merge branch 'master' of ssh://info.iut-bm.univ-fcomte.fr/mpi-energy2 X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy2.git/commitdiff_plain/8114f38de9c2d7cc13b1a3582dbb60948732688b?hp=af7fb590e267eb74efdf62ecceb980c1ce7f3eb8 Merge branch 'master' of ssh://info.iut-bm.univ-fcomte.fr/mpi-energy2 merge --- diff --git a/Heter_paper.tex b/Heter_paper.tex index a9a1e6d..1ac7dcf 100644 --- a/Heter_paper.tex +++ b/Heter_paper.tex @@ -93,17 +93,17 @@ the performance of an application must be selected. In this paper, a new online frequency selecting algorithm for heterogeneous - platforms is presented. It selects the frequencies and tries to give the best + platforms (heterogeneous CPUs) is presented. It selects the frequencies and tries to give the best trade-off between energy saving and performance degradation, for each node computing the message passing iterative application. The algorithm has a small overhead and works without training or profiling. It uses a new energy model for message passing iterative applications running on a heterogeneous platform. The proposed algorithm is evaluated on the SimGrid simulator while running the NAS parallel benchmarks. The experiments show that it reduces the - energy consumption by up to \np[\%]{35} while limiting the performance + energy consumption by up to \np[\%]{34} while limiting the performance degradation as much as possible. Finally, the algorithm is compared to an - existing method, the comparison results showing that it outperforms the - latter. + existing method, the comparison results show that it outperforms the + latter, on average it saves \np[\%]{4} more energy while keeping the same performance. \end{abstract} @@ -171,7 +171,7 @@ consumption of the processor. DVFS is also allowed in GPUs to achieve the same goal. Reducing the frequency of a processor lowers its number of FLOPS and may degrade the performance of the application running on that processor, especially if it is compute bound. Therefore selecting the appropriate frequency for a -processor to satisfy some objectives while taking into account all the +processor to satisfy some objectives, while taking into account all the constraints, is not a trivial operation. Many researchers used different strategies to tackle this problem. Some of them developed online methods that compute the new frequency while executing the application, such @@ -500,14 +500,13 @@ Where $\Ereduced$ and $\Eoriginal$ are computed using (\ref{eq:energy}) and $\Tnew$ and $\Told$ are computed as in (\ref{eq:pnorm}). While the main goal is to optimize the energy and execution time at the same -time, the normalized energy and execution time curves are not in the same -direction. According to the equations~(\ref{eq:pnorm}) and (\ref{eq:enorm}), the +time, the normalized energy and execution time curves do not evolve (increase/decrease) in the same way. According to the equations~(\ref{eq:pnorm}) and (\ref{eq:enorm}), the vector of frequency scaling factors $S_1,S_2,\dots,S_N$ reduce both the energy and the execution time simultaneously. But the main objective is to produce maximum energy reduction with minimum execution time reduction. This problem can be solved by making the optimization process for energy and -execution time following the same direction. Therefore, the equation of the +execution time follow the same evolution according to the vector of scaling factors. Therefore, the equation of the normalized execution time is inverted which gives the normalized performance equation, as follows: \begin{multline} @@ -562,7 +561,7 @@ in~\cite{Zhuo_Energy.efficient.Dynamic.Task.Scheduling,Rauber_Analytical.Modelin \item[{$\Fmax[i]$}] array of the maximum frequencies for all nodes. \item[{$\Pd[i]$}] array of the dynamic powers for all nodes. \item[{$\Ps[i]$}] array of the static powers for all nodes. - \item[{$\Fdiff[i]$}] array of the difference between two successive frequencies for all nodes. + \item[{$\Fdiff[i]$}] array of the differences between two successive frequencies for all nodes. \end{description} \Ensure $\Sopt[1],\Sopt[2] \dots, \Sopt[N]$ is a vector of optimal scaling factors @@ -673,12 +672,12 @@ ascending order and the frequencies of the faster nodes are scaled down according to the computed initial frequency scaling factors. The resulting new frequencies are highlighted in Figure~\ref{fig:st_freq}. This set of frequencies can be considered as a higher bound for the search space of the -optimal vector of frequencies because selecting frequency scaling factors higher +optimal vector of frequencies because selecting scaling factors higher than the higher bound will not improve the performance of the application and it will increase its overall energy consumption. Therefore the algorithm that selects the frequency scaling factors starts the search method from these initial frequencies and takes a downward search direction toward lower -frequencies. The algorithm iterates on all left frequencies, from the higher +frequencies. The algorithm iterates on all remaining frequencies, from the higher bound until all nodes reach their minimum frequencies, to compute their overall energy consumption and performance, and select the optimal frequency scaling factors vector. At each iteration the algorithm determines the slowest node @@ -700,7 +699,9 @@ power of scaled down nodes are lower than the slowest node. In other words, until they reach the higher bound. It can also be noticed that the higher the difference between the faster nodes and the slower nodes is, the bigger the maximum distance between the energy curve and the performance curve is while the -scaling factors are varying which results in bigger energy savings. +scaling factors are varying which results in bigger energy savings. +Finally, in a homogeneous platform the energy consumption is increased when the scaling factor is very high. +Indeed, the dynamic energy saved by reducing the frequency of the processor is compensated by the significant increase of the execution time and thus the increased of the static energy. On the other hand, in a heterogeneous platform this is not the case. \subsection{The evaluation of the proposed algorithm} \label{sec.verif.algo} @@ -719,7 +720,7 @@ parallel benchmarks NPB v3.3 \cite{NAS.Parallel.Benchmarks}, running class B on very precise, the maximum normalized difference between the predicted execution time and the real execution time is equal to 0.03 for all the NAS benchmarks. -Since the proposed algorithm is not an exact method it does not test all the +Since the proposed algorithm is not an exact method, it does not test all the possible solutions (vectors of scaling factors) in the search space. To prove its efficiency, it was compared on small instances to a brute force search algorithm that tests all the possible solutions. The brute force algorithm was @@ -761,7 +762,8 @@ frequency scaling factors that gives the results of the next sections. \label{sec.expe} To evaluate the efficiency and the overall energy consumption reduction of -Algorithm~\ref{HSA}, it was applied to the NAS parallel benchmarks NPB v3.3. The +Algorithm~\ref{HSA}, it was applied to the NAS parallel benchmarks NPB v3.3 which +is composed of synchronous message passing applications. The experiments were executed on the simulator SimGrid/SMPI which offers easy tools to create a heterogeneous platform and run message passing applications over it. The heterogeneous platform that was used in the experiments, had one core per @@ -790,40 +792,16 @@ The proposed algorithm was applied to the seven parallel NAS benchmarks (EP, CG, MG, FT, BT, LU and SP) and the benchmarks were executed with the three classes: A, B and C. However, due to the lack of space in this paper, only the results of the biggest class, C, are presented while being run on different number of -nodes, ranging from 4 to 128 or 144 nodes depending on the benchmark being +nodes, ranging from 8 to 128 or 144 nodes depending on the benchmark being executed. Indeed, the benchmarks CG, MG, LU, EP and FT had to be executed on 1, 2, 4, 8, 16, 32, 64, or 128 nodes. The other benchmarks such as BT and SP had to be executed on 1, 4, 9, 16, 36, 64, or 144 nodes. \begin{table}[!t] - \caption{Running NAS benchmarks on 4 nodes } - % title of Table - \centering - \begin{tabular}{|*{7}{r|}} - \hline - \hspace{-2.2084pt}% - Program & Execution & Energy & Energy & Performance & Distance \\ - name & time/s & consumption/J & saving\% & degradation\% & \\ - \hline - CG & 64.64 & 3560.39 & 34.16 & 6.72 & 27.44 \\ - \hline - MG & 18.89 & 1074.87 & 35.37 & 4.34 & 31.03 \\ - \hline - EP & 79.73 & 5521.04 & 26.83 & 3.04 & 23.79 \\ - \hline - LU & 308.65 & 21126.00 & 34.00 & 6.16 & 27.84 \\ - \hline - BT & 360.12 & 21505.55 & 35.36 & 8.49 & 26.87 \\ - \hline - SP & 234.24 & 13572.16 & 35.22 & 5.70 & 29.52 \\ - \hline - FT & 81.58 & 4151.48 & 35.58 & 0.99 & 34.59 \\ - \hline - \end{tabular} - \label{table:res_4n} + % \end{table} - \medskip + % \begin{table}[!t] \caption{Running NAS benchmarks on 8 and 9 nodes } % title of Table @@ -983,13 +961,13 @@ The overall energy consumption was computed for each instance according to the energy consumption model (\ref{eq:energy}), with and without applying the algorithm. The execution time was also measured for all these experiments. Then, the energy saving and performance degradation percentages were computed for each -instance. The results are presented in Tables~\ref{table:res_4n}, +instance. The results are presented in Tables \ref{table:res_8n}, \ref{table:res_16n}, \ref{table:res_32n}, \ref{table:res_64n} and \ref{table:res_128n}. All these results are the average values from many experiments for energy savings and performance degradation. The tables show the experimental results for running the NAS parallel benchmarks -on different number of nodes. The experiments show that the algorithm -significantly reduces the energy consumption (up to \np[\%]{35}) and tries to +on different numbers of nodes. The experiments show that the algorithm +significantly reduces the energy consumption (up to \np[\%]{34}) and tries to limit the performance degradation. They also show that the energy saving percentage decreases when the number of computing nodes increases. This reduction is due to the increase of the communication times compared to the @@ -1019,7 +997,7 @@ of the benchmarks MG, LU, BT and FT decrease linearly when the number of nodes increase. While for the EP and SP benchmarks, the energy saving percentage is not affected by the increase of the number of computing nodes, because in these benchmarks there are little or no communications. Finally, the energy saving of -the GC benchmark significantly decrease when the number of nodes increase +the CG benchmark significantly decreases when the number of nodes increase because this benchmark has more communications than the others. The second plot shows that the performance degradation percentages of most of the benchmarks decrease when they run on a big number of nodes because they spend more time @@ -1228,7 +1206,7 @@ new energy model for measuring and predicting the energy of distributed iterative applications running over heterogeneous platforms. To evaluate the proposed method, it was applied on the NAS parallel benchmarks and executed over a heterogeneous platform simulated by SimGrid. The results of the experiments -showed that the algorithm reduces up to \np[\%]{35} the energy consumption of a +showed that the algorithm reduces up to \np[\%]{34} the energy consumption of a message passing iterative method while limiting the degradation of the performance. The algorithm also selects different scaling factors according to the percentage of the computing and communication times, and according to the @@ -1248,11 +1226,11 @@ the iterative system. \section*{Acknowledgment} -This work has been partially supported by the Labex -ACTION project (contract ``ANR-11-LABX-01-01''). As a PhD student, -Mr. Ahmed Fanfakh, would like to thank the University of -Babylon (Iraq) for supporting his work. - +This work has been partially supported by the Labex ACTION project (contract +``ANR-11-LABX-01-01''). Computations have been performed on the supercomputer +facilities of the Mésocentre de calcul de Franche-Comté. As a PhD student, +Mr. Ahmed Fanfakh, would like to thank the University of Babylon (Iraq) for +supporting his work. % trigger a \newpage just before the given reference % number - used to balance the columns on the last page diff --git a/edas.paper-1570085255.pdf b/edas.paper-1570085255.pdf new file mode 100644 index 0000000..c5db537 Binary files /dev/null and b/edas.paper-1570085255.pdf differ diff --git a/fig/avg_eq.pdf b/fig/avg_eq.pdf deleted file mode 100644 index 26c5a4a..0000000 Binary files a/fig/avg_eq.pdf and /dev/null differ diff --git a/fig/avg_neq.pdf b/fig/avg_neq.pdf deleted file mode 100644 index 028c084..0000000 Binary files a/fig/avg_neq.pdf and /dev/null differ diff --git a/fig/energy.eps b/fig/energy.eps index c45034b..2a54918 100644 --- a/fig/energy.eps +++ b/fig/energy.eps @@ -1,7 +1,7 @@ %!PS-Adobe-2.0 EPSF-2.0 %%Title: energy.eps %%Creator: gnuplot 4.6 patchlevel 0 -%%CreationDate: Thu Nov 6 09:05:32 2014 +%%CreationDate: Thu Feb 19 16:44:03 2015 %%DocumentFonts: (atend) %%BoundingBox: 50 50 320 239 %%EndComments @@ -432,7 +432,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Nov 6 09:05:32 2014) + /CreationDate (Thu Feb 19 16:44:03 2015) /DOCINFO pdfmark end } ifelse @@ -460,17 +460,17 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 940 M +602 948 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont -518 940 M +518 948 M ( 5) Rshow /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 1291 M +602 1308 M 63 0 V 4482 0 R -63 0 V @@ -480,7 +480,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 1643 M +602 1668 M 63 0 V 4482 0 R -63 0 V @@ -490,7 +490,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 1994 M +602 2028 M 63 0 V 4482 0 R -63 0 V @@ -500,7 +500,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 2346 M +602 2387 M 63 0 V 4482 0 R -63 0 V @@ -510,7 +510,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 2697 M +602 2747 M 63 0 V 4482 0 R -63 0 V @@ -520,7 +520,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 3049 M +602 3107 M 63 0 V 4482 0 R -63 0 V @@ -530,7 +530,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 3400 M +602 3467 M 63 0 V 4482 0 R -63 0 V @@ -669,39 +669,31 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -1.000 UL -LTb -655 3338 N -0 210 V -4438 0 V -0 -210 V --4438 0 V -Z stroke % Begin plot #1 1.500 UP 2.000 UL LT0 0.00 0.00 1.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -1289 3443 M +1259 3443 M (CG) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.00 0.00 1.00 C 739 3443 M +0.00 0.00 1.00 C 709 3443 M 298 0 V -725 2989 M -846 2785 L -242 -348 V -484 -713 V -969 -564 V -4479 870 L -725 2989 Box -846 2785 Box -1088 2437 Box -1572 1724 Box -2541 1160 Box -4479 870 Box -888 3443 Box +725 3047 M +846 2838 L +242 -357 V +484 -730 V +969 -578 V +4479 876 L +725 3047 Box +846 2838 Box +1088 2481 Box +1572 1751 Box +2541 1173 Box +4479 876 Box +858 3443 Box % End plot #1 % Begin plot #2 1.500 UP @@ -709,25 +701,25 @@ LT0 LT0 1.00 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -1923 3443 M +1893 3443 M (MG) Rshow /Helvetica findfont 140 scalefont setfont LT0 -1.00 0.00 0.00 C 1373 3443 M +1.00 0.00 0.00 C 1343 3443 M 298 0 V -725 3074 M -846 2963 L -242 -90 V -484 -251 V +725 3134 M +846 3020 L +242 -93 V +484 -256 V 969 24 V -4479 1908 L -725 3074 TriD -846 2963 TriD -1088 2873 TriD -1572 2622 TriD -2541 2646 TriD -4479 1908 TriD -1522 3443 TriD +4479 1940 L +725 3134 TriD +846 3020 TriD +1088 2927 TriD +1572 2671 TriD +2541 2695 TriD +4479 1940 TriD +1492 3443 TriD % End plot #2 % Begin plot #3 1.500 UP @@ -735,25 +727,25 @@ LT0 LT0 0.50 0.00 0.50 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -2557 3443 M +2527 3443 M (EP) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.50 0.00 0.50 C 2007 3443 M +0.50 0.00 0.50 C 1977 3443 M 298 0 V -725 2474 M +725 2519 M 121 15 V 242 -13 V 484 9 V 969 10 V 1938 -2 V -725 2474 Star -846 2489 Star -1088 2476 Star -1572 2485 Star -2541 2495 Star -4479 2493 Star -2156 3443 Star +725 2519 Star +846 2534 Star +1088 2521 Star +1572 2530 Star +2541 2540 Star +4479 2538 Star +2126 3443 Star % End plot #3 % Begin plot #4 1.500 UP @@ -761,25 +753,25 @@ LT0 LT0 0.18 0.31 0.31 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -3191 3443 M +3161 3443 M (LU) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.18 0.31 0.31 C 2641 3443 M +0.18 0.31 0.31 C 2611 3443 M 298 0 V -725 2979 M -846 2573 L -242 40 V -484 -365 V -969 -116 V -4479 1760 L -725 2979 TriUF -846 2573 TriUF -1088 2613 TriUF -1572 2248 TriUF -2541 2132 TriUF -4479 1760 TriUF -2790 3443 TriUF +725 3036 M +846 2620 L +242 41 V +484 -374 V +969 -118 V +4479 1788 L +725 3036 TriUF +846 2620 TriUF +1088 2661 TriUF +1572 2287 TriUF +2541 2169 TriUF +4479 1788 TriUF +2760 3443 TriUF % End plot #4 % Begin plot #5 1.500 UP @@ -787,25 +779,25 @@ LT0 LT0 0.18 0.55 0.34 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -3825 3443 M +3795 3443 M (BT) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.18 0.55 0.34 C 3275 3443 M +0.18 0.55 0.34 C 3245 3443 M 298 0 V -725 3074 M -876 2861 L -212 184 V -606 -23 V -847 -183 V -4964 2218 L -725 3074 BoxF -876 2861 BoxF -1088 3045 BoxF -1694 3022 BoxF -2541 2839 BoxF -4964 2218 BoxF -3424 3443 BoxF +725 3133 M +876 2915 L +212 189 V +606 -24 V +847 -187 V +4964 2257 L +725 3133 BoxF +876 2915 BoxF +1088 3104 BoxF +1694 3080 BoxF +2541 2893 BoxF +4964 2257 BoxF +3394 3443 BoxF % End plot #5 % Begin plot #6 1.500 UP @@ -813,25 +805,25 @@ LT0 LT0 0.85 0.65 0.13 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -4459 3443 M +4429 3443 M (SP) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.85 0.65 0.13 C 3909 3443 M +0.85 0.65 0.13 C 3879 3443 M 298 0 V -725 3064 M -876 2327 L -212 -158 V +725 3123 M +876 2368 L +212 -161 V 606 17 V -847 149 V -2423 132 V -725 3064 Circle -876 2327 Circle -1088 2169 Circle -1694 2186 Circle -2541 2335 Circle -4964 2467 Circle -4058 3443 Circle +847 152 V +2423 136 V +725 3123 Circle +876 2368 Circle +1088 2207 Circle +1694 2224 Circle +2541 2376 Circle +4964 2512 Circle +4028 3443 Circle % End plot #6 % Begin plot #7 1.500 UP @@ -839,25 +831,25 @@ LT0 LT0 0.55 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -5093 3443 M +5063 3443 M (FT) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.55 0.00 0.00 C 4543 3443 M +0.55 0.00 0.00 C 4513 3443 M 298 0 V -725 3089 M -846 2769 L -242 40 V -484 -597 V -969 -207 V -4479 1492 L -725 3089 CircleF -846 2769 CircleF -1088 2809 CircleF -1572 2212 CircleF -2541 2005 CircleF -4479 1492 CircleF -4692 3443 CircleF +725 3148 M +846 2821 L +242 41 V +484 -612 V +969 -212 V +4479 1513 L +725 3148 CircleF +846 2821 CircleF +1088 2862 CircleF +1572 2250 CircleF +2541 2038 CircleF +4479 1513 CircleF +4662 3443 CircleF % End plot #7 1.000 UL LTb @@ -874,5 +866,3 @@ stroke grestore end showpage -%%Trailer -%%DocumentFonts: Helvetica diff --git a/fig/energy.pdf b/fig/energy.pdf index 1128f8a..f16f58a 100644 Binary files a/fig/energy.pdf and b/fig/energy.pdf differ diff --git a/fig/heter.eps b/fig/heter.eps index ffc8e58..b1b59f5 100644 --- a/fig/heter.eps +++ b/fig/heter.eps @@ -1,7 +1,7 @@ %!PS-Adobe-2.0 EPSF-2.0 %%Title: heter2.eps %%Creator: gnuplot 4.6 patchlevel 0 -%%CreationDate: Thu Nov 6 10:45:38 2014 +%%CreationDate: Thu Feb 19 12:00:04 2015 %%DocumentFonts: (atend) %%BoundingBox: 50 50 320 239 %%EndComments @@ -432,7 +432,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Nov 6 10:45:38 2014) + /CreationDate (Thu Feb 19 12:00:04 2015) /DOCINFO pdfmark end } ifelse @@ -582,7 +582,7 @@ LC2 setrgbcolor LCb setrgbcolor /Helvetica findfont 220 scalefont setfont 4496 3443 M -(Normalize performance) Rshow +(Normalized performance) Rshow /Helvetica findfont 140 scalefont setfont LT2 LC2 setrgbcolor @@ -657,5 +657,3 @@ stroke grestore end showpage -%%Trailer -%%DocumentFonts: Helvetica diff --git a/fig/heter.pdf b/fig/heter.pdf index cad42e2..006ad27 100644 Binary files a/fig/heter.pdf and b/fig/heter.pdf differ diff --git a/fig/per_deg.eps b/fig/per_deg.eps index aba2966..e1ef706 100644 --- a/fig/per_deg.eps +++ b/fig/per_deg.eps @@ -1,7 +1,7 @@ %!PS-Adobe-2.0 EPSF-2.0 %%Title: per_deg.eps %%Creator: gnuplot 4.6 patchlevel 0 -%%CreationDate: Thu Nov 6 09:06:50 2014 +%%CreationDate: Thu Feb 19 16:42:46 2015 %%DocumentFonts: (atend) %%BoundingBox: 50 50 320 239 %%EndComments @@ -432,7 +432,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Nov 6 09:06:50 2014) + /CreationDate (Thu Feb 19 16:42:46 2015) /DOCINFO pdfmark end } ifelse @@ -450,17 +450,17 @@ newpath BackgroundColor 0 lt 3 1 roll 0 lt exch 0 lt or or not {BackgroundColor C 1.000 0 0 5400.00 3780.00 BoxColFill} if 1.000 UL LTb -602 662 M +602 674 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont -518 662 M +518 674 M ( 0) Rshow /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 1399 M +602 1538 M 63 0 V 4482 0 R -63 0 V @@ -470,7 +470,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 2136 M +602 2402 M 63 0 V 4482 0 R -63 0 V @@ -480,7 +480,7 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 2874 M +602 3266 M 63 0 V 4482 0 R -63 0 V @@ -490,16 +490,6 @@ LTb /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -602 3611 M -63 0 V -4482 0 R --63 0 V -/Helvetica findfont 190 scalefont setfont --4566 0 R -( 20) Rshow -/Helvetica findfont 140 scalefont setfont -1.000 UL -LTb 604 588 M 0 63 V 0 2960 R @@ -624,44 +614,36 @@ LCb setrgbcolor LTb 1.000 UP /Helvetica findfont 190 scalefont setfont -649 728 M +649 752 M ( ) Lshow /Helvetica findfont 140 scalefont setfont 1.000 UL LTb -1.000 UL -LTb -655 3338 N -0 210 V -4438 0 V -0 -210 V --4438 0 V -Z stroke % Begin plot #1 1.500 UP 2.000 UL LT0 0.00 0.00 1.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -1289 3443 M +1259 3443 M (CG) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.00 0.00 1.00 C 739 3443 M +0.00 0.00 1.00 C 709 3443 M 298 0 V -725 1653 M -121 59 V -242 361 V -484 -629 V -2541 910 L -4479 824 L -725 1653 Box -846 1712 Box -1088 2073 Box -1572 1444 Box -2541 910 Box -4479 824 Box -888 3443 Box +725 1835 M +121 70 V +242 423 V +484 -737 V +2541 965 L +4479 865 L +725 1835 Box +846 1905 Box +1088 2328 Box +1572 1591 Box +2541 965 Box +4479 865 Box +858 3443 Box % End plot #1 % Begin plot #2 1.500 UP @@ -669,25 +651,25 @@ LT0 LT0 1.00 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -1923 3443 M +1893 3443 M (MG) Rshow /Helvetica findfont 140 scalefont setfont LT0 -1.00 0.00 0.00 C 1373 3443 M +1.00 0.00 0.00 C 1343 3443 M 298 0 V -725 1301 M -121 307 V -242 -54 V -484 413 V -969 812 V -4479 2193 L -725 1301 TriD -846 1608 TriD -1088 1554 TriD -1572 1967 TriD -2541 2779 TriD -4479 2193 TriD -1522 3443 TriD +725 1424 M +121 359 V +242 -63 V +484 483 V +969 951 V +4479 2469 L +725 1424 TriD +846 1783 TriD +1088 1720 TriD +1572 2203 TriD +2541 3154 TriD +4479 2469 TriD +1492 3443 TriD % End plot #2 % Begin plot #3 1.500 UP @@ -695,25 +677,25 @@ LT0 LT0 0.50 0.00 0.50 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -2557 3443 M +2527 3443 M (EP) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.50 0.00 0.50 C 2007 3443 M +0.50 0.00 0.50 C 1977 3443 M 298 0 V -725 1110 M -846 735 L -242 10 V -484 -80 V -969 456 V -4479 666 L -725 1110 Star -846 735 Star -1088 745 Star -1572 665 Star -2541 1121 Star -4479 666 Star -2156 3443 Star +725 1199 M +846 760 L +242 12 V +484 -94 V +969 534 V +4479 680 L +725 1199 Star +846 760 Star +1088 772 Star +1572 678 Star +2541 1212 Star +4479 680 Star +2126 3443 Star % End plot #3 % Begin plot #4 1.500 UP @@ -721,25 +703,25 @@ LT0 LT0 0.18 0.31 0.31 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -3191 3443 M +3161 3443 M (LU) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.18 0.31 0.31 C 2641 3443 M +0.18 0.31 0.31 C 2611 3443 M 298 0 V -725 1571 M -846 663 L -242 966 V -484 -605 V -969 180 V -4479 1010 L -725 1571 TriUF -846 663 TriUF -1088 1629 TriUF -1572 1024 TriUF -2541 1204 TriUF -4479 1010 TriUF -2790 3443 TriUF +725 1739 M +846 676 L +242 1132 V +484 -709 V +969 211 V +4479 1082 L +725 1739 TriUF +846 676 TriUF +1088 1808 TriUF +1572 1099 TriUF +2541 1310 TriUF +4479 1082 TriUF +2760 3443 TriUF % End plot #4 % Begin plot #5 1.500 UP @@ -747,25 +729,25 @@ LT0 LT0 0.18 0.55 0.34 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -3825 3443 M +3795 3443 M (BT) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.18 0.55 0.34 C 3275 3443 M +0.18 0.55 0.34 C 3245 3443 M 298 0 V -725 1913 M -151 -87 V -212 -309 V -606 5 V -847 951 V -4964 851 L -725 1913 BoxF -876 1826 BoxF -1088 1517 BoxF -1694 1522 BoxF -2541 2473 BoxF -4964 851 BoxF -3424 3443 BoxF +725 2140 M +876 2038 L +212 -362 V +606 6 V +847 1114 V +4964 896 L +725 2140 BoxF +876 2038 BoxF +1088 1676 BoxF +1694 1682 BoxF +2541 2796 BoxF +4964 896 BoxF +3394 3443 BoxF % End plot #5 % Begin plot #6 1.500 UP @@ -773,25 +755,25 @@ LT0 LT0 0.85 0.65 0.13 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -4459 3443 M +4429 3443 M (SP) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.85 0.65 0.13 C 3909 3443 M +0.85 0.65 0.13 C 3879 3443 M 298 0 V -725 1501 M -876 1072 L -212 154 V -606 -55 V -2541 666 L +725 1658 M +876 1155 L +212 180 V +606 -64 V +2541 680 L 2423 3 V -725 1501 Circle -876 1072 Circle -1088 1226 Circle -1694 1171 Circle -2541 666 Circle -4964 669 Circle -4058 3443 Circle +725 1658 Circle +876 1155 Circle +1088 1335 Circle +1694 1271 Circle +2541 680 Circle +4964 683 Circle +4028 3443 Circle % End plot #6 % Begin plot #7 1.500 UP @@ -799,25 +781,25 @@ LT0 LT0 0.55 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont -5093 3443 M +5063 3443 M (FT) Rshow /Helvetica findfont 140 scalefont setfont LT0 -0.55 0.00 0.00 C 4543 3443 M +0.55 0.00 0.00 C 4513 3443 M 298 0 V -725 809 M -121 228 V -242 581 V -484 -527 V -969 289 V -4479 1081 L -725 809 CircleF -846 1037 CircleF -1088 1618 CircleF -1572 1091 CircleF -2541 1380 CircleF -4479 1081 CircleF -4692 3443 CircleF +725 847 M +121 267 V +242 680 V +484 -617 V +969 339 V +4479 1166 L +725 847 CircleF +846 1114 CircleF +1088 1794 CircleF +1572 1177 CircleF +2541 1516 CircleF +4479 1166 CircleF +4662 3443 CircleF % End plot #7 1.000 UL LTb @@ -834,3 +816,5 @@ stroke grestore end showpage +%%Trailer +%%DocumentFonts: Helvetica diff --git a/fig/per_deg.pdf b/fig/per_deg.pdf index ab48039..e4211b5 100644 Binary files a/fig/per_deg.pdf and b/fig/per_deg.pdf differ diff --git a/pdsec15_review.txt b/pdsec15_review.txt new file mode 100644 index 0000000..7f5b887 --- /dev/null +++ b/pdsec15_review.txt @@ -0,0 +1,331 @@ +============================== Standard 1 ============================== + +> *** Key Contributions: Please describe the key contributions of the + paper or lack thereof. Your comments should be specific and + justify your overall recommendation. + +This paper presents a new online frequency selecting algorithm for +distributed iterative applications running on heterogeneous CPU nodes. +Contrary to previous work (for homogeneous CPU), this heterogeneous +context implies a vector of scaling factors and "slack times" before +synchronizing the processes at each iteration. The models and the +algorithm are clearly presented and detailed, and are validated on +several benchmarks thanks to a simulator. Comparison with another +scaling factor selection algorithm (which does not take into account +communication times and heterogeneity) shows the relevance of this new +algorithm which manages to significantly reduce the energy consumption +with acceptable performance overhead. + +Overall, this is a very solid work, and the paper is well-written and +very clear. + +The main flaw of this paper is that the evaluation is only done via a +simulator. As mentioned in future work, evaluations on real +heterogeneous CPU platforms (with real power measurements) will be +necessary (as future work) to validate definitely this algorithm and +the models. + +> *** Suggestions for Improvement: Additional comments and suggestions + for improvement in the technical content or the presentation. + Please be as detailed and constructive as you can be. + +The energy and performance models rely on compute-bound programs, +where the computation time is linearly proportional to the processor +frequency. Does this apply to all NAS benchmarks ? The authors should +specify which NAS benchmarks are memory-bound (if any), and how their +model apply to these memory-bound benchmarks. + +Moreover, in section III it seems that the authors assume that the +communication time (without slack time) is the same for all processors +provided they have the same communication volume. This could be +pointed out more clearly in the paper. Also, does this apply to all +NAS benchmarks? Does it also depends on the placement of the MPI +processes? I assume that for the same communication volume, the +communication time will differ whether the processes are on +neighbouring nodes or are on distant nodes (especially with 128 or 144 +nodes). +Could the authors discuss in the text? + +The authors consider that the communication time only apply to static +power, which means that no CPU cycle is used for the MPI +communications. Does this implies specific networks (like Infiniband) +with RDMA? +This could be clarified in the paper. + +Finally, the algorithm applies to synchronous iterative applications: +is this the case for all NAS benchmarks evaluated in this paper? This +could also be specified in the paper. + +Figures 2a and 2b : I do not understand why the energy curve in Fig.2b +does not have the same shape as the one in Fig.2a. +Could the authors specify this in the text? + +Minor comments : +- The authors could specify in the abstract that "heterogeneous + platforms" refer to heterogeneous CPUs (not to CPU-GPU nodes). +- The terms "in the same direction" (used twice in section IV) are + unclear and should be rewritten. +- Section V.A : replace "because selecting frequency scaling factors + higher than the higher bound" by "because selecting frequencies + higher than the higher bound"? + +> *** Significance: Assess the significance of the topic addressed in + the paper. + +Excellent (5) + +> *** Originality/Novelty (of contribution): How novel are the + concepts presented in the paper? + +Above average (4) + +> *** Technical Soundness: How strong are the techniques and + methodologies used in the paper? + +Excellent (5) + +> *** Overall Recommendation: Your final rating should be consistent + with your ratings on previous questions. + +Accept (5) + +============================== Standard 2 ============================== + +> *** Key Contributions: Please describe the key contributions of the + paper or lack thereof. Your comments should be specific and + justify your overall recommendation. + +The paper proposed a frequency selection algorithm for heterogeneous +platforms. The algorithm proposed the maximum distance between the +energy consumption and the performance to get the trade off scale +factor. on This is an interesting paper with good trial to cover many +factors. + +The paper ran NPB benchmarks to verify the algorithm but there is no +comparison between the results at the the trade-off scale factor and +those from all other possible scale factors without applying the +algorithm. Without this, it is not reliable to validate the algorithm. + +> *** Suggestions for Improvement: Additional comments and suggestions + for improvement in the technical content or the presentation. + Please be as detailed and constructive as you can be. + +There are too much tables i.e. II-VII in section VI. Better to +summarize them in a couple of figures. + +It is necessary to describe the overhead of the algorithm which is +missed in the paper. + +> *** Significance: Assess the significance of the topic addressed in + the paper. + +Average (3) + +> *** Originality/Novelty (of contribution): How novel are the + concepts presented in the paper? + +Average (3) + +> *** Technical Soundness: How strong are the techniques and + methodologies used in the paper? + +Acceptable (3) + +> *** Overall Recommendation: Your final rating should be consistent + with your ratings on previous questions. + +Weak Accept (4) + +============================== Standard 3 ============================== + +> *** Key Contributions: Please describe the key contributions of the + paper or lack thereof. Your comments should be specific and + justify your overall recommendation. + +The paper develops DVFS performance models and an online algorithm to +optimize time and energy for iterative message passing applications on +a heterogeneous CPU cluster. An objective function is developed to +express the time energy tradeoff. Results using a simulated framework +show worthwhile energy gains for acceptable loss of execution time. A +comparison with a more general pre-existing algorithm show modest +improvements in energy and and energy-time tradeoff. + +The paper is well-written and is technically sound. Its significance +is slightly diminished due to the fact that previous work has largely +dealt with this issue on scenarios that are of stronger interests +and/or are less specialized. + +> *** Suggestions for Improvement: Additional comments and suggestions + for improvement in the technical content or the presentation. + Please be as detailed and constructive as you can be. + +The abstract would be sharpened it it contained numbers relating to +the performance degradation and comparison. + +III.A. The modelling of the communication time being independent of +the frequency is questionable, even if it is backed up by a 10year old +reference. While slack time is not affected, my own research has shown +that communication bandwidth does clearly increase with frequency, +albeit in a sub-linear fashion. The use of taking the minimum for +communication time (3) needs better explanation, as it is +counter-intuitive. + +I would like to some explanation as to why it takes so many iterations +for the algorithm to select the best vector, and whether this can be +improved. While the NAS benchmarks have a standard number of +iterations, it would be helpful to the reader to indicate what these +are in VI. + +The results on a real heterogeneous platform in the future work will +be interesting. + +There are a number of small grammatical errors: + +p2. ``to satisfy some objectives while taking into account all the +constraints,'': a comma is needed before `while' to match the 2nd + +Fig2(b) normalize -> normalized + +p4 ``following the same direction'': use `follow' + +Alg1: F_diff_i: difference -> differences + +p6: on all left frequencies -> on all remaining frequencies + +while it lowers the frequency of all other nodes -> +while it lowers the frequencies of all other nodes + +``the proposed algorithm is not an exact method it does'': +put a : before it + +p8: on different number of nodes -> on different numbers of nodes +the GC benchmark significantly decrease -> +the CG benchmark significantly decreases + +> *** Significance: Assess the significance of the topic addressed in + the paper. + +Above average (4) + +> *** Originality/Novelty (of contribution): How novel are the + concepts presented in the paper? + +Above average (4) + +> *** Technical Soundness: How strong are the techniques and + methodologies used in the paper? + +Excellent (5) + +> *** Overall Recommendation: Your final rating should be consistent + with your ratings on previous questions. + +Strong Accept (6) + +============================== Standard 4 ============================== + +> *** Key Contributions: Please describe the key contributions of the + paper or lack thereof. Your comments should be specific and + justify your overall recommendation. + +In this paper, a new online frequency selecting algorithm for +heterogeneous platforms is presented. It selects the frequencies and +tries to give the best trade-off between energy saving and performance +degradation, for each node computing the message passing iterative +application. The algorithm has a small overhead and works without +training or profiling. It uses a new energy model for message passing +iterative applications running on a het- erogeneous platform. The +proposed algorithm is evaluated on the SimGrid simulator while running +the NAS parallel benchmarks. The experiments show that it reduces the +energy consumption by up to 35 % while limiting the performance +degradation as much as possible. Finally, the algorithm is compared to +an existing method, the comparison results showing that it outperforms +the latter. + +> *** Suggestions for Improvement: Additional comments and suggestions + for improvement in the technical content or the presentation. + Please be as detailed and constructive as you can be. + +I did not see every clearly that if the proposed online algorithm can +achieve the optimal selection. If only the heustrics, then how close +to the optimal? I would like to see more theoretical or experimental +results if possible since the authors claims the "the best trade-off +between energy saving and performance degradation". + +> *** Significance: Assess the significance of the topic addressed in + the paper. + +Excellent (5) + +> *** Originality/Novelty (of contribution): How novel are the + concepts presented in the paper? + +Excellent (5) + +> *** Technical Soundness: How strong are the techniques and + methodologies used in the paper? + +Strong (4) + +> *** Overall Recommendation: Your final rating should be consistent + with your ratings on previous questions. + +Strong Accept (6) + +============================== Standard 5 ============================== + +> *** Key Contributions: Please describe the key contributions of the + paper or lack thereof. Your comments should be specific and + justify your overall recommendation. + +The paper considers the DVFS technique and presents an energy model +for DVFS systems that also takes the communication time into +consideration. An new algorithm for selecting the scaling factors is +presented. The algorithm uses a vector of scaling factors, one for +each node, and determines the scaling factors such that best trade-off +between minimizing the energy consumption and maximizing the +performance for a synchronous iterative algorithm is reached. The +algorithm works during execution time and uses the first interation +step for collecting the information required for the scaling factor +selection. An experimental evaluation is given using the SimGrid +environment. + +The paper is well written and structured and should be accepted. It +is solid work and provides new contributions by extending earlier +energy models with communication time concerns and proposes a new +algorithm for DVFS control. + +> *** Suggestions for Improvement: Additional comments and suggestions + for improvement in the technical content or the presentation. + Please be as detailed and constructive as you can be. + +Algorithm 1 in Section V could be explained in more detail. As far as +I can see, it tests all possible frequencies or scaling factors for +the different nodes and selects the best as indicated by the model. I +was wondering whether all combinations of scaling factors are tested +or whether this is not necessary because of the behavior of the +communication. +The accuracy of the frequency selection depends on the accuracy of the +model used for the computation of the scaling factors. It would be +interesting to see how accurate the model is for real systems. +However, I see that this might be difficult to capture in practice. + +> *** Significance: Assess the significance of the topic addressed in + the paper. + +Excellent (5) + +> *** Originality/Novelty (of contribution): How novel are the + concepts presented in the paper? + +Above average (4) + +> *** Technical Soundness: How strong are the techniques and + methodologies used in the paper? + +Excellent (5) + +> *** Overall Recommendation: Your final rating should be consistent + with your ratings on previous questions. + +Accept (5)