1 ============================== Standard 1 ==============================
3 > *** Key Contributions: Please describe the key contributions of the
4 paper or lack thereof. Your comments should be specific and
5 justify your overall recommendation.
7 This paper presents a new online frequency selecting algorithm for
8 distributed iterative applications running on heterogeneous CPU nodes.
9 Contrary to previous work (for homogeneous CPU), this heterogeneous
10 context implies a vector of scaling factors and "slack times" before
11 synchronizing the processes at each iteration. The models and the
12 algorithm are clearly presented and detailed, and are validated on
13 several benchmarks thanks to a simulator. Comparison with another
14 scaling factor selection algorithm (which does not take into account
15 communication times and heterogeneity) shows the relevance of this new
16 algorithm which manages to significantly reduce the energy consumption
17 with acceptable performance overhead.
19 Overall, this is a very solid work, and the paper is well-written and
22 The main flaw of this paper is that the evaluation is only done via a
23 simulator. As mentioned in future work, evaluations on real
24 heterogeneous CPU platforms (with real power measurements) will be
25 necessary (as future work) to validate definitely this algorithm and
28 > *** Suggestions for Improvement: Additional comments and suggestions
29 for improvement in the technical content or the presentation.
30 Please be as detailed and constructive as you can be.
32 The energy and performance models rely on compute-bound programs,
33 where the computation time is linearly proportional to the processor
34 frequency. Does this apply to all NAS benchmarks ? The authors should
35 specify which NAS benchmarks are memory-bound (if any), and how their
36 model apply to these memory-bound benchmarks.
38 Moreover, in section III it seems that the authors assume that the
39 communication time (without slack time) is the same for all processors
40 provided they have the same communication volume. This could be
41 pointed out more clearly in the paper. Also, does this apply to all
42 NAS benchmarks? Does it also depends on the placement of the MPI
43 processes? I assume that for the same communication volume, the
44 communication time will differ whether the processes are on
45 neighbouring nodes or are on distant nodes (especially with 128 or 144
47 Could the authors discuss in the text?
49 The authors consider that the communication time only apply to static
50 power, which means that no CPU cycle is used for the MPI
51 communications. Does this implies specific networks (like Infiniband)
53 This could be clarified in the paper.
55 Finally, the algorithm applies to synchronous iterative applications:
56 is this the case for all NAS benchmarks evaluated in this paper? This
57 could also be specified in the paper.
59 Figures 2a and 2b : I do not understand why the energy curve in Fig.2b
60 does not have the same shape as the one in Fig.2a.
61 Could the authors specify this in the text?
64 - The authors could specify in the abstract that "heterogeneous
65 platforms" refer to heterogeneous CPUs (not to CPU-GPU nodes).
66 - The terms "in the same direction" (used twice in section IV) are
67 unclear and should be rewritten.
68 - Section V.A : replace "because selecting frequency scaling factors
69 higher than the higher bound" by "because selecting frequencies
70 higher than the higher bound"?
72 > *** Significance: Assess the significance of the topic addressed in
77 > *** Originality/Novelty (of contribution): How novel are the
78 concepts presented in the paper?
82 > *** Technical Soundness: How strong are the techniques and
83 methodologies used in the paper?
87 > *** Overall Recommendation: Your final rating should be consistent
88 with your ratings on previous questions.
92 ============================== Standard 2 ==============================
94 > *** Key Contributions: Please describe the key contributions of the
95 paper or lack thereof. Your comments should be specific and
96 justify your overall recommendation.
98 The paper proposed a frequency selection algorithm for heterogeneous
99 platforms. The algorithm proposed the maximum distance between the
100 energy consumption and the performance to get the trade off scale
101 factor. on This is an interesting paper with good trial to cover many
104 The paper ran NPB benchmarks to verify the algorithm but there is no
105 comparison between the results at the the trade-off scale factor and
106 those from all other possible scale factors without applying the
107 algorithm. Without this, it is not reliable to validate the algorithm.
109 > *** Suggestions for Improvement: Additional comments and suggestions
110 for improvement in the technical content or the presentation.
111 Please be as detailed and constructive as you can be.
113 There are too much tables i.e. II-VII in section VI. Better to
114 summarize them in a couple of figures.
116 It is necessary to describe the overhead of the algorithm which is
119 > *** Significance: Assess the significance of the topic addressed in
124 > *** Originality/Novelty (of contribution): How novel are the
125 concepts presented in the paper?
129 > *** Technical Soundness: How strong are the techniques and
130 methodologies used in the paper?
134 > *** Overall Recommendation: Your final rating should be consistent
135 with your ratings on previous questions.
139 ============================== Standard 3 Thomas R. ==============================
141 > *** Key Contributions: Please describe the key contributions of the
142 paper or lack thereof. Your comments should be specific and
143 justify your overall recommendation.
145 The paper develops DVFS performance models and an online algorithm to
146 optimize time and energy for iterative message passing applications on
147 a heterogeneous CPU cluster. An objective function is developed to
148 express the time energy tradeoff. Results using a simulated framework
149 show worthwhile energy gains for acceptable loss of execution time. A
150 comparison with a more general pre-existing algorithm show modest
151 improvements in energy and and energy-time tradeoff.
153 The paper is well-written and is technically sound. Its significance
154 is slightly diminished due to the fact that previous work has largely
155 dealt with this issue on scenarios that are of stronger interests
156 and/or are less specialized.
158 > *** Suggestions for Improvement: Additional comments and suggestions
159 for improvement in the technical content or the presentation.
160 Please be as detailed and constructive as you can be.
162 The abstract would be sharpened it it contained numbers relating to
163 the performance degradation and comparison.
165 III.A. The modelling of the communication time being independent of
166 the frequency is questionable, even if it is backed up by a 10year old
167 reference. While slack time is not affected, my own research has shown
168 that communication bandwidth does clearly increase with frequency,
169 albeit in a sub-linear fashion. The use of taking the minimum for
170 communication time (3) needs better explanation, as it is
173 I would like to some explanation as to why it takes so many iterations
174 for the algorithm to select the best vector, and whether this can be
175 improved. While the NAS benchmarks have a standard number of
176 iterations, it would be helpful to the reader to indicate what these
179 The results on a real heterogeneous platform in the future work will
182 There are a number of small grammatical errors:
184 p2. ``to satisfy some objectives while taking into account all the
185 constraints,'': a comma is needed before `while' to match the 2nd
187 Fig2(b) normalize -> normalized
189 p4 ``following the same direction'': use `follow'
191 Alg1: F_diff_i: difference -> differences
193 p6: on all left frequencies -> on all remaining frequencies
195 while it lowers the frequency of all other nodes ->
196 while it lowers the frequencies of all other nodes
198 ``the proposed algorithm is not an exact method it does'':
201 p8: on different number of nodes -> on different numbers of nodes
202 the GC benchmark significantly decrease ->
203 the CG benchmark significantly decreases
205 > *** Significance: Assess the significance of the topic addressed in
210 > *** Originality/Novelty (of contribution): How novel are the
211 concepts presented in the paper?
215 > *** Technical Soundness: How strong are the techniques and
216 methodologies used in the paper?
220 > *** Overall Recommendation: Your final rating should be consistent
221 with your ratings on previous questions.
225 ============================== Standard 4 ==============================
227 > *** Key Contributions: Please describe the key contributions of the
228 paper or lack thereof. Your comments should be specific and
229 justify your overall recommendation.
231 In this paper, a new online frequency selecting algorithm for
232 heterogeneous platforms is presented. It selects the frequencies and
233 tries to give the best trade-off between energy saving and performance
234 degradation, for each node computing the message passing iterative
235 application. The algorithm has a small overhead and works without
236 training or profiling. It uses a new energy model for message passing
237 iterative applications running on a het- erogeneous platform. The
238 proposed algorithm is evaluated on the SimGrid simulator while running
239 the NAS parallel benchmarks. The experiments show that it reduces the
240 energy consumption by up to 35 % while limiting the performance
241 degradation as much as possible. Finally, the algorithm is compared to
242 an existing method, the comparison results showing that it outperforms
245 > *** Suggestions for Improvement: Additional comments and suggestions
246 for improvement in the technical content or the presentation.
247 Please be as detailed and constructive as you can be.
249 I did not see every clearly that if the proposed online algorithm can
250 achieve the optimal selection. If only the heustrics, then how close
251 to the optimal? I would like to see more theoretical or experimental
252 results if possible since the authors claims the "the best trade-off
253 between energy saving and performance degradation".
255 > *** Significance: Assess the significance of the topic addressed in
260 > *** Originality/Novelty (of contribution): How novel are the
261 concepts presented in the paper?
265 > *** Technical Soundness: How strong are the techniques and
266 methodologies used in the paper?
270 > *** Overall Recommendation: Your final rating should be consistent
271 with your ratings on previous questions.
275 ============================== Standard 5 ==============================
277 > *** Key Contributions: Please describe the key contributions of the
278 paper or lack thereof. Your comments should be specific and
279 justify your overall recommendation.
281 The paper considers the DVFS technique and presents an energy model
282 for DVFS systems that also takes the communication time into
283 consideration. An new algorithm for selecting the scaling factors is
284 presented. The algorithm uses a vector of scaling factors, one for
285 each node, and determines the scaling factors such that best trade-off
286 between minimizing the energy consumption and maximizing the
287 performance for a synchronous iterative algorithm is reached. The
288 algorithm works during execution time and uses the first interation
289 step for collecting the information required for the scaling factor
290 selection. An experimental evaluation is given using the SimGrid
293 The paper is well written and structured and should be accepted. It
294 is solid work and provides new contributions by extending earlier
295 energy models with communication time concerns and proposes a new
296 algorithm for DVFS control.
298 > *** Suggestions for Improvement: Additional comments and suggestions
299 for improvement in the technical content or the presentation.
300 Please be as detailed and constructive as you can be.
302 Algorithm 1 in Section V could be explained in more detail. As far as
303 I can see, it tests all possible frequencies or scaling factors for
304 the different nodes and selects the best as indicated by the model. I
305 was wondering whether all combinations of scaling factors are tested
306 or whether this is not necessary because of the behavior of the
308 The accuracy of the frequency selection depends on the accuracy of the
309 model used for the computation of the scaling factors. It would be
310 interesting to see how accurate the model is for real systems.
311 However, I see that this might be difficult to capture in practice.
313 > *** Significance: Assess the significance of the topic addressed in
318 > *** Originality/Novelty (of contribution): How novel are the
319 concepts presented in the paper?
323 > *** Technical Soundness: How strong are the techniques and
324 methodologies used in the paper?
328 > *** Overall Recommendation: Your final rating should be consistent
329 with your ratings on previous questions.