============================== Standard 1 ============================== > *** Key Contributions: Please describe the key contributions of the paper or lack thereof. Your comments should be specific and justify your overall recommendation. This paper presents a new online frequency selecting algorithm for distributed iterative applications running on heterogeneous CPU nodes. Contrary to previous work (for homogeneous CPU), this heterogeneous context implies a vector of scaling factors and "slack times" before synchronizing the processes at each iteration. The models and the algorithm are clearly presented and detailed, and are validated on several benchmarks thanks to a simulator. Comparison with another scaling factor selection algorithm (which does not take into account communication times and heterogeneity) shows the relevance of this new algorithm which manages to significantly reduce the energy consumption with acceptable performance overhead. Overall, this is a very solid work, and the paper is well-written and very clear. The main flaw of this paper is that the evaluation is only done via a simulator. As mentioned in future work, evaluations on real heterogeneous CPU platforms (with real power measurements) will be necessary (as future work) to validate definitely this algorithm and the models. > *** Suggestions for Improvement: Additional comments and suggestions for improvement in the technical content or the presentation. Please be as detailed and constructive as you can be. The energy and performance models rely on compute-bound programs, where the computation time is linearly proportional to the processor frequency. Does this apply to all NAS benchmarks ? The authors should specify which NAS benchmarks are memory-bound (if any), and how their model apply to these memory-bound benchmarks. Moreover, in section III it seems that the authors assume that the communication time (without slack time) is the same for all processors provided they have the same communication volume. This could be pointed out more clearly in the paper. Also, does this apply to all NAS benchmarks? Does it also depends on the placement of the MPI processes? I assume that for the same communication volume, the communication time will differ whether the processes are on neighbouring nodes or are on distant nodes (especially with 128 or 144 nodes). Could the authors discuss in the text? The authors consider that the communication time only apply to static power, which means that no CPU cycle is used for the MPI communications. Does this implies specific networks (like Infiniband) with RDMA? This could be clarified in the paper. Finally, the algorithm applies to synchronous iterative applications: is this the case for all NAS benchmarks evaluated in this paper? This could also be specified in the paper. Figures 2a and 2b : I do not understand why the energy curve in Fig.2b does not have the same shape as the one in Fig.2a. Could the authors specify this in the text? Minor comments : - The authors could specify in the abstract that "heterogeneous platforms" refer to heterogeneous CPUs (not to CPU-GPU nodes). - The terms "in the same direction" (used twice in section IV) are unclear and should be rewritten. - Section V.A : replace "because selecting frequency scaling factors higher than the higher bound" by "because selecting frequencies higher than the higher bound"? > *** Significance: Assess the significance of the topic addressed in the paper. Excellent (5) > *** Originality/Novelty (of contribution): How novel are the concepts presented in the paper? Above average (4) > *** Technical Soundness: How strong are the techniques and methodologies used in the paper? Excellent (5) > *** Overall Recommendation: Your final rating should be consistent with your ratings on previous questions. Accept (5) ============================== Standard 2 ============================== > *** Key Contributions: Please describe the key contributions of the paper or lack thereof. Your comments should be specific and justify your overall recommendation. The paper proposed a frequency selection algorithm for heterogeneous platforms. The algorithm proposed the maximum distance between the energy consumption and the performance to get the trade off scale factor. on This is an interesting paper with good trial to cover many factors. The paper ran NPB benchmarks to verify the algorithm but there is no comparison between the results at the the trade-off scale factor and those from all other possible scale factors without applying the algorithm. Without this, it is not reliable to validate the algorithm. > *** Suggestions for Improvement: Additional comments and suggestions for improvement in the technical content or the presentation. Please be as detailed and constructive as you can be. There are too much tables i.e. II-VII in section VI. Better to summarize them in a couple of figures. It is necessary to describe the overhead of the algorithm which is missed in the paper. > *** Significance: Assess the significance of the topic addressed in the paper. Average (3) > *** Originality/Novelty (of contribution): How novel are the concepts presented in the paper? Average (3) > *** Technical Soundness: How strong are the techniques and methodologies used in the paper? Acceptable (3) > *** Overall Recommendation: Your final rating should be consistent with your ratings on previous questions. Weak Accept (4) ============================== Standard 3 ============================== > *** Key Contributions: Please describe the key contributions of the paper or lack thereof. Your comments should be specific and justify your overall recommendation. The paper develops DVFS performance models and an online algorithm to optimize time and energy for iterative message passing applications on a heterogeneous CPU cluster. An objective function is developed to express the time energy tradeoff. Results using a simulated framework show worthwhile energy gains for acceptable loss of execution time. A comparison with a more general pre-existing algorithm show modest improvements in energy and and energy-time tradeoff. The paper is well-written and is technically sound. Its significance is slightly diminished due to the fact that previous work has largely dealt with this issue on scenarios that are of stronger interests and/or are less specialized. > *** Suggestions for Improvement: Additional comments and suggestions for improvement in the technical content or the presentation. Please be as detailed and constructive as you can be. The abstract would be sharpened it it contained numbers relating to the performance degradation and comparison. III.A. The modelling of the communication time being independent of the frequency is questionable, even if it is backed up by a 10year old reference. While slack time is not affected, my own research has shown that communication bandwidth does clearly increase with frequency, albeit in a sub-linear fashion. The use of taking the minimum for communication time (3) needs better explanation, as it is counter-intuitive. I would like to some explanation as to why it takes so many iterations for the algorithm to select the best vector, and whether this can be improved. While the NAS benchmarks have a standard number of iterations, it would be helpful to the reader to indicate what these are in VI. The results on a real heterogeneous platform in the future work will be interesting. There are a number of small grammatical errors: p2. ``to satisfy some objectives while taking into account all the constraints,'': a comma is needed before `while' to match the 2nd Fig2(b) normalize -> normalized p4 ``following the same direction'': use `follow' Alg1: F_diff_i: difference -> differences p6: on all left frequencies -> on all remaining frequencies while it lowers the frequency of all other nodes -> while it lowers the frequencies of all other nodes ``the proposed algorithm is not an exact method it does'': put a : before it p8: on different number of nodes -> on different numbers of nodes the GC benchmark significantly decrease -> the CG benchmark significantly decreases > *** Significance: Assess the significance of the topic addressed in the paper. Above average (4) > *** Originality/Novelty (of contribution): How novel are the concepts presented in the paper? Above average (4) > *** Technical Soundness: How strong are the techniques and methodologies used in the paper? Excellent (5) > *** Overall Recommendation: Your final rating should be consistent with your ratings on previous questions. Strong Accept (6) ============================== Standard 4 ============================== > *** Key Contributions: Please describe the key contributions of the paper or lack thereof. Your comments should be specific and justify your overall recommendation. In this paper, a new online frequency selecting algorithm for heterogeneous platforms is presented. It selects the frequencies and tries to give the best trade-off between energy saving and performance degradation, for each node computing the message passing iterative application. The algorithm has a small overhead and works without training or profiling. It uses a new energy model for message passing iterative applications running on a het- erogeneous platform. The proposed algorithm is evaluated on the SimGrid simulator while running the NAS parallel benchmarks. The experiments show that it reduces the energy consumption by up to 35 % while limiting the performance degradation as much as possible. Finally, the algorithm is compared to an existing method, the comparison results showing that it outperforms the latter. > *** Suggestions for Improvement: Additional comments and suggestions for improvement in the technical content or the presentation. Please be as detailed and constructive as you can be. I did not see every clearly that if the proposed online algorithm can achieve the optimal selection. If only the heustrics, then how close to the optimal? I would like to see more theoretical or experimental results if possible since the authors claims the "the best trade-off between energy saving and performance degradation". > *** Significance: Assess the significance of the topic addressed in the paper. Excellent (5) > *** Originality/Novelty (of contribution): How novel are the concepts presented in the paper? Excellent (5) > *** Technical Soundness: How strong are the techniques and methodologies used in the paper? Strong (4) > *** Overall Recommendation: Your final rating should be consistent with your ratings on previous questions. Strong Accept (6) ============================== Standard 5 ============================== > *** Key Contributions: Please describe the key contributions of the paper or lack thereof. Your comments should be specific and justify your overall recommendation. The paper considers the DVFS technique and presents an energy model for DVFS systems that also takes the communication time into consideration. An new algorithm for selecting the scaling factors is presented. The algorithm uses a vector of scaling factors, one for each node, and determines the scaling factors such that best trade-off between minimizing the energy consumption and maximizing the performance for a synchronous iterative algorithm is reached. The algorithm works during execution time and uses the first interation step for collecting the information required for the scaling factor selection. An experimental evaluation is given using the SimGrid environment. The paper is well written and structured and should be accepted. It is solid work and provides new contributions by extending earlier energy models with communication time concerns and proposes a new algorithm for DVFS control. > *** Suggestions for Improvement: Additional comments and suggestions for improvement in the technical content or the presentation. Please be as detailed and constructive as you can be. Algorithm 1 in Section V could be explained in more detail. As far as I can see, it tests all possible frequencies or scaling factors for the different nodes and selects the best as indicated by the model. I was wondering whether all combinations of scaling factors are tested or whether this is not necessary because of the behavior of the communication. The accuracy of the frequency selection depends on the accuracy of the model used for the computation of the scaling factors. It would be interesting to see how accurate the model is for real systems. However, I see that this might be difficult to capture in practice. > *** Significance: Assess the significance of the topic addressed in the paper. Excellent (5) > *** Originality/Novelty (of contribution): How novel are the concepts presented in the paper? Above average (4) > *** Technical Soundness: How strong are the techniques and methodologies used in the paper? Excellent (5) > *** Overall Recommendation: Your final rating should be consistent with your ratings on previous questions. Accept (5)