From 0f6674d9fe0512b7509ffc2e2a1e2d484c8354c6 Mon Sep 17 00:00:00 2001 From: Arnaud Giersch Date: Fri, 16 May 2014 11:41:23 +0200 Subject: [PATCH 1/1] Add review to repository. --- ispa14_review.txt | 152 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 ispa14_review.txt diff --git a/ispa14_review.txt b/ispa14_review.txt new file mode 100644 index 0000000..c3e8f63 --- /dev/null +++ b/ispa14_review.txt @@ -0,0 +1,152 @@ +----------------------- REVIEW 1 --------------------- +PAPER: 36 +TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs +AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch + +OVERALL EVALUATION: 0 (borderline paper) +REVIEWER'S CONFIDENCE: 4 (high) + +----------- REVIEW ----------- +Paper Summary: +This paper describes an online algorithm for estimating the energy consumption +of a program when running in different DVFS states, or frequency gears. The +algorithm uses information obtained at runtime on the compute vs. communication +time of the application. Their system picks the frequency gear with the biggest +delta between inverse performance and normalized energy. Their algorithm is +evaluated using simulation of a small cluster (8 to 16 nodes) of homogenous +single-core nodes with Ethernet network. + +Importance of Problem: +Energy efficiency must be improved significantly – more than 20x – to meet the +power budget goal for exascale-class supercomputers. This paper proposes an +online algorithm for reducing the energy consumed by an application with as +little performance impact as possible, this improving energy efficiency. + +Strengths and weaknesses: +The paper is well written overall. The background material is clear and the +power models are presented in an understandable way. In my view the main +weakness of this work is it doesn’t sufficiently distinguish itself from +existing work. This seems similar to several other dynamic DVFS control papers +that I’ve read. In terms of execution, another weakness of this paper is the +evaluation section uses only simulation and only includes small-scale results. +Other work has used real HPC systems at a much larger scale (100's or 1000's of +nodes). I also question the wisdom of always choosing the largest gap between +performance and energy curves. This seems to be equivalent to minimizing the +Energy*Delay product metric. In many of the NPB results, performance is +significantly affected (> 30%). This may be unacceptable for many HPC use +cases. Others have used Energy*Delay^2 or Energy*Delay^3 metrics to further +emphasize the importance of performance. Finally, the paper ! could be +improved by examining additional workloads in addition to the NPB’s. + +Additional comments: +- In abstract, is the frequency / energy relationship really exponential? +- Suggest changing footprint -> overhead +- modelize -> model +- What is “Backbone Bandwidth”? Since only 16 nodes were simulated, the + Ethernet switch simulated can easily be a full crossbar. Also, do you mean 1 + gigabit per second Ethernet or 1 gigabyte per second Ethernet? What would + happen if a much faster network was used? Would that eliminate much of the + slack time opportunity? + + +----------------------- REVIEW 2 --------------------- +PAPER: 36 +TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs +AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch + +OVERALL EVALUATION: 0 (borderline paper) +REVIEWER'S CONFIDENCE: 4 (high) + +----------- REVIEW ----------- +The authors proposed an algorithm to optimize the performance and energy +consumption for message-passing parallel programs. The algorithm is an +extension of [4] in the reference, by replacing the idle time due to the +variance in the computation time between processes with the communication times. + +First, the authors have made a primitive but an essential mistake. The +(relative) performance is the inverse of the execution time ratio. In other +words, EQ (12) is the relative performance metric (not inverse of the +performance). + +Before presenting the experimental results, they verified the accuracy of the +predicted performance (for a given scaling factor S) by comparing it to what +they measured on the simulator. This does not seem to be sufficient. Why did +not they compare the predicted performance (and energy!) to those of executions +on a real machine ? + +The specifications of the simulated machine seem to be chosen in favor of their +algorithm. For example, they assumed one core for each node. This implies any +communication with other nodes must go through the giga-bit Ethernet, stretching +the communication time and giving more freedom to their algorithm. On the +current and future machines, each 'processor' should have multiple cores (and +each node may have more than one 'processor'). Therefore, the some fraction of +communication is performed using the shared memory (even the parallel programs +themselves are written with the message passing primitives). + +Moreover, the range of the clock frequency and its granularity (0.8 GHz to 2.4 +GHz with 0.1 GHz increment) is NOT impossible but seems be chosen to support the +results. If the number of available frequency is smaller, the effectiveness of +their algorithm should also be limited. + +While they claim that their algorithm woks 'online', it is not an 'on-the-fly' +algorithm. It requires a full iteration of running the program, and use the +results for the later executions. This assumes that the 2nd and later +executions have the same performance and energy consumption against the selected +scaling factor. This assumption may limit the applicability of their algorithm. + +Few more comments: + +- In Section VII-C, they compared their algorithm with that in [4]. + Should not they include the option of maximum energy reduction with zero + performance + +- Their work is not specific to the MPI and the last part of the title should be + " synchronous message passing program." + +- For the readability of the paper, the text should be more polished. + For example, use appropriate connectives. + +- Similarly, some sentences look redundant. For example, "To be able to predict + ..." (in the first paragraph of Section IV) could be "To predict .." + +- Capitalization, such as "Table" and "Figure" (not "table" and "figure") + +- Use appropriate units (6.65us instead of 0.00665ms) + +- What is 'platform file ?' (e.g. Table 1) + + +----------------------- REVIEW 3 --------------------- +PAPER: 36 +TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs +AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch + +OVERALL EVALUATION: 1 (weak accept) +REVIEWER'S CONFIDENCE: 4 (high) + +----------- REVIEW ----------- +The paper presents an online algorithmic approach to reduce energy consumption +in MPI programs. The authors consider the computation and communication times +of the sub-tasks in the MPI program, and scale the voltage and frequency of the +individual cores to achieve overall energy reduction that balances reduction in +performance with the increase in energy savings. + +VF scaling is the dominant runtime method to cut down power consumption, hence +approaches to identifying and exploiting program slack provides are very useful. +The algorithm and the results are explained well. + +My central concern is the run "once" approach of the online approach. I'm not +convinced that arriving at the scaling factors once after one iteration of the +subtasks (ending in the barrie) is sufficient to determine right scaling factors +of the threads. I suspect the runtime interactions of the thread will lead to +different threads becoming critical at various iterations. It will be good to +do a comparison with idealized scenario of algorithm running every iteration +ignoring overheads. + +It will be useful to summarize the results in Tables III through V in a chart. + +The cost of core frequency transitions is ignored in the paper (minor). + +There are plenty of other works in this area. For example, how does your work +compared to Thrifty barrier [ http://csl.cornell.edu/~martinez/doc/hpca04.pdf ] +(minor)? -- 2.39.5