+----------------------- REVIEW 1 ---------------------
+PAPER: 36
+TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs
+AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch
+
+OVERALL EVALUATION: 0 (borderline paper)
+REVIEWER'S CONFIDENCE: 4 (high)
+
+----------- REVIEW -----------
+Paper Summary:
+This paper describes an online algorithm for estimating the energy consumption
+of a program when running in different DVFS states, or frequency gears. The
+algorithm uses information obtained at runtime on the compute vs. communication
+time of the application. Their system picks the frequency gear with the biggest
+delta between inverse performance and normalized energy. Their algorithm is
+evaluated using simulation of a small cluster (8 to 16 nodes) of homogenous
+single-core nodes with Ethernet network.
+
+Importance of Problem:
+Energy efficiency must be improved significantly – more than 20x – to meet the
+power budget goal for exascale-class supercomputers. This paper proposes an
+online algorithm for reducing the energy consumed by an application with as
+little performance impact as possible, this improving energy efficiency.
+
+Strengths and weaknesses:
+The paper is well written overall. The background material is clear and the
+power models are presented in an understandable way. In my view the main
+weakness of this work is it doesn’t sufficiently distinguish itself from
+existing work. This seems similar to several other dynamic DVFS control papers
+that I’ve read. In terms of execution, another weakness of this paper is the
+evaluation section uses only simulation and only includes small-scale results.
+Other work has used real HPC systems at a much larger scale (100's or 1000's of
+nodes). I also question the wisdom of always choosing the largest gap between
+performance and energy curves. This seems to be equivalent to minimizing the
+Energy*Delay product metric. In many of the NPB results, performance is
+significantly affected (> 30%). This may be unacceptable for many HPC use
+cases. Others have used Energy*Delay^2 or Energy*Delay^3 metrics to further
+emphasize the importance of performance. Finally, the paper ! could be
+improved by examining additional workloads in addition to the NPB’s.
+
+Additional comments:
+- In abstract, is the frequency / energy relationship really exponential?
+- Suggest changing footprint -> overhead
+- modelize -> model
+- What is “Backbone Bandwidth”? Since only 16 nodes were simulated, the
+ Ethernet switch simulated can easily be a full crossbar. Also, do you mean 1
+ gigabit per second Ethernet or 1 gigabyte per second Ethernet? What would
+ happen if a much faster network was used? Would that eliminate much of the
+ slack time opportunity?
+
+
+----------------------- REVIEW 2 ---------------------
+PAPER: 36
+TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs
+AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch
+
+OVERALL EVALUATION: 0 (borderline paper)
+REVIEWER'S CONFIDENCE: 4 (high)
+
+----------- REVIEW -----------
+The authors proposed an algorithm to optimize the performance and energy
+consumption for message-passing parallel programs. The algorithm is an
+extension of [4] in the reference, by replacing the idle time due to the
+variance in the computation time between processes with the communication times.
+
+First, the authors have made a primitive but an essential mistake. The
+(relative) performance is the inverse of the execution time ratio. In other
+words, EQ (12) is the relative performance metric (not inverse of the
+performance).
+
+Before presenting the experimental results, they verified the accuracy of the
+predicted performance (for a given scaling factor S) by comparing it to what
+they measured on the simulator. This does not seem to be sufficient. Why did
+not they compare the predicted performance (and energy!) to those of executions
+on a real machine ?
+
+The specifications of the simulated machine seem to be chosen in favor of their
+algorithm. For example, they assumed one core for each node. This implies any
+communication with other nodes must go through the giga-bit Ethernet, stretching
+the communication time and giving more freedom to their algorithm. On the
+current and future machines, each 'processor' should have multiple cores (and
+each node may have more than one 'processor'). Therefore, the some fraction of
+communication is performed using the shared memory (even the parallel programs
+themselves are written with the message passing primitives).
+
+Moreover, the range of the clock frequency and its granularity (0.8 GHz to 2.4
+GHz with 0.1 GHz increment) is NOT impossible but seems be chosen to support the
+results. If the number of available frequency is smaller, the effectiveness of
+their algorithm should also be limited.
+
+While they claim that their algorithm woks 'online', it is not an 'on-the-fly'
+algorithm. It requires a full iteration of running the program, and use the
+results for the later executions. This assumes that the 2nd and later
+executions have the same performance and energy consumption against the selected
+scaling factor. This assumption may limit the applicability of their algorithm.
+
+Few more comments:
+
+- In Section VII-C, they compared their algorithm with that in [4].
+ Should not they include the option of maximum energy reduction with zero
+ performance
+
+- Their work is not specific to the MPI and the last part of the title should be
+ " synchronous message passing program."
+
+- For the readability of the paper, the text should be more polished.
+ For example, use appropriate connectives.
+
+- Similarly, some sentences look redundant. For example, "To be able to predict
+ ..." (in the first paragraph of Section IV) could be "To predict .."
+
+- Capitalization, such as "Table" and "Figure" (not "table" and "figure")
+
+- Use appropriate units (6.65us instead of 0.00665ms)
+
+- What is 'platform file ?' (e.g. Table 1)
+
+
+----------------------- REVIEW 3 ---------------------
+PAPER: 36
+TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs
+AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch
+
+OVERALL EVALUATION: 1 (weak accept)
+REVIEWER'S CONFIDENCE: 4 (high)
+
+----------- REVIEW -----------
+The paper presents an online algorithmic approach to reduce energy consumption
+in MPI programs. The authors consider the computation and communication times
+of the sub-tasks in the MPI program, and scale the voltage and frequency of the
+individual cores to achieve overall energy reduction that balances reduction in
+performance with the increase in energy savings.
+
+VF scaling is the dominant runtime method to cut down power consumption, hence
+approaches to identifying and exploiting program slack provides are very useful.
+The algorithm and the results are explained well.
+
+My central concern is the run "once" approach of the online approach. I'm not
+convinced that arriving at the scaling factors once after one iteration of the
+subtasks (ending in the barrie) is sufficient to determine right scaling factors
+of the threads. I suspect the runtime interactions of the thread will lead to
+different threads becoming critical at various iterations. It will be good to
+do a comparison with idealized scenario of algorithm running every iteration
+ignoring overheads.
+
+It will be useful to summarize the results in Tables III through V in a chart.
+
+The cost of core frequency transitions is ignored in the paper (minor).
+
+There are plenty of other works in this area. For example, how does your work
+compared to Thrifty barrier [ http://csl.cornell.edu/~martinez/doc/hpca04.pdf ]
+(minor)?