Add review to repository.

author Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>

Fri, 16 May 2014 09:41:23 +0000 (11:41 +0200)

committer Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>

Fri, 16 May 2014 09:41:23 +0000 (11:41 +0200)
author Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Fri, 16 May 2014 09:41:23 +0000 (11:41 +0200)
committer Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
Fri, 16 May 2014 09:41:23 +0000 (11:41 +0200)
diff --git a/ispa14_review.txt b/ispa14_review.txt

new file mode 100644 (file)

index 0000000..c3e8f63
--- /dev/null
+++ b/ispa14_review.txt
@@ -0,0 +1,152 @@
+----------------------- REVIEW 1 ---------------------
+PAPER: 36
+TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs
+AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch
+
+OVERALL EVALUATION: 0 (borderline paper)
+REVIEWER'S CONFIDENCE: 4 (high)
+
+----------- REVIEW -----------
+Paper Summary:
+This paper describes an online algorithm for estimating the energy consumption
+of a program when running in different DVFS states, or frequency gears.  The
+algorithm uses information obtained at runtime on the compute vs. communication
+time of the application.  Their system picks the frequency gear with the biggest
+delta between inverse performance and normalized energy.  Their algorithm is
+evaluated using simulation of a small cluster (8 to 16 nodes) of homogenous
+single-core nodes with Ethernet network.
+
+Importance of Problem:
+Energy efficiency must be improved significantly – more than 20x – to meet the
+power budget goal for exascale-class supercomputers.  This paper proposes an
+online algorithm for reducing the energy consumed by an application with as
+little performance impact as possible, this improving energy efficiency.
+
+Strengths and weaknesses:
+The paper is well written overall.  The background material is clear and the
+power models are presented in an understandable way.  In my view the main
+weakness of this work is it doesn’t sufficiently distinguish itself from
+existing work.  This seems similar to several other dynamic DVFS control papers
+that I’ve read.  In terms of execution, another weakness of this paper is the
+evaluation section uses only simulation and only includes small-scale results.
+Other work has used real HPC systems at a much larger scale (100's or 1000's of
+nodes).  I also question the wisdom of always choosing the largest gap between
+performance and energy curves.  This seems to be equivalent to minimizing the
+Energy*Delay product metric.  In many of the NPB results, performance is
+significantly affected (> 30%).  This may be unacceptable for many HPC use
+cases.  Others have used Energy*Delay^2 or Energy*Delay^3 metrics to further
+emphasize the importance of performance.  Finally, the paper !  could be
+improved by examining additional workloads in addition to the NPB’s.
+
+Additional comments:
+- In abstract, is the frequency / energy relationship really exponential?
+- Suggest changing footprint -> overhead
+- modelize -> model
+- What is “Backbone Bandwidth”?  Since only 16 nodes were simulated, the
+  Ethernet switch simulated can easily be a full crossbar.  Also, do you mean 1
+  gigabit per second Ethernet or 1 gigabyte per second Ethernet?  What would
+  happen if a much faster network was used?  Would that eliminate much of the
+  slack time opportunity?
+
+
+----------------------- REVIEW 2 ---------------------
+PAPER: 36
+TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs
+AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch
+
+OVERALL EVALUATION: 0 (borderline paper)
+REVIEWER'S CONFIDENCE: 4 (high)
+
+----------- REVIEW -----------
+The authors proposed an algorithm to optimize the performance and energy
+consumption for message-passing parallel programs.  The algorithm is an
+extension of [4] in the reference, by replacing the idle time due to the
+variance in the computation time between processes with the communication times.
+
+First, the authors have made a primitive but an essential mistake.  The
+(relative) performance is the inverse of the execution time ratio. In other
+words, EQ (12) is the relative performance metric (not inverse of the
+performance).
+
+Before presenting the experimental results, they verified the accuracy of the
+predicted performance (for a given scaling factor S) by comparing it to what
+they measured on the simulator.  This does not seem to be sufficient. Why did
+not they compare the predicted performance (and energy!) to those of executions
+on a real machine ?
+
+The specifications of the simulated machine seem to be chosen in favor of their
+algorithm. For example, they assumed one core for each node. This implies any
+communication with other nodes must go through the giga-bit Ethernet, stretching
+the communication time and giving more freedom to their algorithm.  On the
+current and future machines, each 'processor' should have multiple cores (and
+each node may have more than one 'processor').  Therefore, the some fraction of
+communication is performed using the shared memory (even the parallel programs
+themselves are written with the message passing primitives).
+
+Moreover, the range of the clock frequency and its granularity (0.8 GHz to 2.4
+GHz with 0.1 GHz increment) is NOT impossible but seems be chosen to support the
+results. If the number of available frequency is smaller, the effectiveness of
+their algorithm should also be limited.
+
+While they claim that their algorithm woks 'online', it is not an 'on-the-fly'
+algorithm. It requires a full iteration of running the program, and use the
+results for the later executions.  This assumes that the 2nd and later
+executions have the same performance and energy consumption against the selected
+scaling factor.  This assumption may limit the applicability of their algorithm.
+
+Few more comments: 
+
+- In Section VII-C, they compared their algorithm with that in [4].
+  Should not they include the option of maximum energy reduction with zero
+  performance
+
+- Their work is not specific to the MPI and the last part of the title should be
+  " synchronous message passing program."
+
+- For the readability of the paper, the text should be more polished. 
+  For example, use appropriate connectives. 
+
+- Similarly, some sentences look redundant. For example, "To be able to predict
+  ..." (in the first paragraph of Section IV) could be "To predict .."
+
+- Capitalization, such as "Table" and "Figure" (not "table" and "figure")
+
+- Use appropriate units (6.65us instead of 0.00665ms) 
+
+- What is 'platform file ?' (e.g. Table 1)
+
+
+----------------------- REVIEW 3 ---------------------
+PAPER: 36
+TITLE: Dynamic Frequency Scaling for Energy Consumption Reduction in Distributed MPI Programs
+AUTHORS: Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh and Arnaud Giersch
+
+OVERALL EVALUATION: 1 (weak accept)
+REVIEWER'S CONFIDENCE: 4 (high)
+
+----------- REVIEW -----------
+The paper presents an online algorithmic approach to reduce energy consumption
+in MPI programs.  The authors consider the computation and communication times
+of the sub-tasks in the MPI program, and scale the voltage and frequency of the
+individual cores to achieve overall energy reduction that balances reduction in
+performance with the increase in energy savings.
+
+VF scaling is the dominant runtime method to cut down power consumption, hence
+approaches to identifying and exploiting program slack provides are very useful.
+The algorithm and the results are explained well.
+
+My central concern is the run "once" approach of the online approach.  I'm not
+convinced that arriving at the scaling factors once after one iteration of the
+subtasks (ending in the barrie) is sufficient to determine right scaling factors
+of the threads.  I suspect the runtime interactions of the thread will lead to
+different threads becoming critical at various iterations.  It will be good to
+do a comparison with idealized scenario of algorithm running every iteration
+ignoring overheads.
+
+It will be useful to summarize the results in Tables III through V in a chart.
+
+The cost of core frequency transitions is ignored in the paper (minor).
+
+There are plenty of other works in this area.  For example, how does your work
+compared to Thrifty barrier [ http://csl.cornell.edu/~martinez/doc/hpca04.pdf ]
+(minor)?
author	Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
	Fri, 16 May 2014 09:41:23 +0000 (11:41 +0200)
committer	Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
	Fri, 16 May 2014 09:41:23 +0000 (11:41 +0200)