From 2fcbf808f481976ed501aaec9b1446407e7a4f72 Mon Sep 17 00:00:00 2001 From: Arnaud Giersch Date: Tue, 1 Jul 2014 17:26:34 +0200 Subject: [PATCH] Incorporate first answers to reviewers. --- hpcc2014_reviews.txt | 344 ++++++++++++++++++++++++++++--------------- 1 file changed, 229 insertions(+), 115 deletions(-) diff --git a/hpcc2014_reviews.txt b/hpcc2014_reviews.txt index 7d07a3d..1d2861a 100644 --- a/hpcc2014_reviews.txt +++ b/hpcc2014_reviews.txt @@ -42,20 +42,42 @@ AUTHORS: Charles Emile Ramamonjisoa, David Laiymani, Arnaud Giersch, ----------- REVIEW ----------- -The contribution of the paper could be better described. - -The authors state that: - -"we show that SimGrid is an efficient simulation -tool that has enabled .." - -If this is one of the goals of the paper to present the -capabilities/strength of SimGrid, then they should compare it with -other tools for the comparison of the two methods. - -Regarding the comparison of the two methods, the possible scalability -expected in the case of larger platforms might also be commended / -discussed. +,---- +| The contribution of the paper could be better described. +| +| The authors state that: +| +| "we show that SimGrid is an efficient simulation +| tool that has enabled .." +| +| If this is one of the goals of the paper to present the +| capabilities/strength of SimGrid, then they should compare it with +| other tools for the comparison of the two methods. +`---- + +[RCE] L’objectif du papier n’est pas de comparer des outils de + simulation et d’arriver à une conclusion sur la performance de + Simgrid. Ce dernier a été choisi parmi d’autres pour effectuer + la comparaison entre les 2 algorithmes en mode async sur un + environnement de grille distribuée. + + On peut modifier la phrase comme suit : "we show that SimGrid is + one of efficient simulation tool that has enabled .." + + +,---- +| Regarding the comparison of the two methods, the possible scalability +| expected in the case of larger platforms might also be commended / +| discussed. +`---- + +[RCE] Je pense que ça a été commenté / discuté tout au long du papier + cette montée en charge possible sur des plateformes plus + larges. Même dans la conclusion, on a avancé que l’objectif est + de réussir à faire tourner le programme sur une plateforme plus + large (en terme de nombre de cœurs et de nombre de clusters) + mais aussi de pouvoir résoudre des problèmes de plus grande + taille. ----------------------- REVIEW 2 --------------------- @@ -66,53 +88,119 @@ AUTHORS: Charles Emile Ramamonjisoa, David Laiymani, Arnaud Giersch, ----------- REVIEW ----------- -This paper describes the simulation of an adapted (authors say -slightly changed) GMRES solver on the SimGrid simulation framework; -the GMRES solver is changed from synchronous iterative solution to a -asynchronous iteration scheme in order to overcome latencies when -interconnecting computers in a Grid environment. - -The prejudice of the paper is that the GMRES algorithm is not using -non-blocking communication to begin with. - - -You mention that for running with SimGrid using SMPI, "little" or no -modification need to be done to the original code: what kind of -modifications are necessary -- and did You have to apply any -modification to run with SMPI? (in a later section of the paper, -changing / deleting global variables were mentioned -- due to the -threaded execution of simulated MPI processes...) - - -SimGrid uses a "fluid model" -- what does that mean? - -The local convergence criterion (k<=MaxIter) seams wrong and should -rather read: k == MaxIter? - -As far as the reviewer can tell, SMPI removes heavy computation by -making assumptions on the CPU performance of the simulated code -- -which however is not true with most Grid environments where You do -have mixed architectures and mixed performance characteristics. How -is this handled? - -However, the main gripe about this paper is the rather unrealistic -assumption on bandwidth (5 Mbps!) and latency (20ms): the internal -network of a cluster may be Infiniband, with bw of Gigabytes/sec and -micro-second latency, while a second cluster may be reachable over -Gigabit-Ethernet with 100-200x the latency... This would be a setup, -where a (even slight) gain would provide more convincing results. - - - -Some knitpicks include: -- Abstract: "Behaviours", please no plural -- Sec II (and others): "As exposed" --> "As described" -- Sec II: "And important idle times" --> better "useless idle times - used for synchronization" -- Sec III: "by the mean of an XML file" --> "by means of an XML file". -- SEC IV.B: did not encouter ... unless some code debugging" --> - please rewrite the unless part... -- SEC V: "Hosts processors power" --> "Host processor power" +,---- +| This paper describes the simulation of an adapted (authors say +| slightly changed) GMRES solver on the SimGrid simulation framework; +| the GMRES solver is changed from synchronous iterative solution to a +| asynchronous iteration scheme in order to overcome latencies when +| interconnecting computers in a Grid environment. +`---- + +[RCE] Non, ce n’est pas tout à fait ça : on veut comparer l’algo GMRES + qui est executé en mode SYNC avec l’algo de multisplitting qui + lui sera executé en mode ASYNC. + +,---- +| The prejudice of the paper is that the GMRES algorithm is not using +| non-blocking communication to begin with. +`---- + +[RCE] Comme dit juste plus haut, effectivement GMRES est resté SYNC + donc en mode de communication bloquant. + + +,---- +| You mention that for running with SimGrid using SMPI, "little" or no +| modification need to be done to the original code: what kind of +| modifications are necessary -- and did You have to apply any +| modification to run with SMPI? (in a later section of the paper, +| changing / deleting global variables were mentioned -- due to the +| threaded execution of simulated MPI processes...) +`---- + +[RCE] Les changements “mineurs” apportés sur le code lors de + l’exécution dans Simgrid/SMPI par rapport à un lancement sur un + environnement réel (MPI) se résument aux deux points suivants : + - Toutes les variables globales ont été ramenées dans un scope + local aux fonctions. Cette modification a entraîné le + changement des définitions synoptiques des fonctions pour + prendre en compte les passages de variables. + - La sequence MPI_ISend, MPI_Irecv and MPI_Waitall a pose aussi + un problème en mode Async. Elle a été remplacée par une + sequence de 6 Isend/Irecv/Wait à la place. + + On peut donc faire un renvoi à la Section III pour clarifier : + « The SMPI interface implements about 80% of the MPI 2.0 + standard [?] and supports applications written in C or + Fortran, with little or no modifications. » + On écrira : + « The SMPI interface implements about 80% of the MPI 2.0 + standard [?] and supports applications written in C or + Fortran, with little or no modifications. (cf Section IV + paragraph B) » + + +,---- +| SimGrid uses a "fluid model" -- what does that mean? +`---- + +[RCE] Arnaud peut-il aider ici ? + [AG] Je fais. + +,---- +| The local convergence criterion (k<=MaxIter) seams wrong and should +| rather read: k == MaxIter? +`---- + +[RCE] Je pense que le reviewer a raison. Lilia ? + +,---- +| As far as the reviewer can tell, SMPI removes heavy computation by +| making assumptions on the CPU performance of the simulated code -- +| which however is not true with most Grid environments where You do +| have mixed architectures and mixed performance characteristics. How +| is this handled? +`---- + +[RCE] Simgrid/SMPI prévoit cette hétérogénéité des composants des + clusters dans une grille par la définition plus ou moins fine + des caractéristiques des nœuds composant les clusters (puissance + CPU, mémoire RAM, …) d’une part mais aussi par la description + plus ou moins détaillée aussi du réseau de communication entre + les clusters de la grille. + +,---- +| However, the main gripe about this paper is the rather unrealistic +| assumption on bandwidth (5 Mbps!) and latency (20ms): the internal +| network of a cluster may be Infiniband, with bw of Gigabytes/sec and +| micro-second latency, while a second cluster may be reachable over +| Gigabit-Ethernet with 100-200x the latency... This would be a setup, +| where a (even slight) gain would provide more convincing results. +`---- + +[RCE] Il faut qu’on précise que ces caractéristiques de réseau “non + réalistes” concernent le réseau INTER cluster. Le réseau INTRA + cluster sont bien dans l’ordre de grandeur donnée (Gbps de bw et + ms de latence). Toutefois, le reviewer a bien vu qu’on a poussé + trop fort sur le réseau inter-cluster ☺ Mais ce n’est qu’à ce + prix qu’on a commencé à avoir un gain appréciable. + + + +,---- +| Some knitpicks include: +| - Abstract: "Behaviours", please no plural +| - Sec II (and others): "As exposed" --> "As described" +| - Sec II: "And important idle times" --> better "useless idle times +| used for synchronization" +| - Sec III: "by the mean of an XML file" --> "by means of an XML file". +| - SEC IV.B: did not encouter ... unless some code debugging" --> +| please rewrite the unless part... +| - SEC V: "Hosts processors power" --> "Host processor power" +`---- + +[RCE] On va prendre en compte ces remarques. + [AG] J'ai commencé pour les plus faciles. ----------------------- REVIEW 3 --------------------- @@ -123,49 +211,71 @@ AUTHORS: Charles Emile Ramamonjisoa, David Laiymani, Arnaud Giersch, ----------- REVIEW ----------- -The submitted paper purports to be the first simulation of -asynchronous iterative algorithms and predicts that, for a particular -cluster configurations with very high latency (20ms) and very low -bandwidths (5/50 Mbit/s), an unpreconditioned asynchronous -multisplitting algorithm will be faster than an unpreconditioned GMRES -algorithm for solving a 3D Poisson equation. - -Several issues with respect to the relevance of these results deserve -discussion: - -1) There is no substantial discussion of the fundamental additions to -SimGrid that were required in order to support the simulation of -asynchronous iterative algorithms. If no extensions were required, -then I am unsure as to how this aspect of the work is a contribution. - -2) The model problem of a 3D Poisson equation with no preconditioner -is regrettable due to the large number of fast solvers available that -have been available for many decades. For this reason, as is, the -results are not relevant to the solution of PDEs. However, a similar -computational structure appears within the context of gradient descent -methods for the solution of convex optimization problems, and -asynchronous algorithms are quite common. I would humbly suggest such -a model problem in the future unless either a more challenging PDE is -tackled or a non-trivial preconditioner is incorporated. - -3) This is somewhat of a minor point, but I did not see an explicit -discussion of the link between a global relative residual norm, -|| A x - b|| / || b ||, and the local convergence criterion used in -the asynchronous algorithm, which tested for the infinity norm of the -local computation. When "precision" is reported in Table I, is it -referring to a consistent global convergence criterion? And, if so, -what is it precisely referring to? - -4) Typical latencies within clusters are on the order of a -microsecond, and the latency used to produce Table I is more than -three orders of magnitude higher (20ms). It would be helpful if more -justification was given for why such a high latency is -relevant. Furthermore, the chosen bandwidths (5 Mbit/s and 50 Mbit/s) -are closer to a non-commercial home internet connection than a -commercial ethernet connection. - -Overall, I feel that a significant number of issues should be -addressed before publication would be warranted. +,---- +| The submitted paper purports to be the first simulation of +| asynchronous iterative algorithms and predicts that, for a particular +| cluster configurations with very high latency (20ms) and very low +| bandwidths (5/50 Mbit/s), an unpreconditioned asynchronous +| multisplitting algorithm will be faster than an unpreconditioned GMRES +| algorithm for solving a 3D Poisson equation. +| +| Several issues with respect to the relevance of these results deserve +| discussion: +| +| 1) There is no substantial discussion of the fundamental additions to +| SimGrid that were required in order to support the simulation of +| asynchronous iterative algorithms. If no extensions were required, +| then I am unsure as to how this aspect of the work is a contribution. +`---- + +[RCE] Il n’y avait pas d’extensions apportées à SIMGRID pour résoudre + le type d’algorithme choisi. + +,---- +| 2) The model problem of a 3D Poisson equation with no preconditioner +| is regrettable due to the large number of fast solvers available that +| have been available for many decades. For this reason, as is, the +| results are not relevant to the solution of PDEs. However, a similar +| computational structure appears within the context of gradient descent +| methods for the solution of convex optimization problems, and +| asynchronous algorithms are quite common. I would humbly suggest such +| a model problem in the future unless either a more challenging PDE is +| tackled or a non-trivial preconditioner is incorporated. +`---- + +[RCE] ?? + +,---- +| 3) This is somewhat of a minor point, but I did not see an explicit +| discussion of the link between a global relative residual norm, +| || A x - b|| / || b ||, and the local convergence criterion used in +| the asynchronous algorithm, which tested for the infinity norm of the +| local computation. When "precision" is reported in Table I, is it +| referring to a consistent global convergence criterion? And, if so, +| what is it precisely referring to? +`---- + +[RCE] Selon ma comprehension, la “precision” de la table I est la + “tolerance threshold” (epsilon) mentionnée dans la Section + IV. Il permet effectivement de determiner le critère ou la + condition de convergence globale. Lilia peut confirmer ? + +,---- +| 4) Typical latencies within clusters are on the order of a +| microsecond, and the latency used to produce Table I is more than +| three orders of magnitude higher (20ms). It would be helpful if more +| justification was given for why such a high latency is +| relevant. Furthermore, the chosen bandwidths (5 Mbit/s and 50 Mbit/s) +| are closer to a non-commercial home internet connection than a +| commercial ethernet connection. +`---- + +[RCE] Voir remarques plus haut. + +,---- +| Overall, I feel that a significant number of issues should be +| addressed before publication would be warranted. +`---- ----------------------- REVIEW 4 --------------------- @@ -176,12 +286,16 @@ AUTHORS: Charles Emile Ramamonjisoa, David Laiymani, Arnaud Giersch, ----------- REVIEW ----------- -This is a very interesting paper devoted to the implementation in a -grid environment of some asynchronous algorithm. These algorithms are -indeed very powerfull, and the more latency, the more efficient are -these algorithms. A comparison of a synchronous GMRES and an -asynchronous multi-splitting is presented. The obtained results are -interesting and confirm the efficiency of these methods. +,---- +| This is a very interesting paper devoted to the implementation in a +| grid environment of some asynchronous algorithm. These algorithms are +| indeed very powerfull, and the more latency, the more efficient are +| these algorithms. A comparison of a synchronous GMRES and an +| asynchronous multi-splitting is presented. The obtained results are +| interesting and confirm the efficiency of these methods. +`---- + +[RCE] Bien compris. ----------------------- REVIEW 5 --------------------- @@ -192,8 +306,8 @@ AUTHORS: Charles Emile Ramamonjisoa, David Laiymani, Arnaud Giersch, ----------- REVIEW ----------- -This paper is a mix between a short and a long paper, it presents -preliminary works on simulation of asynchronous iterative algorithms -using SimGrid. I recommend to accept it as a short paper. - - +,---- +| This paper is a mix between a short and a long paper, it presents +| preliminary works on simulation of asynchronous iterative algorithms +| using SimGrid. I recommend to accept it as a short paper. +`---- -- 2.39.5