X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/1e0619f9a2866a579adb1944e75301175eae8b8c..9793696f84bb746d5af3969fb0bc9105f9cb3a99:/paper.tex?ds=inline

diff --git a/paper.tex b/paper.tex
index 4d7ef2b..fa447f6 100644
--- a/paper.tex
+++ b/paper.tex
@@ -171,40 +171,24 @@ very different execution times. In this challenging context we think that the
 use of a simulation tool can greatly leverage the possibility of testing various
 platform scenarios.
 
-The main contribution of this paper is to show that the use of a simulation tool
-(i.e. the SimGrid toolkit~\cite{SimGrid}) in the context of real  parallel
-applications (i.e. large linear system solvers) can help developers to better
-tune their application for a given multi-core architecture. To show the validity
-of this approach we first compare the simulated execution of the multisplitting
-algorithm  with  the  GMRES   (Generalized   Minimal  Residual)
-solver~\cite{saad86} in synchronous mode. The simulation results allow us to
-determine which method to choose given a specified multi-core architecture.
-
-\LZK{Pas trop convainquant comme argument pour valider l'approche de simulation. \\On peut dire par exemple: on a pu simuler diffÃ©rents algos itÃ©ratifs Ã  large Ã©chelle (le plus connu GMRES et deux variantes de multisplitting) et la simulation nous a permis (sans avoir le vrai matÃ©riel) de dÃ©terminer quelle serait la meilleure solution pour une telle configuration de l'archi ou vice versa.\\A revoir...}
-\DL{OK : ajout d'une phrase prÃ©cisant tout cela}
-
-Moreover the obtained results on different simulated multi-core architectures
-confirm the real results previously obtained on non simulated architectures.
+The  {\bf main  contribution  of  this paper}  is  to show  that  the  use of  a
+simulation tool (i.e. the SimGrid toolkit~\cite{SimGrid}) in the context of real
+parallel applications (i.e. large linear  system solvers) can help developers to
+better tune their  application for a given multi-core architecture.  To show the
+validity of this approach we first compare the simulated execution of the Krylov
+multisplitting  algorithm   with  the   GMRES  (Generalized   Minimal  Residual)
+solver~\cite{saad86} in  synchronous mode.  The simulation  results allow  us to
+determine  which method  to choose  given a  specified multi-core  architecture.
+Moreover the  obtained results  on different simulated  multi-core architectures
+confirm the  real results  previously obtained  on non  simulated architectures.
 More precisely the simulated results are in accordance (i.e. with the same order
-of magnitude) with the works presented in~\cite{couturier15}, which show that the synchronous
-multisplitting method is more efficient than GMRES for large scale clusters.
-
-\LZK{Il n y a pas dans la partie expÃ© cette comparaison et confirmation des
-rÃ©sultats entre la simulation et l'exÃ©cution rÃ©elle des algos sur les vrais
-clusters.\\ Sinon on pourrait ajouter dans la partie expÃ© une rÃ©fÃ©rence vers le
-journal supercomput de krylov multi pour confirmer que cette mÃ©thode est
-meilleure que GMRES sur les clusters large Ã©chelle.} \DL{OK ajout d'une phrase.
-Par contre je n'ai pas la ref. Merci de la mettre}
-
-Simulated results  also confirm  the efficiency  of the asynchronous
-multisplitting algorithm compared to the synchronous GMRES especially in case of
-geographically distant clusters.
-
-\LZK{P.S.: Pour tout le papier, le principal objectif n'est pas de faire des comparaisons entre des mÃ©thodes itÃ©ratives!!\\Sinon, les deux algorithmes Krylov multisplitting synchrone et multisplitting asynchrone sont plus efficaces que GMRES sur des clusters Ã  large Ã©chelle.\\Et prÃ©ciser, si c'est vraiment le cas, que le multisplitting asynchrone est plus efficace et adaptÃ© aux clusters distants par rapport aux deux autres algos (je n'ai pas encore lu la partie expÃ©)}
-\DL{Tu as raison on s'est posÃ© la question de garder ou non cette partie des rÃ©sultats. On a dÃ©cidÃ© de la garder pour avoir plus de chose Ã  montrer. J'ai essayer de clarifier un peu}
+of magnitude)  with the works  presented in~\cite{couturier15}, which  show that
+the synchronous  multisplitting method  is more efficient  than GMRES  for large
+scale  clusters.   Simulated   results  also  confirm  the   efficiency  of  the
+asynchronous  multisplitting   algorithm  compared  to  the   synchronous  GMRES
+especially in case of geographically distant clusters.
 
-In
-this way and with a simple computing architecture (a laptop) SimGrid allows us
+In this way and with a simple computing architecture (a laptop) SimGrid allows us
 to run a test campaign  of  a  real parallel iterative  applications on
 different simulated multi-core architectures.  To our knowledge, there is no
 related work on the large-scale multi-core simulation of a real synchronous and
@@ -217,8 +201,6 @@ Section~\ref{sec:04} details the different solvers that we use.  Finally our
 experimental results are presented in section~\ref{sec:expe} followed by some
 concluding remarks and perspectives.
 
-\LZK{Proposition d'un titre pour le papier: Grid-enabled simulation of large-scale linear iterative solvers.}
-
 
 \section{The asynchronous iteration model and the motivations of our work}
 \label{sec:asynchro}
@@ -643,9 +625,7 @@ speed inter-cluster  network (N1) and  also on  a less performant  network (N2).
 Figure~\ref{fig:02} shows that end users will reduce the execution time
 for  both  algorithms when using  a  grid  architecture  like  4x16 or  8x8: the reduction is about $2$. The results depict  also that when
 the  network speed  drops down (variation of 12.5\%), the  difference between  the two Multisplitting algorithms execution times can reach more than 25\%.
-%\RC{c'est pas clair : la diffÃ©rence entre quoi et quoi?}
-%\DL{pas clair}
-%\RCE{Modifie}
+
 
 
 %\begin{wrapfigure}{l}{100mm}
@@ -790,10 +770,16 @@ on the  algorithms performance in  varying the CPU  power of the  clusters nodes
 from $1$ to $19$ GFlops.  The outputs  depicted in Figure~\ref{fig:06}  confirm the
 performance gain,  around $95\%$ for  both of the  two methods, after  adding more
 powerful CPU.
+\ \\
+%\DL{il faut une conclusion sur ces tests : ils confirment les rÃ©sultats dÃ©jÃ 
+%obtenus en grandeur rÃ©elle. Donc c'est une aide prÃ©cieuse pour les dev. Pas
+%besoin de dÃ©ployer sur une archi rÃ©elle}
 
-\DL{il faut une conclusion sur ces tests : ils confirment les rÃ©sultats dÃ©jÃ 
-obtenus en grandeur rÃ©elle. Donc c'est une aide prÃ©cieuse pour les dev. Pas
-besoin de dÃ©ployer sur une archi rÃ©elle}
+To conclude these series of experiments, with  SimGrid we have been able to make
+many simulations  with many parameters  variations. Doing all  these experiments
+with a real platform is most of  the time not possible. Moreover the behavior of
+both GMRES and  Krylov multisplitting methods is in accordance  with larger real
+executions on large scale supercomputer~\cite{couturier15}.
 
 
 \subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
@@ -889,7 +875,29 @@ geographically distant clusters through the internet.
 
 
 \section{Conclusion}
-CONCLUSION
+
+In this paper we have presented the simulation of the execution of three
+different parallel solvers on some multi-core architectures. We have show that
+the SimGrid toolkit is an interesting simulation tool that has allowed us to
+determine  which method  to choose  given a  specified multi-core  architecture.
+Moreover the simulated results are in accordance (i.e. with the same order of
+magnitude)  with the works  presented in~\cite{couturier15}. Simulated   results
+also  confirm  the   efficiency  of  the asynchronous  multisplitting
+algorithm  compared  to  the   synchronous  GMRES especially in case of
+geographically distant clusters.
+
+These results are important since it is very  time consuming to find optimal
+configuration  and deployment requirements for a given application  on   a given
+multi-core  architecture. Finding   good  resource allocations policies under
+varying CPU power, network speeds and  loads is very challenging and  labor
+intensive. This problematic is  even more difficult  for the  asynchronous
+scheme where  a small parameter variation of the execution platform and of the
+application data can lead to very different numbers of iterations to reach the
+converge and so to very different execution times.
+
+
+Our future works...
+
 
 
 %\section*{Acknowledgment}