X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/d0129a1639a935ba009f36957b8ac7927c103a45..208f531eb17804a7e29dde5f179efc719880f6ae:/paper.tex?ds=sidebyside

diff --git a/paper.tex b/paper.tex
index 34f7ec7..0df781f 100644
--- a/paper.tex
+++ b/paper.tex
@@ -174,10 +174,22 @@ applications (i.e. large linear system solvers) can help developers to better
 tune their application for a given multi-core architecture. To show the validity
 of this approach we first compare the simulated execution of the multisplitting
 algorithm  with  the  GMRES   (Generalized   Minimal  Residual)
-solver~\cite{saad86} in synchronous mode. The obtained results on different
+solver~\cite{saad86} in synchronous mode. 
+
+\LZK{Pas trop convainquant comme argument pour valider l'approche de simulation. \\On peut dire par exemple: on a pu simuler diffÃ©rents algos itÃ©ratifs Ã  large Ã©chelle (le plus connu GMRES et deux variantes de multisplitting) et la simulation nous a permis (sans avoir le vrai matÃ©riel) de dÃ©terminer quelle serait la meilleure solution pour une telle configuration de l'archi ou vice versa.\\A revoir...}
+
+The obtained results on different
 simulated multi-core architectures confirm the real results previously obtained
-on non simulated architectures.  We also confirm  the efficiency  of the
-asynchronous  multisplitting algorithm  compared to the synchronous  GMRES. In
+on non simulated architectures.  
+
+\LZK{Il n y a pas dans la partie expÃ© cette comparaison et confirmation des rÃ©sultats entre la simulation et l'exÃ©cution rÃ©elle des algos sur les vrais clusters.\\ Sinon on pourrait ajouter dans la partie expÃ© une rÃ©fÃ©rence vers le journal supercomput de krylov multi pour confirmer que cette mÃ©thode est meilleure que GMRES sur les clusters large Ã©chelle.}
+
+We also confirm  the efficiency  of the
+asynchronous  multisplitting algorithm  compared to the synchronous  GMRES. 
+
+\LZK{P.S.: Pour tout le papier, le principal objectif n'est pas de faire des comparaisons entre des mÃ©thodes itÃ©ratives!!\\Sinon, les deux algorithmes Krylov multisplitting synchrone et multisplitting asynchrone sont plus efficaces que GMRES sur des clusters Ã  large Ã©chelle.\\Et prÃ©ciser, si c'est vraiment le cas, que le multisplitting asynchrone est plus efficace et adaptÃ© aux clusters distants par rapport aux deux autres algos (je n'ai pas encore lu la partie expÃ©)}
+
+In
 this way and with a simple computing architecture (a laptop) SimGrid allows us
 to run a test campaign  of  a  real parallel iterative  applications on
 different simulated multi-core architectures.  To our knowledge, there is no
@@ -191,8 +203,10 @@ Section~\ref{sec:04} details the different solvers that we use.  Finally our
 experimental results are presented in section~\ref{sec:expe} followed by some
 concluding remarks and perspectives.
 
+\LZK{Proposition d'un titre pour le papier: Grid-enabled simulation of large-scale linear iterative solvers.}
 
-\section{The asynchronous iteration model}
+
+\section{The asynchronous iteration model and the motivations of our work}
 \label{sec:asynchro}
 
 Asynchronous iterative methods have been  studied for many years theoritecally and
@@ -216,6 +230,21 @@ point. In the  asynchronous model, the convergence detection is  more tricky as
 it   must  not   synchronize  all   the  processors.   Interested  readers   can
 consult~\cite{myBCCV05c,bahi07,ccl09:ij}.
 
+The number of iterations required to reach the convergence is generally greater
+for the asynchronous scheme (this number depends depends on  the delay of the
+messages). Note that, it is not the case in the synchronous mode where the
+number of iterations is the same than in the sequential mode. In this way, the
+set of the parameters  of the  platform (number  of nodes,  power of nodes,
+inter and  intra clusters  bandwidth  and  latency \ldots) and  of  the
+application can drastically change the number of iterations required to get the
+convergence. It follows that asynchronous iterative algorithms are difficult to
+optimize since the financial and deployment costs on large scale multi-core
+architecture are often very important. So, prior to delpoyment and tests it
+seems very promising to be able to simulate the behavior of asynchronous
+iterative algorithms. The problematic is then to show that the results produce
+by simulation are in accordance with reality i.e. of the same order of
+magnitude. To our knowledge, there is no study on this problematic.
+
 \section{SimGrid}
  \label{sec:simgrid}
 
@@ -241,7 +270,7 @@ where $x_\ell$ are sub-vectors of the solution $x$, $b_\ell$ are the sub-vectors
 A_{\ell\ell} x_\ell = c_\ell,\mbox{~for~}\ell=1,\ldots,L,
 \label{eq:03}
 \end{equation}
-where right-hand sides $c_\ell=b_\ell-\sum_{m\neq\ell}A_{\ell m}x_m$ are computed using the shared vectors $x_m$. In this paper, we use the well-known iterative method GMRES ({\it Generalized Minimal RESidual})~\cite{saad86} as an inner iteration to approximate the solutions of the different splittings arising from the block Jacobi multisplitting of matrix $A$. The algorithm in Figure~\ref{01} shows the main key points of our block Jacobi two-stage method executed by a cluster of processors. In line~\ref{solve}, the linear sub-system~(\ref{eq:03}) is solved in parallel using GMRES method where $\MIG$ and $\TOLG$ are the maximum number of inner iterations and the tolerance threshold for GMRES respectively. The convergence of the two-stage multisplitting methods, based on synchronous or asynchronous iterations, has been studied by many authors for example~\cite{Bru95,bahi07}.
+where right-hand sides $c_\ell=b_\ell-\sum_{m\neq\ell}A_{\ell m}x_m$ are computed using the shared vectors $x_m$. In this paper, we use the well-known iterative method GMRES ({\it Generalized Minimal RESidual})~\cite{saad86} as an inner iteration to approximate the solutions of the different splittings arising from the block Jacobi multisplitting of matrix $A$. The algorithm in Figure~\ref{alg:01} shows the main key points of our block Jacobi two-stage method executed by a cluster of processors. In line~\ref{solve}, the linear sub-system~(\ref{eq:03}) is solved in parallel using GMRES method where $\MIG$ and $\TOLG$ are the maximum number of inner iterations and the tolerance threshold for GMRES respectively. The convergence of the two-stage multisplitting methods, based on synchronous or asynchronous iterations, has been studied by many authors for example~\cite{Bru95,bahi07}.
 
 \begin{figure}[t]
 %\begin{algorithm}[t]
@@ -529,7 +558,7 @@ and  4x8). We  can  observ  the low  sensitivity  of  the Krylov multisplitting
 in the  grid: in  average, the GMRES  (resp. Multisplitting)  algorithm performs
 $40\%$ better (resp. $48\%$) when running from 2x16=32 to 8x8=64 processors.
 
-\subsubsection{Running on two different inter-clusters network speeds \\} 
+\subsubsection{Running on two different inter-clusters network speeds \\}
 
 \begin{table} [ht!]
 \begin{center}
@@ -550,7 +579,7 @@ speed inter-cluster  network (N1) and  also on  a less performant  network (N2).
 Figure~\ref{fig:02} shows that end users will  gain to reduce the execution time
 for  both  algorithms  in using  a  grid  architecture  like  4x16 or  8x8:  the
 performance was increased  by a factor of  $2$. The results depict  also that when
-the  network speed  drops down (variation of 12.5\%), the  difference between  the two Multisplitting algorithms execution times can reach more than 25\%. 
+the  network speed  drops down (variation of 12.5\%), the  difference between  the two Multisplitting algorithms execution times can reach more than 25\%.
 %\RC{c'est pas clair : la diffÃ©rence entre quoi et quoi?}
 %\DL{pas clair}
 %\RCE{Modifie}