X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/b0ab45cb63050774fc164efad1e58659b71acfa6..f1ca3116c910d634d5282b5b3e4dc929cae46560:/paper.tex

diff --git a/paper.tex b/paper.tex
index 35ab88f..1391f0f 100644
--- a/paper.tex
+++ b/paper.tex
@@ -588,6 +588,7 @@ efficient for distributed systems with high latency networks.
 \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf}
 \caption{Various grid configurations with networks $N1$ vs. $N2$}
 \LZK{CE, remplacer les ``,'' des dÃ©cimales par un ``.''}
+\RCE{ok}
 \label{fig:02}
 \end{figure}
 
@@ -639,7 +640,7 @@ both GMRES and  Krylov two-stage algorithms is in accordance  with larger real
 executions on large scale supercomputers~\cite{couturier15}.
 
 
-\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
+\subsection{Comparison between synchronous GMRES and asynchronous two-stage multisplitting algorithms}
 
 The previous paragraphs  put in evidence the interests to  simulate the behavior
 of  the application  before  any  deployment in  a  real  environment.  In  this
@@ -654,42 +655,34 @@ synchronization  with   the  other   processors.  Thus,  the   asynchronous  may
 theoretically reduce  the overall execution  time and can improve  the algorithm
 performance.
 
-In this section,  the Simgrid simulator is  used to compare the  behavior of the
-multisplitting in  asynchronous mode  with GMRES  in synchronous  mode.  Several
-benchmarks have  been performed with  various combination of the  grid resources
-(CPU, Network, input  matrix size, \ldots ). The test  conditions are summarized
-in  Table~\ref{tab:07}. In  order to  compare  the execution  times, this  table
+In this section,  the SimGrid simulator is  used to compare the  behavior of the
+two-stage algorithm in  asynchronous mode  with GMRES  in synchronous  mode.  Several
+benchmarks have  been performed with  various combinations of the  grid resources
+(CPU, Network, matrix size, \ldots). The test  conditions are summarized
+in  Table~\ref{tab:02}. In  order to  compare  the execution  times, Table~\ref{tab:03}
 reports the  relative gain between both  algorithms. It is defined  by the ratio
 between  the   execution  time  of   GMRES  and   the  execution  time   of  the
-multisplitting.  The  ratio  is  greater  than  one  because  the  asynchronous
+multisplitting.  
+\LZK{Quelle table repporte les gains relatifs?? SÃ»rement pas Table II !!}
+\RCE{Table III avec la nouvelle numerotation}
+The  ratio  is  greater  than  one  because  the  asynchronous
 multisplitting version is faster than GMRES.
 
-
-
-\begin{table} [htbp]
+\begin{table}[htbp]
 \centering
-\begin{tabular}{r c }
+\begin{tabular}{ll}
  \hline
- Grid Architecture & 2 $\times$ 50 totaling 100 processors\\ %\hline
- Processors Power & 1 GFlops to 1.5 GFlops\\
-   Intra-Network & bw=1.25 Gbits - lat=5.10$^{-5}$ \\ %\hline
-   Inter-Network & bw=5 Mbits - lat=2.10$^{-2}$\\
- Input matrix size & $N_{x}$ = From 62 to 150\\ %\hline
- Residual error precision & 10$^{-5}$ to 10$^{-9}$\\ \hline \\
+ Grid architecture                       & 2$\times$50 totaling 100 processors\\
+ Processors Power                        & 1 GFlops to 1.5 GFlops \\
+ \multirow{2}{*}{Network inter-clusters} & $bw$=1.25 Gbits, $lat=50\mu$s \\
+                                         & $bw$=5 Mbits, $lat=20ms$s\\
+ Matrix size                             & from $62^3$ to $150^3$\\
+ Residual error precision                & $10^{-5}$ to $10^{-9}$\\ \hline \\
  \end{tabular}
-\caption{Test conditions: GMRES in synchronous mode vs Krylov Multisplitting in asynchronous mode}
-\label{tab:07}
+\caption{Test conditions: GMRES in synchronous mode vs. Krylov two-stage in asynchronous mode}
+\label{tab:02}
 \end{table}
 
-Again,  comprehensive and  extensive tests  have been  conducted with  different
-parameters as  the CPU power, the  network parameters (bandwidth and  latency)
-and with different problem size. The  relative gains greater than $1$  between the
-two algorithms have  been captured after  each step  of the test.   In
-Table~\ref{tab:08}  are  reported the  best  grid  configurations allowing
-the  multisplitting method to  be more than  $2.5$ times faster  than the
-classical  GMRES.  These  experiments also  show the  relative tolerance  of the
-multisplitting algorithm when using a low speed network as usually observed with
-geographically distant clusters through the internet.
 
 % use the same column width for the following three tables
 \newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}}
@@ -727,15 +720,24 @@ geographically distant clusters through the internet.
     \hline
   \end{mytable}
 %\end{table}
- \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES}
- \label{tab:08}
+ \caption{Relative gains of the two-stage multisplitting algorithm compared with the classical GMRES}
+ \label{tab:03}
 \end{table}
 
+Again,  comprehensive and  extensive tests  have been  conducted with  different
+parameters as  the CPU power, the  network parameters (bandwidth and  latency)
+and with different problem size. The  relative gains greater than $1$  between the
+two algorithms have  been captured after  each step  of the test.   In
+Table~\ref{tab:08}  are  reported the  best  grid  configurations allowing
+the  two-stage multisplitting algorithm to  be more than  $2.5$ times faster  than the
+classical  GMRES.  These  experiments also  show the  relative tolerance  of the
+multisplitting algorithm when using a low speed network as usually observed with
+geographically distant clusters through the internet.
 
-\section{Conclusion}
 
+\section{Conclusion}
 In this paper we have presented the simulation of the execution of three
-different parallel solvers on some multi-core architectures. We have show that
+different parallel solvers on some multi-core architectures. We have shown that
 the SimGrid toolkit is an interesting simulation tool that has allowed us to
 determine  which method  to choose  given a  specified multi-core  architecture.
 Moreover the simulated results are in accordance (i.e. with the same order of
@@ -757,7 +759,7 @@ converge and so to very different execution times.
 In future works, we  plan to investigate how to simulate  the behavior of really
 large scale  applications. For  example, if  we are  interested to  simulate the
 execution of the solvers of this paper with thousand or even dozens of thousands
-or core,  it is not possible  to do that with  SimGrid. In fact, this  tool will
+of cores,  it is not possible  to do that with  SimGrid. In fact, this  tool will
 make the real computation. So we plan to focus our research on that problematic.