RCE : Point decimal sur les graphiques

[rce2015.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index cdcce0078f8916dfcd325fd7038d929f8aa5c712..1391f0f3c275a4e255b7f2746bcf65a5fa08b703 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -588,6 +588,7 @@ efficient for distributed systems with high latency networks.
  \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf}
  \caption{Various grid configurations with networks $N1$ vs. $N2$}
  \LZK{CE, remplacer les ``,'' des décimales par un ``.''}
+\RCE{ok}
  \label{fig:02}
  \end{figure}
  
@@ -597,7 +598,7 @@ Figure~\ref{fig:03} shows the impact of the network latency on the performances
  \begin{figure}[ht]
  \centering
  \includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf}
-\caption{Network latency impacts on execution times}
+\caption{Network latency impacts on performances}
  \label{fig:03}
  \end{figure}
  
@@ -607,7 +608,7 @@ Figure~\ref{fig:04} reports the results obtained for the simulation of a grid of
  \begin{figure}[ht]
  \centering
  \includegraphics[width=100mm]{network_bandwith_impact_on_execution_time.pdf}
-\caption{Network bandwith impacts on execution time}
+\caption{Network bandwith impacts on performances}
  \label{fig:04}
  \end{figure}
  
@@ -618,73 +619,28 @@ These findings may help a lot end users to setup the best and the optimal target
  \begin{figure}[ht]
  \centering
  \includegraphics[width=100mm]{pb_size_impact_on_execution_time.pdf}
-\caption{Problem size impacts on execution times}
+\caption{Problem size impacts on performances}
  \label{fig:05}
  \end{figure}
  
+\subsubsection{CPU power impacts on performances\\}
+Using the SimGrid simulator flexibility, we have tried to determine the impact of the CPU power of the processors in the different clusters on performances of both algorithms. We have varied the CPU power from $1$GFlops to $19$GFlops. The simulation is conducted in a grid of 2$\times$16 processors interconnected by the network $N2$ (see Table~\ref{tab:01}) to solve a 3D Poisson problem of size $150^3$. The results depicted in Figure~\ref{fig:06} confirm the performance gain, about $95\%$ for both algorithms, after improving the CPU power of processors.
  
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-\subsubsection{CPU Power impacts on performance\\}
-
-
-\begin{table} [htbp]
-\centering
-\begin{tabular}{r c }
- \hline
- Grid architecture & 2 $\times$ 16\\ %\hline
- Inter Network & N2 : $bw$=1Gbs - $lat$=5.10$^{-5}$ \\ %\hline
- Input matrix size & $N_{x} = 150 \times 150 \times 150$\\ 
- CPU Power & From 3 to 19 GFlops \\ \hline
- \end{tabular}
-\caption{Test conditions: CPU Power impacts}
-\label{tab:06}
-\end{table}
-
-\begin{figure} [ht!]
+\begin{figure}[ht]
  \centering
  \includegraphics[width=100mm]{cpu_power_impact_on_execution_time.pdf}
-\caption{CPU Power impacts on execution time}
+\caption{CPU Power impacts on performances}
  \label{fig:06}
  \end{figure}
-
-Using the Simgrid  simulator flexibility, we have tried to  determine the impact
-on the  algorithms performance in  varying the CPU  power of the  clusters nodes
-from $1$ to $19$ GFlops.  The outputs  depicted in Figure~\ref{fig:06}  confirm the
-performance gain,  around $95\%$ for  both of the  two methods, after  adding more
-powerful CPU.
  \ \\
-%\DL{il faut une conclusion sur ces tests : ils confirment les résultats déjà
-%obtenus en grandeur réelle. Donc c'est une aide précieuse pour les dev. Pas
-%besoin de déployer sur une archi réelle}
-
  To conclude these series of experiments, with  SimGrid we have been able to make
  many simulations  with many parameters  variations. Doing all  these experiments
  with a real platform is most of  the time not possible. Moreover the behavior of
-both GMRES and  Krylov multisplitting methods is in accordance  with larger real
-executions on large scale supercomputer~\cite{couturier15}.
+both GMRES and  Krylov two-stage algorithms is in accordance  with larger real
+executions on large scale supercomputers~\cite{couturier15}.
  
  
-\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
+\subsection{Comparison between synchronous GMRES and asynchronous two-stage multisplitting algorithms}
  
  The previous paragraphs  put in evidence the interests to  simulate the behavior
  of  the application  before  any  deployment in  a  real  environment.  In  this
@@ -699,42 +655,34 @@ synchronization  with   the  other   processors.  Thus,  the   asynchronous  may
  theoretically reduce  the overall execution  time and can improve  the algorithm
  performance.
  
-In this section,  the Simgrid simulator is  used to compare the  behavior of the
-multisplitting in  asynchronous mode  with GMRES  in synchronous  mode.  Several
-benchmarks have  been performed with  various combination of the  grid resources
-(CPU, Network, input  matrix size, \ldots ). The test  conditions are summarized
-in  Table~\ref{tab:07}. In  order to  compare  the execution  times, this  table
+In this section,  the SimGrid simulator is  used to compare the  behavior of the
+two-stage algorithm in  asynchronous mode  with GMRES  in synchronous  mode.  Several
+benchmarks have  been performed with  various combinations of the  grid resources
+(CPU, Network, matrix size, \ldots). The test  conditions are summarized
+in  Table~\ref{tab:02}. In  order to  compare  the execution  times, Table~\ref{tab:03}
  reports the  relative gain between both  algorithms. It is defined  by the ratio
  between  the   execution  time  of   GMRES  and   the  execution  time   of  the
-multisplitting.  The  ratio  is  greater  than  one  because  the  asynchronous
+multisplitting.  
+\LZK{Quelle table repporte les gains relatifs?? Sûrement pas Table II !!}
+\RCE{Table III avec la nouvelle numerotation}
+The  ratio  is  greater  than  one  because  the  asynchronous
  multisplitting version is faster than GMRES.
  
-
-
-\begin{table} [htbp]
+\begin{table}[htbp]
  \centering
-\begin{tabular}{r c }
+\begin{tabular}{ll}
   \hline
- Grid Architecture & 2 $\times$ 50 totaling 100 processors\\ %\hline
- Processors Power & 1 GFlops to 1.5 GFlops\\
-   Intra-Network & bw=1.25 Gbits - lat=5.10$^{-5}$ \\ %\hline
-   Inter-Network & bw=5 Mbits - lat=2.10$^{-2}$\\
- Input matrix size & $N_{x}$ = From 62 to 150\\ %\hline
- Residual error precision & 10$^{-5}$ to 10$^{-9}$\\ \hline \\
+ Grid architecture                       & 2$\times$50 totaling 100 processors\\
+ Processors Power                        & 1 GFlops to 1.5 GFlops \\
+ \multirow{2}{*}{Network inter-clusters} & $bw$=1.25 Gbits, $lat=50\mu$s \\
+                                         & $bw$=5 Mbits, $lat=20ms$s\\
+ Matrix size                             & from $62^3$ to $150^3$\\
+ Residual error precision                & $10^{-5}$ to $10^{-9}$\\ \hline \\
   \end{tabular}
-\caption{Test conditions: GMRES in synchronous mode vs Krylov Multisplitting in asynchronous mode}
-\label{tab:07}
+\caption{Test conditions: GMRES in synchronous mode vs. Krylov two-stage in asynchronous mode}
+\label{tab:02}
  \end{table}
  
-Again,  comprehensive and  extensive tests  have been  conducted with  different
-parameters as  the CPU power, the  network parameters (bandwidth and  latency)
-and with different problem size. The  relative gains greater than $1$  between the
-two algorithms have  been captured after  each step  of the test.   In
-Table~\ref{tab:08}  are  reported the  best  grid  configurations allowing
-the  multisplitting method to  be more than  $2.5$ times faster  than the
-classical  GMRES.  These  experiments also  show the  relative tolerance  of the
-multisplitting algorithm when using a low speed network as usually observed with
-geographically distant clusters through the internet.
  
  % use the same column width for the following three tables
  \newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}}
@@ -772,15 +720,24 @@ geographically distant clusters through the internet.
      \hline
    \end{mytable}
  %\end{table}
- \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES}
- \label{tab:08}
+ \caption{Relative gains of the two-stage multisplitting algorithm compared with the classical GMRES}
+ \label{tab:03}
  \end{table}
  
+Again,  comprehensive and  extensive tests  have been  conducted with  different
+parameters as  the CPU power, the  network parameters (bandwidth and  latency)
+and with different problem size. The  relative gains greater than $1$  between the
+two algorithms have  been captured after  each step  of the test.   In
+Table~\ref{tab:08}  are  reported the  best  grid  configurations allowing
+the  two-stage multisplitting algorithm to  be more than  $2.5$ times faster  than the
+classical  GMRES.  These  experiments also  show the  relative tolerance  of the
+multisplitting algorithm when using a low speed network as usually observed with
+geographically distant clusters through the internet.
  
-\section{Conclusion}
  
+\section{Conclusion}
  In this paper we have presented the simulation of the execution of three
-different parallel solvers on some multi-core architectures. We have show that
+different parallel solvers on some multi-core architectures. We have shown that
  the SimGrid toolkit is an interesting simulation tool that has allowed us to
  determine  which method  to choose  given a  specified multi-core  architecture.
  Moreover the simulated results are in accordance (i.e. with the same order of
@@ -802,7 +759,7 @@ converge and so to very different execution times.
  In future works, we  plan to investigate how to simulate  the behavior of really
  large scale  applications. For  example, if  we are  interested to  simulate the
  execution of the solvers of this paper with thousand or even dozens of thousands
-or core,  it is not possible  to do that with  SimGrid. In fact, this  tool will
+of cores,  it is not possible  to do that with  SimGrid. In fact, this  tool will
  make the real computation. So we plan to focus our research on that problematic.