X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/96e726f9524ffdb8eaead14a943d80870ea1da65..538afc25a7a90630d2f90891d0a4d700bfe3460f:/paper.tex?ds=sidebyside

diff --git a/paper.tex b/paper.tex
index c3cfdbb..a2e23a2 100644
--- a/paper.tex
+++ b/paper.tex
@@ -442,8 +442,6 @@ In this section, experiments for both multisplitting algorithms are reported. Fi
 
 \subsection{The 3D Poisson problem}
 \label{3dpoisson}
-
-
 We use our two-stage algorithms to solve the well-known Poisson problem $\nabla^2\phi=f$~\cite{Polyanin01}. In three-dimensional Cartesian coordinates in $\mathbb{R}^3$, the problem takes the following form:
 \begin{equation}
 \frac{\partial^2}{\partial x^2}\phi(x,y,z)+\frac{\partial^2}{\partial y^2}\phi(x,y,z)+\frac{\partial^2}{\partial z^2}\phi(x,y,z)=f(x,y,z)\mbox{~in the domain~}\Omega
@@ -489,13 +487,7 @@ and on the other hand the execution time and the number of iterations to reach t
 simulated in the  simulator tool to run the program.  The following architectures
 have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number
 represents the number  of clusters in the grid and  the second number represents
-the number  of hosts (processors/cores)  in each  cluster. The network has been
-designed to  operate with a bandwidth  equals to 10Gbits (resp.  1Gbits/s) and a
-latency of 8.10$^{-6}$ seconds (resp.  5.10$^{-5}$) for the intra-clusters links
-(resp.  inter-clusters backbone links).  \\
-
-\LZK{Il me semble que le bw et lat des deux rÃ©seaux varient dans les expÃ©s d'une simu Ã  l'autre. On vire la derniÃ¨re phrase?}
-\RC{il me semble qu'on peut laisser ca}
+the number  of hosts (processors/cores)  in each  cluster. \\
 
 \textbf{Step 5}: Conduct an extensive and comprehensive testings
 within these configurations by varying the key parameters, especially
@@ -536,60 +528,57 @@ and  between distant  clusters.  This parameter is application dependent.
  a lower speed.  The network  between distant  clusters might  be a  bottleneck
  for  the global performance of the application.
 
-\subsection{Comparison of GMRES and Krylov two-stage algorithms in synchronous mode}
-
-In the scope  of this paper, our  first objective is to analyze  when the Krylov
-two-stage method has  better  performance  than   the  classical  GMRES method. With a synchronous  iterative method, better performance means a
-smaller number of iterations and execution time before reaching the convergence.
-In what follows, we will present the test conditions, the output results and our comments.
-
-%%RAPH : on vire ca, c'est pas clair et pas important
-%For a systematic study,  the experiments  should figure  out  that, for  various
-%grid  parameters values, the simulator will confirm Multisplitting method  better performance compared to classical GMRES, particularly on poor and slow networks.
-%\LZK{Pas du tout claire la derniÃ¨re phrase (For a systematic...)!!}
-%\RCE { Reformule autrement}
-
 
+\subsection{Comparison between GMRES and two-stage multisplitting algorithms in synchronous mode}
+In the scope of this paper, our first objective is to analyze when the synchronous Krylov two-stage method has better performance than the classical GMRES method. With a synchronous iterative method, better performance means a smaller number of iterations and execution time before reaching the convergence.
 
-%\subsubsection{Execution of the algorithms on various computational grid architectures and scaling up the input matrix size}
-\subsubsection{Simulations for various grid architectures and scaling-up matrix sizes}
-\ \\
-% environment
-
-\RC{Je ne comprends plus rien CE : pourquoi dans 5.4.1 il y a 2 network et aussi dans 5.4.2. Quelle est la diffÃ©rence? Dans la figure 3 de la section 5.4.1 pourquoi il n'y a pas N1 et N2?}
+Table~\ref{tab:01} summarizes the parameters used in the different simulations: the grid architectures, the network of inter-clusters backbone links and the matrix sizes of the 3D Poisson problem. However, for all simulations we fix the network parameters of the intra-clusters links: the bandwidth $bw$=10Gbs and the latency $lat$=8$\times$10$^{-6}$. In what follows, we will present the test conditions, the output results and our comments. 
 
 \begin{table} [ht!]
 \begin{center}
-\begin{tabular}{ll }
- \hline
- Grid architecture & 2$\times$16, 4$\times$8, 4$\times$16 and 8$\times$8\\ %\hline
- \multirow{2}{*}{Network} & Inter (N2): $bw$=1Gbs, $lat$=5$\times$10$^{-5}$ \\ %\hline
-                          & Intra (N1): $bw$=10Gbs, $lat$=8$\times$10$^{-6}$ \\
- \multirow{2}{*}{Matrix size}  & N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$ =150 $\times$ 150 $\times$ 150\\ %\hline
-  &  N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$  =170 $\times$ 170 $\times$ 170    \\ \hline
- \end{tabular}
-\caption{Test conditions: various grid configurations with the matrix sizes 150$^3$ or 170$^3$}
-%\LZK{Ce sont les caractÃ©ristiques du rÃ©seau intra ou inter clusters? Ce n'est pas prÃ©cisÃ©...}
-%\RCE{oui c est precise}
+\begin{tabular}{ll}
+\hline
+Grid architecture                       & 2$\times$16, 4$\times$8, 4$\times$16 and 8$\times$8\\ 
+\multirow{2}{*}{Network inter-clusters} & $N1$: $bw$=1Gbs, $lat$=5$\times$10$^{-5}$ \\ 
+                                        & $N2$: $bw$=10Gbs, $lat$=8$\times$10$^{-6}$ \\
+\multirow{2}{*}{Matrix size}            & $Mat1$: N$_{x}\times$N$_{y}\times$N$_{z}$=150$\times$150$\times$150\\
+                                        & $Mat2$: N$_{x}\times$N$_{y}\times$N$_{z}$=170$\times$170$\times$170 \\ \hline
+\end{tabular}
+\caption{Parameters for the different simulations}
 \label{tab:01}
 \end{center}
 \end{table}
+
+\subsubsection{Simulations for various grid architectures and scaling-up matrix sizes}
+\  \\
+% environment
+In this section, we analyze the simulations conducted on various grid configurations and for different sizes of the 3D Poisson problem. The parameters of the network between clusters is fixed to $N1$ (see Table~\ref{tab:01}. Figure~\ref{fig:01} shows, for all grid configurations and a given matrix size 170$^3$ elements, a non-variation in the number of iterations for the classical GMRES algorithm, which is not the case of the Krylov two-stage algorithm.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
 
 
-In  this  section,  we  analyze   the  simulations  conducted  on  various  grid
-configurations presented  in Table~\ref{tab:01}. It  should be noticed  that two
-networks are considered: N1 is  the network between clusters (inter-cluster) and
-N2 is the network inside  a cluster (intra-cluster).  Figure~\ref{fig:01} shows,
-for all  grid configurations  and a  given matrix size,  a non-variation  in the
-number of iterations for the classical GMRES algorithm, which is not the case of
-the Krylov two-stage algorithm.
-%% First,  the results in  Figure~\ref{fig:01}
-%% show for all grid configurations the non-variation of the number of iterations of
-%% classical  GMRES for  a given  input matrix  size; it is not  the case  for the
-%% multisplitting method.
-%\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vÃ©rifie...}
-%\RC{Les lÃ©gendes ne sont pas explicites...}
-%\RCE{Corrige}
 
 \begin{figure} [htbp]
   \begin{center}
@@ -644,9 +633,9 @@ the  network speed  drops down (variation of 12.5\%), the  difference between  t
 \begin{figure} [htbp]
 \centering
 \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf}
-\caption{Various grid configurations with networks N1 vs N2
-\AG{\np{8E-6}, \np{5E-6} au lieu de 8E-6, 5E-6}}
-\RCE{Corrige}
+\caption{Various grid configurations with networks N1 vs N2}
+%\AG{\np{8E-6}, \np{5E-6} au lieu de 8E-6, 5E-6}}
+%\RCE{Corrige}
 \label{fig:02}
 \end{figure}
 %\end{wrapfigure}
@@ -670,18 +659,19 @@ the  network speed  drops down (variation of 12.5\%), the  difference between  t
 \begin{figure} [htbp]
 \centering
 \includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf}
-\caption{Network latency impacts on execution time
-\AG{\np{E-6}}}
+\caption{Network latency impacts on execution time}
+%\AG{\np{E-6}}}
 \label{fig:03}
 \end{figure}
 
-According to  the results of  Figure~\ref{fig:03}, a degradation of  the network
-latency from  $8.10^{-6}$ to  $6.10^{-5}$ implies an  absolute time  increase of
-more  than $75\%$  (resp.  $82\%$)  of the  execution  for  the classical  GMRES
-(resp.  Krylov multisplitting)  algorithm which means that the GMRES seems tolerate more the network latency variation with a less  rate increase  of  the  execution time. However, the execution time factor between the two algorithms varies from 2.2 to 1.5 times with a network latency decreasing from $8.10^{-6}$ to  $6.10^{-5}$.
+In Table~\ref{tab:03}, parameters  for the influence of the  network latency are
+reported.  According to the results of Figure~\ref{fig:03}, a degradation of the
+network  latency  from  $8.10^{-6}$  to $6.10^{-5}$  implies  an  absolute  time
+increase of more than $75\%$ (resp.   $82\%$) of the execution for the classical
+GMRES  (resp.   Krylov  multisplitting)  algorithm. The  execution  time  factor
+between the two algorithms  varies from 2.2 to 1.5 times  with a network latency
+decreasing from $8.10^{-6}$ to $6.10^{-5}$.
 
-\RC{Les  2  prÃ©cÃ©dentes phrases  me  semblent en contradiction....}  
-\RCE{Reformule}
 
 \subsubsection{Network bandwidth impacts on performance}
 \ \\
@@ -694,8 +684,9 @@ more  than $75\%$  (resp.  $82\%$)  of the  execution  for  the classical  GMRES
                           & $lat$= 5.10$^{-5}$ second \\
  Input matrix size & $N_{x} \times N_{y} \times N_{z} =150 \times 150 \times 150$\\ \hline \\
  \end{tabular}
-\caption{Test conditions: Network bandwidth impacts\RC{Qu'est ce qui varie ici? Il n'y a pas de variation dans le tableau}}
-\RCE{C est le bw}
+\caption{Test conditions: Network bandwidth impacts}
+%  \RC{Qu'est ce qui varie ici? Il n'y a pas de variation dans le tableau}
+%\RCE{C est le bw}
 \label{tab:04}
 \end{table}
 
@@ -723,7 +714,7 @@ of $40\%$ which is only around $24\%$ for the classical GMRES.
  \hline
  Grid Architecture & 4 $\times$ 8\\ %\hline
  Inter Network & $bw$=1Gbs - $lat$=5.10$^{-5}$ \\
- Input matrix size & $N_{x} \times N_{y} \times N_{z}$ = From 40$^{3}$ to 200$^{3}$\\ \hline
+ Input matrix size & $N_{x} \times N_{y} \times N_{z}$ = From 50$^{3}$ to 190$^{3}$\\ \hline
  \end{tabular}
 \caption{Test conditions: Input matrix size impacts}
 \label{tab:05}
@@ -737,20 +728,15 @@ of $40\%$ which is only around $24\%$ for the classical GMRES.
 \label{fig:05}
 \end{figure}
 
-In these experiments, the input matrix size  has been set from $N_{x} = N_{y}
-= N_{z} = 40$ to $200$ side elements  that is from $40^{3} = 64.000$ to $200^{3}
-= 8,000,000$  points. Obviously, as  shown in Figure~\ref{fig:05},  the execution
-time for  both algorithms increases when  the input matrix size  also increases.
-But the interesting results are:
-\begin{enumerate}
-  \item the important increase ($10$ times)  of the number of iterations needed to
-    reach the convergence for the classical GMRES algorithm particularly, when the matrix size
-    go beyond $N_{x}=150$; \RC{C'est toujours pas clair... ok le nommbre d'itÃ©rations est 10 fois plus long mais la suite de la phrase ne veut rien dire}
-    \RCE{Le nombre d'iterations augmente de 10 fois, cela surtout a partir de N=150}
-    
-\item the  classical GMRES execution time  is almost the double  for $N_{x}=140$
-  compared with the Krylov multisplitting method.
-\end{enumerate}
+In  these  experiments, the  input  matrix  size has  been  set  from $50^3$  to
+$190^3$. Obviously, as shown in Figure~\ref{fig:05}, the execution time for both
+algorithms increases when the input matrix size also increases.  For all problem
+sizes, GMRES is always slower than the Krylov multisplitting. Moreover, for this
+benchmark, it seems that  the greater the problem size is,  the bigger the ratio
+between both  algorithm execution  times is.  We can also  observ that  for some
+problem   sizes,  the   Krylov   multisplitting  convergence   varies  quite   a
+lot. Consequently the execution times in that cases also varies.
+
 
 These  findings may  help a  lot end  users to  setup the  best and  the optimal
 targeted environment for the application deployment when focusing on the problem