where right-hand sides $c_\ell=b_\ell-\sum_{m\neq\ell}A_{\ell m}x_m$ are computed using the shared vectors $x_m$. In this paper, we use the well-known iterative method GMRES~\cite{saad86} as an inner iteration to approximate the solutions of the different splittings arising from the block Jacobi multisplitting of matrix $A$. The algorithm in Figure~\ref{alg:01} shows the main key points of our block Jacobi two-stage method executed by a cluster of processors. In line~\ref{solve}, the linear sub-system~(\ref{eq:03}) is solved in parallel using GMRES method where $\MIG$ and $\TOLG$ are the maximum number of inner iterations and the tolerance threshold for GMRES respectively. The convergence of the two-stage multisplitting methods, based on synchronous or asynchronous iterations, has been studied by many authors for example~\cite{Bru95,bahi07}.
%\caption{Block Jacobi two-stage multisplitting method}
The algorithm in Figure~\ref{alg:02} includes the procedure of the residual minimization and the outer iteration is restarted with a new approximation $\tilde{x}$ at every $s$ iterations. The least-squares problem~(\ref{eq:06}) is solved in parallel by all clusters using CGLS method~\cite{Hestenes52} such that $\MIC$ is the maximum number of iterations and $\TOLC$ is the tolerance threshold for this method (line~\ref{cgls} in Figure~\ref{alg:02}).
%\caption{Krylov two-stage method using block Jacobi multisplitting}
inter-cluster communications. In the following, these parameters are described:
- \item hostfile: hosts description file.
+ \item hostfile: hosts description file,
\item platform: file describing the platform architecture: clusters (CPU power,
\dots{}), intra cluster network description, inter cluster network (bandwidth $bw$,
-latency $lat$, \dots{}).
+latency $lat$, \dots{}),
\item archi : grid computational description (number of clusters, number of
nodes/processors in each cluster).
on the one hand the algorithm execution mode (synchronous and asynchronous)
and on the other hand the execution time and the number of iterations to reach the convergence. \\
-\textbf{Step 4 }: Set up the different grid testbed environments that will be
+\textbf{Step 4}: Set up the different grid testbed environments that will be
simulated in the simulator tool to run the program. The following architectures
have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number
represents the number of clusters in the grid and the second number represents
latency of 8.10$^{-6}$ seconds (resp. 5.10$^{-5}$) for the intra-clusters links
(resp. inter-clusters backbone links). \\
-\LZK{Il me semble que le bw et lat des deux réseaux varient dans les expés d'une simu à l'autre. On vire la dernière phrase?}
-\RC{il me semble qu'on peut laisser ca}
+%\LZK{Il me semble que le bw et lat des deux réseaux varient dans les expés d'une simu à l'autre. On vire la dernière phrase?}
+%\RC{il me semble qu'on peut laisser ca}
\textbf{Step 5}: Conduct an extensive and comprehensive testings
within these configurations by varying the key parameters, especially
%\RC{Les légendes ne sont pas explicites...}
-\begin{figure} [ht!]
+\begin{figure} [htbp]
-\begin{figure} [ht!]
+\begin{figure} [htbp]
-\caption{Various grid configurations with networks N1 vs N2
-\AG{\np{8E-6}, \np{5E-6} au lieu de 8E-6, 5E-6}}
+\caption{Various grid configurations with networks N1 vs N2}
+%\AG{\np{8E-6}, \np{5E-6} au lieu de 8E-6, 5E-6}}
-\begin{figure} [ht!]
+\begin{figure} [htbp]
-\caption{Network latency impacts on execution time
+\caption{Network latency impacts on execution time}
-According to the results of Figure~\ref{fig:03}, a degradation of the network
-latency from $8.10^{-6}$ to $6.10^{-5}$ implies an absolute time increase of
-more than $75\%$ (resp. $82\%$) of the execution for the classical GMRES
-(resp. Krylov multisplitting) algorithm which means that the GMRES seems tolerate more the network latency variation with a less rate increase of the execution time. However, the execution time factor between the two algorithms varies from 2.2 to 1.5 times with a network latency decreasing from $8.10^{-6}$ to $6.10^{-5}$.
+In Table~\ref{tab:03}, parameters for the influence of the network latency are
+reported. According to the results of Figure~\ref{fig:03}, a degradation of the
+network latency from $8.10^{-6}$ to $6.10^{-5}$ implies an absolute time
+increase of more than $75\%$ (resp. $82\%$) of the execution for the classical
+GMRES (resp. Krylov multisplitting) algorithm. The execution time factor
+between the two algorithms varies from 2.2 to 1.5 times with a network latency
+decreasing from $8.10^{-6}$ to $6.10^{-5}$.
-\RC{Les 2 précédentes phrases me semblent en contradiction....}
\subsubsection{Network bandwidth impacts on performance}
\ \\
& $lat$= 5.10$^{-5}$ second \\
Input matrix size & $N_{x} \times N_{y} \times N_{z} =150 \times 150 \times 150$\\ \hline \\
-\caption{Test conditions: Network bandwidth impacts\RC{Qu'est ce qui varie ici? Il n'y a pas de variation dans le tableau}}
-\RCE{C est le bw}
+\caption{Test conditions: Network bandwidth impacts}
+% \RC{Qu'est ce qui varie ici? Il n'y a pas de variation dans le tableau}
+%\RCE{C est le bw}
\begin{figure} [htbp]
-\caption{Network bandwith impacts on execution time
-\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.}
+\caption{Network bandwith impacts on execution time}
+%\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.}