\end{equation}
where right-hand sides $c_\ell=b_\ell-\sum_{m\neq\ell}A_{\ell m}x_m$ are computed using the shared vectors $x_m$. In this paper, we use the well-known iterative method GMRES~\cite{saad86} as an inner iteration to approximate the solutions of the different splittings arising from the block Jacobi multisplitting of matrix $A$. The algorithm in Figure~\ref{alg:01} shows the main key points of our block Jacobi two-stage method executed by a cluster of processors. In line~\ref{solve}, the linear sub-system~(\ref{eq:03}) is solved in parallel using GMRES method where $\MIG$ and $\TOLG$ are the maximum number of inner iterations and the tolerance threshold for GMRES respectively. The convergence of the two-stage multisplitting methods, based on synchronous or asynchronous iterations, has been studied by many authors for example~\cite{Bru95,bahi07}.
-\begin{figure}[t]
+\begin{figure}[htpb]
%\begin{algorithm}[t]
%\caption{Block Jacobi two-stage multisplitting method}
\begin{algorithmic}[1]
\end{equation}
The algorithm in Figure~\ref{alg:02} includes the procedure of the residual minimization and the outer iteration is restarted with a new approximation $\tilde{x}$ at every $s$ iterations. The least-squares problem~(\ref{eq:06}) is solved in parallel by all clusters using CGLS method~\cite{Hestenes52} such that $\MIC$ is the maximum number of iterations and $\TOLC$ is the tolerance threshold for this method (line~\ref{cgls} in Figure~\ref{alg:02}).
-\begin{figure}[t]
+\begin{figure}[htbp]
%\begin{algorithm}[t]
%\caption{Krylov two-stage method using block Jacobi multisplitting}
\begin{algorithmic}[1]
inter-cluster communications. In the following, these parameters are described:
\begin{itemize}
- \item hostfile: hosts description file.
+ \item hostfile: hosts description file,
\item platform: file describing the platform architecture: clusters (CPU power,
\dots{}), intra cluster network description, inter cluster network (bandwidth $bw$,
-latency $lat$, \dots{}).
+latency $lat$, \dots{}),
\item archi : grid computational description (number of clusters, number of
nodes/processors in each cluster).
\end{itemize}
on the one hand the algorithm execution mode (synchronous and asynchronous)
and on the other hand the execution time and the number of iterations to reach the convergence. \\
-\textbf{Step 4 }: Set up the different grid testbed environments that will be
+\textbf{Step 4}: Set up the different grid testbed environments that will be
simulated in the simulator tool to run the program. The following architectures
have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number
represents the number of clusters in the grid and the second number represents
%\RC{Les légendes ne sont pas explicites...}
%\RCE{Corrige}
-\begin{figure} [ht!]
+\begin{figure} [htbp]
\begin{center}
\includegraphics[width=100mm]{cluster_x_nodes_nx_150_and_nx_170.pdf}
\end{center}
%\begin{wrapfigure}{l}{100mm}
-\begin{figure} [ht!]
+\begin{figure} [htbp]
\centering
\includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf}
\caption{Various grid configurations with networks N1 vs N2
\label{tab:03}
\end{table}
-\begin{figure} [ht!]
+\begin{figure} [htbp]
\centering
\includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf}
\caption{Network latency impacts on execution time
\begin{figure} [htbp]
\centering
\includegraphics[width=100mm]{network_bandwith_impact_on_execution_time.pdf}
-\caption{Network bandwith impacts on execution time
-\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.}
-\RCE{Corrige}
+\caption{Network bandwith impacts on execution time}
+%\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.}
+%\RCE{Corrige}
\label{fig:04}
\end{figure}