+\subsection{Simulation of two-stage methods using SimGrid framework}
+\label{sec:04.02}
+
+One of our objectives when simulating the application in Simgrid is, as in real
+life, to get accurate results (solutions of the problem) but also ensure the
+test reproducibility under the same conditions. According to our experience,
+very few modifications are required to adapt a MPI program for the Simgrid
+simulator using SMPI (Simulator MPI). The first modification is to include SMPI
+libraries and related header files (smpi.h). The second modification is to
+suppress all global variables by replacing them with local variables or using a
+Simgrid selector called "runtime automatic switching"
+(smpi/privatize\_global\_variables). Indeed, global variables can generate side
+effects on runtime between the threads running in the same process, generated by
+the Simgrid to simulate the grid environment. \RC{On vire cette phrase ?}The
+last modification on the MPI program pointed out for some cases, the review of
+the sequence of the MPI\_Isend, MPI\_Irecv and MPI\_Waitall instructions which
+might cause an infinite loop.
+
+
+\paragraph{Simgrid Simulator parameters}
+\ \\ \noindent Before running a Simgrid benchmark, many parameters for the
+computation platform must be defined. For our experiments, we consider platforms
+in which several clusters are geographically distant, so there are intra and
+inter-cluster communications. In the following, these parameters are described:
+
+\begin{itemize}
+ \item hostfile: hosts description file.
+ \item platform: file describing the platform architecture: clusters (CPU power,
+\dots{}), intra cluster network description, inter cluster network (bandwidth bw,
+latency lat, \dots{}).
+ \item archi : grid computational description (number of clusters, number of
+nodes/processors for each cluster).
+\end{itemize}
+\noindent
+In addition, the following arguments are given to the programs at runtime:
+
+\begin{itemize}
+ \item maximum number of inner and outer iterations;
+ \item inner and outer precisions;
+ \item matrix size (N$_{x}$, N$_{y}$ and N$_{z}$);
+ \item matrix diagonal value = 6.0 (for synchronous Krylov multisplitting experiments and 6.2 for asynchronous block Jacobi experiments); \RC{CE tu vérifie, je dis ca de tête}
+ \item execution mode: synchronous or asynchronous.
+\end{itemize}
+
+It should also be noticed that both solvers have been executed with the Simgrid selector -cfg=smpi/running\_power which determines the computational power (here 19GFlops) of the simulator host machine.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\section{Experimental Results}
+\label{sec:expe}
+
+
+\subsection{Study setup and Simulation Methodology}
+
+To conduct our study, we have put in place the following methodology
+which can be reused for any grid-enabled applications.
+
+\textbf{Step 1} : Choose with the end users the class of algorithms or
+the application to be tested. Numerical parallel iterative algorithms
+have been chosen for the study in this paper. \\
+
+\textbf{Step 2} : Collect the software materials needed for the
+experimentation. In our case, we have two variants algorithms for the
+resolution of the 3D-Poisson problem: (1) using the classical GMRES (Algo-1); (2) and the multisplitting method (Algo-2). In addition, Simgrid simulator has been chosen to simulate the behaviors of the
+distributed applications. Simgrid is running on the Mesocentre datacenter in Franche-Comte University but also in a virtual machine on a laptop. \\
+
+\textbf{Step 3} : Fix the criteria which will be used for the future
+results comparison and analysis. In the scope of this study, we retain
+in one hand the algorithm execution mode (synchronous and asynchronous)
+and in the other hand the execution time and the number of iterations of
+the application before obtaining the convergence. \\
+
+\textbf{Step 4 }: Set up the different grid testbed environments
+which will be simulated in the simulator tool to run the program. The
+following architecture has been configured in Simgrid : 2x16 - that is a
+grid containing 2 clusters with 16 hosts (processors/cores) each -, 4x8,
+4x16, 8x8 and 2x50. The network has been designed to operate with a
+bandwidth equals to 10Gbits (resp. 1Gbits/s) and a latency of 8.10$^{-6}$
+microseconds (resp. 5.10$^{-5}$) for the intra-clusters links (resp.
+inter-clusters backbone links). \\
+
+\textbf{Step 5}: Conduct an extensive and comprehensive testings
+within these configurations in varying the key parameters, especially
+the CPU power capacity, the network parameters and also the size of the
+input matrix. Note that some parameters like some program input arguments should be fixed to be invariant to allow the comparison. \\