+\subsection{Simulation of two-stage methods using SimGrid framework}
+One of our objectives when simulating the application in Simgrid is, as in real
+life, to get accurate results (solutions of the problem) but also ensure the
+test reproducibility under the same conditions. According to our experience,
+very few modifications are required to adapt a MPI program for the Simgrid
+simulator using SMPI (Simulator MPI). The first modification is to include SMPI
+libraries and related header files (smpi.h). The second modification is to
+suppress all global variables by replacing them with local variables or using a
+Simgrid selector called "runtime automatic switching"
+(smpi/privatize\_global\_variables). Indeed, global variables can generate side
+effects on runtime between the threads running in the same process, generated by
+the Simgrid to simulate the grid environment. \RC{On vire cette phrase ?}The
+last modification on the MPI program pointed out for some cases, the review of
+the sequence of the MPI\_Isend, MPI\_Irecv and MPI\_Waitall instructions which
+might cause an infinite loop.
+\paragraph{Simgrid Simulator parameters}
+\ \\ \noindent Before running a Simgrid benchmark, many parameters for the
+computation platform must be defined. For our experiments, we consider platforms
+in which several clusters are geographically distant, so there are intra and
+inter-cluster communications. In the following, these parameters are described:
+ \item hostfile: hosts description file.
+ \item platform: file describing the platform architecture: clusters (CPU power,
+\dots{}), intra cluster network description, inter cluster network (bandwidth bw,
+latency lat, \dots{}).
+ \item archi : grid computational description (number of clusters, number of
+nodes/processors for each cluster).
+In addition, the following arguments are given to the programs at runtime:
+ \item maximum number of inner and outer iterations;
+ \item inner and outer precisions;
+ \item matrix size (N$_{x}$, N$_{y}$ and N$_{z}$);
matrix diagonal value = 6.0 (for synchronous Krylov multisplitting experiments and 6.2 for asynchronous block Jacobi experiments);
+ \item execution mode: synchronous or asynchronous.
+It should also be noticed that both solvers have been executed with the Simgrid selector -cfg=smpi/running\_power which determines the computational power (here 19GFlops) of the simulator host machine.
+\section{Experimental Results}
+In this section, experiments for both Multisplitting algorithms are reported. First the problem sued in our experiments is described.
+\subsection{Study setup and Simulation Methodology}
+First, to conduct our study, we propose the following methodology
+which can be reused for any grid-enabled applications.\\
+\textbf{Step 1}: Choose with the end users the class of algorithms or
+the application to be tested. Numerical parallel iterative algorithms
+have been chosen for the study in this paper. \\
+\textbf{Step 2}: Collect the software materials needed for the
+experimentation. In our case, we have two variants algorithms for the
+resolution of the 3D-Poisson problem: (1) using the classical GMRES; (2) and the Multisplitting method. In addition, the Simgrid simulator has been chosen to simulate the behaviors of the
+distributed applications. Simgrid is running on the Mesocentre datacenter in the University of Franche-Comte and also in a virtual machine on a laptop. \\
+\textbf{Step 3}: Fix the criteria which will be used for the future
+results comparison and analysis. In the scope of this study, we retain
+on the one hand the algorithm execution mode (synchronous and asynchronous)
+and on the other hand the execution time and the number of iterations to reach the convergence. \\
+\textbf{Step 4 }: Set up the different grid testbed environments that will be
+simulated in the simulator tool to run the program. The following architecture
+has been configured in Simgrid : 2x16, 4x8, 4x16, 8x8 and 2x50. The first number
+represents the number of clusters in the grid and the second number represents
+the number of hosts (processors/cores) in each cluster. The network has been
+designed to operate with a bandwidth equals to 10Gbits (resp. 1Gbits/s) and a
+latency of 8.10$^{-6}$ seconds (resp. 5.10$^{-5}$) for the intra-clusters links
+(resp. inter-clusters backbone links). \\
+\textbf{Step 5}: Conduct an extensive and comprehensive testings
+within these configurations by varying the key parameters, especially
+the CPU power capacity, the network parameters and also the size of the
+input data. \\