+\subsection{Simulation of two-stage methods using SimGrid framework}
+\label{sec:04.02}
+
+One of our objectives when simulating the application in Simgrid is, as in real
+life, to get accurate results (solutions of the problem) but also ensure the
+test reproducibility under the same conditions. According to our experience,
+very few modifications are required to adapt a MPI program for the Simgrid
+simulator using SMPI (Simulator MPI). The first modification is to include SMPI
+libraries and related header files (smpi.h). The second modification is to
+suppress all global variables by replacing them with local variables or using a
+Simgrid selector called "runtime automatic switching"
+(smpi/privatize\_global\_variables). Indeed, global variables can generate side
+effects on runtime between the threads running in the same process, generated by
+the Simgrid to simulate the grid environment. \RC{On vire cette phrase ?}The
+last modification on the MPI program pointed out for some cases, the review of
+the sequence of the MPI\_Isend, MPI\_Irecv and MPI\_Waitall instructions which
+might cause an infinite loop.
+
+
+\paragraph{Simgrid Simulator parameters}
+\ \\ \noindent Before running a Simgrid benchmark, many parameters for the
+computation platform must be defined. For our experiments, we consider platforms
+in which several clusters are geographically distant, so there are intra and
+inter-cluster communications. In the following, these parameters are described:
+
+\begin{itemize}
+ \item hostfile: hosts description file.
+ \item platform: file describing the platform architecture: clusters (CPU power,
+\dots{}), intra cluster network description, inter cluster network (bandwidth bw,
+latency lat, \dots{}).
+ \item archi : grid computational description (number of clusters, number of
+nodes/processors for each cluster).
+\end{itemize}
+\noindent
+In addition, the following arguments are given to the programs at runtime:
+
+\begin{itemize}
+ \item maximum number of inner and outer iterations;
+ \item inner and outer precisions;
+ \item matrix size (N$_{x}$, N$_{y}$ and N$_{z}$);
+ \item matrix diagonal value = 6.0 (for synchronous Krylov multisplitting experiments and 6.2 for asynchronous block Jacobi experiments); \RC{CE tu vérifies, je dis ca de tête}
+ \item execution mode: synchronous or asynchronous.
+\end{itemize}
+
+It should also be noticed that both solvers have been executed with the Simgrid selector -cfg=smpi/running\_power which determines the computational power (here 19GFlops) of the simulator host machine.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\section{Experimental Results}
+\label{sec:expe}
+
+In this section, experiments for both Multisplitting algorithms are reported. First the problem sued in our experiments is described.
+
+We use our two-stage algorithms to solve the well-known Poisson problem $\nabla^2\phi=f$~\cite{Polyanin01}. In three-dimensional Cartesian coordinates in $\mathbb{R}^3$, the problem takes the following form
+\begin{equation}
+\frac{\partial^2}{\partial x^2}\phi(x,y,z)+\frac{\partial^2}{\partial y^2}\phi(x,y,z)+\frac{\partial^2}{\partial z^2}\phi(x,y,z)=f(x,y,z)\mbox{~in the domain~}\Omega
+\label{eq:07}
+\end{equation}
+such that
+\begin{equation*}
+\phi(x,y,z)=0\mbox{~on the boundary~}\partial\Omega
+\end{equation*}
+where the real-valued function $\phi(x,y,z)$ is the solution sought, $f(x,y,z)$ is a known function and $\Omega=[0,1]^3$. The 3D discretization of the Laplace operator $\nabla^2$ with the finite difference scheme includes 7 points stencil on the computational grid. The numerical approximation of the Poisson problem on three-dimensional grid is repeatedly computed as $\phi=\phi^\star$ such that
+\begin{equation}
+\begin{array}{ll}
+\phi^\star(x,y,z)= & \frac{1}{6}(\phi(x-h,y,z)+\phi(x+h,y,z) \\
+ & +\phi(x,y-h,z)+\phi(x,y+h,z) \\
+ & +\phi(x,y,z-h)+\phi(x,y,z+h)\\
+ & -h^2f(x,y,z))
+\end{array}
+\label{eq:08}
+\end{equation}
+until convergence where $h$ is the grid spacing between two adjacent elements in the 3D computational grid.
+
+In the parallel context, the 3D Poisson problem is partitioned into $L\times p$ sub-problems such that $L$ is the number of clusters and $p$ is the number of processors in each cluster. We apply the three-dimensional partitioning instead of the row-by-row one in order to reduce the size of the data shared at the sub-problems boundaries. In this case, each processor is in charge of parallelepipedic sub-problem and has at most six neighbors in the same cluster or in distant clusters with which it shares data at boundaries.
+
+\subsection{Study setup and Simulation Methodology}
+
+First, to conduct our study, we propose the following methodology
+which can be reused for any grid-enabled applications.\\
+
+\textbf{Step 1}: Choose with the end users the class of algorithms or
+the application to be tested. Numerical parallel iterative algorithms
+have been chosen for the study in this paper. \\
+
+\textbf{Step 2}: Collect the software materials needed for the
+experimentation. In our case, we have two variants algorithms for the
+resolution of the 3D-Poisson problem: (1) using the classical GMRES; (2) and the Multisplitting method. In addition, the Simgrid simulator has been chosen to simulate the behaviors of the
+distributed applications. Simgrid is running on the Mesocentre datacenter in the University of Franche-Comte and also in a virtual machine on a laptop. \\
+
+\textbf{Step 3}: Fix the criteria which will be used for the future
+results comparison and analysis. In the scope of this study, we retain
+on the one hand the algorithm execution mode (synchronous and asynchronous)
+and on the other hand the execution time and the number of iterations to reach the convergence. \\
+
+\textbf{Step 4 }: Set up the different grid testbed environments that will be
+simulated in the simulator tool to run the program. The following architecture
+has been configured in Simgrid : 2x16, 4x8, 4x16, 8x8 and 2x50. The first number
+represents the number of clusters in the grid and the second number represents
+the number of hosts (processors/cores) in each cluster. The network has been
+designed to operate with a bandwidth equals to 10Gbits (resp. 1Gbits/s) and a
+latency of 8.10$^{-6}$ seconds (resp. 5.10$^{-5}$) for the intra-clusters links
+(resp. inter-clusters backbone links). \\
+
+\textbf{Step 5}: Conduct an extensive and comprehensive testings
+within these configurations by varying the key parameters, especially
+the CPU power capacity, the network parameters and also the size of the
+input data. \\