-Parallelization of such algorithms generally involve the division of the problem into several \emph{blocks} that will
-be solved in parallel on multiple processing units. The latter will communicate each intermediate results before a new
-iteration starts and until the approximate solution is reached. These parallel computations can be performed either in
-\emph{synchronous} mode where a new iteration begins only when all nodes communications are completed,
-or in \emph{asynchronous} mode where processors can continue independently with few or no synchronization points. For
-instance in the \textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} model~\cite{bcvc06:ij}, local
-computations do not need to wait for required data. Processors can then perform their iterations with the data present
-at that time. Even if the number of iterations required before the convergence is generally greater than for the
-synchronous case, AIAC algorithms can significantly reduce overall execution times by suppressing idle times due to
-synchronizations especially in a grid computing context (see~\cite{Bahi07} for more details).
-
-Parallel (synchronous or asynchronous) applications may have different
-configuration and deployment requirements. Quantifying their resource
-allocation policies and application scheduling algorithms in grid computing
-environments under varying load, CPU power and network speeds is very costly,
-very labor intensive and very time
-consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC
-algorithms is even more problematic since they are very sensible to the
-execution environment context. For instance, variations in the network bandwidth
-(intra and inter-clusters), in the number and the power of nodes, in the number
-of clusters\dots{} can lead to very different number of iterations and so to
-very different execution times. Then, it appears that the use of simulation
-tools to explore various platform scenarios and to run large numbers of
-experiments quickly can be very promising. In this way, the use of a simulation
-environment to execute parallel iterative algorithms found some interests in
-reducing the highly cost of access to computing resources: (1) for the
-applications development life cycle and in code debugging (2) and in production
-to get results in a reasonable execution time with a simulated infrastructure
-not accessible with physical resources. Indeed, the launch of distributed
-iterative asynchronous algorithms to solve a given problem on a large-scale
-simulated environment challenges to find optimal configurations giving the best
-results with a lowest residual error and in the best of execution time.
-
-To our knowledge, there is no existing work on the large-scale simulation of a
-real AIAC application. There are {\bf two contributions} in this paper. First we give a first
-approach of the simulation of AIAC algorithms using a simulation tool (i.e. the
-SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the
-asynchronous multisplitting algorithm by comparing its performance with the synchronous
-GMRES. More precisely, we had implemented a program for solving large
-linear system of equations by numerical method GMRES (Generalized
-Minimal Residual) \cite{ref1}. We show, that with minor modifications of the
-initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a
-real AIAC application on different computing architectures. The simulated
-results we obtained are in line with real results exposed in ??\AG[]{ref?}.
-SimGrid had allowed us to launch the application from a modest computing
-infrastructure by simulating different distributed architectures composed by
-clusters nodes interconnected by variable speed networks. With selected
-parameters on the network platforms (bandwidth, latency of inter cluster
-network) and on the clusters architecture (number, capacity calculation power)
-in the simulated environment, the experimental results have demonstrated not
-only the algorithm convergence within a reasonable time compared with the
-physical environment performance, but also a time saving of up to \np[\%]{40} in
-asynchronous mode.
-\AG{Il faudrait revoir la phrase précédente (couper en deux?). Là, on peut
- avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle
- et une exécution simulée!}
-
-This article is structured as follows: after this introduction, the next section will give a brief description of
-iterative asynchronous model. Then, the simulation framework SimGrid is presented with the settings to create various
-distributed architectures. The algorithm of the multisplitting method used by GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} written with MPI primitives and
-its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results
-carried out will be presented before some concluding remarks and future works.
+Parallelization of such algorithms generally involves the division of the problem
+into several \emph{blocks} that will be solved in parallel on multiple
+processing units. The latter will communicate each intermediate results before a
+new iteration starts and until the approximate solution is reached. These
+parallel computations can be performed either in \emph{synchronous} mode where a
+new iteration begins only when all nodes communications are completed, or in
+\emph{asynchronous} mode where processors can continue independently with no
+synchronization points~\cite{bcvc06:ij}. In this case, local computations do not
+need to wait for required data. Processors can then perform their iterations
+with the data present at that time. Even if the number of iterations required
+before the convergence is generally greater than for the synchronous case,
+asynchronous iterative algorithms can significantly reduce overall execution
+times by suppressing idle times due to synchronizations especially in a grid
+computing context (see~\cite{Bahi07} for more details).
+
+Parallel applications based on a (synchronous or asynchronous) iteration model
+may have different configuration and deployment requirements. Quantifying their
+resource allocation policies and application scheduling algorithms in grid
+computing environments under varying load, CPU power and network speeds is very
+costly, very labor intensive and very time
+consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of asynchronous
+iterative algorithms is even more problematic since they are very sensible to
+the execution environment context. For instance, variations in the network
+bandwidth (intra and inter-clusters), in the number and the power of nodes, in
+the number of clusters\dots{} can lead to very different number of iterations
+and so to very different execution times. Then, it appears that the use of
+simulation tools to explore various platform scenarios and to run large numbers
+of experiments quickly can be very promising. In this way, the use of a
+simulation environment to execute parallel iterative algorithms found some
+interests in reducing the highly cost of access to computing resources: (1) for
+the applications development life cycle and in code debugging (2) and in
+production to get results in a reasonable execution time with a simulated
+infrastructure not accessible with physical resources. Indeed, the launch of
+distributed iterative asynchronous algorithms to solve a given problem on a
+large-scale simulated environment challenges to find optimal configurations
+giving the best results with a lowest residual error and in the best of
+execution time.
+
+
+To our knowledge, there is no existing work on the large-scale simulation of a
+real asynchronous iterative application. {\bf The contribution of the present
+ paper can be summarized in two main points}. First we give a first approach
+of the simulation of asynchronous iterative algorithms using a simulation tool
+(i.e. the SimGrid toolkit~\cite{SimGrid}). Second, we confirm the
+effectiveness of the asynchronous multisplitting algorithm by comparing its
+performance with the synchronous GMRES (Generalized Minimal Residual) method
+\cite{ref1}. Both these codes can be used to solve large linear systems. In
+this paper, we focus on a 3D Poisson problem. We show, that with minor
+modifications of the initial MPI code, the SimGrid toolkit allows us to perform
+a test campaign of a real asynchronous iterative application on different
+computing architectures.
+% The simulated results we
+%obtained are in line with real results exposed in ??\AG[]{ref?}.
+SimGrid had allowed us to launch the application from a modest computing
+infrastructure by simulating different distributed architectures composed by
+clusters nodes interconnected by variable speed networks. Parameters of the
+network platforms are the bandwidth and the latency of inter cluster
+network. Parameters on the cluster's architecture are the number of machines and
+the computation power of a machine. Simulations show that the asynchronous
+multisplitting algorithm can solve the 3D Poisson problem approximately twice
+faster than GMRES with two distant clusters. In this way, we present an original solution to optimize the use of a simulation
+tool to run efficiently an asynchronous iterative parallel algorithm in a grid architecture
+
+
+
+This article is structured as follows: after this introduction, the next section
+will give a brief description of iterative asynchronous model. Then, the
+simulation framework SimGrid is presented with the settings to create various
+distributed architectures. Then, the multisplitting method is presented, it is
+based on GMRES to solve each block obtained of the splitting. This code is
+written with MPI primitives and its adaptation to SimGrid with SMPI (Simulated
+MPI) is detailed in the next section. At last, the simulation results carried
+out will be presented before some concluding remarks and future works.
+