\section{Introduction}
-Parallel computing and high performance computing (HPC) are becoming
-more and more imperative for solving various problems raised by
-researchers on various scientific disciplines but also by industrial in
-the field. Indeed, the increasing complexity of these requested
-applications combined with a continuous increase of their sizes lead to
-write distributed and parallel algorithms requiring significant hardware
-resources (grid computing, clusters, broadband network, etc.) but
-also a non-negligible CPU execution time. We consider in this paper a
-class of highly efficient parallel algorithms called iterative executed
-in a distributed environment. As their name suggests, these algorithm
-solves a given problem that might be NP-complete complex by successive
-iterations ($X_{n +1} = f(X_{n})$) from an initial value $X_{0}$ to find
-an approximate value $X^*$ of the solution with a very low
-residual error. Several well-known methods demonstrate the convergence
-of these algorithms. Generally, to reduce the complexity and the
-execution time, the problem is divided into several \emph{pieces} that will
-be solved in parallel on multiple processing units. The latter will
-communicate each intermediate results before a new iteration starts
-until the approximate solution is reached. These distributed parallel
-computations can be performed either in \emph{synchronous} communication mode
-where a new iteration begin only when all nodes communications are
-completed, either \emph{asynchronous} mode where processors can continue
-independently without or few synchronization points. Despite the
-effectiveness of iterative approach, a major drawback of the method is
-the requirement of huge resources in terms of computing capacity,
-storage and high speed communication network. Indeed, limited physical
-resources are blocking factors for large-scale deployment of parallel
-algorithms.
-
-In recent years, the use of a simulation environment to execute parallel
-iterative algorithms found some interests in reducing the highly cost of
-access to computing resources: (1) for the applications development life
-cycle and in code debugging (2) and in production to get results in a
-reasonable execution time with a simulated infrastructure not accessible
-with physical resources. Indeed, the launch of distributed iterative
-asynchronous algorithms to solve a given problem on a large-scale
-simulated environment challenges to find optimal configurations giving
-the best results with a lowest residual error and in the best of
-execution time. According our knowledge, no testing of large-scale
-simulation of the class of algorithm solving to achieve real results has
-been undertaken to date. We had in the scope of this work implemented a
-program for solving large non-symmetric linear system of equations by
-numerical method GMRES (Generalized Minimal Residual) in the simulation
-environment SimGrid. The simulated platform had allowed us to launch
-the application from a modest computing infrastructure by simulating
-different distributed architectures composed by clusters nodes
-interconnected by variable speed networks. In addition, it has been
-permitted to show the effectiveness of asynchronous mode algorithm by
-comparing its performance with the synchronous mode time. With selected
-parameters on the network platforms (bandwidth, latency of inter cluster
-network) and on the clusters architecture (number, capacity calculation
-power) in the simulated environment, the experimental results have
-demonstrated not only the algorithm convergence within a reasonable time
-compared with the physical environment performance, but also a time
+Parallel computing and high performance computing (HPC) are becoming more and more imperative for solving various
+problems raised by researchers on various scientific disciplines but also by industrial in the field. Indeed, the
+increasing complexity of these requested applications combined with a continuous increase of their sizes lead to write
+distributed and parallel algorithms requiring significant hardware resources (grid computing, clusters, broadband
+network, etc.) but also a non-negligible CPU execution time. We consider in this paper a class of highly efficient
+parallel algorithms called \texttt{numerical iterative algorithms} executed in a distributed environment. As their name
+suggests, these algorithm solves a given problem by successive iterations ($X_{n +1} = f(X_{n})$) from an initial value
+$X_{0}$ to find an approximate value $X^*$ of the solution with a very low residual error. Several well-known methods
+demonstrate the convergence of these algorithms \cite{}.
+
+Parallelization of such algorithms generally involved the division of the problem into several \emph{pieces} that will
+be solved in parallel on multiple processing units. The latter will communicate each intermediate results before a new
+iteration starts until the approximate solution is reached. These parallel computations can be performed
+either in \emph{synchronous} communication mode where a new iteration begin only when all nodes communications are
+completed, either \emph{asynchronous} mode where processors can continue independently without or few synchronization
+points.
+
+% DL : reprendre correction ici
+Despite the effectiveness of iterative approach, a major drawback of the method is the requirement of huge
+resources in terms of computing capacity, storage and high speed communication network. Indeed, limited physical
+resources are blocking factors for large-scale deployment of parallel algorithms.
+
+In recent years, the use of a simulation environment to execute parallel iterative algorithms found some interests in
+reducing the highly cost of access to computing resources: (1) for the applications development life cycle and in code
+debugging (2) and in production to get results in a reasonable execution time with a simulated infrastructure not
+accessible with physical resources. Indeed, the launch of distributed iterative asynchronous algorithms to solve a
+given problem on a large-scale simulated environment challenges to find optimal configurations giving the best results
+with a lowest residual error and in the best of execution time. According our knowledge, no testing of large-scale
+simulation of the class of algorithm solving to achieve real results has been undertaken to date. We had in the scope
+of this work implemented a program for solving large non-symmetric linear system of equations by numerical method
+GMRES (Generalized Minimal Residual) in the simulation environment SimGrid. The simulated platform had allowed us to
+launch the application from a modest computing infrastructure by simulating different distributed architectures
+composed by clusters nodes interconnected by variable speed networks. In addition, it has been permitted to show the
+effectiveness of asynchronous mode algorithm by comparing its performance with the synchronous mode time. With selected
+parameters on the network platforms (bandwidth, latency of inter cluster network) and on the clusters architecture
+(number, capacity calculation power) in the simulated environment, the experimental results have demonstrated not only
+the algorithm convergence within a reasonable time compared with the physical environment performance, but also a time
saving of up to \np[\%]{40} in asynchronous mode.
-This article is structured as follows: after this introduction, the next
-section will give a brief description of iterative asynchronous model.
-Then, the simulation framework SimGrid will be presented with the
-settings to create various distributed architectures. The algorithm of
-the multi-splitting method used by GMRES written with MPI primitives
-and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the
-next section. At last, the experiments results carried out will be
-presented before the conclusion which we will announce the opening of
-our future work after the results.
+This article is structured as follows: after this introduction, the next section will give a brief description of
+iterative asynchronous model. Then, the simulation framework SimGrid will be presented with the settings to create
+various distributed architectures. The algorithm of the multi-splitting method used by GMRES written with MPI
+primitives and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the next section. At last, the experiments
+results carried out will be presented before the conclusion which we will announce the opening of our future work after
+the results.
-\section{The asynchronous iteration model}
-
-As exposed in the introduction, parallel iterative methods are now
-widely used in many scientific domains. They can be classified in three main classes
-depending on how iterations and communications are managed (for more
-details readers can refer to \cite{bcvc02:ip}). In the
-\textit{Synchronous Iterations - Synchronous Communications (SISC)}
-model data are exchanged at the end of each iteration. All the
-processors must begin the same iteration at the same time and
-important idle times on processors are generated. The
-\textit{Synchronous Iterations - Asynchronous Communications (SIAC)}
-model can be compared to the previous one except that data required on
-another processor are sent asynchronously i.e. without stopping
-current computations. This technique allows to partially overlap
-communications by computations but unfortunately, the overlapping is
-only partial and important idle times remain. It is clear that, in a
-grid computing context, where the number of computational nodes is large,
-heterogeneous and widely distributed, the idle times generated by
-synchronizations are very penalizing. One way to overcome this problem
-is to use the \textit{Asynchronous Iterations - Asynchronous
- Communications (AIAC)} model. Here, local computations do not need
-to wait for required data. Processors can then perform their
-iterations with the data present at that time. Figure \ref{fig:aiac}
-illustrates this model where the grey blocks represent the computation
-phases, the white spaces the idle times and the arrows the
-communications. With this algorithmic model, the number of iterations
-required before the convergence is generally greater than for the two
-former classes. But, and as detailed in \cite{bcvc06:ij}, AIAC
-algorithms can significantly reduce overall execution times by
-suppressing idle times due to synchronizations especially in a grid
-computing context.
+\section{Motivations and scientific context}
+
+As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be
+classified in three main classes depending on how iterations and communications are managed (for more details readers
+can refer to \cite{bcvc02:ip}). In the \textit{Synchronous Iterations - Synchronous Communications (SISC)} model data
+are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and
+important idle times on processors are generated. The \textit{Synchronous Iterations - Asynchronous Communications
+(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously
+i.e. without stopping current computations. This technique allows to partially overlap communications by computations
+but unfortunately, the overlapping is only partial and important idle times remain. It is clear that, in a grid
+computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle
+times generated by synchronizations are very penalizing. One way to overcome this problem is to use the
+\textit{Asynchronous Iterations - Asynchronous Communications (AIAC)} model. Here, local computations do not need to
+wait for required data. Processors can then perform their iterations with the data present at that time. Figure
+\ref{fig:aiac} illustrates this model where the grey blocks represent the computation phases, the white spaces the idle
+times and the arrows the communications. With this algorithmic model, the number of iterations required before the
+convergence is generally greater than for the two former classes. But, and as detailed in \cite{bcvc06:ij}, AIAC
+algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially
+in a grid computing context.
\begin{figure}[htbp]
\centering
\end{figure}
+It is very challenging to develop efficient applications for large scale, heterogeneous and distributed platforms such
+as computing grids. Researchers and engineers have to develop techniques for maximizing application performance of these
+multi-cluster platforms, by redesigning the applications and/or by using novel algorithms that can account for the
+composite and heterogeneous nature of the platform. Unfortunately, the deployment of such applications on these very
+large scale systems is very costly, labor intensive and time consuming. In this context, it appears that the use of
+simulation tools to explore various platform scenarios at will and to run enormous numbers of experiments quickly can be
+very promising. Several works...
+
+In the context of AIAC algorithms, the use of simulation tools is even more relevant. Indeed, this class of applications
+is very sensible to the execution environment context. For instance, variations in the network bandwith (intra and
+inter-clusters), in the number and the power of nodes, in the number of clusters... can lead to very different number of
+iterations and so to very different execution times.
+
+
\section{SimGrid}