of asynchronous iterative algorithms. For that, we compare the behaviour of a
synchronous GMRES algorithm with an asynchronous multisplitting one with
simulations in which we choose some parameters. Both codes are real MPI
-codes. Experiments allow us to see when the multisplitting algorithm can be more
+codes. Simulations allow us to see when the multisplitting algorithm can be more
efficient than the GMRES one to solve a 3D Poisson problem.
increasing complexity of these requested applications combined with a continuous increase of their sizes lead to write
distributed and parallel algorithms requiring significant hardware resources (grid computing, clusters, broadband
network, etc.) but also a non-negligible CPU execution time. We consider in this paper a class of highly efficient
-parallel algorithms called \emph{numerical iterative algorithms} executed in a distributed environment. As their name
+parallel algorithms called \emph{iterative algorithms} executed in a distributed environment. As their name
suggests, these algorithms solve a given problem by successive iterations ($X_{n +1} = f(X_{n})$) from an initial value
$X_{0}$ to find an approximate value $X^*$ of the solution with a very low residual error. Several well-known methods
demonstrate the convergence of these algorithms~\cite{BT89,Bahi07}.
synchronous case, AIAC algorithms can significantly reduce overall execution times by suppressing idle times due to
synchronizations especially in a grid computing context (see~\cite{Bahi07} for more details).
-Parallel numerical applications (synchronous or asynchronous) may have different
-configuration and deployment requirements. Quantifying their resource
-allocation policies and application scheduling algorithms in grid computing
-environments under varying load, CPU power and network speeds is very costly,
-very labor intensive and very time
-consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC
-algorithms is even more problematic since they are very sensible to the
+Parallel (synchronous or asynchronous) applications may have different
+configuration and deployment requirements. Quantifying their resource
+allocation policies and application scheduling algorithms in grid computing
+environments under varying load, CPU power and network speeds is very costly,
+very labor intensive and very time
+consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC
+algorithms is even more problematic since they are very sensible to the
execution environment context. For instance, variations in the network bandwidth
-(intra and inter-clusters), in the number and the power of nodes, in the number
-of clusters\dots{} can lead to very different number of iterations and so to
-very different execution times. Then, it appears that the use of simulation
-tools to explore various platform scenarios and to run large numbers of
-experiments quickly can be very promising. In this way, the use of a simulation
-environment to execute parallel iterative algorithms found some interests in
-reducing the highly cost of access to computing resources: (1) for the
-applications development life cycle and in code debugging (2) and in production
-to get results in a reasonable execution time with a simulated infrastructure
-not accessible with physical resources. Indeed, the launch of distributed
-iterative asynchronous algorithms to solve a given problem on a large-scale
-simulated environment challenges to find optimal configurations giving the best
+(intra and inter-clusters), in the number and the power of nodes, in the number
+of clusters\dots{} can lead to very different number of iterations and so to
+very different execution times. Then, it appears that the use of simulation
+tools to explore various platform scenarios and to run large numbers of
+experiments quickly can be very promising. In this way, the use of a simulation
+environment to execute parallel iterative algorithms found some interests in
+reducing the highly cost of access to computing resources: (1) for the
+applications development life cycle and in code debugging (2) and in production
+to get results in a reasonable execution time with a simulated infrastructure
+not accessible with physical resources. Indeed, the launch of distributed
+iterative asynchronous algorithms to solve a given problem on a large-scale
+simulated environment challenges to find optimal configurations giving the best
results with a lowest residual error and in the best of execution time.
-To our knowledge, there is no existing work on the large-scale simulation of a
-real AIAC application. The aim of this paper is twofold. First we give a first
-approach of the simulation of AIAC algorithms using a simulation tool (i.e. the
-SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of
-asynchronous mode algorithms by comparing their performance with the synchronous
-mode. More precisely, we had implemented a program for solving large
-linear system of equations by numerical method GMRES (Generalized
-Minimal Residual) \cite{ref1}. We show, that with minor modifications of the
-initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a
-real AIAC application on different computing architectures. The simulated
-results we obtained are in line with real results exposed in ??\AG[]{ref?}.
-SimGrid had allowed us to launch the application from a modest computing
-infrastructure by simulating different distributed architectures composed by
-clusters nodes interconnected by variable speed networks. With selected
-parameters on the network platforms (bandwidth, latency of inter cluster
-network) and on the clusters architecture (number, capacity calculation power)
-in the simulated environment, the experimental results have demonstrated not
-only the algorithm convergence within a reasonable time compared with the
-physical environment performance, but also a time saving of up to \np[\%]{40} in
-asynchronous mode.
-\AG{Il faudrait revoir la phrase précédente (couper en deux?). Là, on peut
- avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle
- et une exécution simulée!}
-
-This article is structured as follows: after this introduction, the next section will give a brief description of
-iterative asynchronous model. Then, the simulation framework SimGrid is presented with the settings to create various
-distributed architectures. The algorithm of the multisplitting method used by GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} written with MPI primitives and
-its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results
-carried out will be presented before some concluding remarks and future works.
+To our knowledge, there is no existing work on the large-scale simulation of a
+real AIAC application. {\bf The contribution of the present paper can be
+ summarised in two main points}. First we give a first approach of the
+simulation of AIAC algorithms using a simulation tool (i.e. the SimGrid
+toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the
+asynchronous multisplitting algorithm by comparing its performance with the
+synchronous GMRES (Generalized Minimal Residual) \cite{ref1}. Both these codes
+can be used to solve large linear systems. In this paper, we focus on a 3D
+Poisson problem. We show, that with minor modifications of the initial MPI
+code, the SimGrid toolkit allows us to perform a test campaign of a real AIAC
+application on different computing architectures.
+% The simulated results we
+%obtained are in line with real results exposed in ??\AG[]{ref?}.
+SimGrid had allowed us to launch the application from a modest computing
+infrastructure by simulating different distributed architectures composed by
+clusters nodes interconnected by variable speed networks. Parameters of the
+network platforms are the bandwidth and the latency of inter cluster
+network. Parameters on the cluster's architecture are the number of machines and
+the computation power of a machine. Simulations show that the asynchronous
+multisplitting algorithm can solve the 3D Poisson problem approximately twice
+faster than GMRES with two distant clusters.
+
+
+
+This article is structured as follows: after this introduction, the next section
+will give a brief description of iterative asynchronous model. Then, the
+simulation framework SimGrid is presented with the settings to create various
+distributed architectures. Then, the multisplitting method is presented, it is
+based on GMRES to solve each block obtained of the splitting. This code is
+written with MPI primitives and its adaptation to SimGrid with SMPI (Simulated
+MPI) is detailed in the next section. At last, the simulation results carried
+out will be presented before some concluding remarks and future works.
\section{Motivations and scientific context}
-As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be
-classified in three main classes depending on how iterations and communications are managed (for more details readers
-can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~-- Synchronous Communications (SISC)} model data
-are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and
-important idle times on processors are generated. The \textit{Synchronous Iterations~-- Asynchronous Communications
-(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously
-i.e. without stopping current computations. This technique allows to partially overlap communications by computations
-but unfortunately, the overlapping is only partial and important idle times remain. It is clear that, in a grid
-computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle
-times generated by synchronizations are very penalizing. One way to overcome this problem is to use the
-\textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} model. Here, local computations do not need to
-wait for required data. Processors can then perform their iterations with the data present at that time. Figure~\ref{fig:aiac}
-illustrates this model where the gray blocks represent the computation phases, the white spaces the idle
-times and the arrows the communications.
-\AG{There are no ``white spaces'' on the figure.}
-With this algorithmic model, the number of iterations required before the
-convergence is generally greater than for the two former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC
-algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially
-in a grid computing context.\LZK{Répétition par rapport à l'intro}
+As exposed in the introduction, parallel iterative methods are now widely used
+in many scientific domains. They can be classified in three main classes
+depending on how iterations and communications are managed (for more details
+readers can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~--
+ Synchronous Communications (SISC)} model data are exchanged at the end of each
+iteration. All the processors must begin the same iteration at the same time and
+important idle times on processors are generated. The \textit{Synchronous
+ Iterations~-- Asynchronous Communications (SIAC)} model can be compared to the
+previous one except that data required on another processor are sent
+asynchronously i.e. without stopping current computations. This technique
+allows to partially overlap communications by computations but unfortunately,
+the overlapping is only partial and important idle times remain. It is clear
+that, in a grid computing context, where the number of computational nodes is
+large, heterogeneous and widely distributed, the idle times generated by
+synchronizations are very penalizing. One way to overcome this problem is to use
+the \textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)}
+model. Here, local computations do not need to wait for required
+data. Processors can then perform their iterations with the data present at that
+time. Figure~\ref{fig:aiac} illustrates this model where the gray blocks
+represent the computation phases. With this algorithmic model, the number of
+iterations required before the convergence is generally greater than for the two
+former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC algorithms can
+significantly reduce overall execution times by suppressing idle times due to
+synchronizations especially in a grid computing context.
+%\LZK{Répétition par rapport à l'intro}
\begin{figure}[!t]
\centering
\label{fig:aiac}
\end{figure}
+\RC{Je serais partant de virer AIAC et laisser asynchronous algorithms... à voir}
+
+%% It is very challenging to develop efficient applications for large scale,
+%% heterogeneous and distributed platforms such as computing grids. Researchers and
+%% engineers have to develop techniques for maximizing application performance of
+%% these multi-cluster platforms, by redesigning the applications and/or by using
+%% novel algorithms that can account for the composite and heterogeneous nature of
+%% the platform. Unfortunately, the deployment of such applications on these very
+%% large scale systems is very costly, labor intensive and time consuming. In this
+%% context, it appears that the use of simulation tools to explore various platform
+%% scenarios at will and to run enormous numbers of experiments quickly can be very
+%% promising. Several works\dots{}
+
+%% \AG{Several works\dots{} what?\\
+% Le paragraphe suivant se trouve déjà dans l'intro ?}
+In the context of asynchronous algorithms, the number of iterations to reach the
+convergence depends on the delay of messages. With synchronous iterations, the
+number of iterations is exactly the same than in the sequential mode (if the
+parallelization process does not change the algorithm). So the difficulty with
+asynchronous algorithms comes from the fact it is necessary to run the algorithm
+with real data. In fact, from an execution to another the order of messages will
+change and the number of iterations to reach the convergence will also change.
+According to all the parameters of the platform (number of nodes, power of
+nodes, inter and intra clusrters bandwith and latency, ....) and of the
+algorithm (number of splitting with the multisplitting algorithm), the
+multisplitting code will obtain the solution more or less quickly. Or course,
+the GMRES method also depends of the same parameters. As it is difficult to have
+access to many clusters, grids or supercomputers with many different network
+parameters, it is interesting to be able to simulate the behaviors of
+asynchronous iterative algoritms before being able to runs real experiments.
+
-It is very challenging to develop efficient applications for large scale,
-heterogeneous and distributed platforms such as computing grids. Researchers and
-engineers have to develop techniques for maximizing application performance of
-these multi-cluster platforms, by redesigning the applications and/or by using
-novel algorithms that can account for the composite and heterogeneous nature of
-the platform. Unfortunately, the deployment of such applications on these very
-large scale systems is very costly, labor intensive and time consuming. In this
-context, it appears that the use of simulation tools to explore various platform
-scenarios at will and to run enormous numbers of experiments quickly can be very
-promising. Several works\dots{}
-
-\AG{Several works\dots{} what?\\
- Le paragraphe suivant se trouve déjà dans l'intro ?}
-In the context of AIAC algorithms, the use of simulation tools is even more
-relevant. Indeed, this class of applications is very sensible to the execution
-environment context. For instance, variations in the network bandwidth (intra
-and inter-clusters), in the number and the power of nodes, in the number of
-clusters\dots{} can lead to very different number of iterations and so to very
-different execution times.