X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/b9756213bfa3fbf7c5667385f5070b003b9bf0b3..c9066e476a4a5b2bdba5c2eb18ad0aa2ad1d2bb3:/hpcc.tex diff --git a/hpcc.tex b/hpcc.tex index abcf399..e043c78 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -82,7 +82,7 @@ paper, we show that it is interesting to use SimGrid to simulate the behaviors of asynchronous iterative algorithms. For that, we compare the behaviour of a synchronous GMRES algorithm with an asynchronous multisplitting one with simulations in which we choose some parameters. Both codes are real MPI -codes. Experiments allow us to see when the multisplitting algorithm can be more +codes. Simulations allow us to see when the multisplitting algorithm can be more efficient than the GMRES one to solve a 3D Poisson problem. @@ -135,57 +135,66 @@ iterative asynchronous algorithms to solve a given problem on a large-scale simulated environment challenges to find optimal configurations giving the best results with a lowest residual error and in the best of execution time. -To our knowledge, there is no existing work on the large-scale simulation of a -real AIAC application. There are {\bf two contributions} in this paper. First we give a first -approach of the simulation of AIAC algorithms using a simulation tool (i.e. the -SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the -asynchronous multisplitting algorithm by comparing its performance with the synchronous -GMRES. More precisely, we had implemented a program for solving large -linear system of equations by numerical method GMRES (Generalized -Minimal Residual) \cite{ref1}. We show, that with minor modifications of the -initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a -real AIAC application on different computing architectures. The simulated -results we obtained are in line with real results exposed in ??\AG[]{ref?}. -SimGrid had allowed us to launch the application from a modest computing -infrastructure by simulating different distributed architectures composed by -clusters nodes interconnected by variable speed networks. With selected -parameters on the network platforms (bandwidth, latency of inter cluster -network) and on the clusters architecture (number, capacity calculation power) -in the simulated environment, the experimental results have demonstrated not -only the algorithm convergence within a reasonable time compared with the -physical environment performance, but also a time saving of up to \np[\%]{40} in -asynchronous mode. -\AG{Il faudrait revoir la phrase précédente (couper en deux?). Là, on peut - avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle - et une exécution simulée!} - -This article is structured as follows: after this introduction, the next section will give a brief description of -iterative asynchronous model. Then, the simulation framework SimGrid is presented with the settings to create various -distributed architectures. The algorithm of the multisplitting method used by GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} written with MPI primitives and -its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results -carried out will be presented before some concluding remarks and future works. +To our knowledge, there is no existing work on the large-scale simulation of a +real AIAC application. {\bf The contribution of the present paper can be + summarised in two main points}. First we give a first approach of the +simulation of AIAC algorithms using a simulation tool (i.e. the SimGrid +toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the +asynchronous multisplitting algorithm by comparing its performance with the +synchronous GMRES (Generalized Minimal Residual) \cite{ref1}. Both these codes +can be used to solve large linear systems. In this paper, we focus on a 3D +Poisson problem. We show, that with minor modifications of the initial MPI +code, the SimGrid toolkit allows us to perform a test campaign of a real AIAC +application on different computing architectures. +% The simulated results we +%obtained are in line with real results exposed in ??\AG[]{ref?}. +SimGrid had allowed us to launch the application from a modest computing +infrastructure by simulating different distributed architectures composed by +clusters nodes interconnected by variable speed networks. Parameters of the +network platforms are the bandwidth and the latency of inter cluster +network. Parameters on the cluster's architecture are the number of machines and +the computation power of a machine. Simulations show that the asynchronous +multisplitting algorithm can solve the 3D Poisson problem approximately twice +faster than GMRES with two distant clusters. + + + +This article is structured as follows: after this introduction, the next section +will give a brief description of iterative asynchronous model. Then, the +simulation framework SimGrid is presented with the settings to create various +distributed architectures. Then, the multisplitting method is presented, it is +based on GMRES to solve each block obtained of the splitting. This code is +written with MPI primitives and its adaptation to SimGrid with SMPI (Simulated +MPI) is detailed in the next section. At last, the simulation results carried +out will be presented before some concluding remarks and future works. \section{Motivations and scientific context} -As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be -classified in three main classes depending on how iterations and communications are managed (for more details readers -can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~-- Synchronous Communications (SISC)} model data -are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and -important idle times on processors are generated. The \textit{Synchronous Iterations~-- Asynchronous Communications -(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously -i.e. without stopping current computations. This technique allows to partially overlap communications by computations -but unfortunately, the overlapping is only partial and important idle times remain. It is clear that, in a grid -computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle -times generated by synchronizations are very penalizing. One way to overcome this problem is to use the -\textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} model. Here, local computations do not need to -wait for required data. Processors can then perform their iterations with the data present at that time. Figure~\ref{fig:aiac} -illustrates this model where the gray blocks represent the computation phases, the white spaces the idle -times and the arrows the communications. -\AG{There are no ``white spaces'' on the figure.} -With this algorithmic model, the number of iterations required before the -convergence is generally greater than for the two former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC -algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially -in a grid computing context.\LZK{Répétition par rapport à l'intro} +As exposed in the introduction, parallel iterative methods are now widely used +in many scientific domains. They can be classified in three main classes +depending on how iterations and communications are managed (for more details +readers can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~-- + Synchronous Communications (SISC)} model data are exchanged at the end of each +iteration. All the processors must begin the same iteration at the same time and +important idle times on processors are generated. The \textit{Synchronous + Iterations~-- Asynchronous Communications (SIAC)} model can be compared to the +previous one except that data required on another processor are sent +asynchronously i.e. without stopping current computations. This technique +allows to partially overlap communications by computations but unfortunately, +the overlapping is only partial and important idle times remain. It is clear +that, in a grid computing context, where the number of computational nodes is +large, heterogeneous and widely distributed, the idle times generated by +synchronizations are very penalizing. One way to overcome this problem is to use +the \textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} +model. Here, local computations do not need to wait for required +data. Processors can then perform their iterations with the data present at that +time. Figure~\ref{fig:aiac} illustrates this model where the gray blocks +represent the computation phases. With this algorithmic model, the number of +iterations required before the convergence is generally greater than for the two +former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC algorithms can +significantly reduce overall execution times by suppressing idle times due to +synchronizations especially in a grid computing context. +%\LZK{Répétition par rapport à l'intro} \begin{figure}[!t] \centering @@ -194,26 +203,38 @@ in a grid computing context.\LZK{Répétition par rapport à l'intro} \label{fig:aiac} \end{figure} +\RC{Je serais partant de virer AIAC et laisser asynchronous algorithms... à voir} + +%% It is very challenging to develop efficient applications for large scale, +%% heterogeneous and distributed platforms such as computing grids. Researchers and +%% engineers have to develop techniques for maximizing application performance of +%% these multi-cluster platforms, by redesigning the applications and/or by using +%% novel algorithms that can account for the composite and heterogeneous nature of +%% the platform. Unfortunately, the deployment of such applications on these very +%% large scale systems is very costly, labor intensive and time consuming. In this +%% context, it appears that the use of simulation tools to explore various platform +%% scenarios at will and to run enormous numbers of experiments quickly can be very +%% promising. Several works\dots{} + +%% \AG{Several works\dots{} what?\\ +% Le paragraphe suivant se trouve déjà dans l'intro ?} +In the context of asynchronous algorithms, the number of iterations to reach the +convergence depends on the delay of messages. With synchronous iterations, the +number of iterations is exactly the same than in the sequential mode (if the +parallelization process does not change the algorithm). So the difficulty with +asynchronous algorithms comes from the fact it is necessary to run the algorithm +with real data. In fact, from an execution to another the order of messages will +change and the number of iterations to reach the convergence will also change. +According to all the parameters of the platform (number of nodes, power of +nodes, inter and intra clusrters bandwith and latency, ....) and of the +algorithm (number of splitting with the multisplitting algorithm), the +multisplitting code will obtain the solution more or less quickly. Or course, +the GMRES method also depends of the same parameters. As it is difficult to have +access to many clusters, grids or supercomputers with many different network +parameters, it is interesting to be able to simulate the behaviors of +asynchronous iterative algoritms before being able to runs real experiments. + -It is very challenging to develop efficient applications for large scale, -heterogeneous and distributed platforms such as computing grids. Researchers and -engineers have to develop techniques for maximizing application performance of -these multi-cluster platforms, by redesigning the applications and/or by using -novel algorithms that can account for the composite and heterogeneous nature of -the platform. Unfortunately, the deployment of such applications on these very -large scale systems is very costly, labor intensive and time consuming. In this -context, it appears that the use of simulation tools to explore various platform -scenarios at will and to run enormous numbers of experiments quickly can be very -promising. Several works\dots{} - -\AG{Several works\dots{} what?\\ - Le paragraphe suivant se trouve déjà dans l'intro ?} -In the context of AIAC algorithms, the use of simulation tools is even more -relevant. Indeed, this class of applications is very sensible to the execution -environment context. For instance, variations in the network bandwidth (intra -and inter-clusters), in the number and the power of nodes, in the number of -clusters\dots{} can lead to very different number of iterations and so to very -different execution times.