X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/201b9e3feb0d4318510a67c834f9645f04c4e8d0..c9066e476a4a5b2bdba5c2eb18ad0aa2ad1d2bb3:/hpcc.tex?ds=sidebyside diff --git a/hpcc.tex b/hpcc.tex index dabef52..e043c78 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -82,7 +82,7 @@ paper, we show that it is interesting to use SimGrid to simulate the behaviors of asynchronous iterative algorithms. For that, we compare the behaviour of a synchronous GMRES algorithm with an asynchronous multisplitting one with simulations in which we choose some parameters. Both codes are real MPI -codes. Experiments allow us to see when the multisplitting algorithm can be more +codes. Simulations allow us to see when the multisplitting algorithm can be more efficient than the GMRES one to solve a 3D Poisson problem. @@ -97,7 +97,7 @@ problems raised by researchers on various scientific disciplines but also by in increasing complexity of these requested applications combined with a continuous increase of their sizes lead to write distributed and parallel algorithms requiring significant hardware resources (grid computing, clusters, broadband network, etc.) but also a non-negligible CPU execution time. We consider in this paper a class of highly efficient -parallel algorithms called \emph{numerical iterative algorithms} executed in a distributed environment. As their name +parallel algorithms called \emph{iterative algorithms} executed in a distributed environment. As their name suggests, these algorithms solve a given problem by successive iterations ($X_{n +1} = f(X_{n})$) from an initial value $X_{0}$ to find an approximate value $X^*$ of the solution with a very low residual error. Several well-known methods demonstrate the convergence of these algorithms~\cite{BT89,Bahi07}. @@ -113,79 +113,88 @@ at that time. Even if the number of iterations required before the convergence i synchronous case, AIAC algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially in a grid computing context (see~\cite{Bahi07} for more details). -Parallel numerical applications (synchronous or asynchronous) may have different -configuration and deployment requirements. Quantifying their resource -allocation policies and application scheduling algorithms in grid computing -environments under varying load, CPU power and network speeds is very costly, -very labor intensive and very time -consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC -algorithms is even more problematic since they are very sensible to the +Parallel (synchronous or asynchronous) applications may have different +configuration and deployment requirements. Quantifying their resource +allocation policies and application scheduling algorithms in grid computing +environments under varying load, CPU power and network speeds is very costly, +very labor intensive and very time +consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC +algorithms is even more problematic since they are very sensible to the execution environment context. For instance, variations in the network bandwidth -(intra and inter-clusters), in the number and the power of nodes, in the number -of clusters\dots{} can lead to very different number of iterations and so to -very different execution times. Then, it appears that the use of simulation -tools to explore various platform scenarios and to run large numbers of -experiments quickly can be very promising. In this way, the use of a simulation -environment to execute parallel iterative algorithms found some interests in -reducing the highly cost of access to computing resources: (1) for the -applications development life cycle and in code debugging (2) and in production -to get results in a reasonable execution time with a simulated infrastructure -not accessible with physical resources. Indeed, the launch of distributed -iterative asynchronous algorithms to solve a given problem on a large-scale -simulated environment challenges to find optimal configurations giving the best +(intra and inter-clusters), in the number and the power of nodes, in the number +of clusters\dots{} can lead to very different number of iterations and so to +very different execution times. Then, it appears that the use of simulation +tools to explore various platform scenarios and to run large numbers of +experiments quickly can be very promising. In this way, the use of a simulation +environment to execute parallel iterative algorithms found some interests in +reducing the highly cost of access to computing resources: (1) for the +applications development life cycle and in code debugging (2) and in production +to get results in a reasonable execution time with a simulated infrastructure +not accessible with physical resources. Indeed, the launch of distributed +iterative asynchronous algorithms to solve a given problem on a large-scale +simulated environment challenges to find optimal configurations giving the best results with a lowest residual error and in the best of execution time. -To our knowledge, there is no existing work on the large-scale simulation of a -real AIAC application. The aim of this paper is twofold. First we give a first -approach of the simulation of AIAC algorithms using a simulation tool (i.e. the -SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of -asynchronous mode algorithms by comparing their performance with the synchronous -mode. More precisely, we had implemented a program for solving large -linear system of equations by numerical method GMRES (Generalized -Minimal Residual) \cite{ref1}. We show, that with minor modifications of the -initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a -real AIAC application on different computing architectures. The simulated -results we obtained are in line with real results exposed in ??\AG[]{ref?}. -SimGrid had allowed us to launch the application from a modest computing -infrastructure by simulating different distributed architectures composed by -clusters nodes interconnected by variable speed networks. With selected -parameters on the network platforms (bandwidth, latency of inter cluster -network) and on the clusters architecture (number, capacity calculation power) -in the simulated environment, the experimental results have demonstrated not -only the algorithm convergence within a reasonable time compared with the -physical environment performance, but also a time saving of up to \np[\%]{40} in -asynchronous mode. -\AG{Il faudrait revoir la phrase précédente (couper en deux?). Là, on peut - avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle - et une exécution simulée!} - -This article is structured as follows: after this introduction, the next section will give a brief description of -iterative asynchronous model. Then, the simulation framework SimGrid is presented with the settings to create various -distributed architectures. The algorithm of the multisplitting method used by GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} written with MPI primitives and -its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results -carried out will be presented before some concluding remarks and future works. +To our knowledge, there is no existing work on the large-scale simulation of a +real AIAC application. {\bf The contribution of the present paper can be + summarised in two main points}. First we give a first approach of the +simulation of AIAC algorithms using a simulation tool (i.e. the SimGrid +toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the +asynchronous multisplitting algorithm by comparing its performance with the +synchronous GMRES (Generalized Minimal Residual) \cite{ref1}. Both these codes +can be used to solve large linear systems. In this paper, we focus on a 3D +Poisson problem. We show, that with minor modifications of the initial MPI +code, the SimGrid toolkit allows us to perform a test campaign of a real AIAC +application on different computing architectures. +% The simulated results we +%obtained are in line with real results exposed in ??\AG[]{ref?}. +SimGrid had allowed us to launch the application from a modest computing +infrastructure by simulating different distributed architectures composed by +clusters nodes interconnected by variable speed networks. Parameters of the +network platforms are the bandwidth and the latency of inter cluster +network. Parameters on the cluster's architecture are the number of machines and +the computation power of a machine. Simulations show that the asynchronous +multisplitting algorithm can solve the 3D Poisson problem approximately twice +faster than GMRES with two distant clusters. + + + +This article is structured as follows: after this introduction, the next section +will give a brief description of iterative asynchronous model. Then, the +simulation framework SimGrid is presented with the settings to create various +distributed architectures. Then, the multisplitting method is presented, it is +based on GMRES to solve each block obtained of the splitting. This code is +written with MPI primitives and its adaptation to SimGrid with SMPI (Simulated +MPI) is detailed in the next section. At last, the simulation results carried +out will be presented before some concluding remarks and future works. \section{Motivations and scientific context} -As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be -classified in three main classes depending on how iterations and communications are managed (for more details readers -can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~-- Synchronous Communications (SISC)} model data -are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and -important idle times on processors are generated. The \textit{Synchronous Iterations~-- Asynchronous Communications -(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously -i.e. without stopping current computations. This technique allows to partially overlap communications by computations -but unfortunately, the overlapping is only partial and important idle times remain. It is clear that, in a grid -computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle -times generated by synchronizations are very penalizing. One way to overcome this problem is to use the -\textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} model. Here, local computations do not need to -wait for required data. Processors can then perform their iterations with the data present at that time. Figure~\ref{fig:aiac} -illustrates this model where the gray blocks represent the computation phases, the white spaces the idle -times and the arrows the communications. -\AG{There are no ``white spaces'' on the figure.} -With this algorithmic model, the number of iterations required before the -convergence is generally greater than for the two former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC -algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially -in a grid computing context.\LZK{Répétition par rapport à l'intro} +As exposed in the introduction, parallel iterative methods are now widely used +in many scientific domains. They can be classified in three main classes +depending on how iterations and communications are managed (for more details +readers can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~-- + Synchronous Communications (SISC)} model data are exchanged at the end of each +iteration. All the processors must begin the same iteration at the same time and +important idle times on processors are generated. The \textit{Synchronous + Iterations~-- Asynchronous Communications (SIAC)} model can be compared to the +previous one except that data required on another processor are sent +asynchronously i.e. without stopping current computations. This technique +allows to partially overlap communications by computations but unfortunately, +the overlapping is only partial and important idle times remain. It is clear +that, in a grid computing context, where the number of computational nodes is +large, heterogeneous and widely distributed, the idle times generated by +synchronizations are very penalizing. One way to overcome this problem is to use +the \textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} +model. Here, local computations do not need to wait for required +data. Processors can then perform their iterations with the data present at that +time. Figure~\ref{fig:aiac} illustrates this model where the gray blocks +represent the computation phases. With this algorithmic model, the number of +iterations required before the convergence is generally greater than for the two +former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC algorithms can +significantly reduce overall execution times by suppressing idle times due to +synchronizations especially in a grid computing context. +%\LZK{Répétition par rapport à l'intro} \begin{figure}[!t] \centering @@ -194,26 +203,38 @@ in a grid computing context.\LZK{Répétition par rapport à l'intro} \label{fig:aiac} \end{figure} +\RC{Je serais partant de virer AIAC et laisser asynchronous algorithms... à voir} + +%% It is very challenging to develop efficient applications for large scale, +%% heterogeneous and distributed platforms such as computing grids. Researchers and +%% engineers have to develop techniques for maximizing application performance of +%% these multi-cluster platforms, by redesigning the applications and/or by using +%% novel algorithms that can account for the composite and heterogeneous nature of +%% the platform. Unfortunately, the deployment of such applications on these very +%% large scale systems is very costly, labor intensive and time consuming. In this +%% context, it appears that the use of simulation tools to explore various platform +%% scenarios at will and to run enormous numbers of experiments quickly can be very +%% promising. Several works\dots{} + +%% \AG{Several works\dots{} what?\\ +% Le paragraphe suivant se trouve déjà dans l'intro ?} +In the context of asynchronous algorithms, the number of iterations to reach the +convergence depends on the delay of messages. With synchronous iterations, the +number of iterations is exactly the same than in the sequential mode (if the +parallelization process does not change the algorithm). So the difficulty with +asynchronous algorithms comes from the fact it is necessary to run the algorithm +with real data. In fact, from an execution to another the order of messages will +change and the number of iterations to reach the convergence will also change. +According to all the parameters of the platform (number of nodes, power of +nodes, inter and intra clusrters bandwith and latency, ....) and of the +algorithm (number of splitting with the multisplitting algorithm), the +multisplitting code will obtain the solution more or less quickly. Or course, +the GMRES method also depends of the same parameters. As it is difficult to have +access to many clusters, grids or supercomputers with many different network +parameters, it is interesting to be able to simulate the behaviors of +asynchronous iterative algoritms before being able to runs real experiments. + -It is very challenging to develop efficient applications for large scale, -heterogeneous and distributed platforms such as computing grids. Researchers and -engineers have to develop techniques for maximizing application performance of -these multi-cluster platforms, by redesigning the applications and/or by using -novel algorithms that can account for the composite and heterogeneous nature of -the platform. Unfortunately, the deployment of such applications on these very -large scale systems is very costly, labor intensive and time consuming. In this -context, it appears that the use of simulation tools to explore various platform -scenarios at will and to run enormous numbers of experiments quickly can be very -promising. Several works\dots{} - -\AG{Several works\dots{} what?\\ - Le paragraphe suivant se trouve déjà dans l'intro ?} -In the context of AIAC algorithms, the use of simulation tools is even more -relevant. Indeed, this class of applications is very sensible to the execution -environment context. For instance, variations in the network bandwidth (intra -and inter-clusters), in the number and the power of nodes, in the number of -clusters\dots{} can lead to very different number of iterations and so to very -different execution times.