X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/5b2dbb0b083f40235d97c8e8ee2b5834dd2096cc..d08d0574c6782fe3a320861ccd6ec60a2ab6025e:/hpcc.tex?ds=inline diff --git a/hpcc.tex b/hpcc.tex index 0dccef3..29d00a1 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -45,14 +45,14 @@ \begin{document} -\title{Simulation of Asynchronous Iterative Numerical Algorithms Using SimGrid} +\title{Simulation of Asynchronous Iterative Algorithms Using SimGrid} \author{% \IEEEauthorblockN{% Charles Emile Ramamonjisoa\IEEEauthorrefmark{1}, + Lilia Ziane Khodja\IEEEauthorrefmark{2}, David Laiymani\IEEEauthorrefmark{1}, - Arnaud Giersch\IEEEauthorrefmark{1}, - Lilia Ziane Khodja\IEEEauthorrefmark{2} and + Arnaud Giersch\IEEEauthorrefmark{1} and Raphaël Couturier\IEEEauthorrefmark{1} } \IEEEauthorblockA{\IEEEauthorrefmark{1}% @@ -71,34 +71,20 @@ \maketitle -\RC{Ordre des auteurs pas définitif.} \begin{abstract} -\AG{L'abstract est AMHA incompréhensible et ne donne pas envie de lire la suite.} -In recent years, the scalability of large-scale implementation in a -distributed environment of algorithms becoming more and more complex has -always been hampered by the limits of physical computing resources -capacity. One solution is to run the program in a virtual environment -simulating a real interconnected computers architecture. The results are -convincing and useful solutions are obtained with far fewer resources -than in a real platform. However, challenges remain for the convergence -and efficiency of a class of algorithms that concern us here, namely -numerical parallel iterative algorithms executed in asynchronous mode, -especially in a large scale level. Actually, such algorithm requires a -balance and a compromise between computation and communication time -during the execution. Two important factors determine the success of the -experimentation: the convergence of the iterative algorithm on a large -scale and the execution time reduction in asynchronous mode. Once again, -from the current work, a simulated environment like SimGrid provides -accurate results which are difficult or even impossible to obtain in a -physical platform by exploiting the flexibility of the simulator on the -computing units clusters and the network structure design. Our -experimental outputs showed a saving of up to \np[\%]{40} for the algorithm -execution time in asynchronous mode compared to the synchronous one with -a residual precision up to \np{E-11}. Such successful results open -perspectives on experimentations for running the algorithm on a -simulated large scale growing environment and with larger problem size. - -\LZK{Long\ldots} + +Synchronous iterative algorithms are often less scalable than asynchronous +iterative ones. Performing large scale experiments with different kind of +network parameters is not easy because with supercomputers such parameters are +fixed. So one solution consists in using simulations first in order to analyze +what parameters could influence or not the behaviors of an algorithm. In this +paper, we show that it is interesting to use SimGrid to simulate the behaviors +of asynchronous iterative algorithms. For that, we compare the behaviour of a +synchronous GMRES algorithm with an asynchronous multisplitting one with +simulations in which we choose some parameters. Both codes are real MPI +codes. Simulations allow us to see when the multisplitting algorithm can be more +efficient than the GMRES one to solve a 3D Poisson problem. + % no keywords for IEEE conferences % Keywords: Algorithm distributed iterative asynchronous simulation SimGrid @@ -111,7 +97,7 @@ problems raised by researchers on various scientific disciplines but also by in increasing complexity of these requested applications combined with a continuous increase of their sizes lead to write distributed and parallel algorithms requiring significant hardware resources (grid computing, clusters, broadband network, etc.) but also a non-negligible CPU execution time. We consider in this paper a class of highly efficient -parallel algorithms called \emph{numerical iterative algorithms} executed in a distributed environment. As their name +parallel algorithms called \emph{iterative algorithms} executed in a distributed environment. As their name suggests, these algorithms solve a given problem by successive iterations ($X_{n +1} = f(X_{n})$) from an initial value $X_{0}$ to find an approximate value $X^*$ of the solution with a very low residual error. Several well-known methods demonstrate the convergence of these algorithms~\cite{BT89,Bahi07}. @@ -127,57 +113,60 @@ at that time. Even if the number of iterations required before the convergence i synchronous case, AIAC algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially in a grid computing context (see~\cite{Bahi07} for more details). -Parallel numerical applications (synchronous or asynchronous) may have different -configuration and deployment requirements. Quantifying their resource -allocation policies and application scheduling algorithms in grid computing -environments under varying load, CPU power and network speeds is very costly, -very labor intensive and very time -consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC -algorithms is even more problematic since they are very sensible to the +Parallel (synchronous or asynchronous) applications may have different +configuration and deployment requirements. Quantifying their resource +allocation policies and application scheduling algorithms in grid computing +environments under varying load, CPU power and network speeds is very costly, +very labor intensive and very time +consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC +algorithms is even more problematic since they are very sensible to the execution environment context. For instance, variations in the network bandwidth -(intra and inter-clusters), in the number and the power of nodes, in the number -of clusters\dots{} can lead to very different number of iterations and so to -very different execution times. Then, it appears that the use of simulation -tools to explore various platform scenarios and to run large numbers of -experiments quickly can be very promising. In this way, the use of a simulation -environment to execute parallel iterative algorithms found some interests in -reducing the highly cost of access to computing resources: (1) for the -applications development life cycle and in code debugging (2) and in production -to get results in a reasonable execution time with a simulated infrastructure -not accessible with physical resources. Indeed, the launch of distributed -iterative asynchronous algorithms to solve a given problem on a large-scale -simulated environment challenges to find optimal configurations giving the best +(intra and inter-clusters), in the number and the power of nodes, in the number +of clusters\dots{} can lead to very different number of iterations and so to +very different execution times. Then, it appears that the use of simulation +tools to explore various platform scenarios and to run large numbers of +experiments quickly can be very promising. In this way, the use of a simulation +environment to execute parallel iterative algorithms found some interests in +reducing the highly cost of access to computing resources: (1) for the +applications development life cycle and in code debugging (2) and in production +to get results in a reasonable execution time with a simulated infrastructure +not accessible with physical resources. Indeed, the launch of distributed +iterative asynchronous algorithms to solve a given problem on a large-scale +simulated environment challenges to find optimal configurations giving the best results with a lowest residual error and in the best of execution time. -To our knowledge, there is no existing work on the large-scale simulation of a -real AIAC application. The aim of this paper is twofold. First we give a first -approach of the simulation of AIAC algorithms using a simulation tool (i.e. the -SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of -asynchronous mode algorithms by comparing their performance with the synchronous -mode. More precisely, we had implemented a program for solving large -linear system of equations by numerical method GMRES (Generalized -Minimal Residual) \cite{ref1}. We show, that with minor modifications of the -initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a -real AIAC application on different computing architectures. The simulated -results we obtained are in line with real results exposed in ??\AG[]{ref?}. -SimGrid had allowed us to launch the application from a modest computing -infrastructure by simulating different distributed architectures composed by -clusters nodes interconnected by variable speed networks. With selected -parameters on the network platforms (bandwidth, latency of inter cluster -network) and on the clusters architecture (number, capacity calculation power) -in the simulated environment, the experimental results have demonstrated not -only the algorithm convergence within a reasonable time compared with the -physical environment performance, but also a time saving of up to \np[\%]{40} in -asynchronous mode. -\AG{Il faudrait revoir la phrase précédente (couper en deux?). Là, on peut - avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle - et une exécution simulée!} - -This article is structured as follows: after this introduction, the next section will give a brief description of -iterative asynchronous model. Then, the simulation framework SimGrid is presented with the settings to create various -distributed architectures. The algorithm of the multisplitting method used by GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} written with MPI primitives and -its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results -carried out will be presented before some concluding remarks and future works. +To our knowledge, there is no existing work on the large-scale simulation of a +real AIAC application. {\bf The contribution of the present paper can be + summarised in two main points}. First we give a first approach of the +simulation of AIAC algorithms using a simulation tool (i.e. the SimGrid +toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the +asynchronous multisplitting algorithm by comparing its performance with the +synchronous GMRES (Generalized Minimal Residual) \cite{ref1}. Both these codes +can be used to solve large linear systems. In this paper, we focus on a 3D +Poisson problem. We show, that with minor modifications of the initial MPI +code, the SimGrid toolkit allows us to perform a test campaign of a real AIAC +application on different computing architectures. +% The simulated results we +%obtained are in line with real results exposed in ??\AG[]{ref?}. +SimGrid had allowed us to launch the application from a modest computing +infrastructure by simulating different distributed architectures composed by +clusters nodes interconnected by variable speed networks. Parameters of the +network platforms are the bandwidth and the latency of inter cluster +network. Parameters on the cluster's architecture are the number of machines and +the computation power of a machine. Simulations show that the asynchronous +multisplitting algorithm can solve the 3D Poisson problem approximately twice +faster than GMRES with two distant clusters. + + + +This article is structured as follows: after this introduction, the next section +will give a brief description of iterative asynchronous model. Then, the +simulation framework SimGrid is presented with the settings to create various +distributed architectures. Then, the multisplitting method is presented, it is +based on GMRES to solve each block obtained of the splitting. This code is +written with MPI primitives and its adaptation to SimGrid with SMPI (Simulated +MPI) is detailed in the next section. At last, the simulation results carried +out will be presented before some concluding remarks and future works. \section{Motivations and scientific context}