From: RCE Date: Sun, 27 Apr 2014 17:09:56 +0000 (+0200) Subject: Merge X-Git-Tag: hpcc2014_submission~57 X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/commitdiff_plain/e4b14590c5b948dac64c900d8d83e990a3f79122 Merge Merge branch 'master' of ssh://info.iut-bm.univ-fcomte.fr/hpcc2014 Conflicts: hpcc.tex --- e4b14590c5b948dac64c900d8d83e990a3f79122 diff --cc hpcc.tex index 6f7f7ca,49caa2f..640f3ae --- a/hpcc.tex +++ b/hpcc.tex @@@ -127,55 -113,60 +113,90 @@@ at that time. Even if the number of ite synchronous case, AIAC algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially in a grid computing context (see~\cite{Bahi07} for more details). - Parallel numerical applications (synchronous or asynchronous) may have different - configuration and deployment requirements. Quantifying their resource - allocation policies and application scheduling algorithms in grid computing - environments under varying load, CPU power and network speeds is very costly, - very labor intensive and very time - consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC - algorithms is even more problematic since they are very sensible to the + Parallel (synchronous or asynchronous) applications may have different + configuration and deployment requirements. Quantifying their resource + allocation policies and application scheduling algorithms in grid computing + environments under varying load, CPU power and network speeds is very costly, + very labor intensive and very time + consuming~\cite{Calheiros:2011:CTM:1951445.1951450}. The case of AIAC + algorithms is even more problematic since they are very sensible to the execution environment context. For instance, variations in the network bandwidth - (intra and inter-clusters), in the number and the power of nodes, in the number - of clusters\dots{} can lead to very different number of iterations and so to - very different execution times. Then, it appears that the use of simulation - tools to explore various platform scenarios and to run large numbers of - experiments quickly can be very promising. In this way, the use of a simulation - environment to execute parallel iterative algorithms found some interests in - reducing the highly cost of access to computing resources: (1) for the - applications development life cycle and in code debugging (2) and in production - to get results in a reasonable execution time with a simulated infrastructure - not accessible with physical resources. Indeed, the launch of distributed - iterative asynchronous algorithms to solve a given problem on a large-scale - simulated environment challenges to find optimal configurations giving the best + (intra and inter-clusters), in the number and the power of nodes, in the number + of clusters\dots{} can lead to very different number of iterations and so to + very different execution times. Then, it appears that the use of simulation + tools to explore various platform scenarios and to run large numbers of + experiments quickly can be very promising. In this way, the use of a simulation + environment to execute parallel iterative algorithms found some interests in + reducing the highly cost of access to computing resources: (1) for the + applications development life cycle and in code debugging (2) and in production + to get results in a reasonable execution time with a simulated infrastructure + not accessible with physical resources. Indeed, the launch of distributed + iterative asynchronous algorithms to solve a given problem on a large-scale + simulated environment challenges to find optimal configurations giving the best results with a lowest residual error and in the best of execution time. ++<<<<<<< HEAD +To our knowledge, there is no existing work on the large-scale simulation of a +real AIAC application. The aim of this paper is twofold. First we give a first +approach of the simulation of AIAC algorithms using a simulation tool (i.e. the +SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of +asynchronous mode algorithms by comparing their performance with the synchronous +mode. More precisely, we had implemented a program for solving large +linear system of equations by numerical method GMRES (Generalized +Minimal Residual) \cite{ref1}. We show, that with minor modifications of the +initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a +real AIAC application on different computing architectures. The simulated +results we obtained are in line with real results exposed in ??\AG[]{ref?}. +SimGrid had allowed us to launch the application from a modest computing +infrastructure by simulating different distributed architectures composed by +clusters nodes interconnected by variable speed networks. In the simulated environment, after setting appropriate +network and cluster parameters like the network bandwidth, latency or the processors power, +the experimental results have demonstrated a asynchronous execution time saving up to \np[\%]{40} in +compared to the synchronous mode. +\AG{Il faudrait revoir la phrase précédente (couper en deux?). Là, on peut + avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle + et une exécution simulée!} +\CER{La phrase a été modifiée} + +This article is structured as follows: after this introduction, the next section will give a brief description of +iterative asynchronous model. Then, the simulation framework SimGrid is presented with the settings to create various +distributed architectures. The algorithm of the multisplitting method based on GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} \CER{La phrase a été corrigée} written with MPI primitives and +its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results +carried out will be presented before some concluding remarks and future works. ++======= + To our knowledge, there is no existing work on the large-scale simulation of a + real AIAC application. {\bf The contribution of the present paper can be + summarised in two main points}. First we give a first approach of the + simulation of AIAC algorithms using a simulation tool (i.e. the SimGrid + toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the + asynchronous multisplitting algorithm by comparing its performance with the + synchronous GMRES (Generalized Minimal Residual) \cite{ref1}. Both these codes + can be used to solve large linear systems. In this paper, we focus on a 3D + Poisson problem. We show, that with minor modifications of the initial MPI + code, the SimGrid toolkit allows us to perform a test campaign of a real AIAC + application on different computing architectures. + % The simulated results we + %obtained are in line with real results exposed in ??\AG[]{ref?}. + SimGrid had allowed us to launch the application from a modest computing + infrastructure by simulating different distributed architectures composed by + clusters nodes interconnected by variable speed networks. Parameters of the + network platforms are the bandwidth and the latency of inter cluster + network. Parameters on the cluster's architecture are the number of machines and + the computation power of a machine. Simulations show that the asynchronous + multisplitting algorithm can solve the 3D Poisson problem approximately twice + faster than GMRES with two distant clusters. + + + + This article is structured as follows: after this introduction, the next section + will give a brief description of iterative asynchronous model. Then, the + simulation framework SimGrid is presented with the settings to create various + distributed architectures. Then, the multisplitting method is presented, it is + based on GMRES to solve each block obtained of the splitting. This code is + written with MPI primitives and its adaptation to SimGrid with SMPI (Simulated + MPI) is detailed in the next section. At last, the simulation results carried + out will be presented before some concluding remarks and future works. ++>>>>>>> 6785b9ef58de0db67c33ca901c7813f3dfdc76e0 \section{Motivations and scientific context} @@@ -438,54 -482,21 +513,20 @@@ study that the results depend on the fo A priori, obtaining a relative gain greater than 1 would be difficult in a local area network configuration where the synchronous mode will take advantage on the rapid exchange of information on such high-speed links. Thus, the methodology -adopted was to launch the application on clustered network. In this last +adopted was to launch the application on a clustered network. In this configuration, degrading the inter-cluster network performance will penalize the synchronous mode allowing to get a relative gain greater than 1. This action -simulates the case of distant clusters linked with long distance network like -Internet. +simulates the case of distant clusters linked with long distance network as in grid computing context. - \AG{Cette partie sur le poisson 3D - % on sait donc que ce n'est pas une plie ou une sole (/me fatigué) - n'est pas à sa place. Elle devrait être placée plus tôt.} - In this paper, we solve the 3D Poisson problem whose the mathematical model is - \begin{equation} - \left\{ - \begin{array}{l} - \nabla^2 u = f \text{~in~} \Omega \\ - u =0 \text{~on~} \Gamma =\partial\Omega - \end{array} - \right. - \label{eq:02} - \end{equation} - where $\nabla^2$ is the Laplace operator, $f$ and $u$ are real-valued functions, and $\Omega=[0,1]^3$. The spatial discretization with a finite difference scheme reduces problem~(\ref{eq:02}) to a system of sparse linear equations. The general iteration scheme of our multisplitting method in a 3D domain using a seven point stencil could be written as - \begin{equation} - \begin{array}{ll} - u^{k+1}(x,y,z)= & u^k(x,y,z) - \frac{1}{6}\times\\ - & (u^k(x-1,y,z) + u^k(x+1,y,z) + \\ - & u^k(x,y-1,z) + u^k(x,y+1,z) + \\ - & u^k(x,y,z-1) + u^k(x,y,z+1)), - \end{array} - \label{eq:03} - \end{equation} - where the iteration matrix $A$ of size $N_x\times N_y\times N_z$ of the discretized linear system is sparse, symmetric and positive definite. - - The parallel solving of the 3D Poisson problem with our multisplitting method requires a data partitioning of the problem between clusters and between processors within a cluster. We have chosen the 3D partitioning instead of the row-by-row partitioning in order to reduce the data exchanges at sub-domain boundaries. Figure~\ref{fig:4.2} shows an example of the data partitioning of the 3D Poisson problem between two clusters of processors, where each sub-problem is assigned to a processor. In this context, a processor has at most six neighbors within a cluster or in distant clusters with which it shares data at sub-domain boundaries. - - \begin{figure}[!t] - \centering - \includegraphics[width=80mm,keepaspectratio]{partition} - \caption{Example of the 3D data partitioning between two clusters of processors.} - \label{fig:4.2} - \end{figure} - -As a first step, the algorithm was run on a network consisting of two clusters -containing 50 hosts each, totaling 100 hosts. Various combinations of the above -factors have provided the results shown in Table~\ref{tab.cluster.2x50} with a -matrix size ranging from $N_x = N_y = N_z = \text{62}$ to 171 elements or from -$\text{62}^\text{3} = \text{\np{238328}}$ to $\text{171}^\text{3} = -\text{\np{5000211}}$ entries. +% As a first step, +The algorithm was run on a two clusters based network with 50 hosts each, totaling 100 hosts. Various combinations of the above +factors have provided the results shown in Table~\ref{tab.cluster.2x50}. The algorithm convergence with a 3D +matrix size ranging from $N_x = N_y = N_z = \text{62}$ to 150 elements (that is from +$\text{62}^\text{3} = \text{\np{238328}}$ to $\text{150}^\text{3} = +\text{\np{3375000}}$ entries), is obtained in asynchronous in average 2.5 times speeder than the synchronous mode. \AG{Expliquer comment lire les tableaux.} - +\CER{J'ai reformulé la phrase par la lecture du tableau. Plus de détails seront lus dans la partie Interprétations et commentaires} % use the same column width for the following three tables \newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}} \newenvironment{mytable}[1]{% #1: number of columns for data