Merge

author RCE <cramamonjisoa@bilbo.iut-bm.univ-fcomte.fr>

Sun, 27 Apr 2014 17:09:56 +0000 (19:09 +0200)

committer RCE <cramamonjisoa@bilbo.iut-bm.univ-fcomte.fr>

Sun, 27 Apr 2014 17:09:56 +0000 (19:09 +0200)
author RCE <cramamonjisoa@bilbo.iut-bm.univ-fcomte.fr>
Sun, 27 Apr 2014 17:09:56 +0000 (19:09 +0200)
committer RCE <cramamonjisoa@bilbo.iut-bm.univ-fcomte.fr>
Sun, 27 Apr 2014 17:09:56 +0000 (19:09 +0200)
diff --cc hpcc.tex

index 6f7f7ca9efe361e833a75c8c3087215c6ec2691d,49caa2f67fbd366e4b68dcf4aa8fdaeaa8ea3c8b..640f3ae56c6fb43afd652aaa84139a68b3b29d0b
--- 1/hpcc.tex
--- 2/hpcc.tex
+++ b/hpcc.tex
@@@ -127,55 -113,60 +113,90 @@@ at that time. Even if the number of ite
   synchronous case, AIAC algorithms can significantly reduce overall execution times by suppressing idle times due to
   synchronizations especially in a grid computing context (see~\cite{Bahi07} for more details).
   
- Parallel numerical applications (synchronous or asynchronous) may have different
- configuration and deployment requirements.  Quantifying their resource
- allocation policies and application scheduling algorithms in grid computing
- environments under varying load, CPU power and network speeds is very costly,
- very labor intensive and very time
- consuming~\cite{Calheiros:2011:CTM:1951445.1951450}.  The case of AIAC
- algorithms is even more problematic since they are very sensible to the
+ Parallel   (synchronous  or  asynchronous)   applications  may   have  different
+ configuration   and  deployment   requirements.    Quantifying  their   resource
+ allocation  policies and  application  scheduling algorithms  in grid  computing
+ environments under  varying load, CPU power  and network speeds  is very costly,
+ very          labor           intensive          and          very          time
+ consuming~\cite{Calheiros:2011:CTM:1951445.1951450}.     The   case    of   AIAC
+ algorithms  is  even  more problematic  since  they  are  very sensible  to  the
   execution environment context. For instance, variations in the network bandwidth
- (intra and inter-clusters), in the number and the power of nodes, in the number
- of clusters\dots{} can lead to very different number of iterations and so to
- very different execution times. Then, it appears that the use of simulation
- tools to explore various platform scenarios and to run large numbers of
- experiments quickly can be very promising. In this way, the use of a simulation
- environment to execute parallel iterative algorithms found some interests in
- reducing the highly cost of access to computing resources: (1) for the
- applications development life cycle and in code debugging (2) and in production
- to get results in a reasonable execution time with a simulated infrastructure
- not accessible with physical resources. Indeed, the launch of distributed
- iterative asynchronous algorithms to solve a given problem on a large-scale
- simulated environment challenges to find optimal configurations giving the best
+ (intra and inter-clusters), in the number  and the power of nodes, in the number
+ of clusters\dots{}  can lead to  very different number  of iterations and  so to
+ very  different execution times.  Then, it  appears that  the use  of simulation
+ tools  to  explore  various platform  scenarios  and  to  run large  numbers  of
+ experiments quickly can be very promising.  In this way, the use of a simulation
+ environment  to execute parallel  iterative algorithms  found some  interests in
+ reducing  the  highly  cost  of  access  to computing  resources:  (1)  for  the
+ applications development life cycle and  in code debugging (2) and in production
+ to get  results in a reasonable  execution time with  a simulated infrastructure
+ not  accessible  with physical  resources.  Indeed,  the  launch of  distributed
+ iterative  asynchronous algorithms  to solve  a given  problem on  a large-scale
+ simulated environment challenges to  find optimal configurations giving the best
   results with a lowest residual error and in the best of execution time.
   
++<<<<<<< HEAD
+ +To our knowledge, there is no existing work on the large-scale simulation of a
+ +real AIAC application. The aim of this paper is twofold. First we give a first
+ +approach of the simulation of AIAC algorithms using a simulation tool (i.e. the
+ +SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of
+ +asynchronous mode algorithms by comparing their performance with the synchronous
+ +mode. More precisely, we had implemented a program for solving large
+ +linear system of equations by numerical method GMRES (Generalized
+ +Minimal Residual) \cite{ref1}. We show, that with minor modifications of the
+ +initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a
+ +real AIAC application on different computing architectures. The simulated
+ +results we obtained are in line with real results exposed in ??\AG[]{ref?}.
+ +SimGrid had allowed us to launch the application from a modest computing
+ +infrastructure by simulating different distributed architectures composed by
+ +clusters nodes interconnected by variable speed networks. In the simulated environment, after setting appropriate 
+ +network and cluster parameters like the network bandwidth, latency or the processors power, 
+ +the experimental results have demonstrated a asynchronous execution time saving up to \np[\%]{40} in
+ +compared to the synchronous mode.
+ +\AG{Il faudrait revoir la phrase précédente (couper en deux?).  Là, on peut
+ +  avoir l'impression que le gain de \np[\%]{40} est entre une exécution réelle
+ +  et une exécution simulée!}
+ +\CER{La phrase a été modifiée}
+ +
+ +This article is structured as follows: after this introduction, the next  section will give a brief description of
+ +iterative asynchronous model.  Then, the simulation framework SimGrid is presented with the settings to create various
+ +distributed architectures. The algorithm of  the multisplitting method based on GMRES \LZK{??? GMRES n'utilise pas la méthode de multisplitting! Sinon ne doit on pas expliquer le choix d'une méthode de multisplitting?} \CER{La phrase a été corrigée} written with MPI primitives and
+ +its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results
+ +carried out will be presented before some concluding remarks and future works.
++=======
+ To our knowledge,  there is no existing work on the  large-scale simulation of a
+ real  AIAC application.   {\bf  The contribution  of  the present  paper can  be
+   summarised  in two  main  points}.  First  we  give a  first  approach of  the
+ simulation  of  AIAC algorithms  using  a  simulation  tool (i.e.   the  SimGrid
+ toolkit~\cite{SimGrid}).    Second,  we   confirm  the   effectiveness   of  the
+ asynchronous  multisplitting algorithm  by  comparing its  performance with  the
+ synchronous GMRES (Generalized Minimal  Residual) \cite{ref1}.  Both these codes
+ can be  used to  solve large linear  systems. In  this paper, we  focus on  a 3D
+ Poisson  problem.  We show,  that with  minor modifications  of the  initial MPI
+ code, the SimGrid  toolkit allows us to  perform a test campaign of  a real AIAC
+ application on different computing architectures.
+ % The  simulated results  we
+ %obtained are  in line with real  results exposed in  ??\AG[]{ref?}. 
+ SimGrid  had  allowed us  to  launch the  application  from  a modest  computing
+ infrastructure  by simulating  different distributed  architectures  composed by
+ clusters  nodes interconnected by  variable speed  networks.  Parameters  of the
+ network  platforms  are   the  bandwidth  and  the  latency   of  inter  cluster
+ network. Parameters on the cluster's architecture are the number of machines and
+ the  computation power  of a  machine.  Simulations show  that the  asynchronous
+ multisplitting algorithm  can solve the  3D Poisson problem  approximately twice
+ faster than GMRES with two distant clusters.
+ 
+ 
+ 
+ This article is structured as follows: after this introduction, the next section
+ will  give a  brief  description  of iterative  asynchronous  model.  Then,  the
+ simulation framework  SimGrid is presented  with the settings to  create various
+ distributed architectures.  Then, the  multisplitting method is presented, it is
+ based  on GMRES to  solve each  block obtained  of the  splitting. This  code is
+ written with MPI  primitives and its adaptation to  SimGrid with SMPI (Simulated
+ MPI) is  detailed in the next  section. At last, the  simulation results carried
+ out will be presented before some concluding remarks and future works.
++>>>>>>> 6785b9ef58de0db67c33ca901c7813f3dfdc76e0
    
   \section{Motivations and scientific context}
   
@@@ -438,54 -482,21 +513,20 @@@ study that the results depend on the fo
   A priori, obtaining a relative gain greater than 1 would be difficult in a local
   area network configuration where the synchronous mode will take advantage on the
   rapid exchange of information on such high-speed links. Thus, the methodology
- -adopted was to launch the application on clustered network. In this last
+ +adopted was to launch the application on a clustered network. In this
   configuration, degrading the inter-cluster network performance will penalize the
   synchronous mode allowing to get a relative gain greater than 1.  This action
- -simulates the case of distant clusters linked with long distance network like
- -Internet.
+ +simulates the case of distant clusters linked with long distance network as in grid computing context.
   
- \AG{Cette partie sur le poisson 3D
-   % on sait donc que ce n'est pas une plie ou une sole (/me fatigué)
-   n'est pas à sa place.  Elle devrait être placée plus tôt.}
- In this paper, we solve the 3D Poisson problem whose the mathematical model is 
- \begin{equation}
- \left\{
- \begin{array}{l}
- \nabla^2 u = f \text{~in~} \Omega \\
- u =0 \text{~on~} \Gamma =\partial\Omega
- \end{array}
- \right.
- \label{eq:02}
- \end{equation}
- where $\nabla^2$ is the Laplace operator, $f$ and $u$ are real-valued functions, and $\Omega=[0,1]^3$. The spatial discretization with a finite difference scheme reduces problem~(\ref{eq:02}) to a system of sparse linear equations. The general iteration scheme of our multisplitting method in a 3D domain using a seven point stencil could be written as 
- \begin{equation}
- \begin{array}{ll}
- u^{k+1}(x,y,z)= & u^k(x,y,z) - \frac{1}{6}\times\\
-                & (u^k(x-1,y,z) + u^k(x+1,y,z) + \\
-                & u^k(x,y-1,z) + u^k(x,y+1,z) + \\
-                & u^k(x,y,z-1) + u^k(x,y,z+1)),
- \end{array}
- \label{eq:03}
- \end{equation} 
- where the iteration matrix $A$ of size $N_x\times N_y\times N_z$ of the discretized linear system is sparse, symmetric and positive definite. 
- 
- The parallel solving of the 3D Poisson problem with our multisplitting method requires a data partitioning of the problem between clusters and between processors within a cluster. We have chosen the 3D partitioning instead of the row-by-row partitioning in order to reduce the data exchanges at sub-domain boundaries. Figure~\ref{fig:4.2} shows an example of the data partitioning of the 3D Poisson problem between two clusters of processors, where each sub-problem is assigned to a processor. In this context, a processor has at most six neighbors within a cluster or in distant clusters with which it shares data at sub-domain boundaries. 
- 
- \begin{figure}[!t]
- \centering
-   \includegraphics[width=80mm,keepaspectratio]{partition}
- \caption{Example of the 3D data partitioning between two clusters of processors.}
- \label{fig:4.2}
- \end{figure}
- 
   
- -As a first step, the algorithm was run on a network consisting of two clusters
- -containing 50 hosts each, totaling 100 hosts. Various combinations of the above
- -factors have provided the results shown in Table~\ref{tab.cluster.2x50} with a
- -matrix size ranging from $N_x = N_y = N_z = \text{62}$ to 171 elements or from
- -$\text{62}^\text{3} = \text{\np{238328}}$ to $\text{171}^\text{3} =
- -\text{\np{5000211}}$ entries.
+ +% As a first step, 
+ +The algorithm was run on a two clusters based network with 50 hosts each, totaling 100 hosts. Various combinations of the above
+ +factors have provided the results shown in Table~\ref{tab.cluster.2x50}. The algorithm convergence with a 3D
+ +matrix size ranging from $N_x = N_y = N_z = \text{62}$ to 150 elements (that is from
+ +$\text{62}^\text{3} = \text{\np{238328}}$ to $\text{150}^\text{3} =
+ +\text{\np{3375000}}$ entries), is obtained in asynchronous in average 2.5 times speeder than the synchronous mode. 
   \AG{Expliquer comment lire les tableaux.}
- -
+ +\CER{J'ai reformulé la phrase par la lecture du tableau. Plus de détails seront lus dans la partie Interprétations et commentaires}
   % use the same column width for the following three tables
   \newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}}
   \newenvironment{mytable}[1]{% #1: number of columns for data
author	RCE <cramamonjisoa@bilbo.iut-bm.univ-fcomte.fr>
	Sun, 27 Apr 2014 17:09:56 +0000 (19:09 +0200)
committer	RCE <cramamonjisoa@bilbo.iut-bm.univ-fcomte.fr>
	Sun, 27 Apr 2014 17:09:56 +0000 (19:09 +0200)