X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/b9756213bfa3fbf7c5667385f5070b003b9bf0b3..c9066e476a4a5b2bdba5c2eb18ad0aa2ad1d2bb3:/hpcc.tex

diff --git a/hpcc.tex b/hpcc.tex
index abcf399..e043c78 100644
--- a/hpcc.tex
+++ b/hpcc.tex
@@ -82,7 +82,7 @@ paper, we show  that it is interesting to use SimGrid  to simulate the behaviors
 of asynchronous  iterative algorithms. For that,  we compare the  behaviour of a
 synchronous  GMRES  algorithm  with  an  asynchronous  multisplitting  one  with
 simulations  in  which we  choose  some parameters.   Both  codes  are real  MPI
-codes. Experiments allow us to see when the multisplitting algorithm can be more
+codes. Simulations allow us to see when the multisplitting algorithm can be more
 efficient than the GMRES one to solve a 3D Poisson problem.
 
 
@@ -135,57 +135,66 @@ iterative  asynchronous algorithms  to solve  a given  problem on  a large-scale
 simulated environment challenges to  find optimal configurations giving the best
 results with a lowest residual error and in the best of execution time.
 
-To our knowledge, there is no existing work on the large-scale simulation of a
-real AIAC application. There are {\bf two contributions} in this paper. First we give a first
-approach of the simulation of AIAC algorithms using a simulation tool (i.e. the
-SimGrid toolkit~\cite{SimGrid}). Second, we confirm the effectiveness of the
-asynchronous multisplitting algorithm by comparing its performance with the synchronous
-GMRES. More precisely, we had implemented a program for solving large
-linear system of equations by numerical method GMRES (Generalized
-Minimal Residual) \cite{ref1}. We show, that with minor modifications of the
-initial MPI code, the SimGrid toolkit allows us to perform a test campaign of a
-real AIAC application on different computing architectures. The simulated
-results we obtained are in line with real results exposed in ??\AG[]{ref?}.
-SimGrid had allowed us to launch the application from a modest computing
-infrastructure by simulating different distributed architectures composed by
-clusters nodes interconnected by variable speed networks.  With selected
-parameters on the network platforms (bandwidth, latency of inter cluster
-network) and on the clusters architecture (number, capacity calculation power)
-in the simulated environment, the experimental results have demonstrated not
-only the algorithm convergence within a reasonable time compared with the
-physical environment performance, but also a time saving of up to \np[\%]{40} in
-asynchronous mode.
-\AG{Il faudrait revoir la phrase prÃ©cÃ©dente (couper en deux?).  LÃ , on peut
-  avoir l'impression que le gain de \np[\%]{40} est entre une exÃ©cution rÃ©elle
-  et une exÃ©cution simulÃ©e!}
-
-This article is structured as follows: after this introduction, the next  section will give a brief description of
-iterative asynchronous model.  Then, the simulation framework SimGrid is presented with the settings to create various
-distributed architectures. The algorithm of  the multisplitting method used by GMRES \LZK{??? GMRES n'utilise pas la mÃ©thode de multisplitting! Sinon ne doit on pas expliquer le choix d'une mÃ©thode de multisplitting?} written with MPI primitives and
-its adaptation to SimGrid with SMPI (Simulated MPI) is detailed in the next section. At last, the experiments results
-carried out will be presented before some concluding remarks and future works.
+To our knowledge,  there is no existing work on the  large-scale simulation of a
+real  AIAC application.   {\bf  The contribution  of  the present  paper can  be
+  summarised  in two  main  points}.  First  we  give a  first  approach of  the
+simulation  of  AIAC algorithms  using  a  simulation  tool (i.e.   the  SimGrid
+toolkit~\cite{SimGrid}).    Second,  we   confirm  the   effectiveness   of  the
+asynchronous  multisplitting algorithm  by  comparing its  performance with  the
+synchronous GMRES (Generalized Minimal  Residual) \cite{ref1}.  Both these codes
+can be  used to  solve large linear  systems. In  this paper, we  focus on  a 3D
+Poisson  problem.  We show,  that with  minor modifications  of the  initial MPI
+code, the SimGrid  toolkit allows us to  perform a test campaign of  a real AIAC
+application on different computing architectures.
+% The  simulated results  we
+%obtained are  in line with real  results exposed in  ??\AG[]{ref?}. 
+SimGrid  had  allowed us  to  launch the  application  from  a modest  computing
+infrastructure  by simulating  different distributed  architectures  composed by
+clusters  nodes interconnected by  variable speed  networks.  Parameters  of the
+network  platforms  are   the  bandwidth  and  the  latency   of  inter  cluster
+network. Parameters on the cluster's architecture are the number of machines and
+the  computation power  of a  machine.  Simulations show  that the  asynchronous
+multisplitting algorithm  can solve the  3D Poisson problem  approximately twice
+faster than GMRES with two distant clusters.
+
+
+
+This article is structured as follows: after this introduction, the next section
+will  give a  brief  description  of iterative  asynchronous  model.  Then,  the
+simulation framework  SimGrid is presented  with the settings to  create various
+distributed architectures.  Then, the  multisplitting method is presented, it is
+based  on GMRES to  solve each  block obtained  of the  splitting. This  code is
+written with MPI  primitives and its adaptation to  SimGrid with SMPI (Simulated
+MPI) is  detailed in the next  section. At last, the  simulation results carried
+out will be presented before some concluding remarks and future works.
  
 \section{Motivations and scientific context}
 
-As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be
-classified in three main classes depending on how iterations and communications are managed (for more details readers
-can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~-- Synchronous Communications (SISC)} model data
-are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and
-important idle times on processors are generated. The \textit{Synchronous Iterations~-- Asynchronous Communications
-(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously
-i.e.  without stopping current computations. This technique allows to partially overlap communications by computations
-but unfortunately, the overlapping is only partial and important idle times remain.  It is clear that, in a grid
-computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle
-times generated by synchronizations are very penalizing. One way to overcome this problem is to use the
-\textit{Asynchronous Iterations~-- Asynchronous Communications (AIAC)} model. Here, local computations do not need to
-wait for required data. Processors can then perform their iterations with the data present at that time. Figure~\ref{fig:aiac}
-illustrates this model where the gray blocks represent the computation phases, the white spaces the idle
-times and the arrows the communications.
-\AG{There are no ``white spaces'' on the figure.}
-With this algorithmic model, the number of iterations required before the
-convergence is generally greater than for the two former classes. But, and as detailed in~\cite{bcvc06:ij}, AIAC
-algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially
-in a grid computing context.\LZK{RÃ©pÃ©tition par rapport Ã  l'intro}
+As exposed in  the introduction, parallel iterative methods  are now widely used
+in  many  scientific domains.  They  can be  classified  in  three main  classes
+depending on  how iterations  and communications are  managed (for  more details
+readers can refer to~\cite{bcvc06:ij}). In the \textit{Synchronous Iterations~--
+  Synchronous Communications (SISC)} model data are exchanged at the end of each
+iteration. All the processors must begin the same iteration at the same time and
+important  idle  times  on  processors are  generated.  The  \textit{Synchronous
+  Iterations~-- Asynchronous Communications (SIAC)} model can be compared to the
+previous  one  except   that  data  required  on  another   processor  are  sent
+asynchronously  i.e.   without  stopping  current computations.  This  technique
+allows to  partially overlap  communications by computations  but unfortunately,
+the overlapping  is only partial and  important idle times remain.   It is clear
+that, in  a grid computing context,  where the number of  computational nodes is
+large,  heterogeneous  and  widely  distributed,  the idle  times  generated  by
+synchronizations are very penalizing. One way to overcome this problem is to use
+the  \textit{Asynchronous   Iterations~--  Asynchronous  Communications  (AIAC)}
+model.   Here,  local   computations  do   not   need  to   wait  for   required
+data. Processors can then perform their iterations with the data present at that
+time.  Figure~\ref{fig:aiac}  illustrates  this  model  where  the  gray  blocks
+represent the  computation phases.  With  this algorithmic model, the  number of
+iterations required before the convergence is generally greater than for the two
+former classes.  But, and as  detailed in~\cite{bcvc06:ij}, AIAC  algorithms can
+significantly reduce  overall execution times  by suppressing idle times  due to
+synchronizations  especially  in a  grid  computing context.
+%\LZK{RÃ©pÃ©tition  par  rapport Ã  l'intro}
 
 \begin{figure}[!t]
   \centering
@@ -194,26 +203,38 @@ in a grid computing context.\LZK{RÃ©pÃ©tition par rapport Ã  l'intro}
   \label{fig:aiac}
 \end{figure}
 
+\RC{Je serais partant de virer AIAC et laisser asynchronous algorithms... Ã  voir}
+
+%% It is very challenging to develop efficient applications for large scale,
+%% heterogeneous and distributed platforms such as computing grids. Researchers and
+%% engineers have to develop techniques for maximizing application performance of
+%% these multi-cluster platforms, by redesigning the applications and/or by using
+%% novel algorithms that can account for the composite and heterogeneous nature of
+%% the platform. Unfortunately, the deployment of such applications on these very
+%% large scale systems is very costly, labor intensive and time consuming. In this
+%% context, it appears that the use of simulation tools to explore various platform
+%% scenarios at will and to run enormous numbers of experiments quickly can be very
+%% promising. Several works\dots{}
+
+%% \AG{Several works\dots{} what?\\
+%  Le paragraphe suivant se trouve dÃ©jÃ  dans l'intro ?}
+In the context of asynchronous algorithms, the number of iterations to reach the
+convergence depends on  the delay of messages. With  synchronous iterations, the
+number of  iterations is exactly  the same than  in the sequential mode  (if the
+parallelization process does  not change the algorithm). So  the difficulty with
+asynchronous algorithms comes from the fact it is necessary to run the algorithm
+with real data. In fact, from an execution to another the order of messages will
+change and the  number of iterations to reach the  convergence will also change.
+According  to all  the parameters  of the  platform (number  of nodes,  power of
+nodes,  inter  and  intra clusrters  bandwith  and  latency,  ....) and  of  the
+algorithm  (number   of  splitting  with  the   multisplitting  algorithm),  the
+multisplitting code  will obtain the solution  more or less  quickly. Or course,
+the GMRES method also depends of the same parameters. As it is difficult to have
+access to  many clusters,  grids or supercomputers  with many  different network
+parameters,  it  is  interesting  to  be  able  to  simulate  the  behaviors  of
+asynchronous iterative algoritms before being able to runs real experiments.
+
 
-It is very challenging to develop efficient applications for large scale,
-heterogeneous and distributed platforms such as computing grids. Researchers and
-engineers have to develop techniques for maximizing application performance of
-these multi-cluster platforms, by redesigning the applications and/or by using
-novel algorithms that can account for the composite and heterogeneous nature of
-the platform. Unfortunately, the deployment of such applications on these very
-large scale systems is very costly, labor intensive and time consuming. In this
-context, it appears that the use of simulation tools to explore various platform
-scenarios at will and to run enormous numbers of experiments quickly can be very
-promising. Several works\dots{}
-
-\AG{Several works\dots{} what?\\
-  Le paragraphe suivant se trouve dÃ©jÃ  dans l'intro ?}
-In the context of AIAC algorithms, the use of simulation tools is even more
-relevant. Indeed, this class of applications is very sensible to the execution
-environment context. For instance, variations in the network bandwidth (intra
-and inter-clusters), in the number and the power of nodes, in the number of
-clusters\dots{} can lead to very different number of iterations and so to very
-different execution times.