ajout de AIAC.pdf

[hpcc2014.git] / hpcc.tex
diff --git a/hpcc.tex b/hpcc.tex

index ab6e020e98836594820e875ab29cc7ab656875d7..161a02c6b67cfee72e1ba8c5ed93851d02b5dec8 100644 (file)
--- a/hpcc.tex
+++ b/hpcc.tex
@@ -101,75 +101,95 @@ simulated large scale growing environment and with larger problem size.
  
  \section{Introduction}
  
-Parallel computing and high performance computing (HPC) are becoming 
-more and more imperative for solving various problems raised by 
-researchers on various scientific disciplines but also by industrial in 
-the field. Indeed, the increasing complexity of these requested 
-applications combined with a continuous increase of their sizes lead to 
-write distributed and parallel algorithms requiring significant hardware 
-resources (grid computing, clusters, broadband network, etc.) but
-also a non-negligible CPU execution time. We consider in this paper a
-class of highly efficient parallel algorithms called iterative executed 
-in a distributed environment. As their name suggests, these algorithm 
-solves a given problem that might be NP-complete complex by successive
-iterations ($X_{n +1} = f(X_{n})$) from an initial value $X_{0}$ to find
-an approximate value $X^*$ of the solution with a very low
-residual error. Several well-known methods demonstrate the convergence 
-of these algorithms. Generally, to reduce the complexity and the 
-execution time, the problem is divided into several \emph{pieces} that will
-be solved in parallel on multiple processing units. The latter will 
-communicate each intermediate results before a new iteration starts 
-until the approximate solution is reached. These distributed parallel 
-computations can be performed either in \emph{synchronous} communication mode
-where a new iteration begin only when all nodes communications are 
-completed, either \emph{asynchronous} mode where processors can continue
-independently without or few synchronization points. Despite the 
-effectiveness of iterative approach, a major drawback of the method is 
-the requirement of huge resources in terms of computing capacity, 
-storage and high speed communication network. Indeed, limited physical 
-resources are blocking factors for large-scale deployment of parallel 
-algorithms. 
-
-In recent years, the use of a simulation environment to execute parallel 
-iterative algorithms found some interests in reducing the highly cost of 
-access to computing resources: (1) for the applications development life 
-cycle and in code debugging (2) and in production to get results in a 
-reasonable execution time with a simulated infrastructure not accessible 
-with physical resources. Indeed, the launch of distributed iterative 
-asynchronous algorithms to solve a given problem on a large-scale 
-simulated environment challenges to find optimal configurations giving 
-the best results with a lowest residual error and in the best of 
-execution time. According our knowledge, no testing of large-scale 
-simulation of the class of algorithm solving to achieve real results has 
-been undertaken to date. We had in the scope of this work implemented a 
-program for solving large non-symmetric linear system of equations by 
-numerical method GMRES (Generalized Minimal Residual) in the simulation
-environment SimGrid. The simulated platform had allowed us to launch
-the application from a modest computing infrastructure by simulating 
-different distributed architectures composed by clusters nodes 
-interconnected by variable speed networks. In addition, it has been 
-permitted to show the effectiveness of asynchronous mode algorithm by 
-comparing its performance with the synchronous mode time. With selected 
-parameters on the network platforms (bandwidth, latency of inter cluster 
-network) and on the clusters architecture (number, capacity calculation 
-power) in the simulated environment, the experimental results have
-demonstrated not only the algorithm convergence within a reasonable time 
-compared with the physical environment performance, but also a time 
+Parallel computing and high performance computing (HPC) are becoming  more and more imperative for solving various
+problems raised by  researchers on various scientific disciplines but also by industrial in  the field. Indeed, the
+increasing complexity of these requested  applications combined with a continuous increase of their sizes lead to  write
+distributed and parallel algorithms requiring significant hardware  resources (grid computing, clusters, broadband
+network, etc.) but also a non-negligible CPU execution time. We consider in this paper a class of highly efficient
+parallel algorithms called \texttt{numerical iterative algorithms} executed in a distributed environment. As their name
+suggests, these algorithm solves a given problem by successive iterations ($X_{n +1} = f(X_{n})$) from an initial value
+$X_{0}$ to find an approximate value $X^*$ of the solution with a very low residual error. Several well-known methods
+demonstrate the convergence of these algorithms \cite{}. 
+
+Parallelization of such algorithms generally involved the division of the problem into several \emph{pieces} that will
+be solved in parallel on multiple processing units. The latter will communicate each intermediate results before a new
+iteration starts  until the approximate solution is reached. These parallel  computations can be performed
+either in \emph{synchronous} communication mode where a new iteration begin only when all nodes communications are
+completed, either \emph{asynchronous} mode where processors can continue independently without or few synchronization
+points. 
+
+% DL : reprendre correction ici 
+Despite the effectiveness of iterative approach, a major drawback of the method is  the requirement of huge
+resources in terms of computing capacity,  storage and high speed communication network. Indeed, limited physical
+resources are blocking factors for large-scale deployment of parallel algorithms.
+
+In recent years, the use of a simulation environment to execute parallel  iterative algorithms found some interests in
+reducing the highly cost of  access to computing resources: (1) for the applications development life  cycle and in code
+debugging (2) and in production to get results in a  reasonable execution time with a simulated infrastructure not
+accessible  with physical resources. Indeed, the launch of distributed iterative  asynchronous algorithms to solve a
+given problem on a large-scale  simulated environment challenges to find optimal configurations giving  the best results
+with a lowest residual error and in the best of  execution time. According our knowledge, no testing of large-scale
+simulation of the class of algorithm solving to achieve real results has  been undertaken to date. We had in the scope
+of this work implemented a  program for solving large non-symmetric linear system of equations by  numerical method
+GMRES (Generalized Minimal Residual) in the simulation environment SimGrid. The simulated platform had allowed us to
+launch the application from a modest computing infrastructure by simulating  different distributed architectures
+composed by clusters nodes  interconnected by variable speed networks. In addition, it has been  permitted to show the
+effectiveness of asynchronous mode algorithm by  comparing its performance with the synchronous mode time. With selected
+parameters on the network platforms (bandwidth, latency of inter cluster  network) and on the clusters architecture
+(number, capacity calculation  power) in the simulated environment, the experimental results have demonstrated not only
+the algorithm convergence within a reasonable time  compared with the physical environment performance, but also a time
  saving of up to \np[\%]{40} in asynchronous mode.
  
-This article is structured as follows: after this introduction, the next 
-section will give a brief description of iterative asynchronous model. 
-Then, the simulation framework SimGrid will be presented with the
-settings to create various distributed architectures. The algorithm of 
-the multi-splitting method used by GMRES written with MPI primitives
-and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the
-next section. At last, the experiments results carried out will be
-presented before the conclusion which we will announce the opening of 
-our future work after the results.
+This article is structured as follows: after this introduction, the next  section will give a brief description of
+iterative asynchronous model.  Then, the simulation framework SimGrid will be presented with the settings to create
+various distributed architectures. The algorithm of  the multi-splitting method used by GMRES written with MPI
+primitives and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the next section. At last, the experiments
+results carried out will be presented before the conclusion which we will announce the opening of  our future work after
+the results.
   
-\section{The asynchronous iteration model}
+\section{Motivations and scientific context}
+
+As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be
+classified in three main classes depending on how iterations and communications are managed (for more details readers
+can refer to \cite{bcvc02:ip}). In the \textit{Synchronous Iterations - Synchronous Communications (SISC)} model data
+are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and
+important idle times on processors are generated. The \textit{Synchronous Iterations - Asynchronous Communications
+(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously
+i.e.  without stopping current computations. This technique allows to partially overlap communications by computations
+but unfortunately, the overlapping is only partial and important idle times remain.  It is clear that, in a grid
+computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle
+times generated by synchronizations are very penalizing. One way to overcome this problem is to use the
+\textit{Asynchronous Iterations - Asynchronous   Communications (AIAC)} model. Here, local computations do not need to
+wait for required data. Processors can then perform their iterations with the data present at that time. Figure
+\ref{fig:aiac} illustrates this model where the grey blocks represent the computation phases, the white spaces the idle
+times and the arrows the communications. With this algorithmic model, the number of iterations required before the
+convergence is generally greater than for the two former classes. But, and as detailed in \cite{bcvc06:ij}, AIAC
+algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially
+in a grid computing context.
+
+\begin{figure}[htbp]
+  \centering
+    \includegraphics[width=8cm]{AIAC.pdf}
+  \caption{The Asynchronous Iterations - Asynchronous Communications model } 
+  \label{fig:aiac}
+\end{figure}
+
+
+It is very challenging to develop efficient applications for large scale, heterogeneous and distributed platforms such
+as computing grids. Researchers and engineers have to develop techniques for maximizing application performance of these
+multi-cluster platforms, by redesigning the applications and/or by using novel algorithms that can account for the
+composite and heterogeneous nature of the platform. Unfortunately, the deployment of such applications on these very
+large scale systems is very costly, labor intensive and time consuming. In this context, it appears that the use of
+simulation tools to explore various platform scenarios at will and to run enormous numbers of experiments quickly can be
+very promising. Several works...
+
+In the context of AIAC algorithms, the use of simulation tools is even more relevant. Indeed, this class of applications
+is very sensible to the execution environment context. For instance, variations in the network bandwith (intra and
+inter-clusters), in the number and the power of nodes, in the number of clusters... can lead to very different number of
+iterations and so to very different execution times.
+
+
  
-\DL{Décrire le modèle asynchrone. Je m'en charge}
  
  \section{SimGrid}