X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/a9757b4dc9aad25ed2dc6884c0bd638a0f101c31..15c7337ca6150b7e463ce966f188445f2d55e95a:/hpcc.tex?ds=sidebyside diff --git a/hpcc.tex b/hpcc.tex index 5ab9faf..161a02c 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -101,103 +101,71 @@ simulated large scale growing environment and with larger problem size. \section{Introduction} -Parallel computing and high performance computing (HPC) are becoming -more and more imperative for solving various problems raised by -researchers on various scientific disciplines but also by industrial in -the field. Indeed, the increasing complexity of these requested -applications combined with a continuous increase of their sizes lead to -write distributed and parallel algorithms requiring significant hardware -resources (grid computing, clusters, broadband network, etc.) but -also a non-negligible CPU execution time. We consider in this paper a -class of highly efficient parallel algorithms called iterative executed -in a distributed environment. As their name suggests, these algorithm -solves a given problem that might be NP-complete complex by successive -iterations ($X_{n +1} = f(X_{n})$) from an initial value $X_{0}$ to find -an approximate value $X^*$ of the solution with a very low -residual error. Several well-known methods demonstrate the convergence -of these algorithms. Generally, to reduce the complexity and the -execution time, the problem is divided into several \emph{pieces} that will -be solved in parallel on multiple processing units. The latter will -communicate each intermediate results before a new iteration starts -until the approximate solution is reached. These distributed parallel -computations can be performed either in \emph{synchronous} communication mode -where a new iteration begin only when all nodes communications are -completed, either \emph{asynchronous} mode where processors can continue -independently without or few synchronization points. Despite the -effectiveness of iterative approach, a major drawback of the method is -the requirement of huge resources in terms of computing capacity, -storage and high speed communication network. Indeed, limited physical -resources are blocking factors for large-scale deployment of parallel -algorithms. - -In recent years, the use of a simulation environment to execute parallel -iterative algorithms found some interests in reducing the highly cost of -access to computing resources: (1) for the applications development life -cycle and in code debugging (2) and in production to get results in a -reasonable execution time with a simulated infrastructure not accessible -with physical resources. Indeed, the launch of distributed iterative -asynchronous algorithms to solve a given problem on a large-scale -simulated environment challenges to find optimal configurations giving -the best results with a lowest residual error and in the best of -execution time. According our knowledge, no testing of large-scale -simulation of the class of algorithm solving to achieve real results has -been undertaken to date. We had in the scope of this work implemented a -program for solving large non-symmetric linear system of equations by -numerical method GMRES (Generalized Minimal Residual) in the simulation -environment SimGrid. The simulated platform had allowed us to launch -the application from a modest computing infrastructure by simulating -different distributed architectures composed by clusters nodes -interconnected by variable speed networks. In addition, it has been -permitted to show the effectiveness of asynchronous mode algorithm by -comparing its performance with the synchronous mode time. With selected -parameters on the network platforms (bandwidth, latency of inter cluster -network) and on the clusters architecture (number, capacity calculation -power) in the simulated environment, the experimental results have -demonstrated not only the algorithm convergence within a reasonable time -compared with the physical environment performance, but also a time +Parallel computing and high performance computing (HPC) are becoming more and more imperative for solving various +problems raised by researchers on various scientific disciplines but also by industrial in the field. Indeed, the +increasing complexity of these requested applications combined with a continuous increase of their sizes lead to write +distributed and parallel algorithms requiring significant hardware resources (grid computing, clusters, broadband +network, etc.) but also a non-negligible CPU execution time. We consider in this paper a class of highly efficient +parallel algorithms called \texttt{numerical iterative algorithms} executed in a distributed environment. As their name +suggests, these algorithm solves a given problem by successive iterations ($X_{n +1} = f(X_{n})$) from an initial value +$X_{0}$ to find an approximate value $X^*$ of the solution with a very low residual error. Several well-known methods +demonstrate the convergence of these algorithms \cite{}. + +Parallelization of such algorithms generally involved the division of the problem into several \emph{pieces} that will +be solved in parallel on multiple processing units. The latter will communicate each intermediate results before a new +iteration starts until the approximate solution is reached. These parallel computations can be performed +either in \emph{synchronous} communication mode where a new iteration begin only when all nodes communications are +completed, either \emph{asynchronous} mode where processors can continue independently without or few synchronization +points. + +% DL : reprendre correction ici +Despite the effectiveness of iterative approach, a major drawback of the method is the requirement of huge +resources in terms of computing capacity, storage and high speed communication network. Indeed, limited physical +resources are blocking factors for large-scale deployment of parallel algorithms. + +In recent years, the use of a simulation environment to execute parallel iterative algorithms found some interests in +reducing the highly cost of access to computing resources: (1) for the applications development life cycle and in code +debugging (2) and in production to get results in a reasonable execution time with a simulated infrastructure not +accessible with physical resources. Indeed, the launch of distributed iterative asynchronous algorithms to solve a +given problem on a large-scale simulated environment challenges to find optimal configurations giving the best results +with a lowest residual error and in the best of execution time. According our knowledge, no testing of large-scale +simulation of the class of algorithm solving to achieve real results has been undertaken to date. We had in the scope +of this work implemented a program for solving large non-symmetric linear system of equations by numerical method +GMRES (Generalized Minimal Residual) in the simulation environment SimGrid. The simulated platform had allowed us to +launch the application from a modest computing infrastructure by simulating different distributed architectures +composed by clusters nodes interconnected by variable speed networks. In addition, it has been permitted to show the +effectiveness of asynchronous mode algorithm by comparing its performance with the synchronous mode time. With selected +parameters on the network platforms (bandwidth, latency of inter cluster network) and on the clusters architecture +(number, capacity calculation power) in the simulated environment, the experimental results have demonstrated not only +the algorithm convergence within a reasonable time compared with the physical environment performance, but also a time saving of up to \np[\%]{40} in asynchronous mode. -This article is structured as follows: after this introduction, the next -section will give a brief description of iterative asynchronous model. -Then, the simulation framework SimGrid will be presented with the -settings to create various distributed architectures. The algorithm of -the multi-splitting method used by GMRES written with MPI primitives -and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the -next section. At last, the experiments results carried out will be -presented before the conclusion which we will announce the opening of -our future work after the results. +This article is structured as follows: after this introduction, the next section will give a brief description of +iterative asynchronous model. Then, the simulation framework SimGrid will be presented with the settings to create +various distributed architectures. The algorithm of the multi-splitting method used by GMRES written with MPI +primitives and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the next section. At last, the experiments +results carried out will be presented before the conclusion which we will announce the opening of our future work after +the results. -\section{The asynchronous iteration model} - -As exposed in the introduction, parallel iterative methods are now -widely used in many scientific domains. They can be classified in three main classes -depending on how iterations and communications are managed (for more -details readers can refer to \cite{bcvc02:ip}). In the -\textit{Synchronous Iterations - Synchronous Communications (SISC)} -model data are exchanged at the end of each iteration. All the -processors must begin the same iteration at the same time and -important idle times on processors are generated. The -\textit{Synchronous Iterations - Asynchronous Communications (SIAC)} -model can be compared to the previous one except that data required on -another processor are sent asynchronously i.e. without stopping -current computations. This technique allows to partially overlap -communications by computations but unfortunately, the overlapping is -only partial and important idle times remain. It is clear that, in a -grid computing context, where the number of computational nodes is large, -heterogeneous and widely distributed, the idle times generated by -synchronizations are very penalizing. One way to overcome this problem -is to use the \textit{Asynchronous Iterations - Asynchronous - Communications (AIAC)} model. Here, local computations do not need -to wait for required data. Processors can then perform their -iterations with the data present at that time. Figure \ref{fig:aiac} -illustrates this model where the grey blocks represent the computation -phases, the white spaces the idle times and the arrows the -communications. With this algorithmic model, the number of iterations -required before the convergence is generally greater than for the two -former classes. But, and as detailed in \cite{bcvc06:ij}, AIAC -algorithms can significantly reduce overall execution times by -suppressing idle times due to synchronizations especially in a grid -computing context. +\section{Motivations and scientific context} + +As exposed in the introduction, parallel iterative methods are now widely used in many scientific domains. They can be +classified in three main classes depending on how iterations and communications are managed (for more details readers +can refer to \cite{bcvc02:ip}). In the \textit{Synchronous Iterations - Synchronous Communications (SISC)} model data +are exchanged at the end of each iteration. All the processors must begin the same iteration at the same time and +important idle times on processors are generated. The \textit{Synchronous Iterations - Asynchronous Communications +(SIAC)} model can be compared to the previous one except that data required on another processor are sent asynchronously +i.e. without stopping current computations. This technique allows to partially overlap communications by computations +but unfortunately, the overlapping is only partial and important idle times remain. It is clear that, in a grid +computing context, where the number of computational nodes is large, heterogeneous and widely distributed, the idle +times generated by synchronizations are very penalizing. One way to overcome this problem is to use the +\textit{Asynchronous Iterations - Asynchronous Communications (AIAC)} model. Here, local computations do not need to +wait for required data. Processors can then perform their iterations with the data present at that time. Figure +\ref{fig:aiac} illustrates this model where the grey blocks represent the computation phases, the white spaces the idle +times and the arrows the communications. With this algorithmic model, the number of iterations required before the +convergence is generally greater than for the two former classes. But, and as detailed in \cite{bcvc06:ij}, AIAC +algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially +in a grid computing context. \begin{figure}[htbp] \centering @@ -207,6 +175,20 @@ computing context. \end{figure} +It is very challenging to develop efficient applications for large scale, heterogeneous and distributed platforms such +as computing grids. Researchers and engineers have to develop techniques for maximizing application performance of these +multi-cluster platforms, by redesigning the applications and/or by using novel algorithms that can account for the +composite and heterogeneous nature of the platform. Unfortunately, the deployment of such applications on these very +large scale systems is very costly, labor intensive and time consuming. In this context, it appears that the use of +simulation tools to explore various platform scenarios at will and to run enormous numbers of experiments quickly can be +very promising. Several works... + +In the context of AIAC algorithms, the use of simulation tools is even more relevant. Indeed, this class of applications +is very sensible to the execution environment context. For instance, variations in the network bandwith (intra and +inter-clusters), in the number and the power of nodes, in the number of clusters... can lead to very different number of +iterations and so to very different execution times. + + \section{SimGrid}