X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/fadf3e9c2be61f86ddebab4672ca769096168535..992c04fea16bc7e6f2f98c23e4047533dcd273ad:/hpcc.tex diff --git a/hpcc.tex b/hpcc.tex index b4d0be3..d44a7b4 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -320,10 +320,12 @@ \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} -%\usepackage{amsmath} +\usepackage{amsfonts,amssymb} +\usepackage{amsmath} +\usepackage{algorithm} +\usepackage{algpseudocode} %\usepackage{amsthm} -%\usepackage{amsfonts} -%\usepackage{graphicx} +\usepackage{graphicx} %\usepackage{xspace} \usepackage[american]{babel} % Extension pour les graphiques EPS @@ -333,6 +335,14 @@ % et l'affichage correct des URL (commande \url{http://example.com}) %\usepackage{hyperref} +\algnewcommand\algorithmicinput{\textbf{Input:}} +\algnewcommand\Input{\item[\algorithmicinput]} + +\algnewcommand\algorithmicoutput{\textbf{Output:}} +\algnewcommand\Output{\item[\algorithmicoutput]} + + + \begin{document} % @@ -344,7 +354,7 @@ % author names and affiliations % use a multiple column layout for up to three different % affiliations -\author{\IEEEauthorblockN{Raphaël Couturier and Arnaud Giersch and David Laiymani and Charles-Emile Ramamonjisoa} +\author{\IEEEauthorblockN{Raphaël Couturier and Arnaud Giersch and David Laiymani and Charles Emile Ramamonjisoa} \IEEEauthorblockA{Femto-ST Institute - DISC Department\\ Université de Franche-Comté\\ Belfort\\ @@ -398,8 +408,71 @@ The abstract goes here. \section{Introduction} -Présenter un bref état de l'art sur la simulation d'algos parallèles. Présenter rapidement les algos itératifs asynchrones et leurs avantages. Parler de leurs inconvénients en particulier la difficulté de déploiement à grande échelle donc il serait bien de simuler. Dire qu'à notre connaissance il n'existe pas de simulation de ce type d'algo. -Présenter les travaux et les résultats obtenus. Annoncer le plan. +Parallel computing and high performance computing (HPC) are becoming +more and more imperative for solving various problems raised by +researchers on various scientific disciplines but also by industrial in +the field. Indeed, the increasing complexity of these requested +applications combined with a continuous increase of their sizes lead to +write distributed and parallel algorithms requiring significant hardware +resources ( grid computing , clusters, broadband network ,etc... ) but +also a non- negligible CPU execution time. We consider in this paper a +class of highly efficient parallel algorithms called iterative executed +in a distributed environment. As their name suggests, these algorithm +solves a given problem that might be NP- complete complex by successive +iterations (X$_{n +1 }$= f (X$_{n}$) ) from an initial value X +$_{0}$ to find an approximate value X* of the solution with a very low +residual error. Several well-known methods demonstrate the convergence +of these algorithms. Generally, to reduce the complexity and the +execution time, the problem is divided into several "pieces" that will +be solved in parallel on multiple processing units. The latter will +communicate each intermediate results before a new iteration starts +until the approximate solution is reached. These distributed parallel +computations can be performed either in "synchronous" communication mode +where a new iteration begin only when all nodes communications are +completed, either "asynchronous" mode where processors can continue +independently without or few synchronization points. Despite the +effectiveness of iterative approach, a major drawback of the method is +the requirement of huge resources in terms of computing capacity, +storage and high speed communication network. Indeed, limited physical +resources are blocking factors for large-scale deployment of parallel +algorithms. + +In recent years, the use of a simulation environment to execute parallel +iterative algorithms found some interests in reducing the highly cost of +access to computing resources: (1) for the applications development life +cycle and in code debugging (2) and in production to get results in a +reasonable execution time with a simulated infrastructure not accessible +with physical resources. Indeed, the launch of distributed iterative +asynchronous algorithms to solve a given problem on a large-scale +simulated environment challenges to find optimal configurations giving +the best results with a lowest residual error and in the best of +execution time. According our knowledge, no testing of large-scale +simulation of the class of algorithm solving to achieve real results has +been undertaken to date. We had in the scope of this work implemented a +program for solving large non-symmetric linear system of equations by +numerical method GMRES (Generalized Minimal Residual ) in the simulation +environment Simgrid . The simulated platform had allowed us to launch +the application from a modest computing infrastructure by simulating +different distributed architectures composed by clusters nodes +interconnected by variable speed networks. In addition, it has been +permitted to show the effectiveness of asynchronous mode algorithm by +comparing its performance with the synchronous mode time. With selected +parameters on the network platforms (bandwidth, latency of inter cluster +network) and on the clusters architecture (number, capacity calculation +power) in the simulated environment , the experimental results have +demonstrated not only the algorithm convergence within a reasonable time +compared with the physical environment performance, but also a time +saving of up to 40 \% in asynchronous mode. + +This article is structured as follows: after this introduction, the next +section will give a brief description of iterative asynchronous model. +Then, the simulation framework SIMGRID will be presented with the +settings to create various distributed architectures. The algorithm of +the multi -splitting method used by GMRES written with MPI primitives +and its adaptation to Simgrid with SMPI (Simulation MPI ) will be in the +next section . At last, the experiments results carried out will be +presented before the conclusion which we will announce the opening of +our future work after the results. \section{The asynchronous iteration model} @@ -409,9 +482,79 @@ Décrire le modèle asynchrone. Je m'en charge (DL) Décrire SimGrid (Arnaud) -\section{Simulation of the multi-splitting method} -Décrire le problème (algo) traité ainsi que le processus d'adaptation à SimGrid. + + + + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Simulation of the multisplitting method} +%Décrire le problème (algo) traité ainsi que le processus d'adaptation à SimGrid. +Let $Ax=b$ be a large sparse system of $n$ linear equations in $\mathbb{R}$, where $A$ is a sparse square and nonsingular matrix, $x$ is the solution vector and $y$ is the right-hand side vector. We use a multisplitting method based on the block Jacobi partitioning to solve this linear system on a large scale platform composed of $L$ clusters of processors. In this case, we apply a row-by-row splitting without overlapping +\[ +\left(\begin{array}{ccc} +A_{11} & \cdots & A_{1L} \\ +\vdots & \ddots & \vdots\\ +A_{L1} & \cdots & A_{LL} +\end{array} \right) +\times +\left(\begin{array}{c} +X_1 \\ +\vdots\\ +X_L +\end{array} \right) += +\left(\begin{array}{c} +Y_1 \\ +\vdots\\ +Y_L +\end{array} \right)\] +in such a way that successive rows of matrix $A$ and both vectors $x$ and $b$ are assigned to one cluster, where for all $l,i\in\{1,\ldots,L\}$ $A_{li}$ is a rectangular block of $A$ of size $n_l\times n_i$, $X_l$ and $Y_l$ are sub-vectors of $x$ and $y$, respectively, each of size $n_l$ and $\sum_{l} n_l=\sum_{i} n_i=n$. + +The multisplitting method proceeds by iteration to solve in parallel the linear system by $L$ clusters of processors, in such a way each sub-system +\begin{equation} +\left\{ +\begin{array}{l} +A_{ll}X_l = Y_l \mbox{,~such that}\\ +Y_l = B_l - \displaystyle\sum_{i=1,i\neq l}^{L}A_{li}X_i, +\end{array} +\right. +\label{eq:4.1} +\end{equation} +is solved independently by a cluster and communication are required to update the right-hand side sub-vectors $Y_l$, such that the sub-vectors $X_i$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. + +\begin{algorithm} +\caption{A multisplitting solver with inner iteration GMRES method} +\begin{algorithmic}[1] +\Input $A_l$ (local sparse matrix), $B_l$ (local right-hand side), $x^0$ (initial guess) +\Output $X_l$ (local solution vector)\vspace{0.2cm} +\State Load $A_l$, $B_l$, $x^0$ +\State Initialize the shared vector $\hat{x}=x^0$ +\For {$k=1,2,3,\ldots$ until the global convergence} +\State $x^0=\hat{x}$ +\State Inner iteration solver: \Call{InnerSolver}{$x^0$, $k$} +\State Exchange the local solution ${X}_l^k$ with the neighboring clusters and copy the shared vector elements in $\hat{x}$ +\EndFor + +\Statex + +\Function {InnerSolver}{$x^0$, $k$} +\State Compute the local right-hand side: $Y_l = B_l - \sum^L_{i=1,i\neq l}A_{li}X_i^0$ +\State Solving the local splitting $A_{ll}X_l^k=Y_l$ using the parallel GMRES method, such that $X_l^0$ is the local initial guess +\State \Return $X_l^k$ +\EndFunction +\end{algorithmic} +\label{algo:01} +\end{algorithm} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + + + + + + \section{Experimental results} @@ -479,21 +622,22 @@ lat latency , ... ). \centering \caption{2 clusters X 50 nodes} \label{tab.cluster.2x50} - \includegraphics[width=209pt]{img-1.eps} + \includegraphics[width=209pt]{img1.jpg} \end{table} \begin{table} \centering - \caption{3 clusters X 33 n\oe{}uds} + \caption{3 clusters X 33 nodes} \label{tab.cluster.3x33} - \includegraphics[width=209pt]{img-1.eps} + \includegraphics[width=209pt]{img2.jpg} \end{table} \begin{table} \centering - \caption{3 clusters X 67 noeuds} + \caption{3 clusters X 67 nodes} \label{tab.cluster.3x67} - \includegraphics[width=128pt]{img-2.eps} +% \includegraphics[width=160pt]{img3.jpg} + \includegraphics[scale=0.5]{img3.jpg} \end{table} \paragraph*{Interpretations and comments} @@ -650,7 +794,7 @@ The authors would like to thank... % http://www.michaelshell.org/tex/ieeetran/bibtex/ \bibliographystyle{IEEEtran} % argument is your BibTeX string definitions and bibliography database(s) -\bibliography{bib/hpccBib} +\bibliography{hpccBib} % % manually copy in the resultant .bbl file % set second argument of \begin to the number of references