X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/c9f1e655cef3e735867e6000202cb1f982f05d58..9db72b31bc5ae56df6c06f94f031aeb35876dd01:/hpcc.tex diff --git a/hpcc.tex b/hpcc.tex index 5fbeca1..2e791d7 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -1,577 +1,346 @@ - -%% bare_conf.tex -%% V1.3 -%% 2007/01/11 -%% by Michael Shell -%% See: -%% http://www.michaelshell.org/ -%% for current contact information. -%% -%% This is a skeleton file demonstrating the use of IEEEtran.cls -%% (requires IEEEtran.cls version 1.7 or later) with an IEEE conference paper. -%% -%% Support sites: -%% http://www.michaelshell.org/tex/ieeetran/ -%% http://www.ctan.org/tex-archive/macros/latex/contrib/IEEEtran/ -%% and -%% http://www.ieee.org/ - -%%************************************************************************* -%% Legal Notice: -%% This code is offered as-is without any warranty either expressed or -%% implied; without even the implied warranty of MERCHANTABILITY or -%% FITNESS FOR A PARTICULAR PURPOSE! -%% User assumes all risk. -%% In no event shall IEEE or any contributor to this code be liable for -%% any damages or losses, including, but not limited to, incidental, -%% consequential, or any other damages, resulting from the use or misuse -%% of any information contained here. -%% -%% All comments are the opinions of their respective authors and are not -%% necessarily endorsed by the IEEE. -%% -%% This work is distributed under the LaTeX Project Public License (LPPL) -%% ( http://www.latex-project.org/ ) version 1.3, and may be freely used, -%% distributed and modified. A copy of the LPPL, version 1.3, is included -%% in the base LaTeX documentation of all distributions of LaTeX released -%% 2003/12/01 or later. -%% Retain all contribution notices and credits. -%% ** Modified files should be clearly indicated as such, including ** -%% ** renaming them and changing author support contact information. ** -%% -%% File list of work: IEEEtran.cls, IEEEtran_HOWTO.pdf, bare_adv.tex, -%% bare_conf.tex, bare_jrnl.tex, bare_jrnl_compsoc.tex -%%************************************************************************* - -% *** Authors should verify (and, if needed, correct) their LaTeX system *** -% *** with the testflow diagnostic prior to trusting their LaTeX platform *** -% *** with production work. IEEE's font choices can trigger bugs that do *** -% *** not appear when using other class files. *** -% The testflow support page is at: -% http://www.michaelshell.org/tex/testflow/ - - - -% Note that the a4paper option is mainly intended so that authors in -% countries using A4 can easily print to A4 and see how their papers will -% look in print - the typesetting of the document will not typically be -% affected with changes in paper size (but the bottom and side margins will). -% Use the testflow package mentioned above to verify correct handling of -% both paper sizes by the user's LaTeX system. -% -% Also note that the "draftcls" or "draftclsnofoot", not "draft", option -% should be used if it is desired that the figures are to be displayed in -% draft mode. -% \documentclass[conference]{IEEEtran} -% Add the compsoc option for Computer Society conferences. -% -% If IEEEtran.cls has not been installed into the LaTeX system files, -% manually specify the path to it like: -% \documentclass[conference]{../sty/IEEEtran} - - - - - -% Some very useful LaTeX packages include: -% (uncomment the ones you want to load) - - -% *** CITATION PACKAGES *** -% -%\usepackage{cite} -% cite.sty was written by Donald Arseneau -% V1.6 and later of IEEEtran pre-defines the format of the cite.sty package -% \cite{} output to follow that of IEEE. Loading the cite package will -% result in citation numbers being automatically sorted and properly -% "compressed/ranged". e.g., [1], [9], [2], [7], [5], [6] without using -% cite.sty will become [1], [2], [5]--[7], [9] using cite.sty. cite.sty's -% \cite will automatically add leading space, if needed. Use cite.sty's -% noadjust option (cite.sty V3.8 and later) if you want to turn this off. -% cite.sty is already installed on most LaTeX systems. Be sure and use -% version 4.0 (2003-05-27) and later if using hyperref.sty. cite.sty does -% not currently provide for hyperlinked citations. -% The latest version can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/cite/ -% The documentation is contained in the cite.sty file itself. - - - - - - -% *** GRAPHICS RELATED PACKAGES *** -% -\ifCLASSINFOpdf - % \usepackage[pdftex]{graphicx} - % declare the path(s) where your graphic files are - % \graphicspath{{../pdf/}{../jpeg/}} - % and their extensions so you won't have to specify these with - % every instance of \includegraphics - % \DeclareGraphicsExtensions{.pdf,.jpeg,.png} -\else - % or other class option (dvipsone, dvipdf, if not using dvips). graphicx - % will default to the driver specified in the system graphics.cfg if no - % driver is specified. - % \usepackage[dvips]{graphicx} - % declare the path(s) where your graphic files are - % \graphicspath{{../eps/}} - % and their extensions so you won't have to specify these with - % every instance of \includegraphics - % \DeclareGraphicsExtensions{.eps} -\fi -% graphicx was written by David Carlisle and Sebastian Rahtz. It is -% required if you want graphics, photos, etc. graphicx.sty is already -% installed on most LaTeX systems. The latest version and documentation can -% be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/required/graphics/ -% Another good source of documentation is "Using Imported Graphics in -% LaTeX2e" by Keith Reckdahl which can be found as epslatex.ps or -% epslatex.pdf at: http://www.ctan.org/tex-archive/info/ -% -% latex, and pdflatex in dvi mode, support graphics in encapsulated -% postscript (.eps) format. pdflatex in pdf mode supports graphics -% in .pdf, .jpeg, .png and .mps (metapost) formats. Users should ensure -% that all non-photo figures use a vector format (.eps, .pdf, .mps) and -% not a bitmapped formats (.jpeg, .png). IEEE frowns on bitmapped formats -% which can result in "jaggedy"/blurry rendering of lines and letters as -% well as large increases in file sizes. -% -% You can find documentation about the pdfTeX application at: -% http://www.tug.org/applications/pdftex - - - - - -% *** MATH PACKAGES *** -% -%\usepackage[cmex10]{amsmath} -% A popular package from the American Mathematical Society that provides -% many useful and powerful commands for dealing with mathematics. If using -% it, be sure to load this package with the cmex10 option to ensure that -% only type 1 fonts will utilized at all point sizes. Without this option, -% it is possible that some math symbols, particularly those within -% footnotes, will be rendered in bitmap form which will result in a -% document that can not be IEEE Xplore compliant! -% -% Also, note that the amsmath package sets \interdisplaylinepenalty to 10000 -% thus preventing page breaks from occurring within multiline equations. Use: -%\interdisplaylinepenalty=2500 -% after loading amsmath to restore such page breaks as IEEEtran.cls normally -% does. amsmath.sty is already installed on most LaTeX systems. The latest -% version and documentation can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/required/amslatex/math/ - - - - - -% *** SPECIALIZED LIST PACKAGES *** -% -%\usepackage{algorithmic} -% algorithmic.sty was written by Peter Williams and Rogerio Brito. -% This package provides an algorithmic environment fo describing algorithms. -% You can use the algorithmic environment in-text or within a figure -% environment to provide for a floating algorithm. Do NOT use the algorithm -% floating environment provided by algorithm.sty (by the same authors) or -% algorithm2e.sty (by Christophe Fiorio) as IEEE does not use dedicated -% algorithm float types and packages that provide these will not provide -% correct IEEE style captions. The latest version and documentation of -% algorithmic.sty can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/algorithms/ -% There is also a support site at: -% http://algorithms.berlios.de/index.html -% Also of interest may be the (relatively newer and more customizable) -% algorithmicx.sty package by Szasz Janos: -% http://www.ctan.org/tex-archive/macros/latex/contrib/algorithmicx/ - - - - -% *** ALIGNMENT PACKAGES *** -% -%\usepackage{array} -% Frank Mittelbach's and David Carlisle's array.sty patches and improves -% the standard LaTeX2e array and tabular environments to provide better -% appearance and additional user controls. As the default LaTeX2e table -% generation code is lacking to the point of almost being broken with -% respect to the quality of the end results, all users are strongly -% advised to use an enhanced (at the very least that provided by array.sty) -% set of table tools. array.sty is already installed on most systems. The -% latest version and documentation can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/required/tools/ - - -%\usepackage{mdwmath} -%\usepackage{mdwtab} -% Also highly recommended is Mark Wooding's extremely powerful MDW tools, -% especially mdwmath.sty and mdwtab.sty which are used to format equations -% and tables, respectively. The MDWtools set is already installed on most -% LaTeX systems. The lastest version and documentation is available at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/mdwtools/ - - -% IEEEtran contains the IEEEeqnarray family of commands that can be used to -% generate multiline equations as well as matrices, tables, etc., of high -% quality. - - -%\usepackage{eqparbox} -% Also of notable interest is Scott Pakin's eqparbox package for creating -% (automatically sized) equal width boxes - aka "natural width parboxes". -% Available at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/eqparbox/ - - - - - -% *** SUBFIGURE PACKAGES *** -%\usepackage[tight,footnotesize]{subfigure} -% subfigure.sty was written by Steven Douglas Cochran. This package makes it -% easy to put subfigures in your figures. e.g., "Figure 1a and 1b". For IEEE -% work, it is a good idea to load it with the tight package option to reduce -% the amount of white space around the subfigures. subfigure.sty is already -% installed on most LaTeX systems. The latest version and documentation can -% be obtained at: -% http://www.ctan.org/tex-archive/obsolete/macros/latex/contrib/subfigure/ -% subfigure.sty has been superceeded by subfig.sty. - - - -%\usepackage[caption=false]{caption} -%\usepackage[font=footnotesize]{subfig} -% subfig.sty, also written by Steven Douglas Cochran, is the modern -% replacement for subfigure.sty. However, subfig.sty requires and -% automatically loads Axel Sommerfeldt's caption.sty which will override -% IEEEtran.cls handling of captions and this will result in nonIEEE style -% figure/table captions. To prevent this problem, be sure and preload -% caption.sty with its "caption=false" package option. This is will preserve -% IEEEtran.cls handing of captions. Version 1.3 (2005/06/28) and later -% (recommended due to many improvements over 1.2) of subfig.sty supports -% the caption=false option directly: -%\usepackage[caption=false,font=footnotesize]{subfig} -% -% The latest version and documentation can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/subfig/ -% The latest version and documentation of caption.sty can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/caption/ - - - - -% *** FLOAT PACKAGES *** -% -%\usepackage{fixltx2e} -% fixltx2e, the successor to the earlier fix2col.sty, was written by -% Frank Mittelbach and David Carlisle. This package corrects a few problems -% in the LaTeX2e kernel, the most notable of which is that in current -% LaTeX2e releases, the ordering of single and double column floats is not -% guaranteed to be preserved. Thus, an unpatched LaTeX2e can allow a -% single column figure to be placed prior to an earlier double column -% figure. The latest version and documentation can be found at: -% http://www.ctan.org/tex-archive/macros/latex/base/ - - - -%\usepackage{stfloats} -% stfloats.sty was written by Sigitas Tolusis. This package gives LaTeX2e -% the ability to do double column floats at the bottom of the page as well -% as the top. (e.g., "\begin{figure*}[!b]" is not normally possible in -% LaTeX2e). It also provides a command: -%\fnbelowfloat -% to enable the placement of footnotes below bottom floats (the standard -% LaTeX2e kernel puts them above bottom floats). This is an invasive package -% which rewrites many portions of the LaTeX2e float routines. It may not work -% with other packages that modify the LaTeX2e float routines. The latest -% version and documentation can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/sttools/ -% Documentation is contained in the stfloats.sty comments as well as in the -% presfull.pdf file. Do not use the stfloats baselinefloat ability as IEEE -% does not allow \baselineskip to stretch. Authors submitting work to the -% IEEE should note that IEEE rarely uses double column equations and -% that authors should try to avoid such use. Do not be tempted to use the -% cuted.sty or midfloat.sty packages (also by Sigitas Tolusis) as IEEE does -% not format its papers in such ways. - - - - - -% *** PDF, URL AND HYPERLINK PACKAGES *** -% -%\usepackage{url} -% url.sty was written by Donald Arseneau. It provides better support for -% handling and breaking URLs. url.sty is already installed on most LaTeX -% systems. The latest version can be obtained at: -% http://www.ctan.org/tex-archive/macros/latex/contrib/misc/ -% Read the url.sty source comments for usage information. Basically, -% \url{my_url_here}. - -% *** Do not adjust lengths that control margins, column widths, etc. *** -% *** Do not use packages that alter fonts (such as pslatex). *** -% There should be no need to do such things with IEEEtran.cls V1.6 and later. -% (Unless specifically asked to do so by the journal or conference you plan -% to submit to, of course. ) - - \usepackage[T1]{fontenc} -\usepackage{ucs} -%\usepackage[utf8x]{inputenc} -\usepackage{lmodern} -\usepackage{color} -%% Jolis entetes %% -\usepackage[Glenn]{fncychap} -%\usepackage{amsmath} +\usepackage[utf8]{inputenc} +\usepackage{amsfonts,amssymb} +\usepackage{amsmath} +\usepackage{algorithm} +\usepackage{algpseudocode} %\usepackage{amsthm} -%\usepackage{amsfonts} -%\usepackage{graphicx} +\usepackage{graphicx} %\usepackage{xspace} -% Definition des marges -\usepackage{vmargin} -\setpapersize[portrait]{A4} -\usepackage[francais]{babel} -% Extension pour les graphiques EPS -%\usepackage[dvips]{graphicx} -\usepackage[pdftex,final]{graphicx} +\usepackage[american]{babel} % Extension pour les liens intra-documents (tagged PDF) % et l'affichage correct des URL (commande \url{http://example.com}) -\usepackage{hyperref} +%\usepackage{hyperref} -\ifCLASSINFOpdf - \usepackage[pdftex]{graphicx} - \DeclareGraphicsExtensions{.pdf,.jpeg,.png} -\else -\fi +\usepackage{url} +\DeclareUrlCommand\email{\urlstyle{same}} +\usepackage[autolanguage,np]{numprint} +\AtBeginDocument{% + \renewcommand*\npunitcommand[1]{\text{#1}} + \npthousandthpartsep{}} +\algnewcommand\algorithmicinput{\textbf{Input:}} +\algnewcommand\Input{\item[\algorithmicinput]} -% correct bad hyphenation here -\hyphenation{op-tical net-works semi-conduc-tor} +\algnewcommand\algorithmicoutput{\textbf{Output:}} +\algnewcommand\Output{\item[\algorithmicoutput]} \begin{document} -% -% paper title -% can use linebreaks \\ within to get better formatting as desired -\title{Simulation of Asynchronous Iterative Numerical Algorithms Using SimGrid} +\title{Simulation of Asynchronous Iterative Numerical Algorithms Using SimGrid} -% author names and affiliations -% use a multiple column layout for up to three different -% affiliations -\author{\IEEEauthorblockN{Raphaël Couturier and Arnaud Giersch and David Laiymani and Charles-Emile Ramamonjisoa} -\IEEEauthorblockA{Femto-ST Institute - DISC Department\\ -Université de Franche-Comté\\ -Belfort\\ -Email: raphael.couturier@univ-fcomte.fr} -%\and -%\IEEEauthorblockN{Arnaud Giersch} -%\IEEEauthorblockA{Twentieth Century Fox\\ -%Springfield, USA\\ -%Email: homer@thesimpsons.com} -%\and -%\IEEEauthorblockN{James Kirk\\ and Montgomery Scott} -%\IEEEauthorblockA{Starfleet Academy\\ -%San Francisco, California 96678-2391\\ -%Telephone: (800) 555--1212\\ -%Fax: (888) 555--1212 +\author{% + \IEEEauthorblockN{% + Raphaël Couturier, + Arnaud Giersch, + David Laiymani and + Charles Emile Ramamonjisoa + } + \IEEEauthorblockA{% + Femto-ST Institute - DISC Department\\ + Université de Franche-Comté\\ + Belfort\\ + Email: \email{raphael.couturier@univ-fcomte.fr} + } } - - -% make the title area \maketitle - \begin{abstract} -%\boldmath The abstract goes here. \end{abstract} -% IEEEtran.cls defaults to using nonbold math in the Abstract. -% This preserves the distinction between vectors and scalars. However, -% if the conference you are submitting to favors bold math in the abstract, -% then you can use LaTeX's standard command \boldmath at the very start -% of the abstract to achieve this. Many IEEE journals/conferences frown on -% math in the abstract anyway. -% no keywords +\section{Introduction} +Parallel computing and high performance computing (HPC) are becoming +more and more imperative for solving various problems raised by +researchers on various scientific disciplines but also by industrial in +the field. Indeed, the increasing complexity of these requested +applications combined with a continuous increase of their sizes lead to +write distributed and parallel algorithms requiring significant hardware +resources (grid computing, clusters, broadband network, etc\dots{}) but +also a non-negligible CPU execution time. We consider in this paper a +class of highly efficient parallel algorithms called iterative executed +in a distributed environment. As their name suggests, these algorithm +solves a given problem that might be NP- complete complex by successive +iterations ($X_{n +1} = f(X_{n})$) from an initial value $X_{0}$ to find +an approximate value $X^*$ of the solution with a very low +residual error. Several well-known methods demonstrate the convergence +of these algorithms. Generally, to reduce the complexity and the +execution time, the problem is divided into several "pieces" that will +be solved in parallel on multiple processing units. The latter will +communicate each intermediate results before a new iteration starts +until the approximate solution is reached. These distributed parallel +computations can be performed either in "synchronous" communication mode +where a new iteration begin only when all nodes communications are +completed, either "asynchronous" mode where processors can continue +independently without or few synchronization points. Despite the +effectiveness of iterative approach, a major drawback of the method is +the requirement of huge resources in terms of computing capacity, +storage and high speed communication network. Indeed, limited physical +resources are blocking factors for large-scale deployment of parallel +algorithms. + +In recent years, the use of a simulation environment to execute parallel +iterative algorithms found some interests in reducing the highly cost of +access to computing resources: (1) for the applications development life +cycle and in code debugging (2) and in production to get results in a +reasonable execution time with a simulated infrastructure not accessible +with physical resources. Indeed, the launch of distributed iterative +asynchronous algorithms to solve a given problem on a large-scale +simulated environment challenges to find optimal configurations giving +the best results with a lowest residual error and in the best of +execution time. According our knowledge, no testing of large-scale +simulation of the class of algorithm solving to achieve real results has +been undertaken to date. We had in the scope of this work implemented a +program for solving large non-symmetric linear system of equations by +numerical method GMRES (Generalized Minimal Residual) in the simulation +environment SimGrid. The simulated platform had allowed us to launch +the application from a modest computing infrastructure by simulating +different distributed architectures composed by clusters nodes +interconnected by variable speed networks. In addition, it has been +permitted to show the effectiveness of asynchronous mode algorithm by +comparing its performance with the synchronous mode time. With selected +parameters on the network platforms (bandwidth, latency of inter cluster +network) and on the clusters architecture (number, capacity calculation +power) in the simulated environment, the experimental results have +demonstrated not only the algorithm convergence within a reasonable time +compared with the physical environment performance, but also a time +saving of up to \np[\%]{40} in asynchronous mode. + +This article is structured as follows: after this introduction, the next +section will give a brief description of iterative asynchronous model. +Then, the simulation framework SimGrid will be presented with the +settings to create various distributed architectures. The algorithm of +the multi -splitting method used by GMRES written with MPI primitives +and its adaptation to SimGrid with SMPI (Simulated MPI) will be in the +next section. At last, the experiments results carried out will be +presented before the conclusion which we will announce the opening of +our future work after the results. + +\section{The asynchronous iteration model} +Décrire le modèle asynchrone. Je m'en charge (DL) +\section{SimGrid} -% For peer review papers, you can put extra information on the cover -% page as needed: -% \ifCLASSOPTIONpeerreview -% \begin{center} \bfseries EDICS Category: 3-BBND \end{center} -% \fi -% -% For peerreview papers, this IEEEtran command inserts a page break and -% creates the second title. It will be ignored for other modes. -\IEEEpeerreviewmaketitle +Décrire SimGrid (Arnaud) -\section{Introduction} -Présenter un bref état de l'art sur la simulation d'algos parallèles. Présenter rapidement les algos itératifs asynchrones et leurs avantages. Parler de leurs inconvénients en particulier la difficulté de déploiement à grande échelle donc il serait bien de simuler. Dire qu'à notre connaissance il n'existe pas de simulation de ce type d'algo. -Présenter les travaux et les résultats obtenus. Annoncer le plan. - -\section{The asynchronous iteration model} -Décrire le modèle asynchrone. Je m'en charge (DL) -\section{SimGrid} -Décrire SimGrid (Arnaud) +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\section{Simulation of the multisplitting method} +%Décrire le problème (algo) traité ainsi que le processus d'adaptation à SimGrid. +Let $Ax=b$ be a large sparse system of $n$ linear equations in $\mathbb{R}$, where $A$ is a sparse square and nonsingular matrix, $x$ is the solution vector and $y$ is the right-hand side vector. We use a multisplitting method based on the block Jacobi partitioning to solve this linear system on a large scale platform composed of $L$ clusters of processors. In this case, we apply a row-by-row splitting without overlapping +\[ +\left(\begin{array}{ccc} +A_{11} & \cdots & A_{1L} \\ +\vdots & \ddots & \vdots\\ +A_{L1} & \cdots & A_{LL} +\end{array} \right) +\times +\left(\begin{array}{c} +X_1 \\ +\vdots\\ +X_L +\end{array} \right) += +\left(\begin{array}{c} +Y_1 \\ +\vdots\\ +Y_L +\end{array} \right)\] +in such a way that successive rows of matrix $A$ and both vectors $x$ and $b$ are assigned to one cluster, where for all $l,i\in\{1,\ldots,L\}$ $A_{li}$ is a rectangular block of $A$ of size $n_l\times n_i$, $X_l$ and $Y_l$ are sub-vectors of $x$ and $y$, respectively, each of size $n_l$ and $\sum_{l} n_l=\sum_{i} n_i=n$. -\section{Simulation of the multi-splitting method} +The multisplitting method proceeds by iteration to solve in parallel the linear system by $L$ clusters of processors, in such a way each sub-system +\begin{equation} +\left\{ +\begin{array}{l} +A_{ll}X_l = Y_l \mbox{,~such that}\\ +Y_l = B_l - \displaystyle\sum_{i=1,i\neq l}^{L}A_{li}X_i, +\end{array} +\right. +\label{eq:4.1} +\end{equation} +is solved independently by a cluster and communication are required to update the right-hand side sub-vectors $Y_l$, such that the sub-vectors $X_i$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. -Décrire le problème (algo) traité ainsi que le processus d'adaptation à SimGrid. +\begin{algorithm} +\caption{A multisplitting solver with inner iteration GMRES method} +\begin{algorithmic}[1] +\Input $A_l$ (local sparse matrix), $B_l$ (local right-hand side), $x^0$ (initial guess) +\Output $X_l$ (local solution vector)\vspace{0.2cm} +\State Load $A_l$, $B_l$, $x^0$ +\State Initialize the shared vector $\hat{x}=x^0$ +\For {$k=1,2,3,\ldots$ until the global convergence} +\State $x^0=\hat{x}$ +\State Inner iteration solver: \Call{InnerSolver}{$x^0$, $k$} +\State Exchange the local solution ${X}_l^k$ with the neighboring clusters and copy the shared vector elements in $\hat{x}$ +\EndFor -\section{Experimental results} +\Statex -\section{Conclusion} +\Function {InnerSolver}{$x^0$, $k$} +\State Compute the local right-hand side: $Y_l = B_l - \sum^L_{i=1,i\neq l}A_{li}X_i^0$ +\State Solving the local splitting $A_{ll}X_l^k=Y_l$ using the parallel GMRES method, such that $X_l^0$ is the local initial guess +\State \Return $X_l^k$ +\EndFunction +\end{algorithmic} +\label{algo:01} +\end{algorithm} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% An example of a floating figure using the graphicx package. -% Note that \label must occur AFTER (or within) \caption. -% For figures, \caption should occur after the \includegraphics. -% Note that IEEEtran v1.7 and later has special internal code that -% is designed to preserve the operation of \label within \caption -% even when the captionsoff option is in effect. However, because -% of issues like this, it may be the safest practice to put all your -% \label just after \caption rather than within \caption{}. -% -% Reminder: the "draftcls" or "draftclsnofoot", not "draft", class -% option should be used if it is desired that the figures are to be -% displayed while in draft mode. -% -%\begin{figure}[!t] -%\centering -%\includegraphics[width=2.5in]{myfigure} -% where an .eps filename suffix will be assumed under latex, -% and a .pdf suffix will be assumed for pdflatex; or what has been declared -% via \DeclareGraphicsExtensions. -%\caption{Simulation Results} -%\label{fig_sim} -%\end{figure} - -% Note that IEEE typically puts floats only at the top, even when this -% results in a large percentage of a column being occupied by floats. - - -% An example of a double column floating figure using two subfigures. -% (The subfig.sty package must be loaded for this to work.) -% The subfigure \label commands are set within each subfloat command, the -% \label for the overall figure must come after \caption. -% \hfil must be used as a separator to get equal spacing. -% The subfigure.sty package works much the same way, except \subfigure is -% used instead of \subfloat. -% -%\begin{figure*}[!t] -%\centerline{\subfloat[Case I]\includegraphics[width=2.5in]{subfigcase1}% -%\label{fig_first_case}} -%\hfil -%\subfloat[Case II]{\includegraphics[width=2.5in]{subfigcase2}% -%\label{fig_second_case}}} -%\caption{Simulation results} -%\label{fig_sim} -%\end{figure*} -% -% Note that often IEEE papers with subfigures do not employ subfigure -% captions (using the optional argument to \subfloat), but instead will -% reference/describe all of them (a), (b), etc., within the main caption. - - -% An example of a floating table. Note that, for IEEE style tables, the -% \caption command should come BEFORE the table. Table text will default to -% \footnotesize as IEEE normally uses this smaller font for tables. -% The \label must come after \caption as always. -% -%\begin{table}[!t] -%% increase table row spacing, adjust to taste -%\renewcommand{\arraystretch}{1.3} -% if using array.sty, it might be a good idea to tweak the value of -% \extrarowheight as needed to properly center the text within the cells -%\caption{An Example of a Table} -%\label{table_example} -%\centering -%% Some packages, such as MDW tools, offer better commands for making tables -%% than the plain LaTeX2e tabular which is used here. -%\begin{tabular}{|c||c|} -%\hline -%One & Two\\ -%\hline -%Three & Four\\ -%\hline -%\end{tabular} -%\end{table} - - -% Note that IEEE does not put floats in the very first column - or typically -% anywhere on the first page for that matter. Also, in-text middle ("here") -% positioning is not used. Most IEEE journals/conferences use top floats -% exclusively. Note that, LaTeX2e, unlike IEEE journals/conferences, places -% footnotes above bottom floats. This can be corrected via the \fnbelowfloat -% command of the stfloats package. - - - - - - - -% conference papers do not normally have an appendix - - -% use section* for acknowledgement -\section*{Acknowledgment} -The authors would like to thank... +\section{Experimental results} -% trigger a \newpage just before the given reference -% number - used to balance the columns on the last page -% adjust value as needed - may need to be readjusted if -% the document is modified later -%\IEEEtriggeratref{8} -% The "triggered" command can be changed if desired: -%\IEEEtriggercmd{\enlargethispage{-5in}} +When the ``real'' application runs in the simulation environment and produces +the expected results, varying the input parameters and the program arguments +allows us to compare outputs from the code execution. We have noticed from this +study that the results depend on the following parameters: (1) at the network +level, we found that the most critical values are the bandwidth (bw) and the +network latency (lat). (2) Hosts power (GFlops) can also influence on the +results. And finally, (3) when submitting job batches for execution, the +arguments values passed to the program like the maximum number of iterations or +the ``external'' precision are critical to ensure not only the convergence of the +algorithm but also to get the main objective of the experimentation of the +simulation in having an execution time in asynchronous less than in synchronous +mode, in others words, in having a ``speedup'' less than 1 (Speedup = Execution +time in synchronous mode / Execution time in asynchronous mode). + +A priori, obtaining a speedup less than 1 would be difficult in a local area +network configuration where the synchronous mode will take advantage on the rapid +exchange of information on such high-speed links. Thus, the methodology adopted +was to launch the application on clustered network. In this last configuration, +degrading the inter-cluster network performance will "penalize" the synchronous +mode allowing to get a speedup lower than 1. This action simulates the case of +clusters linked with long distance network like Internet. + +As a first step, the algorithm was run on a network consisting of two clusters +containing fifty hosts each, totaling one hundred hosts. Various combinations of +the above factors have providing the results shown in Table~\ref{tab.cluster.2x50} with a matrix size +ranging from Nx = Ny = Nz = 62 to 171 elements or from $62^{3} = \np{238328}$ to +$171^{3} = \np{5211000}$ entries. + +Then we have changed the network configuration using three clusters containing +respectively 33, 33 and 34 hosts, or again by on hundred hosts for all the +clusters. In the same way as above, a judicious choice of key parameters has +permitted to get the results in Table~\ref{tab.cluster.3x33} which shows the speedups less than 1 with +a matrix size from 62 to 100 elements. + +In a final step, results of an execution attempt to scale up the three clustered +configuration but increasing by two hundreds hosts has been recorded in Table~\ref{tab.cluster.3x67}. + +Note that the program was run with the following parameters: + +\paragraph*{SMPI parameters} + +\begin{itemize} + \item HOSTFILE: Hosts file description. + \item PLATFORM: file description of the platform architecture : clusters (CPU power, +\dots{}), intra cluster network description, inter cluster network (bandwidth bw, +lat latency, \dots{}). +\end{itemize} + + +\paragraph*{Arguments of the program} + +\begin{itemize} + \item Description of the cluster architecture; + \item Maximum number of internal and external iterations; + \item Internal and external precisions; + \item Matrix size NX, NY and NZ; + \item Matrix diagonal value = 6.0; + \item Execution Mode: synchronous or asynchronous. +\end{itemize} + +\begin{table} + \centering + \caption{2 clusters X 50 nodes} + \label{tab.cluster.2x50} + \includegraphics[width=209pt]{img1.jpg} +\end{table} + +\begin{table} + \centering + \caption{3 clusters X 33 nodes} + \label{tab.cluster.3x33} + \includegraphics[width=209pt]{img2.jpg} +\end{table} + +\begin{table} + \centering + \caption{3 clusters X 67 nodes} + \label{tab.cluster.3x67} +% \includegraphics[width=160pt]{img3.jpg} + \includegraphics[scale=0.5]{img3.jpg} +\end{table} + +\paragraph*{Interpretations and comments} + +After analyzing the outputs, generally, for the configuration with two or three +clusters including one hundred hosts (Tables~\ref{tab.cluster.2x50} and~\ref{tab.cluster.3x33}), some combinations of the +used parameters affecting the results have given a speedup less than 1, showing +the effectiveness of the asynchronous performance compared to the synchronous +mode. + +In the case of a two clusters configuration, Table~\ref{tab.cluster.2x50} shows that with a +deterioration of inter cluster network set with \np[Mbits/s]{5} of bandwidth, a latency +in order of a hundredth of a millisecond and a system power of one GFlops, an +efficiency of about \np[\%]{40} in asynchronous mode is obtained for a matrix size of 62 +elements. It is noticed that the result remains stable even if we vary the +external precision from \np{E-5} to \np{E-9}. By increasing the problem size up to 100 +elements, it was necessary to increase the CPU power of \np[\%]{50} to \np[GFlops]{1.5} for a +convergence of the algorithm with the same order of asynchronous mode efficiency. +Maintaining such a system power but this time, increasing network throughput +inter cluster up to \np[Mbits/s]{50}, the result of efficiency of about \np[\%]{40} is +obtained with high external precision of \np{E-11} for a matrix size from 110 to 150 +side elements. + +For the 3 clusters architecture including a total of 100 hosts, Table~\ref{tab.cluster.3x33} shows +that it was difficult to have a combination which gives an efficiency of +asynchronous below \np[\%]{80}. Indeed, for a matrix size of 62 elements, equality +between the performance of the two modes (synchronous and asynchronous) is +achieved with an inter cluster of \np[Mbits/s]{10} and a latency of \np{E-1} ms. To +challenge an efficiency by \np[\%]{78} with a matrix size of 100 points, it was +necessary to degrade the inter cluster network bandwidth from 5 to 2 Mbit/s. + +A last attempt was made for a configuration of three clusters but more power +with 200 nodes in total. The convergence with a speedup of \np[\%]{90} was obtained +with a bandwidth of \np[Mbits/s]{1} as shown in Table~\ref{tab.cluster.3x67}. -% references section +\section{Conclusion} -% can use a bibliography generated by BibTeX as a .bbl file -% BibTeX documentation can be easily obtained at: -% http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/ -% The IEEEtran BibTeX style support page is at: -% http://www.michaelshell.org/tex/ieeetran/bibtex/ -\bibliographystyle{IEEEtran} -% argument is your BibTeX string definitions and bibliography database(s) -\bibliography{bib/hpccBib} -% -% manually copy in the resultant .bbl file -% set second argument of \begin to the number of references -% (used to reserve space for the reference number labels box) -%\begin{thebibliography}{1} -% -%\bibitem{IEEEhowto:kopka} -%H.~Kopka and P.~W. Daly, \emph{A Guide to \LaTeX}, 3rd~ed.\hskip 1em plus -% 0.5em minus 0.4em\relax Harlow, England: Addison-Wesley, 1999. -% -%\end{thebibliography} +\section*{Acknowledgment} +The authors would like to thank\dots{} -% that's all folks -\end{document} +% trigger a \newpage just before the given reference +% number - used to balance the columns on the last page +% adjust value as needed - may need to be readjusted if +% the document is modified later +\bibliographystyle{IEEEtran} +\bibliography{hpccBib} +\end{document} +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% fill-column: 80 +%%% ispell-local-dictionary: "american" +%%% End: