From: David Laiymani Date: Tue, 28 Apr 2015 12:13:38 +0000 (+0200) Subject: Premier draft de l'intro X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/commitdiff_plain/f721cbed7d2388830aaa885dc59b8a9bab852a32?ds=inline;hp=a91b3c5e161102f4d9c2bb345a28eefc71167ada Premier draft de l'intro --- diff --git a/paper.tex b/paper.tex index 73c98e4..87949b4 100644 --- a/paper.tex +++ b/paper.tex @@ -113,6 +113,47 @@ simulation tool before. \maketitle \section{Introduction} +The use of multi-core architectures for solving large scientific problems seems to become imperative in a lot of cases. +Whatever the scale of these architectures (distributed clusters, computational grids, embedded multi-core \ldots) they are generally +well adapted to execute complexe parallel applications operating on a large amount of data. Unfortunately, users (industrials or scientists), +who need such computational resources may not have an easy access to such efficient architectures. The cost of using the platform and/or the cost of +testing and deploying an application are often very important. So, in this context it is difficult to optimize a given application for a given +architecture. In this way and in order to reduce the access cost to these computing resources it seems very interesting to use a simulation environment. +The advantages are numerous: development life cycle, code debugging, ability to obtain results quickly \ldots + +In this paper we focus on a class of highly efficient parallel algorithms called \emph{iterative algorithms}. The +parallel scheme of iterative methods is quite simple. It generally involves the division of the problem +into several \emph{blocks} that will be solved in parallel on multiple +processing units. Then each processing unit has to +compute an iteration, to send/receive some data dependencies to/from +its neighbors and to iterate this process until the convergence of +the method. Several well-known methods demonstrate the convergence of these algorithms~\cite{BT89,Bahi07}. +In this processing mode a task cannot begin a new iteration while it +has not received data dependencies from its neighbors. We say that the iteration computation follows a synchronous scheme. +In the asynchronous scheme a task can compute a new iteration without having to +wait for the data dependencies coming from its neighbors. Both +communication and computations are asynchronous inducing that there is +no more idle times, due to synchronizations, between two +iterations~\cite{bcvc06:ij}. This model presents some advantages and drawbacks that we detail in section 2 but even if the number of iterations required to converge is +generally greater than for the synchronous case, it appears that the asynchronous iterative scheme can significantly reduce overall execution +times by suppressing idle times due to synchronizations~\cite{Bahi07} for more details. + +Nevertheless, in both cases (synchronous or asynchronous) it is very time consuming to find optimal configuration and deployment requirements +for a given application on a given multi-core architecture. Finding good resource allocations policies under varying CPU power, network speeds and +loads is very challenging and labor intensive.~\cite{Calheiros:2011:CTM:1951445.1951450}. This problematic is even more difficult for the asynchronous scheme +where variations of the parameters of the execution platform can lead to very different number of iterations required to converge and so to very different execution times. +In this challenging context we think that the use of a simulation tool can leverage the possibility of testing various platform scenarios. + +The main contribution of this paper is to show that the use of a simulation tool (i.e. the SimGrid toolkit~\cite{SimGrid}) in the context of real +parallel applications (i.e. large linear system solver) can help developers to better tune their application for a given multi-core architecture. +To show the validity of this approach we first compare the simulated execution of the multisplitting algorithm with the GMRES (Generalized Minimal Residual) solver +\cite{ref1} both in synchronous mode. The obtained results on different simulated multi-core architectures confirm the results previously obtained on non simulated architecture. +We also confirm the efficiency of the asynchronous multisplitting algorithm comparing to the synchronous GMRES. In this way and with a simple computing architecture (a laptop) +SimGrid allows us (with small modifications of the MPI code) to run a test campaign of a real parallel iterative applications on different simulated multi-core architectures. +To our knowledge, there is no related work on the large-scale multi-core simulation of a real synchronous and asynchronous iterative application. + +This paper is organized as follows: + \section{The asynchronous iteration model}