+\section{Introduction} The use of multi-core architectures for solving large
+scientific problems seems to become imperative in a lot of cases.
+Whatever the scale of these architectures (distributed clusters, computational
+grids, embedded multi-core,~\ldots) they are generally well adapted to execute
+complex parallel applications operating on a large amount of data.
+Unfortunately, users (industrials or scientists), who need such computational
+resources, may not have an easy access to such efficient architectures. The cost
+of using the platform and/or the cost of testing and deploying an application
+are often very important. So, in this context it is difficult to optimize a
+given application for a given architecture. In this way and in order to reduce
+the access cost to these computing resources it seems very interesting to use a
+simulation environment. The advantages are numerous: development life cycle,
+code debugging, ability to obtain results quickly,~\ldots at the condition that
+the simulation results are in education with the real ones.
+
+In this paper we focus on a class of highly efficient parallel algorithms called
+\emph{iterative algorithms}. The parallel scheme of iterative methods is quite
+simple. It generally involves the division of the problem into several
+\emph{blocks} that will be solved in parallel on multiple processing
+units. Each processing unit has to compute an iteration, to send/receive some
+data dependencies to/from its neighbors and to iterate this process until the
+convergence of the method. Several well-known methods demonstrate the
+convergence of these algorithms~\cite{BT89,bahi07}. In this processing mode a
+task cannot begin a new iteration while it has not received data dependencies
+from its neighbors. We say that the iteration computation follows a synchronous
+scheme. In the asynchronous scheme a task can compute a new iteration without
+having to wait for the data dependencies coming from its neighbors. Both
+communication and computations are asynchronous inducing that there is no more
+idle times, due to synchronizations, between two iterations~\cite{bcvc06:ij}.
+This model presents some advantages and drawbacks that we detail in
+section~\ref{sec:asynchro} but even if the number of iterations required to
+converge is generally greater than for the synchronous case, it appears that
+the asynchronous iterative scheme can significantly reduce overall execution
+times by suppressing idle times due to synchronizations~(see~\cite{bahi07}
+for more details).
+
+Nevertheless, in both cases (synchronous or asynchronous) it is very time
+consuming to find optimal configuration and deployment requirements for a given
+application on a given multi-core architecture. Finding good resource
+allocations policies under varying CPU power, network speeds and loads is very
+challenging and labor intensive~\cite{Calheiros:2011:CTM:1951445.1951450}. This
+problematic is even more difficult for the asynchronous scheme where variations
+of the parameters of the execution platform can lead to very different number of
+iterations required to converge and so to very different execution times. In
+this challenging context we think that the use of a simulation tool can greatly
+leverage the possibility of testing various platform scenarios.
+
+The main contribution of this paper is to show that the use of a simulation tool
+(i.e. the SimGrid toolkit~\cite{SimGrid}) in the context of real parallel
+applications (i.e. large linear system solvers) can help developers to better
+tune their application for a given multi-core architecture. To show the validity
+of this approach we first compare the simulated execution of the multisplitting
+algorithm with the GMRES (Generalized Minimal Residual)
+solver~\cite{saad86} in synchronous mode. The obtained results on different
+simulated multi-core architectures confirm the real results previously obtained
+on non simulated architectures. We also confirm the efficiency of the
+asynchronous multisplitting algorithm comparing to the synchronous GMRES. In
+this way and with a simple computing architecture (a laptop) SimGrid allows us
+to run a test campaign of a real parallel iterative applications on
+different simulated multi-core architectures. To our knowledge, there is no
+related work on the large-scale multi-core simulation of a real synchronous and
+asynchronous iterative application.
+
+This paper is organized as follows. Section~\ref{sec:asynchro} presents the
+iteration model we use and more particularly the asynchronous scheme. In
+section~\ref{sec:simgrid} the SimGrid simulation toolkit is presented.
+Section~\ref{sec:04} details the different solvers that we use. Finally our
+experimental results are presented in section~\ref{sec:expe} followed by some
+concluding remarks and perspectives.
+