\newcommand{\Prec}{\mathit{prec}}
\newcommand{\Ratio}{\mathit{Ratio}}
-%\usepackage{xspace}
-%\usepackage[textsize=footnotesize]{todonotes}
-%\newcommand{\LZK}[2][inline]{%
-%\todo[color=green!40,#1]{\sffamily\textbf{LZK:} #2}\xspace}
+\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
+\let\endchangemargin=\endlist
\title{A scalable multisplitting algorithm for solving large sparse linear systems}
\date{}
solvers. However, most of the good preconditioners are not scalable when
thousands of cores are used.
-Traditional iterative solvers have global synchronizations that penalize the
-scalability. Two possible solutions consists either in using asynchronous
-iterative methods~\cite{ref18} or to use multisplitting algorithms. In this
-paper, we will reconsider the use of a multisplitting method. In opposition to
-traditional multisplitting method that suffer from slow convergence, as
-proposed in~\cite{huang1993krylov}, the use of a minimization process can
-drastically improve the convergence.
+%Traditional iterative solvers have global synchronizations that penalize the
+%scalability. Two possible solutions consists either in using asynchronous
+%iterative methods~\cite{ref18} or to use multisplitting algorithms. In this
+%paper, we will reconsider the use of a multisplitting method. In opposition to
+%traditional multisplitting method that suffer from slow convergence, as
+%proposed in~\cite{huang1993krylov}, the use of a minimization process can
+%drastically improve the convergence.
+
+Traditional parallel iterative solvers are based on fine-grain computations that
+frequently require data exchanges between computing nodes and have global
+synchronizations that penalize the scalability. Particularly, they are more
+penalized on large scale architectures or on distributed platforms composed of
+distant clusters interconnected by a high-latency network. It is therefore
+imperative to develop coarse-grain based algorithms to reduce the communications
+in the parallel iterative solvers. Two possible solutions consists either in
+using asynchronous iterative methods~\cite{ref18} or to use multisplitting
+algorithms. In this paper, we will reconsider the use of a multisplitting
+method. In opposition to traditional multisplitting method that suffer from slow
+convergence, as proposed in~\cite{huang1993krylov}, the use of a minimization
+process can drastically improve the convergence.
+
+The present paper is organized as follows. First, Section~\ref{sec:02} presents
+some related works and the principle of multisplitting methods. Then, in
+Section~\ref{sec:03} is presented the algorithm of our Krylov multisplitting
+method based on inner-outer iterations. Finally, in Section~\ref{sec:04}, the
+parallel experiments on Hector architecture show the performances of the Krylov
+multisplitting algorithm compared to the classical GMRES algorithm to solve a 3D
+Poisson problem.
%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%
-\section{Related works and presention of the multisplitting method}
+\section{Related works and presentation of the multisplitting method}
+\label{sec:02}
A general framework for studying parallel multisplitting has been presented in~\cite{o1985multi}
by O'Leary and White. Convergence conditions are given for the
most general case. Many authors improved multisplitting algorithms by proposing
%%%%%%%%%%%%%%%%%%%%%%%%
\section{A two-stage method with a minimization}
+\label{sec:03}
Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows
\begin{equation}
\left\{
%%%%%%%%%%%%%%%%%%%%%%%%
\section{Experiments}
+\label{sec:04}
In order to illustrate the interest of our algorithm. We have compared our
algorithm with the GMRES method which is a very well used method in many
situations. We have chosen to focus on only one problem which is very simple to
preconditioners are not scalable when using many cores.
%Doing many experiments with many cores is not easy and requires to access to a supercomputer with several hours for developing a code and then improving it.
-In the following we present some experiments we could achieved out on the
-Hector architecture, the previous UK's high-end computing resource, funded by
-the UK Research Councils, which has been stopped in the early 2014.
+In the following we present some experiments we could achieved out on the Hector
+architecture, a UK's high-end computing resource, funded by the UK Research
+Councils~\cite{hector}. This is a Cray XE6 supercomputer, equipped with two
+16-core AMD Opteron 2.3 Ghz and 32 GB of memory. Machines are interconnected
+with a 3D torus.
Table~\ref{tab1} shows the result of the experiments. The first column shows
the size of the 3D Poisson problem. The size is chosen in order to have
\begin{table}[htbp]
\begin{center}
+\begin{changemargin}{-1.8cm}{0cm}
+\begin{small}
\begin{tabular}{|c|c||c|c|c||c|c|c||c|}
\hline
\multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} & \multicolumn{3}{c||}{GMRES} & \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\
\end{tabular}
\caption{Results}
\label{tab1}
+\end{small}
+\end{changemargin}
\end{center}
\end{table}
preconditioning techniques for Krylov iterative methods and multisplitting
methods with overlapping blocks.
+\section{Acknowledgement}
+The authors would like to thank Mark Bull of the EPCC his fruitful remarks and the facilities of HECToR.
%Other applications (=> other matrices)\\
%Larger experiments\\