X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/Krylov_multi.git/blobdiff_plain/e82d3db730093a1ae5fd3f4ecb5442ee0eaef981..0dcb5e02d747971e35e25b6b2f31d13ac692cc11:/krylov_multi.tex?ds=inline diff --git a/krylov_multi.tex b/krylov_multi.tex index d687ad8..275700c 100644 --- a/krylov_multi.tex +++ b/krylov_multi.tex @@ -17,10 +17,8 @@ \newcommand{\Prec}{\mathit{prec}} \newcommand{\Ratio}{\mathit{Ratio}} -%\usepackage{xspace} -%\usepackage[textsize=footnotesize]{todonotes} -%\newcommand{\LZK}[2][inline]{% -%\todo[color=green!40,#1]{\sffamily\textbf{LZK:} #2}\xspace} +\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]} +\let\endchangemargin=\endlist \title{A scalable multisplitting algorithm for solving large sparse linear systems} \date{} @@ -67,19 +65,41 @@ Preconditioners can be used in order to increase the convergence of iterative solvers. However, most of the good preconditioners are not scalable when thousands of cores are used. -Traditional iterative solvers have global synchronizations that penalize the -scalability. Two possible solutions consists either in using asynchronous -iterative methods~\cite{ref18} or to use multisplitting algorithms. In this -paper, we will reconsider the use of a multisplitting method. In opposition to -traditional multisplitting method that suffer from slow convergence, as -proposed in~\cite{huang1993krylov}, the use of a minimization process can -drastically improve the convergence. +%Traditional iterative solvers have global synchronizations that penalize the +%scalability. Two possible solutions consists either in using asynchronous +%iterative methods~\cite{ref18} or to use multisplitting algorithms. In this +%paper, we will reconsider the use of a multisplitting method. In opposition to +%traditional multisplitting method that suffer from slow convergence, as +%proposed in~\cite{huang1993krylov}, the use of a minimization process can +%drastically improve the convergence. + +Traditional parallel iterative solvers are based on fine-grain computations that +frequently require data exchanges between computing nodes and have global +synchronizations that penalize the scalability. Particularly, they are more +penalized on large scale architectures or on distributed platforms composed of +distant clusters interconnected by a high-latency network. It is therefore +imperative to develop coarse-grain based algorithms to reduce the communications +in the parallel iterative solvers. Two possible solutions consists either in +using asynchronous iterative methods~\cite{ref18} or to use multisplitting +algorithms. In this paper, we will reconsider the use of a multisplitting +method. In opposition to traditional multisplitting method that suffer from slow +convergence, as proposed in~\cite{huang1993krylov}, the use of a minimization +process can drastically improve the convergence. + +The present paper is organized as follows. First, Section~\ref{sec:02} presents +some related works and the principle of multisplitting methods. Then, in +Section~\ref{sec:03} is presented the algorithm of our Krylov multisplitting +method based on inner-outer iterations. Finally, in Section~\ref{sec:04}, the +parallel experiments on Hector architecture show the performances of the Krylov +multisplitting algorithm compared to the classical GMRES algorithm to solve a 3D +Poisson problem. %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% -\section{Related works and presention of the multisplitting method} +\section{Related works and presentation of the multisplitting method} +\label{sec:02} A general framework for studying parallel multisplitting has been presented in~\cite{o1985multi} by O'Leary and White. Convergence conditions are given for the most general case. Many authors improved multisplitting algorithms by proposing @@ -142,6 +162,7 @@ In the case where the diagonal weighting matrices $E_\ell$ have only zero and on %%%%%%%%%%%%%%%%%%%%%%%% \section{A two-stage method with a minimization} +\label{sec:03} Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows \begin{equation} \left\{ @@ -253,6 +274,7 @@ The main key points of our Krylov multisplitting method to solve a large sparse %%%%%%%%%%%%%%%%%%%%%%%% \section{Experiments} +\label{sec:04} In order to illustrate the interest of our algorithm. We have compared our algorithm with the GMRES method which is a very well used method in many situations. We have chosen to focus on only one problem which is very simple to @@ -277,10 +299,10 @@ preconditioners are not scalable when using many cores. %Doing many experiments with many cores is not easy and requires to access to a supercomputer with several hours for developing a code and then improving it. In the following we present some experiments we could achieved out on the Hector -architecture, a UK's high-end computing resource, funded by the UK -Research Councils. This is a Cray XE6 supercomputer, equipped with two 16-core -AMD Opteron 2.3 Ghz and 32 GB of memory. Machines are interconnected with a 3D -torus. +architecture, a UK's high-end computing resource, funded by the UK Research +Councils~\cite{hector}. This is a Cray XE6 supercomputer, equipped with two +16-core AMD Opteron 2.3 Ghz and 32 GB of memory. Machines are interconnected +with a 3D torus. Table~\ref{tab1} shows the result of the experiments. The first column shows the size of the 3D Poisson problem. The size is chosen in order to have @@ -302,6 +324,8 @@ is reached. The precision and the maximum number of iterations of CGNR method ar \begin{table}[htbp] \begin{center} +\begin{changemargin}{-1.8cm}{0cm} +\begin{small} \begin{tabular}{|c|c||c|c|c||c|c|c||c|} \hline \multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} & \multicolumn{3}{c||}{GMRES} & \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\ @@ -320,6 +344,8 @@ $743^3$ & 8,192 (4x2,048) & 704.4 & 87,822 & 4.80e-07 & 110.3 & \end{tabular} \caption{Results} \label{tab1} +\end{small} +\end{changemargin} \end{center} \end{table} @@ -366,6 +392,8 @@ intend to investigate the convergence improvements of our method by using preconditioning techniques for Krylov iterative methods and multisplitting methods with overlapping blocks. +\section{Acknowledgement} +The authors would like to thank Mark Bull of the EPCC his fruitful remarks and the facilities of HECToR. %Other applications (=> other matrices)\\ %Larger experiments\\