X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/Krylov_multi.git/blobdiff_plain/fd654bb9a37efb3530aed230af886466bbb7f42a..662f69e0fc2f13849c19868a71423817e6305cf9:/krylov_multi.tex?ds=inline diff --git a/krylov_multi.tex b/krylov_multi.tex index 92730ac..9e423d1 100644 --- a/krylov_multi.tex +++ b/krylov_multi.tex @@ -17,10 +17,8 @@ \newcommand{\Prec}{\mathit{prec}} \newcommand{\Ratio}{\mathit{Ratio}} -%\usepackage{xspace} -%\usepackage[textsize=footnotesize]{todonotes} -%\newcommand{\LZK}[2][inline]{% -%\todo[color=green!40,#1]{\sffamily\textbf{LZK:} #2}\xspace} +\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]} +\let\endchangemargin=\endlist \title{A scalable multisplitting algorithm for solving large sparse linear systems} \date{} @@ -67,19 +65,41 @@ Preconditioners can be used in order to increase the convergence of iterative solvers. However, most of the good preconditioners are not scalable when thousands of cores are used. -Traditional iterative solvers have global synchronizations that penalize the -scalability. Two possible solutions consists either in using asynchronous -iterative methods~\cite{ref18} or to use multisplitting algorithms. In this -paper, we will reconsider the use of a multisplitting method. In opposition to -traditional multisplitting method that suffer from slow convergence, as -proposed in~\cite{huang1993krylov}, the use of a minimization process can -drastically improve the convergence. +%Traditional iterative solvers have global synchronizations that penalize the +%scalability. Two possible solutions consists either in using asynchronous +%iterative methods~\cite{ref18} or to use multisplitting algorithms. In this +%paper, we will reconsider the use of a multisplitting method. In opposition to +%traditional multisplitting method that suffer from slow convergence, as +%proposed in~\cite{huang1993krylov}, the use of a minimization process can +%drastically improve the convergence. + +Traditional parallel iterative solvers are based on fine-grain computations that +frequently require data exchanges between computing nodes and have global +synchronizations that penalize the scalability. Particularly, they are more +penalized on large scale architectures or on distributed platforms composed of +distant clusters interconnected by a high-latency network. It is therefore +imperative to develop coarse-grain based algorithms to reduce the communications +in the parallel iterative solvers. Two possible solutions consists either in +using asynchronous iterative methods~\cite{ref18} or to use multisplitting +algorithms. In this paper, we will reconsider the use of a multisplitting +method. In opposition to traditional multisplitting method that suffer from slow +convergence, as proposed in~\cite{huang1993krylov}, the use of a minimization +process can drastically improve the convergence. + +The present paper is organized as follows. First, Section~\ref{sec:02} presents +some related works and the principle of multisplitting methods. Then, in +Section~\ref{sec:03} is presented the algorithm of our Krylov multisplitting +method based on inner-outer iterations. Finally, in Section~\ref{sec:04}, the +parallel experiments on Hector architecture show the performances of the Krylov +multisplitting algorithm compared to the classical GMRES algorithm to solve a 3D +Poisson problem. %%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%% -\section{Related works and presention of the multisplitting method} +\section{Related works and presentation of the multisplitting method} +\label{sec:02} A general framework for studying parallel multisplitting has been presented in~\cite{o1985multi} by O'Leary and White. Convergence conditions are given for the most general case. Many authors improved multisplitting algorithms by proposing @@ -142,6 +162,7 @@ In the case where the diagonal weighting matrices $E_\ell$ have only zero and on %%%%%%%%%%%%%%%%%%%%%%%% \section{A two-stage method with a minimization} +\label{sec:03} Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows \begin{equation} \left\{ @@ -253,6 +274,7 @@ The main key points of our Krylov multisplitting method to solve a large sparse %%%%%%%%%%%%%%%%%%%%%%%% \section{Experiments} +\label{sec:04} In order to illustrate the interest of our algorithm. We have compared our algorithm with the GMRES method which is a very well used method in many situations. We have chosen to focus on only one problem which is very simple to @@ -277,10 +299,10 @@ preconditioners are not scalable when using many cores. %Doing many experiments with many cores is not easy and requires to access to a supercomputer with several hours for developing a code and then improving it. In the following we present some experiments we could achieved out on the Hector -architecture, the previous UK's high-end computing resource, funded by the UK -Research Councils. This is a Cray XE6 supercomputer, equipped with two 16-core -AMD Opteron 2.3 Ghz and 32 GB of memory. Machines are interconnected with a 3D -torus. +architecture, a UK's high-end computing resource, funded by the UK Research +Councils~\cite{hector}. This is a Cray XE6 supercomputer, equipped with two +16-core AMD Opteron 2.3 Ghz and 32 GB of memory. Machines are interconnected +with a 3D torus. Table~\ref{tab1} shows the result of the experiments. The first column shows the size of the 3D Poisson problem. The size is chosen in order to have @@ -302,6 +324,8 @@ is reached. The precision and the maximum number of iterations of CGNR method ar \begin{table}[htbp] \begin{center} +\begin{changemargin}{-1.4cm}{0cm} +\begin{footnotesize} \begin{tabular}{|c|c||c|c|c||c|c|c||c|} \hline \multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} & \multicolumn{3}{c||}{GMRES} & \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\ @@ -320,6 +344,38 @@ $743^3$ & 8,192 (4x2,048) & 704.4 & 87,822 & 4.80e-07 & 110.3 & \end{tabular} \caption{Results} \label{tab1} +\end{footnotesize} +\end{changemargin} +\end{center} +\end{table} + + + + +\begin{table}[htbp] +\begin{center} +\begin{changemargin}{-1.8cm}{0cm} +\begin{small} +\begin{tabular}{|c|c||c|c|c||c|c|c||c|} +\hline +\multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} & \multicolumn{3}{c||}{GMRES} & \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\ + \cline{3-8} + & & Time (s) & nb Iter. & $\Delta$ & Time (s)& nb Iter. & $\Delta$ & \\ +\hline +$468^3$ & 2,048 (2x1,024) & 299.7 & 41,028 & 5.02e-8 & 48.4 & 691(6,146) & 8.24e-08 & 6.19 \\ +\hline +$590^3$ & 4,096 (2x2,048) & 433.1 & 55,494 & 4.92e-7 & 74.1 & 1,101(8,211) & 6.62e-08 & 5.84 \\ +\hline +$743^3$ & 8,192 (2x4,096) & 704.4 & 87,822 & 4.80e-07 & 151.2 & 3,061(14,914) & 5.87e-08 & 4.65 \\ +\hline +$743^3$ & 8,192 (4x2,048) & 704.4 & 87,822 & 4.80e-07 & 110.3 & 1,531(12,721) & 1.47e-07& 6.39 \\ +\hline + +\end{tabular} +\caption{Results} +\label{tab1} +\end{small} +\end{changemargin} \end{center} \end{table} @@ -366,6 +422,8 @@ intend to investigate the convergence improvements of our method by using preconditioning techniques for Krylov iterative methods and multisplitting methods with overlapping blocks. +\section{Acknowledgement} +The authors would like to thank Mark Bull of the EPCC his fruitful remarks and the facilities of HECToR. %Other applications (=> other matrices)\\ %Larger experiments\\