01-05-2014b

[Krylov_multi.git] / krylov_multi.tex
diff --git a/krylov_multi.tex b/krylov_multi.tex

index ff5e8480a0d6617212e4fac787a2380e312c6e02..9e423d1751084b923d9169bb1c3a2d6e3573462b 100644 (file)
--- a/krylov_multi.tex
+++ b/krylov_multi.tex
@@ -17,10 +17,8 @@
  \newcommand{\Prec}{\mathit{prec}}
  \newcommand{\Ratio}{\mathit{Ratio}}
  
-%\usepackage{xspace}
-%\usepackage[textsize=footnotesize]{todonotes}
-%\newcommand{\LZK}[2][inline]{%
-%\todo[color=green!40,#1]{\sffamily\textbf{LZK:} #2}\xspace}
+\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
+\let\endchangemargin=\endlist
  
  \title{A scalable multisplitting algorithm for solving large sparse linear systems} 
  \date{}
@@ -67,19 +65,41 @@ Preconditioners can be  used in order to increase  the convergence of iterative
  solvers.   However, most  of the  good preconditioners  are not  scalable when
  thousands of cores are used.
  
-Traditional iterative  solvers have  global synchronizations that  penalize the
-scalability.   Two  possible solutions  consists  either  in using  asynchronous
-iterative  methods~\cite{ref18} or  to  use multisplitting  algorithms. In  this
-paper, we will  reconsider the use of a multisplitting  method. In opposition to
-traditional  multisplitting  method  that  suffer  from  slow  convergence,  as
-proposed  in~\cite{huang1993krylov},  the  use  of a  minimization  process  can
-drastically improve the convergence.
+%Traditional iterative  solvers have  global synchronizations that  penalize the
+%scalability.   Two  possible solutions  consists  either  in using  asynchronous
+%iterative  methods~\cite{ref18} or  to  use multisplitting  algorithms. In  this
+%paper, we will  reconsider the use of a multisplitting  method. In opposition to
+%traditional  multisplitting  method  that  suffer  from  slow  convergence,  as
+%proposed  in~\cite{huang1993krylov},  the  use  of a  minimization  process  can
+%drastically improve the convergence.
+
+Traditional parallel iterative solvers are based on fine-grain computations that
+frequently  require  data exchanges  between  computing  nodes  and have  global
+synchronizations  that penalize  the  scalability. Particularly,  they are  more
+penalized on large  scale architectures or on distributed  platforms composed of
+distant  clusters interconnected  by  a high-latency  network.  It is  therefore
+imperative to develop coarse-grain based algorithms to reduce the communications
+in the  parallel iterative  solvers. Two possible  solutions consists  either in
+using  asynchronous  iterative  methods~\cite{ref18}  or to  use  multisplitting
+algorithms.  In this  paper,  we will  reconsider  the use  of a  multisplitting
+method. In opposition to traditional multisplitting method that suffer from slow
+convergence, as  proposed in~\cite{huang1993krylov},  the use of  a minimization
+process can drastically improve the convergence.
+
+The present paper is  organized as follows. First, Section~\ref{sec:02} presents
+some  related  works and  the  principle  of  multisplitting methods.  Then,  in
+Section~\ref{sec:03}  is presented  the algorithm  of our  Krylov multisplitting
+method based  on inner-outer  iterations. Finally, in  Section~\ref{sec:04}, the
+parallel experiments on Hector architecture  show the performances of the Krylov
+multisplitting algorithm compared to the classical GMRES algorithm to solve a 3D
+Poisson problem.
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%
  %%%%%%%%%%%%%%%%%%%%%%%%
  
-\section{Related works and presention of the multisplitting method}
+\section{Related works and presentation of the multisplitting method}
+\label{sec:02}
  A general framework  for studying parallel multisplitting has  been presented in~\cite{o1985multi}
  by O'Leary and White. Convergence conditions are given for the
  most general case.  Many authors improved multisplitting algorithms by proposing
@@ -142,6 +162,7 @@ In the case where the diagonal weighting matrices $E_\ell$ have only zero and on
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  \section{A two-stage method with a minimization}
+\label{sec:03}
  Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows
  \begin{equation}
  \left\{
@@ -253,6 +274,7 @@ The main key points of our Krylov multisplitting method to solve a large sparse
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  \section{Experiments}
+\label{sec:04}
  In order to illustrate  the interest  of our algorithm. We have  compared our
  algorithm  with  the  GMRES  method  which  is a very  well  used  method  in  many
  situations.  We have chosen to focus on only one problem which is very simple to
@@ -276,9 +298,11 @@ preconditioner  it  is   possible  to  reduce  the  number   of  iterations  but
  preconditioners are not scalable when using many cores.
  
  %Doing many experiments  with many cores is  not easy and requires to  access to a supercomputer  with several  hours for  developing  a code  and then  improving it. 
-In the following we present  some experiments we could achieved out on the
-Hector architecture,  the previous UK's  high-end computing resource,  funded by
-the UK Research Councils, which has been stopped in the early 2014.
+In the following we present some experiments we could achieved out on the Hector
+architecture,  a UK's  high-end computing  resource, funded  by the  UK Research
+Councils~\cite{hector}.  This is  a Cray  XE6 supercomputer,  equipped  with two
+16-core AMD  Opteron 2.3 Ghz  and 32 GB  of memory. Machines  are interconnected
+with a 3D torus.
  
  Table~\ref{tab1} shows  the result of  the experiments.  The first  column shows
  the  size of  the  3D Poisson  problem.  The size  is chosen  in  order to  have
@@ -300,6 +324,8 @@ is reached. The precision and the maximum number of iterations of CGNR method ar
  
  \begin{table}[htbp]
  \begin{center}
+\begin{changemargin}{-1.4cm}{0cm}
+\begin{footnotesize}
  \begin{tabular}{|c|c||c|c|c||c|c|c||c|} 
  \hline
  \multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} &  \multicolumn{3}{c||}{GMRES} &  \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\
@@ -318,6 +344,38 @@ $743^3$ & 8,192 (4x2,048)        & 704.4     & 87,822    & 4.80e-07 &  110.3   &
  \end{tabular}
  \caption{Results}
  \label{tab1}
+\end{footnotesize}
+\end{changemargin}
+\end{center}
+\end{table}
+
+
+
+
+\begin{table}[htbp]
+\begin{center}
+\begin{changemargin}{-1.8cm}{0cm}
+\begin{small}
+\begin{tabular}{|c|c||c|c|c||c|c|c||c|} 
+\hline
+\multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} &  \multicolumn{3}{c||}{GMRES} &  \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\
+ \cline{3-8}
+           &                   &  Time (s) & nb Iter. & $\Delta$  &   Time (s)& nb Iter. & $\Delta$ & \\
+\hline
+$468^3$ & 2,048 (2x1,024)        &  299.7    & 41,028    & 5.02e-8  &  48.4    & 691(6,146) & 8.24e-08  & 6.19   \\
+\hline
+$590^3$ & 4,096 (2x2,048)        &  433.1    & 55,494    & 4.92e-7  &  74.1    & 1,101(8,211) & 6.62e-08  & 5.84   \\
+\hline
+$743^3$ & 8,192 (2x4,096)        & 704.4     & 87,822    & 4.80e-07 &  151.2   & 3,061(14,914) & 5.87e-08 & 4.65    \\
+\hline
+$743^3$ & 8,192 (4x2,048)        & 704.4     & 87,822    & 4.80e-07 &  110.3   & 1,531(12,721) & 1.47e-07& 6.39  \\
+\hline
+
+\end{tabular}
+\caption{Results}
+\label{tab1}
+\end{small}
+\end{changemargin}
  \end{center}
  \end{table}
  
@@ -364,6 +422,8 @@ intend  to investigate  the  convergence  improvements of  our  method by  using
  preconditioning  techniques  for  Krylov  iterative methods  and  multisplitting
  methods with overlapping blocks.
  
+\section{Acknowledgement}
+The authors would like to thank Mark Bull of the EPCC his fruitful remarks and the facilities of HECToR.
  
  %Other applications (=> other matrices)\\
  %Larger experiments\\