new results

[GMRES2stage.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 36298aa5ea5d0e07e8be169e7918b43b4b4d74dc..c559dda03cd9b5b228f03dbdc0fcc82e10248f45 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -367,31 +367,29 @@
  %
  % paper title
  % can use linebreaks \\ within to get better formatting as desired
-\title{A Krylov two-stage algorithm to solve large sparse linear systems}
+\title{TSARM: A Two-Stage Algorithm with least-square Residual Minimization to solve large sparse linear systems}
  %où
  %\title{A two-stage algorithm with error minimization to solve large sparse linear systems}
  %où
  %\title{???}
  
  
+
+
+
  % author names and affiliations
  % use a multiple column layout for up to two different
  % affiliations
  
-\author{\IEEEauthorblockN{Rapha\"el Couturier}
-\IEEEauthorblockA{Femto-ST Institute - DISC Department\\
-Universit\'e de Franche-Comt\'e, IUT de Belfort-Montb\'eliard\\
-19 avenue de Mar\'echal Juin, BP 527 \\
-90016 Belfort Cedex, France\\
-Email: raphael.couturier@univ-fcomte.fr}
-\and
-\IEEEauthorblockN{Lilia Ziane Khodja}
-\IEEEauthorblockA{Centre de Recherche INRIA Bordeaux Sud-Ouest\\
-200 avenue de la Vieille Tour\\
-33405 Talence Cedex, France\\
+\author{\IEEEauthorblockN{Rapha\"el Couturier\IEEEauthorrefmark{1}, Lilia Ziane Khodja \IEEEauthorrefmark{2} and Christophe Guyeux\IEEEauthorrefmark{1}}
+\IEEEauthorblockA{\IEEEauthorrefmark{1} Femto-ST Institute, University of Franche Comte, France\\
+Email: \{raphael.couturier,christophe.guyeux\}@univ-fcomte.fr}
+\IEEEauthorblockA{\IEEEauthorrefmark{2} INRIA Bordeaux Sud-Ouest, France\\
  Email: lilia.ziane@inria.fr}
  }
  
+
+
  % conference papers do not typically use \thanks and this command
  % is locked out in conference mode. If really needed, such as for
  % the acknowledgment of grants, issue a \IEEEoverridecommandlockouts
@@ -428,7 +426,16 @@ Email: lilia.ziane@inria.fr}
  
  
  \begin{abstract}
-%The abstract goes here. DO NOT USE SPECIAL CHARACTERS, SYMBOLS, OR MATH IN YOUR TITLE OR ABSTRACT.
+In  this paper  we propose  a  two stage  iterative method  which increases  the
+convergence of Krylov iterative methods,  typically those of GMRES variants. The
+principle of  our approach  is to  build an external  iteration over  the Krylov
+method  and to  save  the current  residual  frequently (for  example, for  each
+restart of GMRES). Then after a given number of outer iterations, a minimization
+step is applied on the matrix composed of the save residuals in order to compute
+a  better solution and  make a  new iteration  if necessary.  We prove  that our
+method  has the  same  convergence property  than  the inner  method used.  Some
+experiments using up  to 16,394 cores show that compared  to GMRES our algorithm
+can be around 7 times faster.
  \end{abstract}
  
  \begin{IEEEkeywords}
@@ -539,12 +546,14 @@ Iterative Krylov methods; sparse linear systems; error minimization; PETSc; %à
  % no \IEEEPARstart
  % You must have at least 2 lines in the paragraph with the drop letter
  % (should never be an issue)
-Iterative  methods are become  more attractive  than direct  ones to  solve very
-large sparse linear  systems. They are more effective in  a parallel context and
-require less memory  and arithmetic operations than direct  methods. A number of
-iterative methods are proposed and adapted by many researchers and the increased
-need for solving  very large sparse linear systems  triggered the development of
-efficient iterative techniques suitable for the parallel processing.
+
+Iterative methods  became more attractive than  direct ones to  solve very large
+sparse  linear systems.  Iterative  methods  are more  effecient  in a  parallel
+context,  with  thousands  of  cores,  and  require  less  memory  and  arithmetic
+operations than direct  methods. A number of iterative  methods are proposed and
+adapted by many researchers and the increased need for solving very large sparse
+linear  systems  triggered the  development  of  efficient iterative  techniques
+suitable for the parallel processing.
  
  Most of the successful iterative methods currently available are based on Krylov
  subspaces which  consist in forming a  basis of a sequence  of successive matrix
@@ -566,13 +575,16 @@ large clusters.
  In this  paper we propose a  two-stage algorithm based on  two nested iterations
  called inner-outer  iterations.  This algorithm  consists in solving  the sparse
  linear system iteratively  with a small number of  inner iterations and restarts
-the outer step with a new solution minimizing some error functions over a Krylov
-subspace. This algorithm is iterative  and easy to parallelize on large clusters
-and the minimization technique improves its convergence and performances.
+the outer  step with a  new solution minimizing  some error functions  over some
+previous residuals. This algorithm is iterative and easy to parallelize on large
+clusters   and  the   minimization  technique   improves  its   convergence  and
+performances.
  
  The present paper is organized  as follows. In Section~\ref{sec:02} some related
-works are presented. Section~\ref{sec:03} presents our two-stage algorithm based
-on   Krylov  subspace   iteration  methods.   Section~\ref{sec:04}   shows  some
+works are presented. Section~\ref{sec:03} presents our two-stage algorithm using
+a  least-square  residual   minimization.  Section~\ref{sec:04}  describes  some
+convergence results on this method.
+   Section~\ref{sec:05}  shows   some
  experimental results obtained on large  clusters of our algorithm using routines
  of PETSc toolkit.
  %%%*********************************************************
@@ -637,8 +649,8 @@ direct method in the parallel context.
    \Input $A$ (sparse matrix), $b$ (right-hand side)
    \Output $x$ (solution vector)\vspace{0.2cm}
    \State Set the initial guess $x^0$
-  \For {$k=1,2,3,\ldots$ until convergence}
-    \State Solve iteratively $Ax^k=b$
+  \For {$k=1,2,3,\ldots$ until convergence} \label{algo:conv}
+    \State Solve iteratively $Ax^k=b$  \label{algo:solve}
      \State $S_{k~mod~s}=x^k$ 
      \If {$k$ mod $s=0$ {\bf and} not convergence}
        \State Compute dense matrix $R=AS$
@@ -657,18 +669,20 @@ reused with the new values of the residuals.
  %%%*********************************************************
  %%%*********************************************************
  
-
+\section{Convergence results}
+\label{sec:04}
  
  %%%*********************************************************
  %%%*********************************************************
  \section{Experiments using petsc}
-\label{sec:04}
+\label{sec:05}
  
  
  In order to see the influence of our algorithm with only one processor, we first
  show  a comparison  with the  standard version  of GMRES  and our  algorithm. In
  table~\ref{tab:01},  we  show  the  matrices  we  have used  and  some  of  them
-characteristics.
+characteristics. For all  the matrices, the name, the field,  the number of rows
+and the number of nonzero elements is given.
  
  \begin{table}
  \begin{center}
@@ -689,7 +703,28 @@ torso3             & 2D/3D problem & 259,156 & 4,429,042 \\
  \end{center}
  \end{table}
  
-
+The following  parameters have been chosen  for our experiments.   As by default
+the restart  of GMRES is performed every  30 iterations, we have  chosen to stop
+the     GMRES    every     30    iterations     (line     \ref{algo:solve}    in
+Algorithm~\ref{algo:01}).   $s$ is  set to  8. CGLS  is chosen  to  minimize the
+least-squares  problem.  Two  conditions  are  used to  stop  CGLS,  either  the
+precision is under $1e-40$ or the  number of iterations is greater to $20$.  The
+external   precision    is   set    to   $1e-10$   (line    \ref{algo:conv}   in
+Algorithm~\ref{algo:01}).  Those  experiments have been performed  on a Intel(R)
+Core(TM) i7-3630QM CPU @ 2.40GHz with the version 3.5.1 of PETSc.
+
+
+In  Table~\ref{tab:02}, some  experiments comparing  the solving  of  the linear
+systems obtained with the previous matrices  with a GMRES variant and with out 2
+stage algorithm are  given. In the second column, it can  be noticed that either
+gmres or fgmres is used to  solve the linear system.  According to the matrices,
+different preconditioner is  used.  With the 2 stage  algorithm, the same solver
+and  the same  preconditionner  is used.   This  Table shows  that  the 2  stage
+algorithm  can  drastically  reduce  the  number  of  iterations  to  reach  the
+convergence when the  number of iterations for the normal GMRES  is more or less
+greater than  500. In fact  this also depends  on tow parameters: the  number of
+iterations  to  stop  GMRES  and   the  number  of  iterations  to  perform  the
+minimization.
  
  
  \begin{table}
@@ -697,32 +732,76 @@ torso3             & 2D/3D problem & 259,156 & 4,429,042 \\
  \begin{tabular}{|c|c|r|r|r|r|} 
  \hline
  
- \multirow{2}{*}{Matrix name}  & Solver /   & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage} \\
+ \multirow{2}{*}{Matrix name}  & Solver /   & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} \\ 
+\cline{3-6}
         &  precond             & Time  & \# Iter.  & Time  & \# Iter.  \\\hline \hline
  
  crashbasis         & gmres / none             &  15.65     & 518  &  14.12 & 450  \\
-parabolic\_fem     & gmres / ilu           &  2152  & ?? & 724 & ?? \\
+parabolic\_fem     & gmres / ilu           & 1009.94   & 7573 & 401.52 & 2970 \\
  epb3               & fgmres / sor             &  8.67     & 600  &  8.21 & 540  \\
  atmosmodj          &  fgmres / sor & 104.23  & 451 & 88.97 & 366  \\
  bfwa398            & gmres / none  & 1.42 & 9612 & 0.28 & 1650 \\
-torso3             & fgmres/sor  & 565  & 37.70 & 34.97 & 510 \\
+torso3             & fgmres / sor  & 37.70 & 565 & 34.97 & 510 \\
  \hline
  
  \end{tabular}
-\caption{Comparison of GMRES and 2 stage GMRES algorithms in sequential with some matrices, time is expressed in seconds.}
+\caption{Comparison of (F)GMRES and 2 stage (F)GMRES algorithms in sequential with some matrices, time is expressed in seconds.}
  \label{tab:02}
  \end{center}
  \end{table}
  
  
-Param : retart 30 iters
-cols = 8
-iter cgls = 20
-cgls prec = 1e-40
-prec = 1e-10
-Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  
  
+Larger experiments ....
+
+\begin{table*}
+\begin{center}
+\begin{tabular}{|r|r|r|r|r|r|r|r|r|} 
+\hline
+
+  nb. cores & precond   & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} &  \multicolumn{2}{c|}{2 stage LSQR} & best gain \\ 
+\cline{3-8}
+             &                       & Time  & \# Iter.  & Time  & \# Iter. & Time  & \# Iter. & \\\hline \hline
+  2,048      & mg                    & 403.49   & 18,210    & 73.89  & 3,060   & 77.84  & 3,270  & 5.46 \\
+  2,048      & sor                   & 745.37   & 57,060    & 87.31  & 6,150   & 104.21 & 7,230  & 8.53 \\
+  4,096      & mg                    & 562.25   & 25,170    & 97.23  & 3,990   & 89.71  & 3,630  & 6.27 \\
+  4,096      & sor                   & 912.12   & 70,194    & 145.57 & 9,750   & 168.97 & 10,980 & 6.26 \\
+  8,192      & mg                    & 917.02   & 40,290    & 148.81 & 5,730   & 143.03 & 5,280  & 6.41 \\
+  8,192      & sor                   & 1,404.53 & 106,530   & 212.55 & 12,990  & 180.97 & 10,470 & 7.76 \\
+  16,384     & mg                    & 1,430.56 & 63,930    & 237.17 & 8,310   & 244.26 & 7,950  & 6.03 \\
+  16,384     & sor                   & 2,852.14 & 216,240   & 418.46 & 21,690  & 505.26 & 23,970 & 6.82 \\
+\hline
+
+\end{tabular}
+\caption{Comparison of FGMRES and 2 stage FGMRES algorithms for ex15 of Petsc with 25000 components per core on Juqueen (threshold 1e-3, restart=30, s=12),  time is expressed in seconds.}
+\label{tab:03}
+\end{center}
+\end{table*}
+
+
+\begin{table*}
+\begin{center}
+\begin{tabular}{|r|r|r|r|r|r|r|r|r|} 
+\hline
+
+  nb. cores & threshold   & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} &  \multicolumn{2}{c|}{2 stage LSQR} & best gain \\ 
+\cline{3-8}
+             &                       & Time  & \# Iter.  & Time  & \# Iter. & Time  & \# Iter. & \\\hline \hline
+  2,048      & 8e-5                  & 108.88 & 16,560  & 23.06  &  3,630  & 22.79  & 3,630   & 4.77 \\
+  2,048      & 6e-5                  & 194.01 & 30,270  & 35.50  &  5,430  & 27.74  & 4,350   & 6.99 \\
+  4,096      & 7e-5                  & 160.59 & 22,530  & 35.15  &  5,130  & 29.21  & 4,350   & 5.49 \\
+  4,096      & 6e-5                  & 249.27 & 35,520  & 52.13  &  7,950  & 39.24  & 5,790   & 6.35 \\
+  8,192      & 6e-5                  & 149.54 & 17,280  & 28.68  &  3,810  & 29.05  & 3,990  & 5.21 \\
+  8,192      & 5e-5                  & 792.11 & 109,590 & 76.83  &  10,470  & 65.20  & 9,030  & 12.14 \\
+  16,384     & 4e-5                  & 718.61 & 86,400 & 98.98  &  10,830  & 131.86  & 14,790  & 7.26 \\
+\hline
+
+\end{tabular}
+\caption{Comparison of FGMRES  and 2 stage FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 25000 components per core on Curie (restart=30, s=12),  time is expressed in seconds.}
+\label{tab:04}
+\end{center}
+\end{table*}
  %%%*********************************************************
  %%%*********************************************************
  
@@ -737,6 +816,12 @@ Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  %%%*********************************************************
  
  
+future plan : \\
+- study other kinds of matrices, problems, inner solvers\\
+- adaptative number of outer iterations to minimize\\
+- other methods to minimize the residuals?\\
+- implement our solver inside PETSc
+
  
  % conference papers do not normally have an appendix
  
@@ -746,10 +831,10 @@ Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
  %%%*********************************************************
  %%%*********************************************************
  \section*{Acknowledgment}
-%The authors would like to thank...
-%more thanks here
-%%%*********************************************************
-%%%*********************************************************
+This  paper  is   partially  funded  by  the  Labex   ACTION  program  (contract
+ANR-11-LABX-01-01).   We acknowledge PRACE  for awarding  us access  to resource
+Curie and Juqueen respectively based in France and Germany.
+
  
  
  % trigger a \newpage just before the given reference