dfmqslk

[GMRES2stage.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 16beac7e509d9b87c1bd444249ad71a7807fe5cd..87640739334500189fe37d1e42f428902550e45f 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -439,7 +439,7 @@ GMRES.
  \end{abstract}
  
  \begin{IEEEkeywords}
-Iterative Krylov methods; sparse linear systems; residual minimization; PETSc; %à voir... 
+Iterative Krylov methods; sparse linear systems; two stage iteration; least-squares residual minimization; PETSc
  \end{IEEEkeywords}
  
  
@@ -547,38 +547,42 @@ Iterative Krylov methods; sparse linear systems; residual minimization; PETSc; %
  % You must have at least 2 lines in the paragraph with the drop letter
  % (should never be an issue)
  
-Iterative methods have recently become more attractive than  direct ones to  solve very large
-sparse  linear systems.  They are more  efficient  in a  parallel
-context,  supporting  thousands  of  cores,  and they require  less  memory  and  arithmetic
-operations than direct  methods. This is why new iterative  methods are frequently 
-proposed or adapted by researchers, and the increasing need to solve very large sparse
-linear  systems  has triggered the  development  of such efficient iterative  techniques
-suitable for parallel processing.
-
-Most of the successful iterative methods currently available are based on so-called ``Krylov
-subspaces''. They  consist in forming a  basis of successive matrix
-powers multiplied by an initial vector, which can be for instance the residual. These methods use vectors orthogonality of the Krylov  subspace  basis in order to solve  linear
-systems.  The  most known iterative  Krylov subspace methods  are conjugate
-gradient and GMRES ones (Generalized Minimal RESidual).
-
-
-However,  iterative  methods suffer  from scalability  problems  on parallel
-computing  platforms  with many  processors, due  to  their need  of  reduction
-operations, and to  collective    communications   to  achieve   matrix-vector
+Iterative methods have recently become more attractive than direct ones to solve
+very large sparse  linear systems\cite{Saad2003}.  They are more  efficient in a
+parallel context,  supporting thousands of  cores, and they require  less memory
+and  arithmetic operations than  direct methods~\cite{bahicontascoutu}.  This is
+why new iterative methods are frequently proposed or adapted by researchers, and
+the increasing need to solve very  large sparse linear systems has triggered the
+development  of  such  efficient  iterative  techniques  suitable  for  parallel
+processing.
+
+Most  of the  successful  iterative  methods currently  available  are based  on
+so-called ``Krylov  subspaces''. They consist  in forming a basis  of successive
+matrix powers  multiplied by an  initial vector, which  can be for  instance the
+residual. These methods  use vectors orthogonality of the  Krylov subspace basis
+in  order to solve  linear systems.   The most  known iterative  Krylov subspace
+methods are conjugate gradient and GMRES ones (Generalized Minimal RESidual).
+
+
+However,  iterative  methods  suffer   from  scalability  problems  on  parallel
+computing  platforms  with many  processors,  due  to  their need  of  reduction
+operations,   and  to   collective  communications   to   achieve  matrix-vector
  multiplications. The  communications on large  clusters with thousands  of cores
-and  large  sizes of  messages  can  significantly  affect the  performances  of these
-iterative methods. As a consequence, Krylov subspace iteration methods are often used
-with preconditioners in practice, to increase their convergence and accelerate their
-performances.  However, most  of the  good preconditioners  are not  scalable on
-large clusters.
-
-In this research work, a two-stage algorithm based on  two nested iterations
-called inner-outer  iterations is proposed.  This algorithm  consists in solving  the sparse
-linear system iteratively  with a small number of  inner iterations, and restarting
-the outer  step with a  new solution minimizing  some error functions  over some
-previous residuals. This algorithm is iterative and easy to parallelize on large
-clusters. Furthermore,  the   minimization  technique   improves  its   convergence  and
-performances.
+and large sizes  of messages can significantly affect  the performances of these
+iterative methods. As a consequence, Krylov subspace iteration methods are often
+used  with  preconditioners  in  practice,  to increase  their  convergence  and
+accelerate their  performances.  However, most  of the good  preconditioners are
+not scalable on large clusters.
+
+In  this research work,  a two-stage  algorithm based  on two  nested iterations
+called inner-outer  iterations is proposed.  This algorithm  consists in solving
+the sparse  linear system iteratively with  a small number  of inner iterations,
+and  restarting  the  outer step  with  a  new  solution minimizing  some  error
+functions  over some previous  residuals. For  further information  on two-stage
+iteration      methods,     interested      readers      are     invited      to
+consult~\cite{Nichols:1973:CTS}. Two-stage algorithms are easy to parallelize on
+large clusters.  Furthermore,  the least-squares minimization technique improves
+its convergence and performances.
  
  The present  article is  organized as follows.   Related works are  presented in
  Section~\ref{sec:02}. Section~\ref{sec:03} details the two-stage algorithm using
@@ -627,6 +631,12 @@ composed by  the $s$  last solutions  that have been  computed during  the inner
  iterations phase.   In the remainder,  the $i$-th column  vector of $S$  will be
  denoted by $S_i$.
  
+$\|r_n\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{n/2} \|r_0\|,$
+In the general case, where A is not positive definite, we have
+
+$\|r_n\| \le \inf_{p \in P_n} \|p(A)\| \le \kappa_2(V) \inf_{p \in P_n} \max_{\lambda \in \sigma(A)} |p(\lambda)| \|r_0\|, \,$
+
+
  At each $s$ iterations, another kind of minimization step is applied in order to
  compute a new  solution $x$. For that, the previous  residuals of $Ax=b$ are computed by
  the inner iterations with $(b-AS)$. The minimization of the residuals is obtained by  
@@ -652,7 +662,7 @@ appropriate than a single direct method in a parallel context.
    \State Set the initial guess $x_0$
    \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{tsirm}$)} \label{algo:conv}
      \State  $[x_k,error]=Solve(A,b,x_{k-1},max\_iter_{kryl})$   \label{algo:solve}
-    \State $S_{k \mod s}=x_k$ \label{algo:store}
+    \State $S_{k \mod s}=x_k$ \label{algo:store} \Comment{update column (k mod s) of S}
      \If {$k \mod s=0$ {\bf and} error$>\epsilon_{kryl}$}
        \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul}
              \State $\alpha=Least\_Squares(R,b,max\_iter_{ls})$ \label{algo:}
@@ -800,10 +810,12 @@ than the one of the GMRES method.
  
  
  In order to see the influence of our algorithm with only one processor, we first
-show  a comparison  with the  standard version  of GMRES  and our  algorithm. In
-Table~\ref{tab:01},  we  show  the  matrices  we  have used  and  some  of  them
-characteristics. For all  the matrices, the name, the field,  the number of rows
-and the number of nonzero elements are given.
+show a comparison with GMRES or FGMRES and our algorithm. In Table~\ref{tab:01},
+we  show the  matrices we  have  used and  some of  them characteristics.  Those
+matrices  are   chosen  from   the  Davis  collection   of  the   University  of
+Florida~\cite{Dav97}. They are matrices arising in real-world applications.  For
+all the  matrices, the name,  the field,  the number of  rows and the  number of
+nonzero elements are given.
  
  \begin{table}[htbp]
  \begin{center}
@@ -836,13 +848,14 @@ Core(TM) i7-3630QM CPU @ 2.40GHz with the version 3.5.1 of PETSc.
  In  Table~\ref{tab:02}, some  experiments comparing  the solving  of  the linear
  systems obtained with the previous matrices  with a GMRES variant and with out 2
  stage algorithm are  given. In the second column, it can  be noticed that either
-gmres or fgmres is used to  solve the linear system.  According to the matrices,
-different  preconditioner is used.   With TSIRM,  the same  solver and  the same
-preconditionner are used.  This Table shows that TSIRM can drastically reduce the
-number of iterations to reach the  convergence when the number of iterations for
-the normal GMRES is more or less  greater than 500. In fact this also depends on
-tow  parameters: the  number  of iterations  to  stop GMRES  and  the number  of
-iterations to perform the minimization.
+GRMES or  FGMRES (Flexible  GMRES)~\cite{Saad:1993} is used  to solve  the linear
+system.   According to  the matrices,  different preconditioner  is  used.  With
+TSIRM, the same solver and the  same preconditionner are used.  This Table shows
+that  TSIRM  can  drastically reduce  the  number  of  iterations to  reach  the
+convergence when the  number of iterations for the normal GMRES  is more or less
+greater than  500. In fact  this also depends  on tow parameters: the  number of
+iterations  to  stop  GMRES  and   the  number  of  iterations  to  perform  the
+minimization.
  
  
  \begin{table}[htbp]