dfmqslk

author Christophe Guyeux <guyeux@gmail.com>

Sat, 11 Oct 2014 08:55:44 +0000 (10:55 +0200)

committer Christophe Guyeux <guyeux@gmail.com>

Sat, 11 Oct 2014 08:55:44 +0000 (10:55 +0200)
author Christophe Guyeux <guyeux@gmail.com>
Sat, 11 Oct 2014 08:55:44 +0000 (10:55 +0200)
committer Christophe Guyeux <guyeux@gmail.com>
Sat, 11 Oct 2014 08:55:44 +0000 (10:55 +0200)
diff --combined paper.tex

index 4d0e239a02f1d21caede6978887dbfb368d7a932,126ff346a33bb29210daf7956459837b991149f6..87640739334500189fe37d1e42f428902550e45f
--- 1/paper.tex
--- 2/paper.tex
+++ b/paper.tex
@@@ -439,7 -439,7 +439,7 @@@ GMRES
   \end{abstract}
   
   \begin{IEEEkeywords}
- Iterative Krylov methods; sparse linear systems; residual minimization; PETSc; %à voir... 
+ Iterative Krylov methods; sparse linear systems; two stage iteration; least-squares residual minimization; PETSc
   \end{IEEEkeywords}
   
   
@@@ -547,38 -547,42 +547,42 @@@
   % You must have at least 2 lines in the paragraph with the drop letter
   % (should never be an issue)
   
- Iterative methods have recently become more attractive than  direct ones to  solve very large
- sparse  linear systems.  They are more  efficient  in a  parallel
- context,  supporting  thousands  of  cores,  and they require  less  memory  and  arithmetic
- operations than direct  methods. This is why new iterative  methods are frequently 
- proposed or adapted by researchers, and the increasing need to solve very large sparse
- linear  systems  has triggered the  development  of such efficient iterative  techniques
- suitable for parallel processing.
- 
- Most of the successful iterative methods currently available are based on so-called ``Krylov
- subspaces''. They  consist in forming a  basis of successive matrix
- powers multiplied by an initial vector, which can be for instance the residual. These methods use vectors orthogonality of the Krylov  subspace  basis in order to solve  linear
- systems.  The  most known iterative  Krylov subspace methods  are conjugate
- gradient and GMRES ones (Generalized Minimal RESidual).
- 
- 
- However,  iterative  methods suffer  from scalability  problems  on parallel
- computing  platforms  with many  processors, due  to  their need  of  reduction
- operations, and to  collective    communications   to  achieve   matrix-vector
+ Iterative methods have recently become more attractive than direct ones to solve
+ very large sparse  linear systems\cite{Saad2003}.  They are more  efficient in a
+ parallel context,  supporting thousands of  cores, and they require  less memory
+ and  arithmetic operations than  direct methods~\cite{bahicontascoutu}.  This is
+ why new iterative methods are frequently proposed or adapted by researchers, and
+ the increasing need to solve very  large sparse linear systems has triggered the
+ development  of  such  efficient  iterative  techniques  suitable  for  parallel
+ processing.
+ 
+ Most  of the  successful  iterative  methods currently  available  are based  on
+ so-called ``Krylov  subspaces''. They consist  in forming a basis  of successive
+ matrix powers  multiplied by an  initial vector, which  can be for  instance the
+ residual. These methods  use vectors orthogonality of the  Krylov subspace basis
+ in  order to solve  linear systems.   The most  known iterative  Krylov subspace
+ methods are conjugate gradient and GMRES ones (Generalized Minimal RESidual).
+ 
+ 
+ However,  iterative  methods  suffer   from  scalability  problems  on  parallel
+ computing  platforms  with many  processors,  due  to  their need  of  reduction
+ operations,   and  to   collective  communications   to   achieve  matrix-vector
   multiplications. The  communications on large  clusters with thousands  of cores
- and  large  sizes of  messages  can  significantly  affect the  performances  of these
- iterative methods. As a consequence, Krylov subspace iteration methods are often used
- with preconditioners in practice, to increase their convergence and accelerate their
- performances.  However, most  of the  good preconditioners  are not  scalable on
- large clusters.
- 
- In this research work, a two-stage algorithm based on  two nested iterations
- called inner-outer  iterations is proposed.  This algorithm  consists in solving  the sparse
- linear system iteratively  with a small number of  inner iterations, and restarting
- the outer  step with a  new solution minimizing  some error functions  over some
- previous residuals. This algorithm is iterative and easy to parallelize on large
- clusters. Furthermore,  the   minimization  technique   improves  its   convergence  and
- performances.
+ and large sizes  of messages can significantly affect  the performances of these
+ iterative methods. As a consequence, Krylov subspace iteration methods are often
+ used  with  preconditioners  in  practice,  to increase  their  convergence  and
+ accelerate their  performances.  However, most  of the good  preconditioners are
+ not scalable on large clusters.
+ 
+ In  this research work,  a two-stage  algorithm based  on two  nested iterations
+ called inner-outer  iterations is proposed.  This algorithm  consists in solving
+ the sparse  linear system iteratively with  a small number  of inner iterations,
+ and  restarting  the  outer step  with  a  new  solution minimizing  some  error
+ functions  over some previous  residuals. For  further information  on two-stage
+ iteration      methods,     interested      readers      are     invited      to
+ consult~\cite{Nichols:1973:CTS}. Two-stage algorithms are easy to parallelize on
+ large clusters.  Furthermore,  the least-squares minimization technique improves
+ its convergence and performances.
   
   The present  article is  organized as follows.   Related works are  presented in
   Section~\ref{sec:02}. Section~\ref{sec:03} details the two-stage algorithm using
@@@ -618,22 -622,15 +622,21 @@@ It can be summarized as follows: th
   inner solver is a Krylov based one. In order to accelerate its convergence, the 
   outer solver periodically applies a least-squares minimization  on the residuals computed by  the inner one. %Tsolver which does not required to be changed.
   
- At each outer iteration, the sparse linear system $Ax=b$ is partially 
- solved using only $m$
- iterations of an iterative method, this latter being initialized with the 
- last obtained approximation. 
- GMRES method~\cite{Saad86}, or any of its variants, can potentially be used as
- inner solver. The current approximation of the Krylov method is then stored inside a $n \times s$ matrix
- $S$, which is composed by the $s$ last solutions that have been computed during 
- the inner iterations phase.
- In the remainder, the $i$-th column vector of $S$ will be denoted by $S_i$. 
+ At each  outer iteration,  the sparse linear  system $Ax=b$ is  partially solved
+ using only $m$ iterations of  an iterative method, this latter being initialized
+ with the last obtained approximation.  GMRES method~\cite{Saad86}, or any of its
+ variants, can potentially be used  as inner solver. The current approximation of
+ the Krylov  method is then  stored inside  a $n \times  s$ matrix $S$,  which is
+ composed by  the $s$  last solutions  that have been  computed during  the inner
+ iterations phase.   In the remainder,  the $i$-th column  vector of $S$  will be
+ denoted by $S_i$.
   
+ +$\|r_n\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{n/2} \|r_0\|,$
+ +In the general case, where A is not positive definite, we have
+ +
+ +$\|r_n\| \le \inf_{p \in P_n} \|p(A)\| \le \kappa_2(V) \inf_{p \in P_n} \max_{\lambda \in \sigma(A)} |p(\lambda)| \|r_0\|, \,$
+ +
+ +
   At each $s$ iterations, another kind of minimization step is applied in order to
   compute a new  solution $x$. For that, the previous  residuals of $Ax=b$ are computed by
   the inner iterations with $(b-AS)$. The minimization of the residuals is obtained by  
@@@ -658,9 -655,8 +661,8 @@@ appropriate than a single direct metho
     \Output $x$ (solution vector)\vspace{0.2cm}
     \State Set the initial guess $x_0$
     \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{tsirm}$)} \label{algo:conv}
-     \State  $x_k=Solve(A,b,x_{k-1},max\_iter_{kryl})$   \label{algo:solve}
-     \State retrieve error
-     \State $S_{k \mod s}=x_k$ \label{algo:store}
+     \State  $[x_k,error]=Solve(A,b,x_{k-1},max\_iter_{kryl})$   \label{algo:solve}
+     \State $S_{k \mod s}=x_k$ \label{algo:store} \Comment{update column (k mod s) of S}
       \If {$k \mod s=0$ {\bf and} error$>\epsilon_{kryl}$}
         \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul}
               \State $\alpha=Least\_Squares(R,b,max\_iter_{ls})$ \label{algo:}
@@@ -671,19 -667,22 +673,22 @@@
   \label{algo:01}
   \end{algorithm}
   
- Algorithm~\ref{algo:01}  summarizes  the principle  of  the proposed  method.  The  outer
- iteration is  inside the \emph{for}  loop. Line~\ref{algo:solve}, the Krylov  method is
- called for a  maximum of $max\_iter_{kryl}$ iterations.  In practice, we  suggest to set this parameter
- equal to  the restart  number in the  GMRES-like method. Moreover,  a tolerance
- threshold must be specified for the  solver. In practice, this threshold must be
- much  smaller  than the  convergence  threshold  of  the TSIRM  algorithm  (\emph{i.e.},
- $\epsilon_{tsirm}$).  Line~\ref{algo:store}, $S_{k \mod s}=x^k$ consists in copying the
- solution  $x_k$  into the  column  $k \mod s$ of $S$.
- After  the
- minimization, the matrix $S$ is reused with the new values of the residuals.  To
- solve the minimization problem, an  iterative method is used. Two parameters are
- required for that: the maximum number of iterations and the threshold to stop the
- method.
+ Algorithm~\ref{algo:01} summarizes  the principle  of the proposed  method.  The
+ outer iteration is inside the \emph{for} loop. Line~\ref{algo:solve}, the Krylov
+ method is called  for a maximum of $max\_iter_{kryl}$  iterations.  In practice,
+ we suggest to  set this parameter equal to the restart  number in the GMRES-like
+ method. Moreover,  a tolerance  threshold must be  specified for the  solver. In
+ practice, this threshold must be  much smaller than the convergence threshold of
+ the  TSIRM algorithm  (\emph{i.e.}, $\epsilon_{tsirm}$).  We also  consider that
+ after the call of the $Solve$ function, we obtain the vector $x_k$ and the error
+ which is defined by $||Ax^k-b||_2$.
+ 
+   Line~\ref{algo:store},
+ $S_{k \mod  s}=x^k$ consists in  copying the solution  $x_k$ into the  column $k
+ \mod s$ of $S$.   After the minimization, the matrix $S$ is  reused with the new
+ values of the residuals.  To solve the minimization problem, an iterative method
+ is used. Two parameters are required  for that: the maximum number of iterations
+ and the threshold to stop the method.
   
   Let us summarize the most important parameters of TSIRM:
   \begin{itemize}
@@@ -805,10 -804,12 +810,12 @@@ than the one of the GMRES method
   
   
   In order to see the influence of our algorithm with only one processor, we first
- show  a comparison  with the  standard version  of GMRES  and our  algorithm. In
- Table~\ref{tab:01},  we  show  the  matrices  we  have used  and  some  of  them
- characteristics. For all  the matrices, the name, the field,  the number of rows
- and the number of nonzero elements are given.
+ show a comparison with GMRES or FGMRES and our algorithm. In Table~\ref{tab:01},
+ we  show the  matrices we  have  used and  some of  them characteristics.  Those
+ matrices  are   chosen  from   the  Davis  collection   of  the   University  of
+ Florida~\cite{Dav97}. They are matrices arising in real-world applications.  For
+ all the  matrices, the name,  the field,  the number of  rows and the  number of
+ nonzero elements are given.
   
   \begin{table}[htbp]
   \begin{center}
@@@ -841,13 -842,14 +848,14 @@@ Core(TM) i7-3630QM CPU @ 2.40GHz with t
   In  Table~\ref{tab:02}, some  experiments comparing  the solving  of  the linear
   systems obtained with the previous matrices  with a GMRES variant and with out 2
   stage algorithm are  given. In the second column, it can  be noticed that either
- gmres or fgmres is used to  solve the linear system.  According to the matrices,
- different  preconditioner is used.   With TSIRM,  the same  solver and  the same
- preconditionner are used.  This Table shows that TSIRM can drastically reduce the
- number of iterations to reach the  convergence when the number of iterations for
- the normal GMRES is more or less  greater than 500. In fact this also depends on
- tow  parameters: the  number  of iterations  to  stop GMRES  and  the number  of
- iterations to perform the minimization.
+ GRMES or  FGMRES (Flexible  GMRES)~\cite{Saad:1993} is used  to solve  the linear
+ system.   According to  the matrices,  different preconditioner  is  used.  With
+ TSIRM, the same solver and the  same preconditionner are used.  This Table shows
+ that  TSIRM  can  drastically reduce  the  number  of  iterations to  reach  the
+ convergence when the  number of iterations for the normal GMRES  is more or less
+ greater than  500. In fact  this also depends  on tow parameters: the  number of
+ iterations  to  stop  GMRES  and   the  number  of  iterations  to  perform  the
+ minimization.
   
   
   \begin{table}[htbp]
@@@ -877,7 -879,7 +885,7 @@@ torso3             & fgmres / sor  & 37
   
   
   
- In order to perform larger  experiments, we have tested some example applications
+ In order to perform larger experiments, we have tested some example applications
   of PETSc. Those  applications are available in the ksp part  which is suited for
   scalable linear equations solvers:
   \begin{itemize}
@@@ -890,11 -892,23 +898,23 @@@
     finite elements. For this example, the user can define the scaling of material
     coefficient in embedded circle called $\alpha$.
   \end{itemize}
- For more technical details on  these applications, interested readers are invited
- to  read the  codes available  in the  PETSc sources.   Those problems  have been
- chosen because they  are scalable with many cores which is not the case of other problems that we have tested.
+ For more technical details on these applications, interested readers are invited
+ to read  the codes  available in  the PETSc sources.   Those problems  have been
+ chosen because they are scalable with many  cores which is not the case of other
+ problems that we have tested.
+ 
+ In  the  following   larger  experiments  are  described  on   two  large  scale
+ architectures:  Curie and  Juqeen.  Both  these architectures  are supercomputer
+ composed of 80,640 cores for Curie and 458,752 cores for Juqueen. Those machines
+ are respectively hosted  by GENCI in France and  Jülich Supercomputing Centre in
+ Germany. They belongs with other similar architectures of the PRACE initiative (
+ Partnership  for Advanced  Computing in  Europe)  which aims  at proposing  high
+ performance supercomputing architecture to enhance research in Europe. The Curie
+ architecture is composed of Intel E5-2680  processors at 2.7 GHz with 2Gb memory
+ by core. The Juqueen architecture is composed  of IBM PowerPC A2 at 1.6 GHz with
+ 1Gb memory per  core. Both those architecture are equiped  with a dedicated high
+ speed network.
   
- In the following larger experiments are described on two large scale architectures: Curie and Juqeen... {\bf description...}\\
   
   
   {\bf Description of preconditioners}\\
author	Christophe Guyeux <guyeux@gmail.com>
	Sat, 11 Oct 2014 08:55:44 +0000 (10:55 +0200)
committer	Christophe Guyeux <guyeux@gmail.com>
	Sat, 11 Oct 2014 08:55:44 +0000 (10:55 +0200)