new

[GMRES2stage.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 82cc12a7ac92e635e52bcc3bb8cc8165844c6f4a..a4c9b268faf6197c47d7c288188adc6bf299b3de 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -771,7 +771,7 @@ where $M$ is the symmetric part of $A$, $\alpha = \lambda_{min}(M)^2$ and $\beta
  \begin{proof}
  Let us first recall that the residue is under control when considering the GMRES algorithm on a positive definite matrix, and it is bounded as follows:
  \begin{equation*}
-\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_0\| .
+\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{k/2} \|r_0\| .
  \end{equation*}
  Additionally, when $A$ is a positive real matrix with symmetric part $M$, then the residual norm provided at the $m$-th step of GMRES satisfies:
  \begin{equation*}
@@ -784,13 +784,18 @@ These well-known results can be found, \emph{e.g.}, in~\cite{Saad86}.
  We will now prove by a mathematical induction that, for each $k \in \mathbb{N}^\ast$, 
  $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{mk}{2}} ||r_0||$ when $A$ is positive, and $\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_0\|$ when $A$ is positive definite.
  
-The base case is obvious, as for $k=1$, the TSIRM algorithm simply consists in applying GMRES($m$) once, leading to a new residual $r_1$ which follows the inductive hypothesis due to the results recalled above.
+The base case is obvious, as for $k=1$, the TSIRM algorithm simply consists in applying GMRES($m$) once, leading to a new residual $r_1$ that follows the inductive hypothesis due, to the results recalled above.
  
-Suppose now that the claim holds for all $m=1, 2, \hdots, k-1$, that is, $\forall m \in \{1,2,\hdots, k-1\}$, $||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$.
+Suppose now that the claim holds for all $m=1, 2, \hdots, k-1$, that is, $\forall m \in \{1,2,\hdots, k-1\}$, $||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ in the positive case, and $\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_0\|$ in the definite positive one.
  We will show that the statement holds too for $r_k$. Two situations can occur:
  \begin{itemize}
-\item If $k \mod m \neq 0$, then the TSIRM algorithm consists in executing GMRES once. In that case, we obtain $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ by the inductive hypothesis.
-\item Else, the TSIRM algorithm consists in two stages: a first GMRES($m$) execution leads to a temporary $x_k$ whose residue satisfies $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$, and a least squares resolution.
+\item If $k \not\equiv 0 ~(\textrm{mod}\ m)$, then the TSIRM algorithm consists in executing GMRES once. In that case and by using the inductive hypothesis, we obtain either $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ if $A$ is positive, or $\|r_k\| \leqslant \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{m/2} \|r_{k-1}\|$ $\leqslant$ $\left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_{0}\|$ in the positive definite case.
+\item Else, the TSIRM algorithm consists in two stages: a first GMRES($m$) execution leads to a temporary $x_k$ whose residue satisfies:
+\begin{itemize}
+\item $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ in the positive case, 
+\item $\|r_k\| \leqslant \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{m/2} \|r_{k-1}\|$ $\leqslant$ $\left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_{0}\|$ in the positive definite one,
+\end{itemize}
+and a least squares resolution.
  Let $\operatorname{span}(S) = \left \{ {\sum_{i=1}^k \lambda_i v_i \Big| k \in \mathbb{N}, v_i \in S, \lambda _i \in \mathbb{R}} \right \}$ be the linear span of a set of real vectors $S$. So,\\
  $\min_{\alpha \in \mathbb{R}^s} ||b-R\alpha ||_2 = \min_{\alpha \in \mathbb{R}^s} ||b-AS\alpha ||_2$
  
@@ -801,14 +806,16 @@ $\begin{array}{ll}
  & \leqslant \min_{\lambda \in \mathbb{R}} ||b-\lambda Ax_{k} ||_2\\
  & \leqslant ||b-Ax_{k}||_2\\
  & = ||r_k||_2\\
-& \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||,
+& \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||, \textrm{ if $A$ is positive,}\\
+& \leqslant \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_{0}\|, \textrm{ if $A$ is}\\
+& \textrm{positive definite,} 
  \end{array}$
  \end{itemize}
  which concludes the induction and the proof.
  \end{proof}
  
-We can remark that, at each iterate, the residue of the TSIRM algorithm is lower 
-than the one of the GMRES method.
+%We can remark that, at each iterate, the residue of the TSIRM algorithm is lower 
+%than the one of the GMRES method.
  
  %%%*********************************************************
  %%%*********************************************************
@@ -816,13 +823,13 @@ than the one of the GMRES method.
  \label{sec:05}
  
  
-In order to see the influence of our algorithm with only one processor, we first
-show a comparison with GMRES or FGMRES and our algorithm. In Table~\ref{tab:01},
-we  show the  matrices we  have  used and  some of  them characteristics.  Those
-matrices  are   chosen  from   the  Davis  collection   of  the   University  of
-Florida~\cite{Dav97}. They are matrices arising in real-world applications.  For
-all the  matrices, the name,  the field,  the number of  rows and the  number of
-nonzero elements are given.
+In order to see the behavior of the proposal when considering only one processor, a first
+comparison with GMRES or FGMRES and the new algorithm detailed previously has been experimented. 
+Matrices that have been used with their characteristics (names, fields, rows, and nonzero coefficients) are detailed in 
+Table~\ref{tab:01}.  These latter, which are real-world applications matrices, 
+have been extracted 
+ from   the  Davis  collection,   University  of
+Florida~\cite{Dav97}.
  
  \begin{table}[htbp]
  \begin{center}
@@ -842,8 +849,9 @@ torso3             & 2D/3D problem & 259,156 & 4,429,042 \\
  \label{tab:01}
  \end{center}
  \end{table}
-
-The following  parameters have been chosen  for our experiments.   As by default
+Chosen parameters are detailed below.
+%The following  parameters have been chosen  for our experiments.   
+As by default
  the restart  of GMRES is performed every  30 iterations, we have  chosen to stop
  the GMRES every 30 iterations (\emph{i.e.} $max\_iter_{kryl}=30$).  $s$ is set to 8. CGLS is
  chosen  to minimize  the least-squares  problem with  the  following parameters:
@@ -923,8 +931,16 @@ by core. The Juqueen architecture is composed  of IBM PowerPC A2 at 1.6 GHz with
  speed network.
  
  
+In  many situations, using  preconditioners is  essential in  order to  find the
+solution of a linear system.  There are many preconditioners available in PETSc.
+For parallel applications all  the preconditioners based on matrix factorization
+are  not  available. In  our  experiments, we  have  tested  different kinds  of
+preconditioners, however  as it is  not the subject  of this paper, we  will not
+present results with many preconditioners. In  practise, we have chosen to use a
+multigrid (mg)  and successive  over-relaxation (sor). For  more details  on the
+preconditioner in PETSc please consult~\cite{petsc-web-page}.
+
  
-{\bf Description of preconditioners}\\
  
  \begin{table*}[htbp]
  \begin{center}
@@ -952,8 +968,7 @@ speed network.
  
  Table~\ref{tab:03} shows  the execution  times and the  number of  iterations of
  example ex15  of PETSc on the  Juqueen architecture. Different  numbers of cores
-are  studied ranging  from  2,048  up-to 16,383.   Two  preconditioners have  been
-tested: {\it mg} and {\it sor}.   For those experiments,  the number  of components  (or unknowns  of the
+are  studied ranging  from  2,048  up-to 16,383 with the two preconditioners {\it mg} and {\it sor}.   For those experiments,  the number  of components  (or unknowns  of the
  problems)  per core  is fixed  to 25,000,  also called  weak  scaling. This
  number can seem relatively small. In fact, for some applications that need a lot
  of  memory, the  number of  components per  processor requires  sometimes  to be
@@ -1020,7 +1035,21 @@ the number of iterations. So, the overall benefit of using TSIRM is interesting.
  \end{table*}
  
  
-In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architecture are reported.
+In  Table~\ref{tab:04},  some  experiments   with  example  ex54  on  the  Curie
+architecture are reported.  For this  application, we fixed $\alpha=0.6$.  As it
+can be seen in that Table, the size of the problem has a strong influence on the
+number of iterations to reach the  convergence. That is why we have preferred to
+change the threshold.  If we set  it to $1e-3$ as with the previous application,
+only one iteration is necessray  to reach the convergence. So Table~\ref{tab:04}
+shows the results  of differents executions with differents  number of cores and
+differents thresholds. As  with the previous example, we  can observe that TSIRM
+is faster than FGMRES. The ratio greatly depends on the number of iterations for
+FMGRES to reach the threshold. The greater the number of iterations to reach the
+convergence is, the  better the ratio between our algorithm  and FMGRES is. This
+experiment is  also a  weak scaling with  approximately $25,000$  components per
+core. It can also  be observed that the difference between CGLS  and LSQR is not
+significant. Both can be good but it seems not possible to know in advance which
+one will be the best.
  
  
  \begin{table*}[htbp]