\end{abstract}
\begin{IEEEkeywords}
-Iterative Krylov methods; sparse linear systems; residual minimization; PETSc; %à voir...
+Iterative Krylov methods; sparse linear systems; two stage iteration; least-squares residual minimization; PETSc
\end{IEEEkeywords}
% You must have at least 2 lines in the paragraph with the drop letter
% (should never be an issue)
-Iterative methods have recently become more attractive than direct ones to solve very large
-sparse linear systems. They are more efficient in a parallel
-context, supporting thousands of cores, and they require less memory and arithmetic
-operations than direct methods. This is why new iterative methods are frequently
-proposed or adapted by researchers, and the increasing need to solve very large sparse
-linear systems has triggered the development of such efficient iterative techniques
-suitable for parallel processing.
-
-Most of the successful iterative methods currently available are based on so-called ``Krylov
-subspaces''. They consist in forming a basis of successive matrix
-powers multiplied by an initial vector, which can be for instance the residual. These methods use vectors orthogonality of the Krylov subspace basis in order to solve linear
-systems. The most known iterative Krylov subspace methods are conjugate
-gradient and GMRES ones (Generalized Minimal RESidual).
-
-
-However, iterative methods suffer from scalability problems on parallel
-computing platforms with many processors, due to their need of reduction
-operations, and to collective communications to achieve matrix-vector
+Iterative methods have recently become more attractive than direct ones to solve
+very large sparse linear systems\cite{Saad2003}. They are more efficient in a
+parallel context, supporting thousands of cores, and they require less memory
+and arithmetic operations than direct methods~\cite{bahicontascoutu}. This is
+why new iterative methods are frequently proposed or adapted by researchers, and
+the increasing need to solve very large sparse linear systems has triggered the
+development of such efficient iterative techniques suitable for parallel
+processing.
+
+Most of the successful iterative methods currently available are based on
+so-called ``Krylov subspaces''. They consist in forming a basis of successive
+matrix powers multiplied by an initial vector, which can be for instance the
+residual. These methods use vectors orthogonality of the Krylov subspace basis
+in order to solve linear systems. The most known iterative Krylov subspace
+methods are conjugate gradient and GMRES ones (Generalized Minimal RESidual).
+
+
+However, iterative methods suffer from scalability problems on parallel
+computing platforms with many processors, due to their need of reduction
+operations, and to collective communications to achieve matrix-vector
multiplications. The communications on large clusters with thousands of cores
-and large sizes of messages can significantly affect the performances of these
-iterative methods. As a consequence, Krylov subspace iteration methods are often used
-with preconditioners in practice, to increase their convergence and accelerate their
-performances. However, most of the good preconditioners are not scalable on
-large clusters.
-
-In this research work, a two-stage algorithm based on two nested iterations
-called inner-outer iterations is proposed. This algorithm consists in solving the sparse
-linear system iteratively with a small number of inner iterations, and restarting
-the outer step with a new solution minimizing some error functions over some
-previous residuals. This algorithm is iterative and easy to parallelize on large
-clusters. Furthermore, the minimization technique improves its convergence and
-performances.
+and large sizes of messages can significantly affect the performances of these
+iterative methods. As a consequence, Krylov subspace iteration methods are often
+used with preconditioners in practice, to increase their convergence and
+accelerate their performances. However, most of the good preconditioners are
+not scalable on large clusters.
+
+In this research work, a two-stage algorithm based on two nested iterations
+called inner-outer iterations is proposed. This algorithm consists in solving
+the sparse linear system iteratively with a small number of inner iterations,
+and restarting the outer step with a new solution minimizing some error
+functions over some previous residuals. For further information on two-stage
+iteration methods, interested readers are invited to
+consult~\cite{Nichols:1973:CTS}. Two-stage algorithms are easy to parallelize on
+large clusters. Furthermore, the least-squares minimization technique improves
+its convergence and performances.
The present article is organized as follows. Related works are presented in
Section~\ref{sec:02}. Section~\ref{sec:03} details the two-stage algorithm using
\State Set the initial guess $x_0$
\For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{tsirm}$)} \label{algo:conv}
\State $[x_k,error]=Solve(A,b,x_{k-1},max\_iter_{kryl})$ \label{algo:solve}
- \State $S_{k \mod s}=x_k$ \label{algo:store}
+ \State $S_{k \mod s}=x_k$ \label{algo:store} \Comment{update column (k mod s) of S}
\If {$k \mod s=0$ {\bf and} error$>\epsilon_{kryl}$}
\State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul}
\State $\alpha=Least\_Squares(R,b,max\_iter_{ls})$ \label{algo:}
\section{Convergence results}
\label{sec:04}
-Let us recall the following result, see~\cite{Saad86} for further readings.
-\begin{proposition}
-\label{prop:saad}
-Suppose that $A$ is a positive real matrix with symmetric part $M$. Then the residual norm provided at the $m$-th step of GMRES satisfies:
-\begin{equation}
-||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_0|| ,
-\end{equation}
-where $\alpha = \lambda_{min}(M)^2$ and $\beta = \lambda_{max}(A^T A)$, which proves
-the convergence of GMRES($m$) for all $m$ under that assumption regarding $A$.
-\end{proposition}
We can now claim that,
\begin{proposition}
-If $A$ is a positive real matrix and GMRES($m$) is used as solver, then the TSIRM algorithm is convergent. Furthermore,
-let $r_k$ be the
+\label{prop:saad}
+If $A$ is either a definite positive or a positive matrix and GMRES($m$) is used as solver, then the TSIRM algorithm is convergent.
+
+Furthermore, let $r_k$ be the
$k$-th residue of TSIRM, then
-we still have:
+we have the following boundaries:
+\begin{itemize}
+\item when $A$ is positive:
\begin{equation}
||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0|| ,
\end{equation}
-where $\alpha$ and $\beta$ are defined as in Proposition~\ref{prop:saad}.
+where $M$ is the symmetric part of $A$, $\alpha = \lambda_{min}(M)^2$ and $\beta = \lambda_{max}(A^T A)$;
+\item when $A$ is positive definite:
+\begin{equation}
+\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_0\|.
+\end{equation}
+\end{itemize}
+%In the general case, where A is not positive definite, we have
+%$\|r_n\| \le \inf_{p \in P_n} \|p(A)\| \le \kappa_2(V) \inf_{p \in P_n} \max_{\lambda \in \sigma(A)} |p(\lambda)| \|r_0\|, .$
\end{proposition}
\begin{proof}
-We will prove by a mathematical induction that, for each $k \in \mathbb{N}^\ast$,
-$||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{mk}{2}} ||r_0||.$
+Let us first recall that the residue is under control when considering the GMRES algorithm on a positive definite matrix, and it is bounded as follows:
+\begin{equation*}
+\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{k/2} \|r_0\| .
+\end{equation*}
+Additionally, when $A$ is a positive real matrix with symmetric part $M$, then the residual norm provided at the $m$-th step of GMRES satisfies:
+\begin{equation*}
+||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_0|| ,
+\end{equation*}
+where $\alpha$ and $\beta$ are defined as in Proposition~\ref{prop:saad}, which proves
+the convergence of GMRES($m$) for all $m$ under such assumptions regarding $A$.
+These well-known results can be found, \emph{e.g.}, in~\cite{Saad86}.
+
+We will now prove by a mathematical induction that, for each $k \in \mathbb{N}^\ast$,
+$||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{mk}{2}} ||r_0||$ when $A$ is positive, and $\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_0\|$ when $A$ is positive definite.
-The base case is obvious, as for $k=1$, the TSIRM algorithm simply consists in applying GMRES($m$) once, leading to a new residual $r_1$ which follows the inductive hypothesis due to Proposition~\ref{prop:saad}.
+The base case is obvious, as for $k=1$, the TSIRM algorithm simply consists in applying GMRES($m$) once, leading to a new residual $r_1$ that follows the inductive hypothesis due, to the results recalled above.
-Suppose now that the claim holds for all $m=1, 2, \hdots, k-1$, that is, $\forall m \in \{1,2,\hdots, k-1\}$, $||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$.
+Suppose now that the claim holds for all $m=1, 2, \hdots, k-1$, that is, $\forall m \in \{1,2,\hdots, k-1\}$, $||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ in the positive case, and $\|r_k\| \leq \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_0\|$ in the definite positive one.
We will show that the statement holds too for $r_k$. Two situations can occur:
\begin{itemize}
-\item If $k \mod m \neq 0$, then the TSIRM algorithm consists in executing GMRES once. In that case, we obtain $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ by the inductive hypothesis.
-\item Else, the TSIRM algorithm consists in two stages: a first GMRES($m$) execution leads to a temporary $x_k$ whose residue satisfies $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$, and a least squares resolution.
+\item If $k \not\equiv 0 ~(\textrm{mod}\ m)$, then the TSIRM algorithm consists in executing GMRES once. In that case and by using the inductive hypothesis, we obtain either $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ if $A$ is positive, or $\|r_k\| \leqslant \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{m/2} \|r_{k-1}\|$ $\leqslant$ $\left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_{0}\|$ in the positive definite case.
+\item Else, the TSIRM algorithm consists in two stages: a first GMRES($m$) execution leads to a temporary $x_k$ whose residue satisfies:
+\begin{itemize}
+\item $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ in the positive case,
+\item $\|r_k\| \leqslant \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{m/2} \|r_{k-1}\|$ $\leqslant$ $\left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_{0}\|$ in the positive definite one,
+\end{itemize}
+and a least squares resolution.
Let $\operatorname{span}(S) = \left \{ {\sum_{i=1}^k \lambda_i v_i \Big| k \in \mathbb{N}, v_i \in S, \lambda _i \in \mathbb{R}} \right \}$ be the linear span of a set of real vectors $S$. So,\\
$\min_{\alpha \in \mathbb{R}^s} ||b-R\alpha ||_2 = \min_{\alpha \in \mathbb{R}^s} ||b-AS\alpha ||_2$
& \leqslant \min_{\lambda \in \mathbb{R}} ||b-\lambda Ax_{k} ||_2\\
& \leqslant ||b-Ax_{k}||_2\\
& = ||r_k||_2\\
-& \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||,
+& \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||, \textrm{ if $A$ is positive,}\\
+& \leqslant \left( 1-\frac{\lambda_{\mathrm{min}}^2(1/2(A^T + A))}{ \lambda_{\mathrm{max}}(A^T A)} \right)^{km/2} \|r_{0}\|, \textrm{ if $A$ is}\\
+& \textrm{positive definite,}
\end{array}$
\end{itemize}
which concludes the induction and the proof.
\end{proof}
-We can remark that, at each iterate, the residue of the TSIRM algorithm is lower
-than the one of the GMRES method.
+%We can remark that, at each iterate, the residue of the TSIRM algorithm is lower
+%than the one of the GMRES method.
%%%*********************************************************
%%%*********************************************************
In order to see the influence of our algorithm with only one processor, we first
-show a comparison with the standard version of GMRES and our algorithm. In
-Table~\ref{tab:01}, we show the matrices we have used and some of them
-characteristics. For all the matrices, the name, the field, the number of rows
-and the number of nonzero elements are given.
+show a comparison with GMRES or FGMRES and our algorithm. In Table~\ref{tab:01},
+we show the matrices we have used and some of them characteristics. Those
+matrices are chosen from the Davis collection of the University of
+Florida~\cite{Dav97}. They are matrices arising in real-world applications. For
+all the matrices, the name, the field, the number of rows and the number of
+nonzero elements are given.
\begin{table}[htbp]
\begin{center}
In Table~\ref{tab:02}, some experiments comparing the solving of the linear
systems obtained with the previous matrices with a GMRES variant and with out 2
stage algorithm are given. In the second column, it can be noticed that either
-gmres or fgmres is used to solve the linear system. According to the matrices,
-different preconditioner is used. With TSIRM, the same solver and the same
-preconditionner are used. This Table shows that TSIRM can drastically reduce the
-number of iterations to reach the convergence when the number of iterations for
-the normal GMRES is more or less greater than 500. In fact this also depends on
-tow parameters: the number of iterations to stop GMRES and the number of
-iterations to perform the minimization.
+GRMES or FGMRES (Flexible GMRES)~\cite{Saad:1993} is used to solve the linear
+system. According to the matrices, different preconditioner is used. With
+TSIRM, the same solver and the same preconditionner are used. This Table shows
+that TSIRM can drastically reduce the number of iterations to reach the
+convergence when the number of iterations for the normal GMRES is more or less
+greater than 500. In fact this also depends on tow parameters: the number of
+iterations to stop GMRES and the number of iterations to perform the
+minimization.
\begin{table}[htbp]