\end{abstract}
\begin{IEEEkeywords}
-Iterative Krylov methods; sparse linear systems; error minimization; PETSc; %à voir...
+Iterative Krylov methods; sparse linear systems; residual minimization; PETSc; %à voir...
\end{IEEEkeywords}
The present paper is organized as follows. In Section~\ref{sec:02} some related
works are presented. Section~\ref{sec:03} presents our two-stage algorithm using
a least-square residual minimization. Section~\ref{sec:04} describes some
-convergence results on this method. Section~\ref{sec:05} shows some
-experimental results obtained on large clusters of our algorithm using routines
-of PETSc toolkit. Finally Section~\ref{sec:06} concludes and gives some
-perspectives.
+convergence results on this method. In Section~\ref{sec:05}, parallization
+details of TSARM are given. Section~\ref{sec:06} shows some experimental
+results obtained on large clusters of our algorithm using routines of PETSc
+toolkit. Finally Section~\ref{sec:06} concludes and gives some perspectives.
%%%*********************************************************
%%%*********************************************************
%%%*********************************************************
%%%*********************************************************
-\section{A Krylov two-stage algorithm}
+\section{Two-stage algorithm with least-square residuals minimization}
\label{sec:03}
A two-stage algorithm is proposed to solve large sparse linear systems of the
form $Ax=b$, where $A\in\mathbb{R}^{n\times n}$ is a sparse and square
inner-outer iteration solver based on iterative Krylov methods. The main key
points of our solver are given in Algorithm~\ref{algo:01}.
-In order to accelerate the convergence, the outer iteration is implemented as an
-iterative Krylov method which minimizes some error functions over a Krylov
-subspace~\cite{saad96}. At each iteration, the sparse linear system $Ax=b$ is
-solved iteratively with an iterative method, for example GMRES
-method~\cite{saad86} or some of its variants, and the Krylov subspace that we
-used is spanned by a basis $S$ composed of successive solutions issued from the
-inner iteration
-\begin{equation}
- S = \{x^1, x^2, \ldots, x^s\} \text{,~} s\leq n.
-\end{equation}
-The advantage of such a Krylov subspace is that we neither need an orthogonal
-basis nor any synchronization between processors to generate this basis. The
-algorithm is periodically restarted every $s$ iterations with a new initial
-guess $x=S\alpha$ which minimizes the residual norm $\|b-Ax\|_2$ over the Krylov
-subspace spanned by vectors of $S$, where $\alpha$ is a solution of the normal
-equations
-\begin{equation}
- R^TR\alpha = R^Tb,
-\end{equation}
-which is associated with the least-squares problem
+In order to accelerate the convergence, the outer iteration periodically applies
+a least-square minimization on the residuals computed by the inner solver. The
+inner solver is a Krylov based solver which does not required to be changed.
+
+At each outer iteration, the sparse linear system $Ax=b$ is solved, only for $m$
+iterations, using an iterative method restarting with the previous solution. For
+example, the GMRES method~\cite{Saad86} or some of its variants can be used as a
+inner solver. The current solution of the Krylov method is saved inside a matrix
+$S$ composed of successive solutions computed by the inner iteration.
+
+Periodically, every $s$ iterations, the minimization step is applied in order to
+compute a new solution $x$. For that, the previous residuals are computed with
+$(b-AS)$. The minimization of the residuals is obtained by
\begin{equation}
\underset{\alpha\in\mathbb{R}^{s}}{min}\|b-R\alpha\|_2
\label{eq:01}
\end{equation}
-such that $R=AS$ is a dense rectangular matrix in $\mathbb{R}^{n\times s}$,
-$s\ll n$, and $R^T$ denotes the transpose of matrix $R$. We use an iterative
-method to solve the least-squares problem~(\ref{eq:01}) such as CGLS
-~\cite{hestenes52} or LSQR~\cite{paige82} which are more appropriate than a
-direct method in the parallel context.
+with $R=AS$. Then the new solution $x$ is computed with $x=S\alpha$.
+
+
+In practice, $R$ is a dense rectangular matrix in $\mathbb{R}^{n\times s}$,
+$s\ll n$. In order to minimize~(\ref{eq:01}), a least-square method such as
+CGLS ~\cite{Hestenes52} or LSQR~\cite{Paige82} is used. Those methods are more
+appropriate than a direct method in a parallel context.
\begin{algorithm}[t]
-\caption{A Krylov two-stage algorithm}
+\caption{TSARM}
\begin{algorithmic}[1]
\Input $A$ (sparse matrix), $b$ (right-hand side)
\Output $x$ (solution vector)\vspace{0.2cm}
\State Set the initial guess $x^0$
- \For {$k=1,2,3,\ldots$ until convergence} \label{algo:conv}
- \State Solve iteratively $Ax^k=b$ \label{algo:solve}
- \State $S_{k~mod~s}=x^k$
- \If {$k$ mod $s=0$ {\bf and} not convergence}
- \State Compute dense matrix $R=AS$
- \State Solve least-squares problem $\underset{\alpha\in\mathbb{R}^{s}}{min}\|b-R\alpha\|_2$
- \State Compute minimizer $x^k=S\alpha$
+ \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{kryl}$)} \label{algo:conv}
+ \State $x^k=Solve(A,b,x^{k-1},max\_iter_{kryl})$ \label{algo:solve}
+ \State retrieve error
+ \State $S_{k~mod~s}=x^k$ \label{algo:store}
+ \If {$k$ mod $s=0$ {\bf and} error$>\epsilon_{kryl}$}
+ \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul}
+ \State Solve least-squares problem $\underset{\alpha\in\mathbb{R}^{s}}{min}\|b-R\alpha\|_2$ \label{algo:}
+ \State $x^k=S\alpha$ \Comment{compute new solution}
\EndIf
\EndFor
\end{algorithmic}
\label{algo:01}
\end{algorithm}
-Operation $S_{k~ mod~ s}=x^k$ consists in copying the residual $x_k$ into the
-column $k~ mod~ s$ of the matrix $S$. After the minimization, the matrix $S$ is
-reused with the new values of the residuals.
+Algorithm~\ref{algo:01} summarizes the principle of our method. The outer
+iteration is inside the for loop. Line~\ref{algo:solve}, the Krylov method is
+called for a maximum of $max\_iter_{kryl}$ iterations. In practice, we suggest to set this parameter
+equals to the restart number of the GMRES-like method. Moreover, a tolerance
+threshold must be specified for the solver. In practice, this threshold must be
+much smaller than the convergence threshold of the TSARM algorithm (i.e.
+$\epsilon$). Line~\ref{algo:store}, $S_{k~ mod~ s}=x^k$ consists in copying the
+solution $x_k$ into the column $k~ mod~ s$ of the matrix $S$. After the
+minimization, the matrix $S$ is reused with the new values of the residuals. To
+solve the minimization problem, an iterative method is used. Two parameters are
+required for that: the maximum number of iteration and the threshold to stop the
+method.
+
+To summarize, the important parameters of TSARM are:
+\begin{itemize}
+\item $\epsilon_{kryl}$ the threshold to stop the method of the krylov method
+\item $max\_iter_{kryl}$ the maximum number of iterations for the krylov method
+\item $s$ the number of outer iterations before applying the minimization step
+\item $max\_iter_{ls}$ the maximum number of iterations for the iterative least-square method
+\item $\epsilon_{ls}$ the threshold to stop the least-square method
+\end{itemize}
%%%*********************************************************
%%%*********************************************************
\section{Convergence results}
\label{sec:04}
+
+
%%%*********************************************************
%%%*********************************************************
-\section{Experiments using petsc}
+\section{Parallelization}
\label{sec:05}
+The parallelisation of TSARM relies on the parallelization of all its
+parts. More precisely, except the least-square step, all the other parts are
+obvious to achieve out in parallel. In order to develop a parallel version of
+our code, we have chosen to use PETSc~\cite{petsc-web-page}. For
+line~\ref{algo:matrix_mul} the matrix-matrix multiplication is implemented and
+efficient since the matrix $A$ is sparse and since the matrix $S$ contains few
+colums in practice. As explained previously, at least two methods seem to be
+interesting to solve the least-square minimization, CGLS and LSQR.
+
+In the following we remind the CGLS algorithm. The LSQR method follows more or
+less the same principle but it take more place, so we briefly explain the parallelization of CGLS which is similar to LSQR.
+
+\begin{algorithm}[t]
+\caption{CGLS}
+\begin{algorithmic}[1]
+ \Input $A$ (matrix), $b$ (right-hand side)
+ \Output $x$ (solution vector)\vspace{0.2cm}
+ \State $r=b-Ax$
+ \State $p=A'r$
+ \State $s=p$
+ \State $g=||s||^2_2$
+ \For {$k=1,2,3,\ldots$ until convergence (g$<\epsilon_{ls}$)} \label{algo2:conv}
+ \State $q=Ap$
+ \State $\alpha=g/||q||^2_2$
+ \State $x=x+alpha*p$
+ \State $r=r-alpha*q$
+ \State $s=A'*r$
+ \State $g_{old}=g$
+ \State $g=||s||^2_2$
+ \State $\beta=g/g_{old}$
+ \EndFor
+\end{algorithmic}
+\label{algo:02}
+\end{algorithm}
+
+
+In each iteration of CGLS, there is two matrix-vector multiplications and some
+classical operations: dots, norm, multiplication and addition on vectors. All
+these operations are easy to implement in PETSc or similar environment.
+
+%%%*********************************************************
+%%%*********************************************************
+\section{Experiments using petsc}
+\label{sec:06}
+
In order to see the influence of our algorithm with only one processor, we first
show a comparison with the standard version of GMRES and our algorithm. In
\begin{tabular}{|c|c|r|r|r|r|}
\hline
- \multirow{2}{*}{Matrix name} & Solver / & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} \\
+ \multirow{2}{*}{Matrix name} & Solver / & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSARM CGLS} \\
\cline{3-6}
& precond & Time & \# Iter. & Time & \# Iter. \\\hline \hline
-Larger experiments ....
+Larger experiments ....\\
+
+Describe the problems ex15 and ex54
\begin{table*}
\begin{center}
\begin{tabular}{|r|r|r|r|r|r|r|r|r|}
\hline
- nb. cores & precond & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} & \multicolumn{2}{c|}{2 stage LSQR} & best gain \\
+ nb. cores & precond & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSARM CGLS} & \multicolumn{2}{c|}{TSARM LSQR} & best gain \\
\cline{3-8}
& & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & \\\hline \hline
2,048 & mg & 403.49 & 18,210 & 73.89 & 3,060 & 77.84 & 3,270 & 5.46 \\
\begin{tabular}{|r|r|r|r|r|r|r|r|r|}
\hline
- nb. cores & threshold & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} & \multicolumn{2}{c|}{2 stage LSQR} & best gain \\
+ nb. cores & threshold & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSARM CGLS} & \multicolumn{2}{c|}{TSARM LSQR} & best gain \\
\cline{3-8}
& & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & \\\hline \hline
2,048 & 8e-5 & 108.88 & 16,560 & 23.06 & 3,630 & 22.79 & 3,630 & 4.77 \\
\label{tab:04}
\end{center}
\end{table*}
+
+
+
+
+
+\begin{table*}
+\begin{center}
+\begin{tabular}{|r|r|r|r|r|r|r|r|r|r|}
+\hline
+
+ nb. cores & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSARM CGLS} & \multicolumn{2}{c|}{TSARM LSQR} & \multicolumn{3}{c|}{efficiency} \\
+\cline{2-10}
+ & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & GMRES & TS CGLS & TS LSQR\\\hline \hline
+ 512 & 3,969.69 & 33,120 & 709.57 & 5,790 & 622.76 & 5,070 & 1 & 1 & 1 \\
+ 1024 & 1,530.06 & 25,860 & 290.95 & 4,830 & 307.71 & 5,070 & 1.30 & 1.21 & 1.01 \\
+ 2048 & 919.62 & 31,470 & 237.52 & 8,040 & 194.22 & 6,510 & 1.08 & .75 & .80\\
+ 4096 & 405.60 & 28,380 & 111.67 & 7,590 & 91.72 & 6,510 & 1.22 & .79 & .84 \\
+ 8192 & 785.04 & 109,590 & 76.07 & 10,470 & 69.42 & 9,030 & .32 & .58 & .56 \\
+
+\hline
+
+\end{tabular}
+\caption{Comparison of FGMRES and 2 stage FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 204,919,225 components on Curie with different number of cores (restart=30, s=12, threshol 5e-5), time is expressed in seconds.}
+\label{tab:05}
+\end{center}
+\end{table*}
+
%%%*********************************************************
%%%*********************************************************
%%%*********************************************************
%%%*********************************************************
\section{Conclusion}
-\label{sec:06}
+\label{sec:07}
%The conclusion goes here. this is more of the conclusion
%%%*********************************************************
%%%*********************************************************
future plan : \\
- study other kinds of matrices, problems, inner solvers\\
+- test the influence of all the parameters\\
- adaptative number of outer iterations to minimize\\
- other methods to minimize the residuals?\\
- implement our solver inside PETSc
% http://www.ctan.org/tex-archive/biblio/bibtex/contrib/doc/
% The IEEEtran BibTeX style support page is at:
% http://www.michaelshell.org/tex/ieeetran/bibtex/
-%\bibliographystyle{IEEEtran}
+\bibliographystyle{IEEEtran}
% argument is your BibTeX string definitions and bibliography database(s)
-%\bibliography{IEEEabrv,../bib/paper}
+\bibliography{biblio}
%
% <OR> manually copy in the resultant .bbl file
% set second argument of \begin to the number of references
% (used to reserve space for the reference number labels box)
-\begin{thebibliography}{1}
+%% \begin{thebibliography}{1}
-\bibitem{saad86} Y.~Saad and M.~H.~Schultz, \emph{GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems}, SIAM Journal on Scientific and Statistical Computing, 7(3):856--869, 1986.
+%% \bibitem{saad86} Y.~Saad and M.~H.~Schultz, \emph{GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems}, SIAM Journal on Scientific and Statistical Computing, 7(3):856--869, 1986.
-\bibitem{saad96} Y.~Saad, \emph{Iterative Methods for Sparse Linear Systems}, PWS Publishing, New York, 1996.
+%% \bibitem{saad96} Y.~Saad, \emph{Iterative Methods for Sparse Linear Systems}, PWS Publishing, New York, 1996.
-\bibitem{hestenes52} M.~R.~Hestenes and E.~Stiefel, \emph{Methods of conjugate gradients for solving linear system}, Journal of Research of National Bureau of Standards, B49:409--436, 1952.
+%% \bibitem{hestenes52} M.~R.~Hestenes and E.~Stiefel, \emph{Methods of conjugate gradients for solving linear system}, Journal of Research of National Bureau of Standards, B49:409--436, 1952.
-\bibitem{paige82} C.~C.~Paige and A.~M.~Saunders, \emph{LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares}, ACM Trans. Math. Softw. 8(1):43--71, 1982.
-\end{thebibliography}
+%% \bibitem{paige82} C.~C.~Paige and A.~M.~Saunders, \emph{LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares}, ACM Trans. Math. Softw. 8(1):43--71, 1982.
+%% \end{thebibliography}