X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES2stage.git/blobdiff_plain/317020501fc8da3f9e443457700821b26f66da55..d40f73fb29cd99acadda6620fc198fc09066584e:/paper.tex diff --git a/paper.tex b/paper.tex index 436909a..16beac7 100644 --- a/paper.tex +++ b/paper.tex @@ -381,7 +381,7 @@ % affiliations \author{\IEEEauthorblockN{Rapha\"el Couturier\IEEEauthorrefmark{1}, Lilia Ziane Khodja\IEEEauthorrefmark{2}, and Christophe Guyeux\IEEEauthorrefmark{1}} -\IEEEauthorblockA{\IEEEauthorrefmark{1} Femto-ST Institute, University of Franche Comte, France\\ +\IEEEauthorblockA{\IEEEauthorrefmark{1} Femto-ST Institute, University of Franche-Comt\'e, France\\ Email: \{raphael.couturier,christophe.guyeux\}@univ-fcomte.fr} \IEEEauthorblockA{\IEEEauthorrefmark{2} INRIA Bordeaux Sud-Ouest, France\\ Email: lilia.ziane@inria.fr} @@ -564,7 +564,7 @@ gradient and GMRES ones (Generalized Minimal RESidual). However, iterative methods suffer from scalability problems on parallel computing platforms with many processors, due to their need of reduction -operations, and to collective communications to achive matrix-vector +operations, and to collective communications to achieve matrix-vector multiplications. The communications on large clusters with thousands of cores and large sizes of messages can significantly affect the performances of these iterative methods. As a consequence, Krylov subspace iteration methods are often used @@ -618,22 +618,23 @@ It can be summarized as follows: the inner solver is a Krylov based one. In order to accelerate its convergence, the outer solver periodically applies a least-squares minimization on the residuals computed by the inner one. %Tsolver which does not required to be changed. -At each outer iteration, the sparse linear system $Ax=b$ is partially -solved using only $m$ -iterations of an iterative method, this latter being initialized with the -best known approximation previously obtained. -GMRES method~\cite{Saad86}, or any of its variants, can be used for instance as an -inner solver. The current approximation of the Krylov method is then stored inside a matrix -$S$ composed by the successive solutions that are computed during inner iterations. +At each outer iteration, the sparse linear system $Ax=b$ is partially solved +using only $m$ iterations of an iterative method, this latter being initialized +with the last obtained approximation. GMRES method~\cite{Saad86}, or any of its +variants, can potentially be used as inner solver. The current approximation of +the Krylov method is then stored inside a $n \times s$ matrix $S$, which is +composed by the $s$ last solutions that have been computed during the inner +iterations phase. In the remainder, the $i$-th column vector of $S$ will be +denoted by $S_i$. -At each $s$ iterations, the minimization step is applied in order to +At each $s$ iterations, another kind of minimization step is applied in order to compute a new solution $x$. For that, the previous residuals of $Ax=b$ are computed by the inner iterations with $(b-AS)$. The minimization of the residuals is obtained by \begin{equation} \underset{\alpha\in\mathbb{R}^{s}}{min}\|b-R\alpha\|_2 \label{eq:01} \end{equation} -with $R=AS$. Then the new solution $x$ is computed with $x=S\alpha$. +with $R=AS$. The new solution $x$ is then computed with $x=S\alpha$. In practice, $R$ is a dense rectangular matrix belonging in $\mathbb{R}^{n\times s}$, @@ -650,8 +651,7 @@ appropriate than a single direct method in a parallel context. \Output $x$ (solution vector)\vspace{0.2cm} \State Set the initial guess $x_0$ \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{tsirm}$)} \label{algo:conv} - \State $x_k=Solve(A,b,x_{k-1},max\_iter_{kryl})$ \label{algo:solve} - \State retrieve error + \State $[x_k,error]=Solve(A,b,x_{k-1},max\_iter_{kryl})$ \label{algo:solve} \State $S_{k \mod s}=x_k$ \label{algo:store} \If {$k \mod s=0$ {\bf and} error$>\epsilon_{kryl}$} \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul} @@ -663,18 +663,22 @@ appropriate than a single direct method in a parallel context. \label{algo:01} \end{algorithm} -Algorithm~\ref{algo:01} summarizes the principle of our method. The outer -iteration is inside the for loop. Line~\ref{algo:solve}, the Krylov method is -called for a maximum of $max\_iter_{kryl}$ iterations. In practice, we suggest to set this parameter -equals to the restart number of the GMRES-like method. Moreover, a tolerance -threshold must be specified for the solver. In practice, this threshold must be -much smaller than the convergence threshold of the TSIRM algorithm (\emph{i.e.} -$\epsilon_{tsirm}$). Line~\ref{algo:store}, $S_{k \mod s}=x^k$ consists in copying the -solution $x_k$ into the column $k \mod s$ of the matrix $S$, where $S$ is a matrix of size $n\times s$ whose column vector $i$ is denoted by $S_i$. After the -minimization, the matrix $S$ is reused with the new values of the residuals. To -solve the minimization problem, an iterative method is used. Two parameters are -required for that: the maximum number of iterations and the threshold to stop the -method. +Algorithm~\ref{algo:01} summarizes the principle of the proposed method. The +outer iteration is inside the \emph{for} loop. Line~\ref{algo:solve}, the Krylov +method is called for a maximum of $max\_iter_{kryl}$ iterations. In practice, +we suggest to set this parameter equal to the restart number in the GMRES-like +method. Moreover, a tolerance threshold must be specified for the solver. In +practice, this threshold must be much smaller than the convergence threshold of +the TSIRM algorithm (\emph{i.e.}, $\epsilon_{tsirm}$). We also consider that +after the call of the $Solve$ function, we obtain the vector $x_k$ and the error +which is defined by $||Ax^k-b||_2$. + + Line~\ref{algo:store}, +$S_{k \mod s}=x^k$ consists in copying the solution $x_k$ into the column $k +\mod s$ of $S$. After the minimization, the matrix $S$ is reused with the new +values of the residuals. To solve the minimization problem, an iterative method +is used. Two parameters are required for that: the maximum number of iterations +and the threshold to stop the method. Let us summarize the most important parameters of TSIRM: \begin{itemize} @@ -735,44 +739,55 @@ these operations are easy to implement in PETSc or similar environment. \section{Convergence results} \label{sec:04} -Let us recall the following result, see~\cite{Saad86}. +Let us recall the following result, see~\cite{Saad86} for further readings. \begin{proposition} \label{prop:saad} Suppose that $A$ is a positive real matrix with symmetric part $M$. Then the residual norm provided at the $m$-th step of GMRES satisfies: \begin{equation} ||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_0|| , \end{equation} -where $\alpha = \lambda_min(M)^2$ and $\beta = \lambda_max(A^T A)$, which proves +where $\alpha = \lambda_{min}(M)^2$ and $\beta = \lambda_{max}(A^T A)$, which proves the convergence of GMRES($m$) for all $m$ under that assumption regarding $A$. \end{proposition} We can now claim that, \begin{proposition} -If $A$ is a positive real matrix and GMRES($m$) is used as solver, then the TSIRM algorithm is convergent. Furthermore, we still have +If $A$ is a positive real matrix and GMRES($m$) is used as solver, then the TSIRM algorithm is convergent. Furthermore, +let $r_k$ be the +$k$-th residue of TSIRM, then +we still have: \begin{equation} -||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_0|| , +||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0|| , \end{equation} where $\alpha$ and $\beta$ are defined as in Proposition~\ref{prop:saad}. \end{proposition} \begin{proof} -Let $r_k = b-Ax_k$, where $x_k$ is the approximation of the solution after the -$k$-th iterate of TSIRM. -We will prove that $r_k \rightarrow 0$ when $k \rightarrow +\infty$. +We will prove by a mathematical induction that, for each $k \in \mathbb{N}^\ast$, +$||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{mk}{2}} ||r_0||.$ -Each step of the TSIRM algorithm \\ +The base case is obvious, as for $k=1$, the TSIRM algorithm simply consists in applying GMRES($m$) once, leading to a new residual $r_1$ which follows the inductive hypothesis due to Proposition~\ref{prop:saad}. -Let $\operatorname{span}(S) = \left \{ {\sum_{i=1}^k \lambda_i v_i \Big| k \in \mathbb{N}, v_i \in S, \lambda _i \in \mathbb{R}} \right \}$ be the linear span of a set of vectors $S$. So,\\ +Suppose now that the claim holds for all $m=1, 2, \hdots, k-1$, that is, $\forall m \in \{1,2,\hdots, k-1\}$, $||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$. +We will show that the statement holds too for $r_k$. Two situations can occur: +\begin{itemize} +\item If $k \mod m \neq 0$, then the TSIRM algorithm consists in executing GMRES once. In that case, we obtain $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$ by the inductive hypothesis. +\item Else, the TSIRM algorithm consists in two stages: a first GMRES($m$) execution leads to a temporary $x_k$ whose residue satisfies $||r_k|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_{k-1}||\leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||$, and a least squares resolution. +Let $\operatorname{span}(S) = \left \{ {\sum_{i=1}^k \lambda_i v_i \Big| k \in \mathbb{N}, v_i \in S, \lambda _i \in \mathbb{R}} \right \}$ be the linear span of a set of real vectors $S$. So,\\ $\min_{\alpha \in \mathbb{R}^s} ||b-R\alpha ||_2 = \min_{\alpha \in \mathbb{R}^s} ||b-AS\alpha ||_2$ $\begin{array}{ll} -& = \min_{x \in span\left(S_{k-s}, S_{k-s+1}, \hdots, S_{k-1} \right)} ||b-AS\alpha ||_2\\ -& = \min_{x \in span\left(x_{k-s}, x_{k-s}+1, \hdots, x_{k-1} \right)} ||b-AS\alpha ||_2\\ -& \leqslant \min_{x \in span\left( x_{k-1} \right)} ||b-Ax ||_2\\ -& \leqslant \min_{\lambda \in \mathbb{R}} ||b-\lambda Ax_{k-1} ||_2\\ -& \leqslant ||b-Ax_{k-1}||_2 . +& = \min_{x \in span\left(S_{k-s+1}, S_{k-s+2}, \hdots, S_{k} \right)} ||b-AS\alpha ||_2\\ +& = \min_{x \in span\left(x_{k-s+1}, x_{k-s}+2, \hdots, x_{k} \right)} ||b-AS\alpha ||_2\\ +& \leqslant \min_{x \in span\left( x_{k} \right)} ||b-Ax ||_2\\ +& \leqslant \min_{\lambda \in \mathbb{R}} ||b-\lambda Ax_{k} ||_2\\ +& \leqslant ||b-Ax_{k}||_2\\ +& = ||r_k||_2\\ +& \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{km}{2}} ||r_0||, \end{array}$ +\end{itemize} +which concludes the induction and the proof. \end{proof} We can remark that, at each iterate, the residue of the TSIRM algorithm is lower @@ -857,7 +872,7 @@ torso3 & fgmres / sor & 37.70 & 565 & 34.97 & 510 \\ -In order to perform larger experiments, we have tested some example applications +In order to perform larger experiments, we have tested some example applications of PETSc. Those applications are available in the ksp part which is suited for scalable linear equations solvers: \begin{itemize} @@ -870,11 +885,23 @@ scalable linear equations solvers: finite elements. For this example, the user can define the scaling of material coefficient in embedded circle called $\alpha$. \end{itemize} -For more technical details on these applications, interested readers are invited -to read the codes available in the PETSc sources. Those problems have been -chosen because they are scalable with many cores which is not the case of other problems that we have tested. +For more technical details on these applications, interested readers are invited +to read the codes available in the PETSc sources. Those problems have been +chosen because they are scalable with many cores which is not the case of other +problems that we have tested. + +In the following larger experiments are described on two large scale +architectures: Curie and Juqeen. Both these architectures are supercomputer +composed of 80,640 cores for Curie and 458,752 cores for Juqueen. Those machines +are respectively hosted by GENCI in France and Jülich Supercomputing Centre in +Germany. They belongs with other similar architectures of the PRACE initiative ( +Partnership for Advanced Computing in Europe) which aims at proposing high +performance supercomputing architecture to enhance research in Europe. The Curie +architecture is composed of Intel E5-2680 processors at 2.7 GHz with 2Gb memory +by core. The Juqueen architecture is composed of IBM PowerPC A2 at 1.6 GHz with +1Gb memory per core. Both those architecture are equiped with a dedicated high +speed network. -In the following larger experiments are described on two large scale architectures: Curie and Juqeen... {\bf description...}\\ {\bf Description of preconditioners}\\ @@ -1018,13 +1045,22 @@ In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architect %%%********************************************************* %%%********************************************************* - -future plan : \\ -- study other kinds of matrices, problems, inner solvers\\ -- test the influence of all parameters\\ -- adaptative number of outer iterations to minimize\\ -- other methods to minimize the residuals?\\ -- implement our solver inside PETSc +A novel two-stage iterative algorithm has been proposed in this article, +in order to accelerate the convergence Krylov iterative methods. +Our TSIRM proposal acts as a merger between Krylov based solvers and +a least-squares minimization step. +The convergence of the method has been proven in some situations, while +experiments up to 16,394 cores have been led to verify that TSIRM runs +5 or 7 times faster than GMRES. + + +For future work, the authors' intention is to investigate +other kinds of matrices, problems, and inner solvers. The +influence of all parameters must be tested too, while +other methods to minimize the residuals must be regarded. +The number of outer iterations to minimize should become +adaptative to improve the overall performances of the proposal. +Finally, this solver will be implemented inside PETSc. % conference papers do not normally have an appendix