X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES2stage.git/blobdiff_plain/6f7da323faf523827b19731b7fcedde8f8e8fbea..860c8942588be4c114424b511f6a525063e42087:/paper.tex diff --git a/paper.tex b/paper.tex index 8085604..b0ec851 100644 --- a/paper.tex +++ b/paper.tex @@ -425,16 +425,17 @@ Email: lilia.ziane@inria.fr} \begin{abstract} -In this article, a two-stage iterative algorithm is proposed to improve the -convergence of Krylov based iterative methods, typically those of GMRES variants. The -principle of the proposed approach is to build an external iteration over the Krylov -method, and to frequently store its current residual (at each -GMRES restart for instance). After a given number of outer iterations, a minimization -step is applied on the matrix composed by the saved residuals, in order to -compute a better solution and to make new iterations if required. It is proven that -the proposal has the same convergence properties than the inner embedded method itself. -Experiments using up to 16,394 cores also show that the proposed algorithm -runs around 5 or 7 times faster than GMRES. +In this article, a two-stage iterative algorithm is proposed to improve the +convergence of Krylov based iterative methods, typically those of GMRES +variants. The principle of the proposed approach is to build an external +iteration over the Krylov method, and to frequently store its current residual +(at each GMRES restart for instance). After a given number of outer iterations, +a least-squares minimization step is applied on the matrix composed by the saved +residuals, in order to compute a better solution and to make new iterations if +required. It is proven that the proposal has the same convergence properties +than the inner embedded method itself. Experiments using up to 16,394 cores +also show that the proposed algorithm runs around 5 or 7 times faster than +GMRES. \end{abstract} \begin{IEEEkeywords} @@ -702,19 +703,21 @@ less the same principle but it takes more place, so we briefly explain the paral \begin{algorithmic}[1] \Input $A$ (matrix), $b$ (right-hand side) \Output $x$ (solution vector)\vspace{0.2cm} - \State $r=b-Ax$ - \State $p=A'r$ - \State $s=p$ - \State $g=||s||^2_2$ - \For {$k=1,2,3,\ldots$ until convergence (g$<\epsilon_{ls}$)} \label{algo2:conv} - \State $q=Ap$ - \State $\alpha=g/||q||^2_2$ - \State $x=x+alpha*p$ - \State $r=r-alpha*q$ - \State $s=A'*r$ - \State $g_{old}=g$ - \State $g=||s||^2_2$ - \State $\beta=g/g_{old}$ + \State Let $x_0$ be an initial approximation + \State $r_0=b-Ax_0$ + \State $p_1=A^Tr_0$ + \State $s_0=p_1$ + \State $\gamma=||s_0||^2_2$ + \For {$k=1,2,3,\ldots$ until convergence ($\gamma<\epsilon_{ls}$)} \label{algo2:conv} + \State $q_k=Ap_k$ + \State $\alpha_k=\gamma/||q_k||^2_2$ + \State $x_k=x_{k-1}+\alpha_kp_k$ + \State $r_k=r_{k-1}-\alpha_kq_k$ + \State $s_k=A^Tr_k$ + \State $\gamma_{old}=\gamma$ + \State $\gamma=||s_k||^2_2$ + \State $\beta_k=\gamma/\gamma_{old}$ + \State $p_{k+1}=s_k+\beta_kp_k$ \EndFor \end{algorithmic} \label{algo:02} @@ -897,7 +900,7 @@ corresponds to 30*12, there are $max\_iter_{ls}$ which corresponds to 15. \begin{figure}[htbp] \centering \includegraphics[width=0.45\textwidth]{nb_iter_sec_ex15_juqueen} -\caption{Number of iterations per second with ex15 and the same parameters than in Table~\ref{tab:03}} +\caption{Number of iterations per second with ex15 and the same parameters than in Table~\ref{tab:03} (weak scaling)} \label{fig:01} \end{figure} @@ -965,6 +968,13 @@ In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architect \end{center} \end{table*} +\begin{figure}[htbp] +\centering + \includegraphics[width=0.45\textwidth]{nb_iter_sec_ex54_curie} +\caption{Number of iterations per second with ex54 and the same parameters than in Table~\ref{tab:05} (strong scaling)} +\label{fig:02} +\end{figure} + %%%********************************************************* %%%*********************************************************