X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES2stage.git/blobdiff_plain/0c77825b587a6051bceb28a3a8762f660f3c6c52..317020501fc8da3f9e443457700821b26f66da55:/paper.tex diff --git a/paper.tex b/paper.tex index 3b19b2d..436909a 100644 --- a/paper.tex +++ b/paper.tex @@ -380,7 +380,7 @@ % use a multiple column layout for up to two different % affiliations -\author{\IEEEauthorblockN{Rapha\"el Couturier\IEEEauthorrefmark{1}, Lilia Ziane Khodja \IEEEauthorrefmark{2}, and Christophe Guyeux\IEEEauthorrefmark{1}} +\author{\IEEEauthorblockN{Rapha\"el Couturier\IEEEauthorrefmark{1}, Lilia Ziane Khodja\IEEEauthorrefmark{2}, and Christophe Guyeux\IEEEauthorrefmark{1}} \IEEEauthorblockA{\IEEEauthorrefmark{1} Femto-ST Institute, University of Franche Comte, France\\ Email: \{raphael.couturier,christophe.guyeux\}@univ-fcomte.fr} \IEEEauthorblockA{\IEEEauthorrefmark{2} INRIA Bordeaux Sud-Ouest, France\\ @@ -669,8 +669,8 @@ called for a maximum of $max\_iter_{kryl}$ iterations. In practice, we sugges equals to the restart number of the GMRES-like method. Moreover, a tolerance threshold must be specified for the solver. In practice, this threshold must be much smaller than the convergence threshold of the TSIRM algorithm (\emph{i.e.} -$\epsilon_{tsirm}$). Line~\ref{algo:store}, $S_{k~ mod~ s}=x^k$ consists in copying the -solution $x_k$ into the column $k~ mod~ s$ of the matrix $S$. After the +$\epsilon_{tsirm}$). Line~\ref{algo:store}, $S_{k \mod s}=x^k$ consists in copying the +solution $x_k$ into the column $k \mod s$ of the matrix $S$, where $S$ is a matrix of size $n\times s$ whose column vector $i$ is denoted by $S_i$. After the minimization, the matrix $S$ is reused with the new values of the residuals. To solve the minimization problem, an iterative method is used. Two parameters are required for that: the maximum number of iterations and the threshold to stop the @@ -686,13 +686,13 @@ Let us summarize the most important parameters of TSIRM: \end{itemize} -The parallelisation of TSIRM relies on the parallelization of all its +The parallelization of TSIRM relies on the parallelization of all its parts. More precisely, except the least-squares step, all the other parts are obvious to achieve out in parallel. In order to develop a parallel version of our code, we have chosen to use PETSc~\cite{petsc-web-page}. For line~\ref{algo:matrix_mul} the matrix-matrix multiplication is implemented and efficient since the matrix $A$ is sparse and since the matrix $S$ contains few -colums in practice. As explained previously, at least two methods seem to be +columns in practice. As explained previously, at least two methods seem to be interesting to solve the least-squares minimization, CGLS and LSQR. In the following we remind the CGLS algorithm. The LSQR method follows more or @@ -737,6 +737,7 @@ these operations are easy to implement in PETSc or similar environment. \label{sec:04} Let us recall the following result, see~\cite{Saad86}. \begin{proposition} +\label{prop:saad} Suppose that $A$ is a positive real matrix with symmetric part $M$. Then the residual norm provided at the $m$-th step of GMRES satisfies: \begin{equation} ||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_0|| , @@ -745,12 +746,14 @@ where $\alpha = \lambda_min(M)^2$ and $\beta = \lambda_max(A^T A)$, which proves the convergence of GMRES($m$) for all $m$ under that assumption regarding $A$. \end{proposition} -<<<<<<< HEAD -======= We can now claim that, \begin{proposition} -If $A$ is a positive real matrix and GMRES($m$) is used as solver, then the TSIRM algorithm is convergent. +If $A$ is a positive real matrix and GMRES($m$) is used as solver, then the TSIRM algorithm is convergent. Furthermore, we still have +\begin{equation} +||r_m|| \leqslant \left(1-\dfrac{\alpha}{\beta}\right)^{\frac{m}{2}} ||r_0|| , +\end{equation} +where $\alpha$ and $\beta$ are defined as in Proposition~\ref{prop:saad}. \end{proposition} \begin{proof} @@ -758,9 +761,22 @@ Let $r_k = b-Ax_k$, where $x_k$ is the approximation of the solution after the $k$-th iterate of TSIRM. We will prove that $r_k \rightarrow 0$ when $k \rightarrow +\infty$. -Each step of the TSIRM algorithm +Each step of the TSIRM algorithm \\ + +Let $\operatorname{span}(S) = \left \{ {\sum_{i=1}^k \lambda_i v_i \Big| k \in \mathbb{N}, v_i \in S, \lambda _i \in \mathbb{R}} \right \}$ be the linear span of a set of vectors $S$. So,\\ +$\min_{\alpha \in \mathbb{R}^s} ||b-R\alpha ||_2 = \min_{\alpha \in \mathbb{R}^s} ||b-AS\alpha ||_2$ + +$\begin{array}{ll} +& = \min_{x \in span\left(S_{k-s}, S_{k-s+1}, \hdots, S_{k-1} \right)} ||b-AS\alpha ||_2\\ +& = \min_{x \in span\left(x_{k-s}, x_{k-s}+1, \hdots, x_{k-1} \right)} ||b-AS\alpha ||_2\\ +& \leqslant \min_{x \in span\left( x_{k-1} \right)} ||b-Ax ||_2\\ +& \leqslant \min_{\lambda \in \mathbb{R}} ||b-\lambda Ax_{k-1} ||_2\\ +& \leqslant ||b-Ax_{k-1}||_2 . +\end{array}$ \end{proof} ->>>>>>> 84e15020344b77e5497c4a516cc20b472b2914cd + +We can remark that, at each iterate, the residue of the TSIRM algorithm is lower +than the one of the GMRES method. %%%********************************************************* %%%********************************************************* @@ -832,7 +848,7 @@ torso3 & fgmres / sor & 37.70 & 565 & 34.97 & 510 \\ \hline \end{tabular} -\caption{Comparison of (F)GMRES and 2 stage (F)GMRES algorithms in sequential with some matrices, time is expressed in seconds.} +\caption{Comparison of (F)GMRES and TSIRM with (F)GMRES in sequential with some matrices, time is expressed in seconds.} \label{tab:02} \end{center} \end{table} @@ -891,7 +907,7 @@ Table~\ref{tab:03} shows the execution times and the number of iterations of example ex15 of PETSc on the Juqueen architecture. Different numbers of cores are studied ranging from 2,048 up-to 16,383. Two preconditioners have been tested: {\it mg} and {\it sor}. For those experiments, the number of components (or unknowns of the -problems) per processor is fixed to 25,000, also called weak scaling. This +problems) per core is fixed to 25,000, also called weak scaling. This number can seem relatively small. In fact, for some applications that need a lot of memory, the number of components per processor requires sometimes to be small. @@ -938,7 +954,7 @@ the number of iterations. So, the overall benefit of using TSIRM is interesting. \begin{tabular}{|r|r|r|r|r|r|r|r|r|} \hline - nb. cores & threshold & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSIRM CGLS} & \multicolumn{2}{c|}{TSIRM LSQR} & best gain \\ + nb. cores & threshold & \multicolumn{2}{c|}{FGMRES} & \multicolumn{2}{c|}{TSIRM CGLS} & \multicolumn{2}{c|}{TSIRM LSQR} & best gain \\ \cline{3-8} & & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & \\\hline \hline 2,048 & 8e-5 & 108.88 & 16,560 & 23.06 & 3,630 & 22.79 & 3,630 & 4.77 \\ @@ -951,7 +967,7 @@ the number of iterations. So, the overall benefit of using TSIRM is interesting. \hline \end{tabular} -\caption{Comparison of FGMRES and 2 stage FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 25000 components per core on Curie (restart=30, s=12), time is expressed in seconds.} +\caption{Comparison of FGMRES and TSIRM with FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 25,000 components per core on Curie (restart=30, s=12), time is expressed in seconds.} \label{tab:04} \end{center} \end{table*} @@ -965,9 +981,9 @@ In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architect \begin{tabular}{|r|r|r|r|r|r|r|r|r|r|r|} \hline - nb. cores & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSIRM CGLS} & \multicolumn{2}{c|}{TSIRM LSQR} & best gain & \multicolumn{3}{c|}{efficiency} \\ + nb. cores & \multicolumn{2}{c|}{FGMRES} & \multicolumn{2}{c|}{TSIRM CGLS} & \multicolumn{2}{c|}{TSIRM LSQR} & best gain & \multicolumn{3}{c|}{efficiency} \\ \cline{2-7} \cline{9-11} - & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & & GMRES & TS CGLS & TS LSQR\\\hline \hline + & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & & FGMRES & TS CGLS & TS LSQR\\\hline \hline 512 & 3,969.69 & 33,120 & 709.57 & 5,790 & 622.76 & 5,070 & 6.37 & 1 & 1 & 1 \\ 1024 & 1,530.06 & 25,860 & 290.95 & 4,830 & 307.71 & 5,070 & 5.25 & 1.30 & 1.21 & 1.01 \\ 2048 & 919.62 & 31,470 & 237.52 & 8,040 & 194.22 & 6,510 & 4.73 & 1.08 & .75 & .80\\ @@ -977,7 +993,7 @@ In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architect \hline \end{tabular} -\caption{Comparison of FGMRES and 2 stage FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 204,919,225 components on Curie with different number of cores (restart=30, s=12, threshol 5e-5), time is expressed in seconds.} +\caption{Comparison of FGMRES and TSIRM with FGMRES for ex54 of Petsc (both with the MG preconditioner) with 204,919,225 components on Curie with different number of cores (restart=30, s=12, threshold 5e-5), time is expressed in seconds.} \label{tab:05} \end{center} \end{table*} @@ -1005,7 +1021,7 @@ In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architect future plan : \\ - study other kinds of matrices, problems, inner solvers\\ -- test the influence of all the parameters\\ +- test the influence of all parameters\\ - adaptative number of outer iterations to minimize\\ - other methods to minimize the residuals?\\ - implement our solver inside PETSc @@ -1020,7 +1036,7 @@ future plan : \\ %%%********************************************************* \section*{Acknowledgment} This paper is partially funded by the Labex ACTION program (contract -ANR-11-LABX-01-01). We acknowledge PRACE for awarding us access to resource +ANR-11-LABX-01-01). We acknowledge PRACE for awarding us access to resources Curie and Juqueen respectively based in France and Germany. @@ -1063,5 +1079,3 @@ Curie and Juqueen respectively based in France and Germany. % that's all folks \end{document} - -