X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES2stage.git/blobdiff_plain/bda6e7ef79c89806bf219192f386105c428e1134..aeb5859d1996432c60c4346b9f961298caa11f5a:/paper.tex diff --git a/paper.tex b/paper.tex index acb46bb..b1c6f59 100644 --- a/paper.tex +++ b/paper.tex @@ -439,7 +439,7 @@ can be around 7 times faster. \end{abstract} \begin{IEEEkeywords} -Iterative Krylov methods; sparse linear systems; error minimization; PETSc; %à voir... +Iterative Krylov methods; sparse linear systems; residual minimization; PETSc; %à voir... \end{IEEEkeywords} @@ -583,10 +583,10 @@ performances. The present paper is organized as follows. In Section~\ref{sec:02} some related works are presented. Section~\ref{sec:03} presents our two-stage algorithm using a least-square residual minimization. Section~\ref{sec:04} describes some -convergence results on this method. Section~\ref{sec:05} shows some -experimental results obtained on large clusters of our algorithm using routines -of PETSc toolkit. Finally Section~\ref{sec:06} concludes and gives some -perspectives. +convergence results on this method. In Section~\ref{sec:05}, parallization +details of TSARM are given. Section~\ref{sec:06} shows some experimental +results obtained on large clusters of our algorithm using routines of PETSc +toolkit. Finally Section~\ref{sec:06} concludes and gives some perspectives. %%%********************************************************* %%%********************************************************* @@ -604,7 +604,7 @@ perspectives. %%%********************************************************* %%%********************************************************* -\section{A Krylov two-stage algorithm} +\section{Two-stage algorithm with least-square residuals minimization} \label{sec:03} A two-stage algorithm is proposed to solve large sparse linear systems of the form $Ax=b$, where $A\in\mathbb{R}^{n\times n}$ is a sparse and square @@ -644,12 +644,12 @@ appropriate than a direct method in a parallel context. \Input $A$ (sparse matrix), $b$ (right-hand side) \Output $x$ (solution vector)\vspace{0.2cm} \State Set the initial guess $x^0$ - \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon$)} \label{algo:conv} - \State $x^k=Solve(A,b,x^{k-1},m)$ \label{algo:solve} + \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{kryl}$)} \label{algo:conv} + \State $x^k=Solve(A,b,x^{k-1},max\_iter_{kryl})$ \label{algo:solve} \State retrieve error \State $S_{k~mod~s}=x^k$ \label{algo:store} - \If {$k$ mod $s=0$ {\bf and} error$>\epsilon$} - \State $R=AS$ \Comment{compute dense matrix} + \If {$k$ mod $s=0$ {\bf and} error$>\epsilon_{kryl}$} + \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul} \State Solve least-squares problem $\underset{\alpha\in\mathbb{R}^{s}}{min}\|b-R\alpha\|_2$ \label{algo:} \State $x^k=S\alpha$ \Comment{compute new solution} \EndIf @@ -660,19 +660,24 @@ appropriate than a direct method in a parallel context. Algorithm~\ref{algo:01} summarizes the principle of our method. The outer iteration is inside the for loop. Line~\ref{algo:solve}, the Krylov method is -called for a maximum of $m$ iterations. In practice, we suggest to choose $m$ +called for a maximum of $max\_iter_{kryl}$ iterations. In practice, we suggest to set this parameter equals to the restart number of the GMRES-like method. Moreover, a tolerance -threshold must be specified for the solver. In practise, this threshold must be -much smaller than the convergence threshold of the TSARM algorithm -(i.e. $\epsilon$). Line~\ref{algo:store}, $S_{k~ mod~ s}=x^k$ consists in -copying the solution $x_k$ into the column $k~ mod~ s$ of the matrix $S$. After -the minimization, the matrix $S$ is reused with the new values of the residuals. % à continuer Line - -To summarize, the important parameters of are: +threshold must be specified for the solver. In practice, this threshold must be +much smaller than the convergence threshold of the TSARM algorithm (i.e. +$\epsilon$). Line~\ref{algo:store}, $S_{k~ mod~ s}=x^k$ consists in copying the +solution $x_k$ into the column $k~ mod~ s$ of the matrix $S$. After the +minimization, the matrix $S$ is reused with the new values of the residuals. To +solve the minimization problem, an iterative method is used. Two parameters are +required for that: the maximum number of iteration and the threshold to stop the +method. + +To summarize, the important parameters of TSARM are: \begin{itemize} -\item $\epsilon$ the threshold to stop the method -\item $m$ the number of iterations for the krylov method +\item $\epsilon_{kryl}$ the threshold to stop the method of the krylov method +\item $max\_iter_{kryl}$ the maximum number of iterations for the krylov method \item $s$ the number of outer iterations before applying the minimization step +\item $max\_iter_{ls}$ the maximum number of iterations for the iterative least-square method +\item $\epsilon_{ls}$ the threshold to stop the least-square method \end{itemize} %%%********************************************************* @@ -681,11 +686,58 @@ To summarize, the important parameters of are: \section{Convergence results} \label{sec:04} + + %%%********************************************************* %%%********************************************************* -\section{Experiments using petsc} +\section{Parallelization} \label{sec:05} +The parallelisation of TSARM relies on the parallelization of all its +parts. More precisely, except the least-square step, all the other parts are +obvious to achieve out in parallel. In order to develop a parallel version of +our code, we have chosen to use PETSc~\cite{petsc-web-page}. For +line~\ref{algo:matrix_mul} the matrix-matrix multiplication is implemented and +efficient since the matrix $A$ is sparse and since the matrix $S$ contains few +colums in practice. As explained previously, at least two methods seem to be +interesting to solve the least-square minimization, CGLS and LSQR. + +In the following we remind the CGLS algorithm. The LSQR method follows more or +less the same principle but it take more place, so we briefly explain the parallelization of CGLS which is similar to LSQR. + +\begin{algorithm}[t] +\caption{CGLS} +\begin{algorithmic}[1] + \Input $A$ (matrix), $b$ (right-hand side) + \Output $x$ (solution vector)\vspace{0.2cm} + \State $r=b-Ax$ + \State $p=A'r$ + \State $s=p$ + \State $g=||s||^2_2$ + \For {$k=1,2,3,\ldots$ until convergence (g$<\epsilon_{ls}$)} \label{algo2:conv} + \State $q=Ap$ + \State $\alpha=g/||q||^2_2$ + \State $x=x+alpha*p$ + \State $r=r-alpha*q$ + \State $s=A'*r$ + \State $g_{old}=g$ + \State $g=||s||^2_2$ + \State $\beta=g/g_{old}$ + \EndFor +\end{algorithmic} +\label{algo:02} +\end{algorithm} + + +In each iteration of CGLS, there is two matrix-vector multiplications and some +classical operations: dots, norm, multiplication and addition on vectors. All +these operations are easy to implement in PETSc or similar environment. + +%%%********************************************************* +%%%********************************************************* +\section{Experiments using petsc} +\label{sec:06} + In order to see the influence of our algorithm with only one processor, we first show a comparison with the standard version of GMRES and our algorithm. In @@ -762,7 +814,9 @@ torso3 & fgmres / sor & 37.70 & 565 & 34.97 & 510 \\ -Larger experiments .... +Larger experiments ....\\ + +Describe the problems ex15 and ex54 \begin{table*} \begin{center} @@ -811,6 +865,33 @@ Larger experiments .... \label{tab:04} \end{center} \end{table*} + + + + + +\begin{table*} +\begin{center} +\begin{tabular}{|r|r|r|r|r|r|r|r|} +\hline + + nb. cores & \multicolumn{2}{c|}{gmres variant} & \multicolumn{2}{c|}{2 stage CGLS} & \multicolumn{2}{c|}{2 stage LSQR} & best gain \\ +\cline{2-7} + & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & \\\hline \hline + 512 & 3,969.69 & 33,120 & 709.57 & 5,790 & 622.76 & 5,070 & \\ + 1024 & 1,530.06 & 25,860 & 290.95 & 4,830 & 307.71 & 5,070 & \\ + 2048 & 919.62 & 31,470 & 237,52 & 8,040 & 194.22 & 6,510 & \\ + 4096 & 405.60 & 28,380 & 111.67 & 7,590 & 91.72 & 6,510 & \\ + 8192 & 785.04 & 109,590 & 76.07 & 10,470 & 69,42 & 9,030 & \\ + +\hline + +\end{tabular} +\caption{Comparison of FGMRES and 2 stage FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 204,919,225 components on Curie with different number of cores (restart=30, s=12, threshol 5e-5), time is expressed in seconds.} +\label{tab:05} +\end{center} +\end{table*} + %%%********************************************************* %%%********************************************************* @@ -819,7 +900,7 @@ Larger experiments .... %%%********************************************************* %%%********************************************************* \section{Conclusion} -\label{sec:06} +\label{sec:07} %The conclusion goes here. this is more of the conclusion %%%********************************************************* %%%********************************************************* @@ -827,6 +908,7 @@ Larger experiments .... future plan : \\ - study other kinds of matrices, problems, inner solvers\\ +- test the influence of all the parameters\\ - adaptative number of outer iterations to minimize\\ - other methods to minimize the residuals?\\ - implement our solver inside PETSc