X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES2stage.git/blobdiff_plain/afeb88ec33744268bc138cf5d9cc5406a9599763..d40f73fb29cd99acadda6620fc198fc09066584e:/paper.tex diff --git a/paper.tex b/paper.tex index d00cbe1..16beac7 100644 --- a/paper.tex +++ b/paper.tex @@ -618,15 +618,14 @@ It can be summarized as follows: the inner solver is a Krylov based one. In order to accelerate its convergence, the outer solver periodically applies a least-squares minimization on the residuals computed by the inner one. %Tsolver which does not required to be changed. -At each outer iteration, the sparse linear system $Ax=b$ is partially -solved using only $m$ -iterations of an iterative method, this latter being initialized with the -last obtained approximation. -GMRES method~\cite{Saad86}, or any of its variants, can potentially be used as -inner solver. The current approximation of the Krylov method is then stored inside a $n \times s$ matrix -$S$, which is composed by the $s$ last solutions that have been computed during -the inner iterations phase. -In the remainder, the $i$-th column vector of $S$ will be denoted by $S_i$. +At each outer iteration, the sparse linear system $Ax=b$ is partially solved +using only $m$ iterations of an iterative method, this latter being initialized +with the last obtained approximation. GMRES method~\cite{Saad86}, or any of its +variants, can potentially be used as inner solver. The current approximation of +the Krylov method is then stored inside a $n \times s$ matrix $S$, which is +composed by the $s$ last solutions that have been computed during the inner +iterations phase. In the remainder, the $i$-th column vector of $S$ will be +denoted by $S_i$. At each $s$ iterations, another kind of minimization step is applied in order to compute a new solution $x$. For that, the previous residuals of $Ax=b$ are computed by @@ -652,8 +651,7 @@ appropriate than a single direct method in a parallel context. \Output $x$ (solution vector)\vspace{0.2cm} \State Set the initial guess $x_0$ \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{tsirm}$)} \label{algo:conv} - \State $x_k=Solve(A,b,x_{k-1},max\_iter_{kryl})$ \label{algo:solve} - \State retrieve error + \State $[x_k,error]=Solve(A,b,x_{k-1},max\_iter_{kryl})$ \label{algo:solve} \State $S_{k \mod s}=x_k$ \label{algo:store} \If {$k \mod s=0$ {\bf and} error$>\epsilon_{kryl}$} \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul} @@ -665,19 +663,22 @@ appropriate than a single direct method in a parallel context. \label{algo:01} \end{algorithm} -Algorithm~\ref{algo:01} summarizes the principle of the proposed method. The outer -iteration is inside the \emph{for} loop. Line~\ref{algo:solve}, the Krylov method is -called for a maximum of $max\_iter_{kryl}$ iterations. In practice, we suggest to set this parameter -equal to the restart number in the GMRES-like method. Moreover, a tolerance -threshold must be specified for the solver. In practice, this threshold must be -much smaller than the convergence threshold of the TSIRM algorithm (\emph{i.e.}, -$\epsilon_{tsirm}$). Line~\ref{algo:store}, $S_{k \mod s}=x^k$ consists in copying the -solution $x_k$ into the column $k \mod s$ of $S$. -After the -minimization, the matrix $S$ is reused with the new values of the residuals. To -solve the minimization problem, an iterative method is used. Two parameters are -required for that: the maximum number of iterations and the threshold to stop the -method. +Algorithm~\ref{algo:01} summarizes the principle of the proposed method. The +outer iteration is inside the \emph{for} loop. Line~\ref{algo:solve}, the Krylov +method is called for a maximum of $max\_iter_{kryl}$ iterations. In practice, +we suggest to set this parameter equal to the restart number in the GMRES-like +method. Moreover, a tolerance threshold must be specified for the solver. In +practice, this threshold must be much smaller than the convergence threshold of +the TSIRM algorithm (\emph{i.e.}, $\epsilon_{tsirm}$). We also consider that +after the call of the $Solve$ function, we obtain the vector $x_k$ and the error +which is defined by $||Ax^k-b||_2$. + + Line~\ref{algo:store}, +$S_{k \mod s}=x^k$ consists in copying the solution $x_k$ into the column $k +\mod s$ of $S$. After the minimization, the matrix $S$ is reused with the new +values of the residuals. To solve the minimization problem, an iterative method +is used. Two parameters are required for that: the maximum number of iterations +and the threshold to stop the method. Let us summarize the most important parameters of TSIRM: \begin{itemize} @@ -871,7 +872,7 @@ torso3 & fgmres / sor & 37.70 & 565 & 34.97 & 510 \\ -In order to perform larger experiments, we have tested some example applications +In order to perform larger experiments, we have tested some example applications of PETSc. Those applications are available in the ksp part which is suited for scalable linear equations solvers: \begin{itemize} @@ -884,9 +885,10 @@ scalable linear equations solvers: finite elements. For this example, the user can define the scaling of material coefficient in embedded circle called $\alpha$. \end{itemize} -For more technical details on these applications, interested readers are invited -to read the codes available in the PETSc sources. Those problems have been -chosen because they are scalable with many cores which is not the case of other problems that we have tested. +For more technical details on these applications, interested readers are invited +to read the codes available in the PETSc sources. Those problems have been +chosen because they are scalable with many cores which is not the case of other +problems that we have tested. In the following larger experiments are described on two large scale architectures: Curie and Juqeen. Both these architectures are supercomputer @@ -897,7 +899,8 @@ Partnership for Advanced Computing in Europe) which aims at proposing high performance supercomputing architecture to enhance research in Europe. The Curie architecture is composed of Intel E5-2680 processors at 2.7 GHz with 2Gb memory by core. The Juqueen architecture is composed of IBM PowerPC A2 at 1.6 GHz with -1Gb memory per core. +1Gb memory per core. Both those architecture are equiped with a dedicated high +speed network.