X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES2stage.git/blobdiff_plain/0f544a712fbfaa8e36e2d89273b1ecf21085669c..afeb88ec33744268bc138cf5d9cc5406a9599763:/paper.tex diff --git a/paper.tex b/paper.tex index 3a51e45..d00cbe1 100644 --- a/paper.tex +++ b/paper.tex @@ -626,15 +626,16 @@ GMRES method~\cite{Saad86}, or any of its variants, can potentially be used as inner solver. The current approximation of the Krylov method is then stored inside a $n \times s$ matrix $S$, which is composed by the $s$ last solutions that have been computed during the inner iterations phase. +In the remainder, the $i$-th column vector of $S$ will be denoted by $S_i$. -At each $s$ iterations, the minimization step is applied in order to +At each $s$ iterations, another kind of minimization step is applied in order to compute a new solution $x$. For that, the previous residuals of $Ax=b$ are computed by the inner iterations with $(b-AS)$. The minimization of the residuals is obtained by \begin{equation} \underset{\alpha\in\mathbb{R}^{s}}{min}\|b-R\alpha\|_2 \label{eq:01} \end{equation} -with $R=AS$. Then the new solution $x$ is computed with $x=S\alpha$. +with $R=AS$. The new solution $x$ is then computed with $x=S\alpha$. In practice, $R$ is a dense rectangular matrix belonging in $\mathbb{R}^{n\times s}$, @@ -664,14 +665,15 @@ appropriate than a single direct method in a parallel context. \label{algo:01} \end{algorithm} -Algorithm~\ref{algo:01} summarizes the principle of our method. The outer -iteration is inside the for loop. Line~\ref{algo:solve}, the Krylov method is +Algorithm~\ref{algo:01} summarizes the principle of the proposed method. The outer +iteration is inside the \emph{for} loop. Line~\ref{algo:solve}, the Krylov method is called for a maximum of $max\_iter_{kryl}$ iterations. In practice, we suggest to set this parameter -equals to the restart number of the GMRES-like method. Moreover, a tolerance +equal to the restart number in the GMRES-like method. Moreover, a tolerance threshold must be specified for the solver. In practice, this threshold must be -much smaller than the convergence threshold of the TSIRM algorithm (\emph{i.e.} +much smaller than the convergence threshold of the TSIRM algorithm (\emph{i.e.}, $\epsilon_{tsirm}$). Line~\ref{algo:store}, $S_{k \mod s}=x^k$ consists in copying the -solution $x_k$ into the column $k \mod s$ of the matrix $S$, where $S$ is a matrix of size $n\times s$ whose column vector $i$ is denoted by $S_i$. After the +solution $x_k$ into the column $k \mod s$ of $S$. +After the minimization, the matrix $S$ is reused with the new values of the residuals. To solve the minimization problem, an iterative method is used. Two parameters are required for that: the maximum number of iterations and the threshold to stop the @@ -736,7 +738,7 @@ these operations are easy to implement in PETSc or similar environment. \section{Convergence results} \label{sec:04} -Let us recall the following result, see~\cite{Saad86}. +Let us recall the following result, see~\cite{Saad86} for further readings. \begin{proposition} \label{prop:saad} Suppose that $A$ is a positive real matrix with symmetric part $M$. Then the residual norm provided at the $m$-th step of GMRES satisfies: @@ -886,7 +888,17 @@ For more technical details on these applications, interested readers are invite to read the codes available in the PETSc sources. Those problems have been chosen because they are scalable with many cores which is not the case of other problems that we have tested. -In the following larger experiments are described on two large scale architectures: Curie and Juqeen... {\bf description...}\\ +In the following larger experiments are described on two large scale +architectures: Curie and Juqeen. Both these architectures are supercomputer +composed of 80,640 cores for Curie and 458,752 cores for Juqueen. Those machines +are respectively hosted by GENCI in France and Jülich Supercomputing Centre in +Germany. They belongs with other similar architectures of the PRACE initiative ( +Partnership for Advanced Computing in Europe) which aims at proposing high +performance supercomputing architecture to enhance research in Europe. The Curie +architecture is composed of Intel E5-2680 processors at 2.7 GHz with 2Gb memory +by core. The Juqueen architecture is composed of IBM PowerPC A2 at 1.6 GHz with +1Gb memory per core. + {\bf Description of preconditioners}\\