new

[GMRES2stage.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 896ac71233815d6aa5c7b3f00399c26d575d4248..16beac7e509d9b87c1bd444249ad71a7807fe5cd 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -618,15 +618,14 @@ It can be summarized as follows: the
  inner solver is a Krylov based one. In order to accelerate its convergence, the 
  outer solver periodically applies a least-squares minimization  on the residuals computed by  the inner one. %Tsolver which does not required to be changed.
  
-At each outer iteration, the sparse linear system $Ax=b$ is partially 
-solved using only $m$
-iterations of an iterative method, this latter being initialized with the 
-last obtained approximation. 
-GMRES method~\cite{Saad86}, or any of its variants, can potentially be used as
-inner solver. The current approximation of the Krylov method is then stored inside a $n \times s$ matrix
-$S$, which is composed by the $s$ last solutions that have been computed during 
-the inner iterations phase.
-In the remainder, the $i$-th column vector of $S$ will be denoted by $S_i$. 
+At each  outer iteration,  the sparse linear  system $Ax=b$ is  partially solved
+using only $m$ iterations of  an iterative method, this latter being initialized
+with the last obtained approximation.  GMRES method~\cite{Saad86}, or any of its
+variants, can potentially be used  as inner solver. The current approximation of
+the Krylov  method is then  stored inside  a $n \times  s$ matrix $S$,  which is
+composed by  the $s$  last solutions  that have been  computed during  the inner
+iterations phase.   In the remainder,  the $i$-th column  vector of $S$  will be
+denoted by $S_i$.
  
  At each $s$ iterations, another kind of minimization step is applied in order to
  compute a new  solution $x$. For that, the previous  residuals of $Ax=b$ are computed by
@@ -652,8 +651,7 @@ appropriate than a single direct method in a parallel context.
    \Output $x$ (solution vector)\vspace{0.2cm}
    \State Set the initial guess $x_0$
    \For {$k=1,2,3,\ldots$ until convergence (error$<\epsilon_{tsirm}$)} \label{algo:conv}
-    \State  $x_k=Solve(A,b,x_{k-1},max\_iter_{kryl})$   \label{algo:solve}
-    \State retrieve error
+    \State  $[x_k,error]=Solve(A,b,x_{k-1},max\_iter_{kryl})$   \label{algo:solve}
      \State $S_{k \mod s}=x_k$ \label{algo:store}
      \If {$k \mod s=0$ {\bf and} error$>\epsilon_{kryl}$}
        \State $R=AS$ \Comment{compute dense matrix} \label{algo:matrix_mul}
@@ -665,19 +663,22 @@ appropriate than a single direct method in a parallel context.
  \label{algo:01}
  \end{algorithm}
  
-Algorithm~\ref{algo:01}  summarizes  the principle  of  the proposed  method.  The  outer
-iteration is  inside the \emph{for}  loop. Line~\ref{algo:solve}, the Krylov  method is
-called for a  maximum of $max\_iter_{kryl}$ iterations.  In practice, we  suggest to set this parameter
-equal to  the restart  number in the  GMRES-like method. Moreover,  a tolerance
-threshold must be specified for the  solver. In practice, this threshold must be
-much  smaller  than the  convergence  threshold  of  the TSIRM  algorithm  (\emph{i.e.},
-$\epsilon_{tsirm}$).  Line~\ref{algo:store}, $S_{k \mod s}=x^k$ consists in copying the
-solution  $x_k$  into the  column  $k \mod s$ of $S$.
-After  the
-minimization, the matrix $S$ is reused with the new values of the residuals.  To
-solve the minimization problem, an  iterative method is used. Two parameters are
-required for that: the maximum number of iterations and the threshold to stop the
-method.
+Algorithm~\ref{algo:01} summarizes  the principle  of the proposed  method.  The
+outer iteration is inside the \emph{for} loop. Line~\ref{algo:solve}, the Krylov
+method is called  for a maximum of $max\_iter_{kryl}$  iterations.  In practice,
+we suggest to  set this parameter equal to the restart  number in the GMRES-like
+method. Moreover,  a tolerance  threshold must be  specified for the  solver. In
+practice, this threshold must be  much smaller than the convergence threshold of
+the  TSIRM algorithm  (\emph{i.e.}, $\epsilon_{tsirm}$).  We also  consider that
+after the call of the $Solve$ function, we obtain the vector $x_k$ and the error
+which is defined by $||Ax^k-b||_2$.
+
+  Line~\ref{algo:store},
+$S_{k \mod  s}=x^k$ consists in  copying the solution  $x_k$ into the  column $k
+\mod s$ of $S$.   After the minimization, the matrix $S$ is  reused with the new
+values of the residuals.  To solve the minimization problem, an iterative method
+is used. Two parameters are required  for that: the maximum number of iterations
+and the threshold to stop the method.
  
  Let us summarize the most important parameters of TSIRM:
  \begin{itemize}
@@ -871,7 +872,7 @@ torso3             & fgmres / sor  & 37.70 & 565 & 34.97 & 510 \\
  
  
  
-In order to perform larger  experiments, we have tested some example applications
+In order to perform larger experiments, we have tested some example applications
  of PETSc. Those  applications are available in the ksp part  which is suited for
  scalable linear equations solvers:
  \begin{itemize}
@@ -884,11 +885,23 @@ scalable linear equations solvers:
    finite elements. For this example, the user can define the scaling of material
    coefficient in embedded circle called $\alpha$.
  \end{itemize}
-For more technical details on  these applications, interested readers are invited
-to  read the  codes available  in the  PETSc sources.   Those problems  have been
-chosen because they  are scalable with many cores which is not the case of other problems that we have tested.
+For more technical details on these applications, interested readers are invited
+to read  the codes  available in  the PETSc sources.   Those problems  have been
+chosen because they are scalable with many  cores which is not the case of other
+problems that we have tested.
+
+In  the  following   larger  experiments  are  described  on   two  large  scale
+architectures:  Curie and  Juqeen.  Both  these architectures  are supercomputer
+composed of 80,640 cores for Curie and 458,752 cores for Juqueen. Those machines
+are respectively hosted  by GENCI in France and  Jülich Supercomputing Centre in
+Germany. They belongs with other similar architectures of the PRACE initiative (
+Partnership  for Advanced  Computing in  Europe)  which aims  at proposing  high
+performance supercomputing architecture to enhance research in Europe. The Curie
+architecture is composed of Intel E5-2680  processors at 2.7 GHz with 2Gb memory
+by core. The Juqueen architecture is composed  of IBM PowerPC A2 at 1.6 GHz with
+1Gb memory per  core. Both those architecture are equiped  with a dedicated high
+speed network.
  
-In the following larger experiments are described on two large scale architectures: Curie and Juqeen... {\bf description...}\\
  
  
  {\bf Description of preconditioners}\\