new

[GMRES2stage.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 416c76ec7649311395e170b2a06be816e215aa20..a4c9b268faf6197c47d7c288188adc6bf299b3de 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -849,8 +849,9 @@ torso3             & 2D/3D problem & 259,156 & 4,429,042 \\
  \label{tab:01}
  \end{center}
  \end{table}
  \label{tab:01}
  \end{center}
  \end{table}
-
-The following  parameters have been chosen  for our experiments.   As by default
+Chosen parameters are detailed below.
+%The following  parameters have been chosen  for our experiments.   
+As by default
  the restart  of GMRES is performed every  30 iterations, we have  chosen to stop
  the GMRES every 30 iterations (\emph{i.e.} $max\_iter_{kryl}=30$).  $s$ is set to 8. CGLS is
  chosen  to minimize  the least-squares  problem with  the  following parameters:
  the restart  of GMRES is performed every  30 iterations, we have  chosen to stop
  the GMRES every 30 iterations (\emph{i.e.} $max\_iter_{kryl}=30$).  $s$ is set to 8. CGLS is
  chosen  to minimize  the least-squares  problem with  the  following parameters:
@@ -930,8 +931,16 @@ by core. The Juqueen architecture is composed  of IBM PowerPC A2 at 1.6 GHz with
  speed network.
  
  
  speed network.
  
  
+In  many situations, using  preconditioners is  essential in  order to  find the
+solution of a linear system.  There are many preconditioners available in PETSc.
+For parallel applications all  the preconditioners based on matrix factorization
+are  not  available. In  our  experiments, we  have  tested  different kinds  of
+preconditioners, however  as it is  not the subject  of this paper, we  will not
+present results with many preconditioners. In  practise, we have chosen to use a
+multigrid (mg)  and successive  over-relaxation (sor). For  more details  on the
+preconditioner in PETSc please consult~\cite{petsc-web-page}.
+
  
  
-{\bf Description of preconditioners}\\
  
  \begin{table*}[htbp]
  \begin{center}
  
  \begin{table*}[htbp]
  \begin{center}
@@ -959,8 +968,7 @@ speed network.
  
  Table~\ref{tab:03} shows  the execution  times and the  number of  iterations of
  example ex15  of PETSc on the  Juqueen architecture. Different  numbers of cores
  
  Table~\ref{tab:03} shows  the execution  times and the  number of  iterations of
  example ex15  of PETSc on the  Juqueen architecture. Different  numbers of cores
-are  studied ranging  from  2,048  up-to 16,383.   Two  preconditioners have  been
-tested: {\it mg} and {\it sor}.   For those experiments,  the number  of components  (or unknowns  of the
+are  studied ranging  from  2,048  up-to 16,383 with the two preconditioners {\it mg} and {\it sor}.   For those experiments,  the number  of components  (or unknowns  of the
  problems)  per core  is fixed  to 25,000,  also called  weak  scaling. This
  number can seem relatively small. In fact, for some applications that need a lot
  of  memory, the  number of  components per  processor requires  sometimes  to be
  problems)  per core  is fixed  to 25,000,  also called  weak  scaling. This
  number can seem relatively small. In fact, for some applications that need a lot
  of  memory, the  number of  components per  processor requires  sometimes  to be
@@ -1027,7 +1035,21 @@ the number of iterations. So, the overall benefit of using TSIRM is interesting.
  \end{table*}
  
  
  \end{table*}
  
  
-In Table~\ref{tab:04}, some experiments with example ex54 on the Curie architecture are reported.
+In  Table~\ref{tab:04},  some  experiments   with  example  ex54  on  the  Curie
+architecture are reported.  For this  application, we fixed $\alpha=0.6$.  As it
+can be seen in that Table, the size of the problem has a strong influence on the
+number of iterations to reach the  convergence. That is why we have preferred to
+change the threshold.  If we set  it to $1e-3$ as with the previous application,
+only one iteration is necessray  to reach the convergence. So Table~\ref{tab:04}
+shows the results  of differents executions with differents  number of cores and
+differents thresholds. As  with the previous example, we  can observe that TSIRM
+is faster than FGMRES. The ratio greatly depends on the number of iterations for
+FMGRES to reach the threshold. The greater the number of iterations to reach the
+convergence is, the  better the ratio between our algorithm  and FMGRES is. This
+experiment is  also a  weak scaling with  approximately $25,000$  components per
+core. It can also  be observed that the difference between CGLS  and LSQR is not
+significant. Both can be good but it seems not possible to know in advance which
+one will be the best.
  
  
  \begin{table*}[htbp]
  
  
  \begin{table*}[htbp]