v2

[Krylov_multi.git] / krylov_multi.tex
diff --git a/krylov_multi.tex b/krylov_multi.tex

index d687ad844fbcd030335560f35764dbb81b0e49ac..87ee2279999d06ae61bfa857b6c2ae4aab60c1bd 100644 (file)
--- a/krylov_multi.tex
+++ b/krylov_multi.tex
@@ -67,19 +67,24 @@ Preconditioners can be  used in order to increase  the convergence of iterative
  solvers.   However, most  of the  good preconditioners  are not  scalable when
  thousands of cores are used.
  
  solvers.   However, most  of the  good preconditioners  are not  scalable when
  thousands of cores are used.
  
-Traditional iterative  solvers have  global synchronizations that  penalize the
-scalability.   Two  possible solutions  consists  either  in using  asynchronous
-iterative  methods~\cite{ref18} or  to  use multisplitting  algorithms. In  this
-paper, we will  reconsider the use of a multisplitting  method. In opposition to
-traditional  multisplitting  method  that  suffer  from  slow  convergence,  as
-proposed  in~\cite{huang1993krylov},  the  use  of a  minimization  process  can
-drastically improve the convergence.
+%Traditional iterative  solvers have  global synchronizations that  penalize the
+%scalability.   Two  possible solutions  consists  either  in using  asynchronous
+%iterative  methods~\cite{ref18} or  to  use multisplitting  algorithms. In  this
+%paper, we will  reconsider the use of a multisplitting  method. In opposition to
+%traditional  multisplitting  method  that  suffer  from  slow  convergence,  as
+%proposed  in~\cite{huang1993krylov},  the  use  of a  minimization  process  can
+%drastically improve the convergence.
+
+Traditional parallel iterative solvers are based on fine-grain computations that frequently require data exchanges between computing nodes and have global synchronizations that penalize the scalability. Particularly, they are more penalized on large scale architectures or on distributed platforms composed of distant clusters interconnected by a high-latency network. It is therefore imperative to develop coarse-grain based algorithms to reduce the communications in the parallel iterative solvers. Two  possible solutions consists either in using asynchronous iterative methods~\cite{ref18} or to use multisplitting algorithms. In this paper, we will reconsider the use of a multisplitting method. In opposition to traditional multisplitting method that suffer from slow convergence, as proposed in~\cite{huang1993krylov}, the use of a minimization process can drastically improve the convergence.
+
+The present paper is organized as follows. First in Section~\ref{sec:02} is given some related works and the main principle of multisplitting methods. Then, in Section~\ref{sec:03} is presented the algorithm of our Krylov multisplitting method based on inner-outer iterations. Finally, in Section~\ref{sec:04}, the parallel experiments on Hector architecture show the performances of the Krylov multisplitting algorithm compared to the classical GMRES algorithm to solve a 3D Poisson problem.
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%
  %%%%%%%%%%%%%%%%%%%%%%%%
  
-\section{Related works and presention of the multisplitting method}
+\section{Related works and presentation of the multisplitting method}
+\label{sec:02}
  A general framework  for studying parallel multisplitting has  been presented in~\cite{o1985multi}
  by O'Leary and White. Convergence conditions are given for the
  most general case.  Many authors improved multisplitting algorithms by proposing
  A general framework  for studying parallel multisplitting has  been presented in~\cite{o1985multi}
  by O'Leary and White. Convergence conditions are given for the
  most general case.  Many authors improved multisplitting algorithms by proposing
@@ -142,6 +147,7 @@ In the case where the diagonal weighting matrices $E_\ell$ have only zero and on
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  \section{A two-stage method with a minimization}
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  \section{A two-stage method with a minimization}
+\label{sec:03}
  Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows
  \begin{equation}
  \left\{
  Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows
  \begin{equation}
  \left\{
@@ -253,6 +259,7 @@ The main key points of our Krylov multisplitting method to solve a large sparse
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  \section{Experiments}
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  \section{Experiments}
+\label{sec:04}
  In order to illustrate  the interest  of our algorithm. We have  compared our
  algorithm  with  the  GMRES  method  which  is a very  well  used  method  in  many
  situations.  We have chosen to focus on only one problem which is very simple to
  In order to illustrate  the interest  of our algorithm. We have  compared our
  algorithm  with  the  GMRES  method  which  is a very  well  used  method  in  many
  situations.  We have chosen to focus on only one problem which is very simple to
@@ -277,10 +284,10 @@ preconditioners are not scalable when using many cores.
  
  %Doing many experiments  with many cores is  not easy and requires to  access to a supercomputer  with several  hours for  developing  a code  and then  improving it. 
  In the following we present some experiments we could achieved out on the Hector
  
  %Doing many experiments  with many cores is  not easy and requires to  access to a supercomputer  with several  hours for  developing  a code  and then  improving it. 
  In the following we present some experiments we could achieved out on the Hector
-architecture, a UK's high-end  computing resource, funded by  the UK
-Research Councils. This  is a Cray XE6 supercomputer,  equipped with two 16-core
-AMD Opteron 2.3 Ghz  and 32 GB of memory. Machines are  interconnected with a 3D
-torus.
+architecture,  a UK's  high-end computing  resource, funded  by the  UK Research
+Councils~\cite{hector}.  This is  a Cray  XE6 supercomputer,  equipped  with two
+16-core AMD  Opteron 2.3 Ghz  and 32 GB  of memory. Machines  are interconnected
+with a 3D torus.
  
  Table~\ref{tab1} shows  the result of  the experiments.  The first  column shows
  the  size of  the  3D Poisson  problem.  The size  is chosen  in  order to  have
  
  Table~\ref{tab1} shows  the result of  the experiments.  The first  column shows
  the  size of  the  3D Poisson  problem.  The size  is chosen  in  order to  have
@@ -366,6 +373,8 @@ intend  to investigate  the  convergence  improvements of  our  method by  using
  preconditioning  techniques  for  Krylov  iterative methods  and  multisplitting
  methods with overlapping blocks.
  
  preconditioning  techniques  for  Krylov  iterative methods  and  multisplitting
  methods with overlapping blocks.
  
+\section{Acknowledgement}
+The authors would like to thank Mark Bull of the EPCC his fruitful remarks and the facilities of HECToR.
  
  %Other applications (=> other matrices)\\
  %Larger experiments\\
  
  %Other applications (=> other matrices)\\
  %Larger experiments\\