09-01-2014

[Krylov_multi.git] / krylov_multi.tex
diff --git a/krylov_multi.tex b/krylov_multi.tex

index b96a5b9aec0e9a27f59c84477910225b2fb1ae37..48d9e2e191998300b47a9534c27a1dd89c622534 100644 (file)
--- a/krylov_multi.tex
+++ b/krylov_multi.tex
@@ -21,12 +21,12 @@
  \begin{abstract}
  In  this  paper we  revist  the  krylov  multisplitting algorithm  presented  in
  \cite{huang1993krylov}  which  uses  a  scalar  method to  minimize  the  krylov
  \begin{abstract}
  In  this  paper we  revist  the  krylov  multisplitting algorithm  presented  in
  \cite{huang1993krylov}  which  uses  a  scalar  method to  minimize  the  krylov
-iterations computed by a multisplitting algorithm. Our new algorithm is simply a
-parallel multisplitting algorithm  with few blocks of large  size and a parallel
-krylov  minimization  is used  to  improve  the  convergence. Some  large  scale
-experiments  with a 3D  Poisson problem  are presented.  They show  the obtained
-improvements compared to a classical GMRES both in terms of number of iterations
-and execution times.
+iterations computed by a multisplitting algorithm. Our new algorithm is based on
+a  parallel multisplitting  algorithm  with few  blocks  of large  size using  a
+parallel GMRES method inside each block and on a parallel krylov minimization in
+order to improve the convergence. Some large scale experiments with a 3D Poisson
+problem  are presented.   They  show  the obtained  improvements  compared to  a
+classical GMRES both in terms of number of iterations and execution times.
  \end{abstract}
  
  
  \end{abstract}
  
  
@@ -38,14 +38,106 @@ and execution times.
  
  Iterative methods are used to solve  large sparse linear systems of equations of
  the form  $Ax=b$ because they are  easier to parallelize than  direct ones. Many
  
  Iterative methods are used to solve  large sparse linear systems of equations of
  the form  $Ax=b$ because they are  easier to parallelize than  direct ones. Many
-iterative  methods have  been proposed  and adpated  by many  researchers.  When
-solving large  linear systems  with many cores,  iterative methods  often suffer
-from  scalability  problems.    This  is  due  to  their   need  for  collective
-communications to perform matrix-vector products and reduction operations.
-
-%%%%% Lilia
-% doit-on définir le principe et les préliminaires du multisplitting dans l'intro ou dans l'autre section? 
-% valides-tu le titre de la 2eme section? celle que je voudrai rédiger.
+iterative  methods have  been proposed  and  adapted by  many researchers.   For
+example, the GMRES method and the  Conjugate Gradient method are very well known
+and  used by  many researchers  ~\cite{S96}. Both  the method  are based  on the
+Krylov subspace which consists in forming  a basis of the sequence of successive
+matrix powers times the initial residual.
+
+When  solving large  linear systems  with  many cores,  iterative methods  often
+suffer  from scalability problems.   This is  due to  their need  for collective
+communications  to  perform  matrix-vector  products and  reduction  operations.
+Preconditionners can be  used in order to increase  the convergence of iterative
+solvers.   However, most  of the  good preconditionners  are not  sclalable when
+thousands of cores are used.
+
+
+A completer...
+On ne peut pas parler de tout...\\
+
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%
+%% BEGIN
+%%%%%%%%%%%%%%%%%%%%%%%
+The key idea  of the multisplitting method for  solving a large system
+of linear equations $Ax=b$ consists  in partitioning the matrix $A$ in
+$L$ several ways
+\begin{equation}
+A = M_l - N_l,~l\in\{1,\ldots,L\},
+\label{eq01}
+\end{equation}
+where $M_l$ are nonsingular matrices. Then the linear system is solved
+by iteration based on the multisplittings as follows
+\begin{equation}
+x^{k+1}=\displaystyle\sum^L_{l=1} E_l M^{-1}_l (N_l x^k + b),~k=1,2,3,\ldots
+\label{eq02}
+\end{equation}
+where $E_l$ are non-negative and diagonal weighting matrices such that
+$\sum^L_{l=1}E_l=I$ ($I$ is an identity matrix).  Thus the convergence
+of such a method is dependent on the condition
+\begin{equation}
+\rho(\displaystyle\sum^L_{l=1}E_l M^{-1}_l N_l)<1.
+\label{eq03}
+\end{equation}
+
+The advantage of  the multisplitting method is that  at each iteration
+$k$ there are $L$ different linear sub-systems
+\begin{equation}
+v_l^k=M^{-1}_l N_l x_l^{k-1} + M^{-1}_l b,~l\in\{1,\ldots,L\},
+\label{eq04}
+\end{equation}
+to be solved  independently by a direct or  an iterative method, where
+$v_l^k$  is   the  solution  of   the  local  sub-system.   Thus,  the
+calculations  of $v_l^k$  may be  performed in  parallel by  a  set of
+processors.   A multisplitting  method using  an iterative  method for
+solving the $L$ linear  sub-systems is called an inner-outer iterative
+method or a  two-stage method.  The results $v_l^k$  obtained from the
+different splittings~(\ref{eq04}) are combined to compute the solution
+$x^k$ of the linear system by using the diagonal weighting matrices
+\begin{equation}
+x^k = \displaystyle\sum^L_{l=1} E_l v_l^k,
+\label{eq05}
+\end{equation}    
+In the case where the diagonal weighting matrices $E_l$ have only zero
+and   one   factors  (i.e.   $v_l^k$   are   disjoint  vectors),   the
+multisplitting method is non-overlapping  and corresponds to the block
+Jacobi method.
+%%%%%%%%%%%%%%%%%%%%%%%
+%% END
+%%%%%%%%%%%%%%%%%%%%%%%
+
+\section{Related works}
+
+
+A general framework  for studying parallel multisplitting has  been presented in
+\cite{o1985multi} by O'Leary and White. Convergence conditions are given for the
+most general case.  Many authors improved multisplitting algorithms by proposing
+for  example  an  asynchronous  version  \cite{bru1995parallel}  and  convergence
+conditions  \cite{bai1999block,bahi2000asynchronous}   in  this  case   or  other
+two-stage algorithms~\cite{frommer1992h,bru1995parallel}.
+
+In  \cite{huang1993krylov},  the  authors  proposed  a  parallel  multisplitting
+algorithm in which all the tasks except  one are devoted to solve a sub-block of
+the splitting  and to send their  local solution to  the first task which  is in
+charge to  combine the vectors at  each iteration.  These vectors  form a Krylov
+basis for  which the first task minimizes  the error function over  the basis to
+increase the convergence, then the other tasks receive the update solution until
+convergence of the global system. 
+
+
+
+In \cite{couturier2008gremlins}, the  authors proposed practical implementations
+of multisplitting algorithms that take benefit from multisplitting algorithms to
+solve large scale linear systems. Inner  solvers could be based on scalar direct
+method with the LU method or scalar iterative one with GMRES.
+
+In~\cite{prace-multi},  the  authors  have  proposed a  parallel  multisplitting
+algorithm in which large block are solved using a GMRES solver. The authors have
+performed large scale experimentations upto  32.768 cores and they conclude that
+asynchronous  multisplitting algorithm  could more  efficient  than traditionnal
+solvers on exascale architecture with hunders of thousands of cores.
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%
@@ -53,6 +145,56 @@ communications to perform matrix-vector products and reduction operations.
  
  
  \section{A two-stage method with a minimization}
  
  
  \section{A two-stage method with a minimization}
+Let $Ax=b$ be a given sparse and large linear system of $n$ equations
+to solve in parallel on $L$ clusters, physically adjacent or geographically
+distant, where $A\in\mathbb{R}^{n\times n}$ is a square and nonsingular
+matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$
+is the right-hand side vector. The multisplitting of this linear system 
+is defined as follows:
+\begin{equation}
+\left\{
+\begin{array}{lll}
+A & = & [A_{1}, \ldots, A_{L}]\\
+x & = & [X_{1}, \ldots, X_{L}]\\
+b & = & [B_{1}, \ldots, B_{L}]
+\end{array}
+\right.
+\label{sec03:eq01}
+\end{equation}  
+where for $l\in\{1,\ldots,L\}$ $A_l$ is a rectangular block of size $n_l\times n$
+and $X_l$ and $B_l$ are sub-vectors of size $n_l$, such that $\sum_ln_l=n$. In this
+case, we use a row-by-row splitting without overlapping in such a way that successive
+rows of the sparse matrix $A$ and both vectors $x$ and $b$ are assigned to one cluster.
+So, the multisplitting format of the linear system is defined as follows:
+\begin{equation}
+\forall l\in\{1,\ldots,L\} \mbox{,~} \displaystyle\sum_{i=1}^{l-1}A_{li}X_i + A_{ll}X_l + \displaystyle\sum_{i=l+1}^{L}A_{li}X_i = B_l, 
+\label{sec03:eq02}
+\end{equation} 
+where $A_{li}$ is a block of size $n_l\times n_i$ of the rectangular matrix $A_l$, $X_i\neq X_l$
+is a sub-vector of size $n_i$ of the solution vector $x$ and $\sum_{i<l}n_i+\sum_{i>l}n_i+n_l=n$,
+for all $i\in\{1,\ldots,l-1,l+1,\ldots,L\}$. 
+
+The multisplitting method proceeds by iteration for solving the linear system in such a
+way each sub-system
+\begin{equation}
+\left\{
+\begin{array}{l}
+A_{ll}X_l = Y_l \mbox{,~such that}\\
+Y_l = B_l - \displaystyle\sum_{i=1,i\neq l}^{L}A_{li}X_i,
+\end{array}
+\right.
+\label{sec03:eq03}
+\end{equation}
+is solved independently by a cluster of processors and communication are required to
+update the right-hand side vectors $Y_l$, such that the vectors $X_i$ represent the data
+dependencies between the clusters. In this work, we use the GMRES method as an inner
+iteration method for solving the sub-systems~(\ref{sec03:eq03}). It is a well-known
+iterative method which gives good performances for solving sparse linear systems in
+parallel on a cluster of processors. 
+
+
+
+
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%
  
  
  %%%%%%%%%%%%%%%%%%%%%%%%