X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/Krylov_multi.git/blobdiff_plain/fd654bb9a37efb3530aed230af886466bbb7f42a..662f69e0fc2f13849c19868a71423817e6305cf9:/krylov_multi.tex?ds=inline

diff --git a/krylov_multi.tex b/krylov_multi.tex
index 92730ac..9e423d1 100644
--- a/krylov_multi.tex
+++ b/krylov_multi.tex
@@ -17,10 +17,8 @@
 \newcommand{\Prec}{\mathit{prec}}
 \newcommand{\Ratio}{\mathit{Ratio}}
 
-%\usepackage{xspace}
-%\usepackage[textsize=footnotesize]{todonotes}
-%\newcommand{\LZK}[2][inline]{%
-%\todo[color=green!40,#1]{\sffamily\textbf{LZK:} #2}\xspace}
+\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
+\let\endchangemargin=\endlist
 
 \title{A scalable multisplitting algorithm for solving large sparse linear systems} 
 \date{}
@@ -67,19 +65,41 @@ Preconditioners can be  used in order to increase  the convergence of iterative
 solvers.   However, most  of the  good preconditioners  are not  scalable when
 thousands of cores are used.
 
-Traditional iterative  solvers have  global synchronizations that  penalize the
-scalability.   Two  possible solutions  consists  either  in using  asynchronous
-iterative  methods~\cite{ref18} or  to  use multisplitting  algorithms. In  this
-paper, we will  reconsider the use of a multisplitting  method. In opposition to
-traditional  multisplitting  method  that  suffer  from  slow  convergence,  as
-proposed  in~\cite{huang1993krylov},  the  use  of a  minimization  process  can
-drastically improve the convergence.
+%Traditional iterative  solvers have  global synchronizations that  penalize the
+%scalability.   Two  possible solutions  consists  either  in using  asynchronous
+%iterative  methods~\cite{ref18} or  to  use multisplitting  algorithms. In  this
+%paper, we will  reconsider the use of a multisplitting  method. In opposition to
+%traditional  multisplitting  method  that  suffer  from  slow  convergence,  as
+%proposed  in~\cite{huang1993krylov},  the  use  of a  minimization  process  can
+%drastically improve the convergence.
+
+Traditional parallel iterative solvers are based on fine-grain computations that
+frequently  require  data exchanges  between  computing  nodes  and have  global
+synchronizations  that penalize  the  scalability. Particularly,  they are  more
+penalized on large  scale architectures or on distributed  platforms composed of
+distant  clusters interconnected  by  a high-latency  network.  It is  therefore
+imperative to develop coarse-grain based algorithms to reduce the communications
+in the  parallel iterative  solvers. Two possible  solutions consists  either in
+using  asynchronous  iterative  methods~\cite{ref18}  or to  use  multisplitting
+algorithms.  In this  paper,  we will  reconsider  the use  of a  multisplitting
+method. In opposition to traditional multisplitting method that suffer from slow
+convergence, as  proposed in~\cite{huang1993krylov},  the use of  a minimization
+process can drastically improve the convergence.
+
+The present paper is  organized as follows. First, Section~\ref{sec:02} presents
+some  related  works and  the  principle  of  multisplitting methods.  Then,  in
+Section~\ref{sec:03}  is presented  the algorithm  of our  Krylov multisplitting
+method based  on inner-outer  iterations. Finally, in  Section~\ref{sec:04}, the
+parallel experiments on Hector architecture  show the performances of the Krylov
+multisplitting algorithm compared to the classical GMRES algorithm to solve a 3D
+Poisson problem.
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%
 
-\section{Related works and presention of the multisplitting method}
+\section{Related works and presentation of the multisplitting method}
+\label{sec:02}
 A general framework  for studying parallel multisplitting has  been presented in~\cite{o1985multi}
 by O'Leary and White. Convergence conditions are given for the
 most general case.  Many authors improved multisplitting algorithms by proposing
@@ -142,6 +162,7 @@ In the case where the diagonal weighting matrices $E_\ell$ have only zero and on
 %%%%%%%%%%%%%%%%%%%%%%%%
 
 \section{A two-stage method with a minimization}
+\label{sec:03}
 Let $Ax=b$ be a given large and sparse linear system of $n$ equations to solve in parallel on $L$ clusters of processors, physically adjacent or geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square and non-singular matrix, $x\in\mathbb{R}^{n}$ is the solution vector and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The multisplitting of this linear system is defined as follows
 \begin{equation}
 \left\{
@@ -253,6 +274,7 @@ The main key points of our Krylov multisplitting method to solve a large sparse
 %%%%%%%%%%%%%%%%%%%%%%%%
 
 \section{Experiments}
+\label{sec:04}
 In order to illustrate  the interest  of our algorithm. We have  compared our
 algorithm  with  the  GMRES  method  which  is a very  well  used  method  in  many
 situations.  We have chosen to focus on only one problem which is very simple to
@@ -277,10 +299,10 @@ preconditioners are not scalable when using many cores.
 
 %Doing many experiments  with many cores is  not easy and requires to  access to a supercomputer  with several  hours for  developing  a code  and then  improving it. 
 In the following we present some experiments we could achieved out on the Hector
-architecture, the  previous UK's high-end  computing resource, funded by  the UK
-Research Councils. This  is a Cray XE6 supercomputer,  equipped with two 16-core
-AMD Opteron 2.3 Ghz  and 32 GB of memory. Machines are  interconnected with a 3D
-torus.
+architecture,  a UK's  high-end computing  resource, funded  by the  UK Research
+Councils~\cite{hector}.  This is  a Cray  XE6 supercomputer,  equipped  with two
+16-core AMD  Opteron 2.3 Ghz  and 32 GB  of memory. Machines  are interconnected
+with a 3D torus.
 
 Table~\ref{tab1} shows  the result of  the experiments.  The first  column shows
 the  size of  the  3D Poisson  problem.  The size  is chosen  in  order to  have
@@ -302,6 +324,8 @@ is reached. The precision and the maximum number of iterations of CGNR method ar
 
 \begin{table}[htbp]
 \begin{center}
+\begin{changemargin}{-1.4cm}{0cm}
+\begin{footnotesize}
 \begin{tabular}{|c|c||c|c|c||c|c|c||c|} 
 \hline
 \multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} &  \multicolumn{3}{c||}{GMRES} &  \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\
@@ -320,6 +344,38 @@ $743^3$ & 8,192 (4x2,048)        & 704.4     & 87,822    & 4.80e-07 &  110.3   &
 \end{tabular}
 \caption{Results}
 \label{tab1}
+\end{footnotesize}
+\end{changemargin}
+\end{center}
+\end{table}
+
+
+
+
+\begin{table}[htbp]
+\begin{center}
+\begin{changemargin}{-1.8cm}{0cm}
+\begin{small}
+\begin{tabular}{|c|c||c|c|c||c|c|c||c|} 
+\hline
+\multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} &  \multicolumn{3}{c||}{GMRES} &  \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\
+ \cline{3-8}
+           &                   &  Time (s) & nb Iter. & $\Delta$  &   Time (s)& nb Iter. & $\Delta$ & \\
+\hline
+$468^3$ & 2,048 (2x1,024)        &  299.7    & 41,028    & 5.02e-8  &  48.4    & 691(6,146) & 8.24e-08  & 6.19   \\
+\hline
+$590^3$ & 4,096 (2x2,048)        &  433.1    & 55,494    & 4.92e-7  &  74.1    & 1,101(8,211) & 6.62e-08  & 5.84   \\
+\hline
+$743^3$ & 8,192 (2x4,096)        & 704.4     & 87,822    & 4.80e-07 &  151.2   & 3,061(14,914) & 5.87e-08 & 4.65    \\
+\hline
+$743^3$ & 8,192 (4x2,048)        & 704.4     & 87,822    & 4.80e-07 &  110.3   & 1,531(12,721) & 1.47e-07& 6.39  \\
+\hline
+
+\end{tabular}
+\caption{Results}
+\label{tab1}
+\end{small}
+\end{changemargin}
 \end{center}
 \end{table}
 
@@ -366,6 +422,8 @@ intend  to investigate  the  convergence  improvements of  our  method by  using
 preconditioning  techniques  for  Krylov  iterative methods  and  multisplitting
 methods with overlapping blocks.
 
+\section{Acknowledgement}
+The authors would like to thank Mark Bull of the EPCC his fruitful remarks and the facilities of HECToR.
 
 %Other applications (=> other matrices)\\
 %Larger experiments\\