+\documentclass{article}
+\usepackage[utf8]{inputenc}
+\usepackage{amsfonts,amssymb}
+\usepackage{amsmath}
+\usepackage{graphicx}
+
+\title{A scalable multisplitting algorithm for solving large sparse linear systems}
+
+
+
+\begin{document}
+\author{Raphaël Couturier \and Lilia Ziane Khodja}
+
+\maketitle
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+\begin{abstract}
+In this paper we revist the krylov multisplitting algorithm presented in
+\cite{huang1993krylov} which uses a scalar method to minimize the krylov
+iterations computed by a multisplitting algorithm. Our new algorithm is based on
+a parallel multisplitting algorithm with few blocks of large size using a
+parallel GMRES method inside each block and on a parallel krylov minimization in
+order to improve the convergence. Some large scale experiments with a 3D Poisson
+problem are presented. They show the obtained improvements compared to a
+classical GMRES both in terms of number of iterations and execution times.
+\end{abstract}
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+\section{Introduction}
+
+Iterative methods are used to solve large sparse linear systems of equations of
+the form $Ax=b$ because they are easier to parallelize than direct ones. Many
+iterative methods have been proposed and adapted by many researchers. For
+example, the GMRES method and the Conjugate Gradient method are very well known
+and used by many researchers ~\cite{S96}. Both the method are based on the
+Krylov subspace which consists in forming a basis of the sequence of successive
+matrix powers times the initial residual.
+
+When solving large linear systems with many cores, iterative methods often
+suffer from scalability problems. This is due to their need for collective
+communications to perform matrix-vector products and reduction operations.
+Preconditionners can be used in order to increase the convergence of iterative
+solvers. However, most of the good preconditionners are not sclalable when
+thousands of cores are used.
+
+
+A completer...
+On ne peut pas parler de tout...\\
+
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%
+%% BEGIN
+%%%%%%%%%%%%%%%%%%%%%%%
+The key idea of the multisplitting method for solving a large system
+of linear equations $Ax=b$ consists in partitioning the matrix $A$ in
+$L$ several ways
+\begin{equation}
+A = M_l - N_l,~l\in\{1,\ldots,L\},
+\label{eq01}
+\end{equation}
+where $M_l$ are nonsingular matrices. Then the linear system is solved
+by iteration based on the multisplittings as follows
+\begin{equation}
+x^{k+1}=\displaystyle\sum^L_{l=1} E_l M^{-1}_l (N_l x^k + b),~k=1,2,3,\ldots
+\label{eq02}
+\end{equation}
+where $E_l$ are non-negative and diagonal weighting matrices such that
+$\sum^L_{l=1}E_l=I$ ($I$ is an identity matrix). Thus the convergence
+of such a method is dependent on the condition
+\begin{equation}
+\rho(\displaystyle\sum^L_{l=1}E_l M^{-1}_l N_l)<1.
+\label{eq03}
+\end{equation}
+
+The advantage of the multisplitting method is that at each iteration
+$k$ there are $L$ different linear sub-systems
+\begin{equation}
+v_l^k=M^{-1}_l N_l x_l^{k-1} + M^{-1}_l b,~l\in\{1,\ldots,L\},
+\label{eq04}
+\end{equation}
+to be solved independently by a direct or an iterative method, where
+$v_l^k$ is the solution of the local sub-system. Thus, the
+calculations of $v_l^k$ may be performed in parallel by a set of
+processors. A multisplitting method using an iterative method for
+solving the $L$ linear sub-systems is called an inner-outer iterative
+method or a two-stage method. The results $v_l^k$ obtained from the
+different splittings~(\ref{eq04}) are combined to compute the solution
+$x^k$ of the linear system by using the diagonal weighting matrices
+\begin{equation}
+x^k = \displaystyle\sum^L_{l=1} E_l v_l^k,
+\label{eq05}
+\end{equation}
+In the case where the diagonal weighting matrices $E_l$ have only zero
+and one factors (i.e. $v_l^k$ are disjoint vectors), the
+multisplitting method is non-overlapping and corresponds to the block
+Jacobi method.
+%%%%%%%%%%%%%%%%%%%%%%%
+%% END
+%%%%%%%%%%%%%%%%%%%%%%%
+
+\section{Related works}
+
+
+A general framework for studying parallel multisplitting has been presented in
+\cite{o1985multi} by O'Leary and White. Convergence conditions are given for the
+most general case. Many authors improved multisplitting algorithms by proposing
+for example an asynchronous version \cite{bru1995parallel} and convergence
+conditions \cite{bai1999block,bahi2000asynchronous} in this case or other
+two-stage algorithms~\cite{frommer1992h,bru1995parallel}.
+
+In \cite{huang1993krylov}, the authors proposed a parallel multisplitting
+algorithm in which all the tasks except one are devoted to solve a sub-block of
+the splitting and to send their local solution to the first task which is in
+charge to combine the vectors at each iteration. These vectors form a Krylov
+basis for which the first task minimizes the error function over the basis to
+increase the convergence, then the other tasks receive the update solution until
+convergence of the global system.
+
+
+
+In \cite{couturier2008gremlins}, the authors proposed practical implementations
+of multisplitting algorithms that take benefit from multisplitting algorithms to
+solve large scale linear systems. Inner solvers could be based on scalar direct
+method with the LU method or scalar iterative one with GMRES.
+
+In~\cite{prace-multi}, the authors have proposed a parallel multisplitting
+algorithm in which large block are solved using a GMRES solver. The authors have
+performed large scale experimentations upto 32.768 cores and they conclude that
+asynchronous multisplitting algorithm could more efficient than traditionnal
+solvers on exascale architecture with hunders of thousands of cores.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+\section{A two-stage method with a minimization}
+Let $Ax=b$ be a given sparse and large linear system of $n$ equations
+to solve in parallel on $L$ clusters, physically adjacent or
+geographically distant, where $A\in\mathbb{R}^{n\times n}$ is a square
+and nonsingular matrix, $x\in\mathbb{R}^{n}$ is the solution vector
+and $b\in\mathbb{R}^{n}$ is the right-hand side vector. The
+multisplitting of this linear system is defined as follows:
+\begin{equation}
+\left\{
+\begin{array}{lll}
+A & = & [A_{1}, \ldots, A_{L}]\\
+x & = & [X_{1}, \ldots, X_{L}]\\
+b & = & [B_{1}, \ldots, B_{L}]
+\end{array}
+\right.
+\label{sec03:eq01}
+\end{equation}
+where for $l\in\{1,\ldots,L\}$, $A_l$ is a rectangular block of size
+$n_l\times n$ and $X_l$ and $B_l$ are sub-vectors of size $n_l$, such
+that $\sum_ln_l=n$. In this case, we use a row-by-row splitting
+without overlapping in such a way that successive rows of the sparse
+matrix $A$ and both vectors $x$ and $b$ are assigned to one cluster.
+So, the multisplitting format of the linear system is defined as
+follows:
+\begin{equation}
+\forall l\in\{1,\ldots,L\} \mbox{,~} \displaystyle\sum_{i=1}^{l-1}A_{li}X_i + A_{ll}X_l + \displaystyle\sum_{i=l+1}^{L}A_{li}X_i = B_l,
+\label{sec03:eq02}
+\end{equation}
+where $A_{li}$ is a block of size $n_l\times n_i$ of the rectangular
+matrix $A_l$, $X_i\neq X_l$ is a sub-vector of size $n_i$ of the
+solution vector $x$ and $\sum_{i<l}n_i+\sum_{i>l}n_i+n_l=n$, for all
+$i\in\{1,\ldots,l-1,l+1,\ldots,L\}$.
+
+The multisplitting method proceeds by iteration for solving the linear
+system in such a way each sub-system
+\begin{equation}
+\left\{
+\begin{array}{l}
+A_{ll}X_l = Y_l \mbox{,~such that}\\
+Y_l = B_l - \displaystyle\sum_{i=1,i\neq l}^{L}A_{li}X_i,
+\end{array}
+\right.
+\label{sec03:eq03}
+\end{equation}
+is solved independently by a cluster of processors and communication
+are required to update the right-hand side vectors $Y_l$, such that
+the vectors $X_i$ represent the data dependencies between the
+clusters. In this work, we use the GMRES method as an inner iteration
+method for solving the sub-systems~(\ref{sec03:eq03}). It is a
+well-known iterative method which gives good performances for solving
+sparse linear systems in parallel on a cluster of processors.
+
+It should be noted that the convergence of the inner iterative solver
+for the different linear sub-systems~(\ref{sec03:eq03}) does not
+necessarily involve the convergence of the multisplitting method. It
+strongly depends on the properties of the sparse linear system to be
+solved and the computing
+environment~\cite{o1985multi,ref18}. Furthermore, the multisplitting
+of the linear system among several clusters of processors increases
+the spectral radius of the iteration matrix, thereby slowing the
+convergence. In this paper, we based on the work presented
+in~\cite{huang1993krylov} to increase the convergence and improve the
+scalability of the multisplitting methods.
+
+In order to accelerate the convergence, we implement the outer
+iteration of the multisplitting solver as a Krylov subspace method
+which minimizes some error function over a Krylov subspace~\cite{S96}.
+The Krylov space of the method that we used is spanned by a basis
+composed of successive solutions issued from solving the $L$
+splittings~(\ref{sec03:eq03})
+\begin{equation}
+S=\{x^1,x^2,\ldots,x^s\},~s\leq n,
+\label{sec03:eq04}
+\end{equation}
+where for $k\in\{1,\ldots,s\}$, $x^k=[X_1^k,\ldots,X_L^k]$ is a
+solution of the global linear system.%The advantage such a method is that the Krylov subspace does not need to be spanned by an orthogonal basis.
+The advantage of such a Krylov subspace is that we need neither an
+orthogonal basis nor synchronizations between the different clusters
+to generate this basis.
+
+The multisplitting method is periodically restarted every $s$ iterations
+with a new initial guess $\tilde{x}=S\alpha$ which minimizes the error
+function $\|b-Ax\|_2$ over the Krylov subspace spanned by the vectors of $S$.
+So, $\alpha$ is defined as the solution of the large overdetermined linear system
+\begin{equation}
+B\alpha=b,
+\label{sec03:eq05}
+\end{equation}
+where $B=AS$ is a dense rectangular matrix of size $n\times s$ and $s\ll n$. This leads
+us to solve the system of normal equations
+\begin{equation}
+B^TB\alpha=B^Tb,
+\label{sec03:eq06}
+\end{equation}
+which is associated with the least squares problem
+\begin{equation}
+\text{minimize}~\|b-B\alpha\|_2,
+\label{sec03:eq07}
+\end{equation}
+where $B^T$ denotes the transpose of the matrix $B$. Since $B$ (i.e. $AS$) and
+$b$ are split among $L$ clusters, the symmetric positive definite system~(\ref{sec03:eq06})
+is solved in parallel. Thus, an iterative method would be more appropriate than
+a direct one for solving this system. We use the parallel conjugate gradient
+method for the normal equations CGNR~\cite{S96,refCGNR}.
+
+%%% Ecrire l'algorithme(s)
+%%% Parler des synchronisations entre proc et clusters
+
+
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%
+
+\bibliographystyle{plain}
+\bibliography{biblio}
+
+\end{document}