+where for $j\in\{1,\ldots,s\}$, $x^j=[X_1^j,\ldots,X_L^j]$ is a
+solution of the global linear system. The advantage of such a Krylov
+subspace is that we need neither an orthogonal basis nor
+synchronizations between the different clusters to generate this
+basis.
+
+The multisplitting method is periodically restarted every $s$
+iterations with a new initial guess $\tilde{x}=S\alpha$ which
+minimizes the error function $\|b-Ax\|_2$ over the Krylov subspace
+spanned by the vectors of $S$. So, $\alpha$ is defined as the
+solution of the large overdetermined linear system
+\begin{equation}
+R\alpha=b,
+\label{sec03:eq05}
+\end{equation}
+where $R=AS$ is a dense rectangular matrix of size $n\times s$ and
+$s\ll n$. This leads us to solve the system of normal equations
+\begin{equation}
+R^TR\alpha=R^Tb,
+\label{sec03:eq06}
+\end{equation}
+which is associated with the least squares problem
+\begin{equation}
+\text{minimize}~\|b-R\alpha\|_2,
+\label{sec03:eq07}
+\end{equation}
+where $R^T$ denotes the transpose of the matrix $R$. Since $R$ (i.e.
+$AS$) and $b$ are split among $L$ clusters, the symmetric positive
+definite system~(\ref{sec03:eq06}) is solved in parallel. Thus, an
+iterative method would be more appropriate than a direct one to solve
+this system. We use the parallel conjugate gradient method for the
+normal equations CGNR~\cite{S96,refCGNR}.
+
+\begin{algorithm}[!t]
+\caption{A two-stage linear solver with inner iteration GMRES method}
+\begin{algorithmic}[1]
+\Input $A_l$ (local sparse matrix), $B_l$ (local right-hand side), $x^0$ (initial guess)
+\Output $X_l$ (local solution vector)\vspace{0.2cm}
+\State Load $A_l$, $B_l$, $x^0$
+\State Initialize the minimizer $\tilde{x}^0=x^0$
+\For {$k=1,2,3,\ldots$ until the global convergence}
+\State Restart with $x^0=\tilde{x}^{k-1}$: \textbf{for} $j=1,2,\ldots,s$ \textbf{do}
+\State\hspace{0.5cm} Inner iteration solver: \Call{InnerSolver}{$x^0$, $j$}
+\State\hspace{0.5cm} Construct the basis $S$: add the column vector $X_l^j$ to the matrix $S_l^k$
+\State\hspace{0.5cm} Exchange the local solution vector $X_l^j$ with the neighboring clusters
+\State\hspace{0.5cm} Compute the dense matrix $R$: $R_l^{k,j}=\sum^L_{i=1}A_{li}X_i^j$
+\State\textbf{end for}
+\State Minimization $\|b-R\alpha\|_2$: \Call{UpdateMinimizer}{$S_l$, $R$, $b$, $k$}
+\State Local solution of the linear system $Ax=b$: $X_l^k=\tilde{X}_l^k$
+\State Exchange the local minimizer $\tilde{X}_l^k$ with the neighboring clusters
+\EndFor
+
+\Statex
+
+\Function {InnerSolver}{$x^0$, $j$}
+\State Compute the local right-hand side: $Y_l = B_l - \sum^L_{i=1,i\neq l}A_{li}X_i^0$
+\State Solving the local splitting $A_{ll}X_l^j=Y_l$ using the parallel GMRES method, such that $X_l^0$ is the initial guess
+\State \Return $X_l^j$
+\EndFunction
+
+\Statex
+
+\Function {UpdateMinimizer}{$S_l$, $R$, $b$, $k$}
+\State Solving the normal equations $(R^k)^TR^k\alpha^k=(R^k)^Tb$ in parallel by $L$ clusters using the parallel CGNR method
+\State Compute the local minimizer: $\tilde{X}_l^k=S_l^k\alpha^k$
+\State \Return $\tilde{X}_l^k$
+\EndFunction
+\end{algorithmic}
+\label{algo:01}
+\end{algorithm}
+
+The main key points of the multisplitting method to solve a large
+sparse linear system are given in Algorithm~\ref{algo:01}. This
+algorithm is based on a two-stage method with a minimization using the
+GMRES iterative method as an inner solver. It is executed in parallel
+by each cluster of processors. The matrices and vectors with the
+subscript $l$ represent the local data for the cluster $l$, where
+$l\in\{1,\ldots,L\}$. The two-stage solver uses two different parallel
+iterative algorithms: the GMRES method to solve each splitting on a
+cluster of processors, and the CGNR method executed in parallel by all
+clusters to minimize the function error over the Krylov subspace
+spanned by $S$. The algorithm requires two global synchronizations
+between the $L$ clusters. The first one is performed at line~$12$ in
+Algorithm~\ref{algo:01} to exchange the local values of the vector
+solution $x$ (i.e. the minimizer $\tilde{x}$) required to restart the
+multisplitting solver. The second one is needed to construct the
+matrix $R$ of the Krylov subspace. We choose to perform this latter
+synchronization $s$ times in every outer iteration $k$ (line~$7$ in
+Algorithm~\ref{algo:01}). This is a straightforward way to compute the
+matrix-matrix multiplication $R=AS$. We implement all
+synchronizations by using the MPI collective communication
+subroutines.