+used. It is well-known that the spectral radius of matrices representing such
+problems are very close to 1. Moreover, the larger the number of discretization
+points is, the closer to 1 the spectral radius is. Hence, to solve a matrix
+obtained for a 3D Poisson problem, the number of iterations is high. Using a
+preconditioner it is possible to reduce the number of iterations but
+preconditioners are not scalable when using many cores.
+
+%Doing many experiments with many cores is not easy and requires to access to a supercomputer with several hours for developing a code and then improving it.
+In the following we present some experiments we could achieve out on the Hector
+architecture, a UK's high-end computing resource, funded by the UK Research
+Councils~\cite{hector}. This is a Cray XE6 supercomputer, equipped with two
+16-core AMD Opteron 2.3 Ghz and 32 GB of memory. Machines are interconnected
+with a 3D torus.
+
+Table~\ref{tab1} shows the result of the experiments. The first column shows
+the size of the 3D Poisson problem. The size is chosen in order to have
+approximately 50,000 components per core. The second column represents the
+number of cores used. In brackets, one can find the decomposition used for the
+Krylov multisplitting. The third column and the sixth column respectively show
+the execution time for the GMRES and the Krylov multisplitting codes. The fourth
+and the seventh column describe the number of iterations. For the
+multisplitting code, the total number of inner iterations is represented in
+brackets. For the GMRES code (alone and in the multisplitting version) the
+restart parameter is fixed to 16. The precision of the GMRES version is fixed to
+1e-6. For the multisplitting, there are two precisions, one for the external
+solver which is fixed to 1e-6 and another one for the inner solver (GMRES) which
+is fixed to 1e-10. It should be noted that a high precision is used but we also
+fixed a maximum number of iterations for each internal step. In practice, we
+limit the number of iterations in the internal step to 10. So an internal iteration is finished
+when the precision is reached or when the maximum internal number of iterations
+is reached. The precision and the maximum number of iterations of CGNR method are fixed to 1e-25 and 20 respectively. The size of the Krylov subspace basis $S$ is fixed to 10 vectors.
+
+\begin{table}[htbp]
+\begin{center}
+\begin{changemargin}{-1.8cm}{0cm}
+\begin{small}
+\begin{tabular}{|c|c||c|c|c||c|c|c||c|}
+\hline
+\multirow{2}{*}{Pb size}&\multirow{2}{*}{Nb. cores} & \multicolumn{3}{c||}{GMRES} & \multicolumn{3}{c||}{Krylov Multisplitting} & \multirow{2}{*}{Ratio}\\
+ \cline{3-8}
+ & & Time (s) & nb Iter. & $\Delta$ & Time (s)& nb Iter. & $\Delta$ & \\
+\hline
+$468^3$ & 2,048 (2x1,024) & 299.7 & 41,028 & 5.02e-8 & 48.4 & 691(6,146) & 8.24e-08 & 6.19 \\
+\hline
+$590^3$ & 4,096 (2x2,048) & 433.1 & 55,494 & 4.92e-7 & 74.1 & 1,101(8,211) & 6.62e-08 & 5.84 \\
+\hline
+$743^3$ & 8,192 (2x4,096) & 704.4 & 87,822 & 4.80e-07 & 151.2 & 3,061(14,914) & 5.87e-08 & 4.65 \\
+\hline
+$743^3$ & 8,192 (4x2,048) & 704.4 & 87,822 & 4.80e-07 & 110.3 & 1,531(12,721) & 1.47e-07& 6.39 \\
+\hline
+
+\end{tabular}
+\caption{Results}
+\label{tab1}
+\end{small}
+\end{changemargin}
+\end{center}
+\end{table}
+
+
+From these experiments, it can be observed that the multisplitting version is
+always faster than the GMRES version. The acceleration gain of the
+multisplitting version ranges between 4 and 6. It can be noticed that the number of
+iterations is drastically reduced with the multisplitting version even it is not
+negligible. Moreover, with 8,192 cores, we can see that using 4 clusters gives a
+better performance than simply using 2 clusters. In fact, we can notice that the
+precision with 2 clusters is slightly better but in both cases the precision is
+under the specified threshold.