+
+In Table~\ref{tab:03}, we can notice that TSIRM is always faster than FGMRES. The last
+column shows the ratio between FGMRES and the best version of TSIRM according to
+the minimization procedure: CGLS or LSQR. Even if we have computed the worst
+case between CGLS and LSQR, it is clear that TSIRM is always faster than
+FGMRES. For this example, the multigrid preconditioner is faster than SOR. The
+gain between TSIRM and FGMRES is more or less similar for the two
+preconditioners. Looking at the number of iterations to reach the convergence,
+it is obvious that TSIRM allows the reduction of the number of iterations. It
+should be noticed that for TSIRM, in those experiments, only the iterations of
+the Krylov solver are taken into account. Iterations of CGLS or LSQR were not
+recorded but they are time-consuming. In general each $max\_iter_{kryl}*s$ which
+corresponds to 30*12, there are $max\_iter_{ls}$ which corresponds to 15.
+
+\begin{figure}[htbp]
+\centering
+ \includegraphics[width=0.45\textwidth]{nb_iter_sec_ex15_juqueen}
+\caption{Number of iterations per second with ex15 and the same parameters than in Table~\ref{tab:03} (weak scaling)}
+\label{fig:01}
+\end{figure}
+
+
+In Figure~\ref{fig:01}, the number of iterations per second corresponding to
+Table~\ref{tab:03} is displayed. It can be noticed that the number of
+iterations per second of FMGRES is constant whereas it decreases with TSIRM with
+both preconditioners. This can be explained by the fact that when the number of
+cores increases the time for the least-squares minimization step also increases but, generally,
+when the number of cores increases, the number of iterations to reach the
+threshold also increases, and, in that case, TSIRM is more efficient to reduce
+the number of iterations. So, the overall benefit of using TSIRM is interesting.
+
+
+
+
+
+
+\begin{table*}[htbp]
+\begin{center}
+\begin{tabular}{|r|r|r|r|r|r|r|r|r|}
+\hline
+
+ nb. cores & threshold & \multicolumn{2}{c|}{FGMRES} & \multicolumn{2}{c|}{TSIRM CGLS} & \multicolumn{2}{c|}{TSIRM LSQR} & best gain \\
+\cline{3-8}
+ & & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & \\\hline \hline
+ 2,048 & 8e-5 & 108.88 & 16,560 & 23.06 & 3,630 & 22.79 & 3,630 & 4.77 \\
+ 2,048 & 6e-5 & 194.01 & 30,270 & 35.50 & 5,430 & 27.74 & 4,350 & 6.99 \\
+ 4,096 & 7e-5 & 160.59 & 22,530 & 35.15 & 5,130 & 29.21 & 4,350 & 5.49 \\
+ 4,096 & 6e-5 & 249.27 & 35,520 & 52.13 & 7,950 & 39.24 & 5,790 & 6.35 \\
+ 8,192 & 6e-5 & 149.54 & 17,280 & 28.68 & 3,810 & 29.05 & 3,990 & 5.21 \\
+ 8,192 & 5e-5 & 785.04 & 109,590 & 76.07 & 10,470 & 69.42 & 9,030 & 11.30 \\
+ 16,384 & 4e-5 & 718.61 & 86,400 & 98.98 & 10,830 & 131.86 & 14,790 & 7.26 \\
+\hline
+
+\end{tabular}
+\caption{Comparison of FGMRES and TSIRM with FGMRES algorithms for ex54 of Petsc (both with the MG preconditioner) with 25,000 components per core on Curie (restart=30, s=12), time is expressed in seconds.}
+\label{tab:04}
+\end{center}
+\end{table*}
+
+
+In Table~\ref{tab:04}, some experiments with example ex54 on the Curie
+architecture are reported. For this application, we fixed $\alpha=0.6$. As it
+can be seen in that Table, the size of the problem has a strong influence on the
+number of iterations to reach the convergence. That is why we have preferred to
+change the threshold. If we set it to $1e-3$ as with the previous application,
+only one iteration is necessray to reach the convergence. So Table~\ref{tab:04}
+shows the results of differents executions with differents number of cores and
+differents thresholds. As with the previous example, we can observe that TSIRM
+is faster than FGMRES. The ratio greatly depends on the number of iterations for
+FMGRES to reach the threshold. The greater the number of iterations to reach the
+convergence is, the better the ratio between our algorithm and FMGRES is. This
+experiment is also a weak scaling with approximately $25,000$ components per
+core. It can also be observed that the difference between CGLS and LSQR is not
+significant. Both can be good but it seems not possible to know in advance which
+one will be the best.
+
+Table~\ref{tab:05} show a strong scaling experiment with the exemple ex54 on the
+Curie architecture. So in this case, the number of unknownws is fixed to
+$204,919,225$ and the number of cores ranges from $512$ to $8192$ with the power
+of two. The threshold is fixed to $5e-5$ and only the $mg$ preconditioner has
+been tested. Here again we can see that TSIRM is faster that FGMRES. Efficiecy
+of each algorithms is reported. It can be noticed that FGMRES is more efficient
+than TSIRM except with $8,192$ cores and that its efficiency is greater that one
+whereas the efficiency of TSIRM is lower than one. Nevertheless, the ratio of
+TSIRM with any version of the least-squares method is always faster. With
+$8,192$ cores when the number of iterations is far more important for FGMRES, we
+can see that it is only slightly more important for TSIRM.
+
+In Figure~\ref{fig:02} we report the number of iterations per second for
+experiments reported in Table~\ref{tab:05}. This Figure highlights that the
+number of iterations per seconds is more of less the same for FGMRES and TSIRM
+with a little advantage for FGMRES. It can be explained by the fact that, as we
+have previously explained, that the iterations of the least-sqaure steps are not
+taken into account with TSIRM.
+
+\begin{table*}[htbp]
+\begin{center}
+\begin{tabular}{|r|r|r|r|r|r|r|r|r|r|r|}
+\hline
+
+ nb. cores & \multicolumn{2}{c|}{FGMRES} & \multicolumn{2}{c|}{TSIRM CGLS} & \multicolumn{2}{c|}{TSIRM LSQR} & best gain & \multicolumn{3}{c|}{efficiency} \\
+\cline{2-7} \cline{9-11}
+ & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & & FGMRES & TS CGLS & TS LSQR\\\hline \hline
+ 512 & 3,969.69 & 33,120 & 709.57 & 5,790 & 622.76 & 5,070 & 6.37 & 1 & 1 & 1 \\
+ 1024 & 1,530.06 & 25,860 & 290.95 & 4,830 & 307.71 & 5,070 & 5.25 & 1.30 & 1.21 & 1.01 \\
+ 2048 & 919.62 & 31,470 & 237.52 & 8,040 & 194.22 & 6,510 & 4.73 & 1.08 & .75 & .80\\
+ 4096 & 405.60 & 28,380 & 111.67 & 7,590 & 91.72 & 6,510 & 4.42 & 1.22 & .79 & .84 \\
+ 8192 & 785.04 & 109,590 & 76.07 & 10,470 & 69.42 & 9,030 & 11.30 & .32 & .58 & .56 \\
+
+\hline
+
+\end{tabular}
+\caption{Comparison of FGMRES and TSIRM with FGMRES for ex54 of Petsc (both with the MG preconditioner) with 204,919,225 components on Curie with different number of cores (restart=30, s=12, threshold 5e-5), time is expressed in seconds.}
+\label{tab:05}
+\end{center}
+\end{table*}
+
+\begin{figure}[htbp]
+\centering
+ \includegraphics[width=0.45\textwidth]{nb_iter_sec_ex54_curie}
+\caption{Number of iterations per second with ex54 and the same parameters than in Table~\ref{tab:05} (strong scaling)}
+\label{fig:02}
+\end{figure}
+
+
+Concerning the experiments some other remarks are interesting.
+\begin{itemize}
+\item We can tested other examples of PETSc (ex29, ex45, ex49). For all these
+ examples, we also obtained similar gain between GMRES and TSIRM but those
+ examples are not scalable with many cores. In general, we had some problems
+ with more than $4,096$ cores.
+\item We have tested many iterative solvers available in PETSc. In fast, it is
+ possible to use most of them with TSIRM. From our point of view, the condition
+ to use a solver inside TSIRM is that the solver must have a restart
+ feature. More precisely, the solver must support to be stoped and restarted
+ without decrease its converge. That is why with GMRES we stop it when it is
+ naturraly restarted (i.e. with $m$ the restart parameter). The Conjugate
+ Gradient (CG) and all its variants do not have ``restarted'' version in PETSc,
+ so they are not efficient. They will converge with TSIRM but not quickly
+ because if we compare a normal CG with a CG for which we stop it each 16
+ iterations for example, the normal CG will be for more efficient. Some
+ restarted CG or CG variant versions exist and may be interested to study in
+ future works.
+\end{itemize}