principle of our approach is to build an external iteration over the Krylov
method and to save the current residual frequently (for example, for each
restart of GMRES). Then after a given number of outer iterations, a minimization
-step is applied on the matrix composed of the save residuals in order to compute
-a better solution and make a new iteration if necessary. We prove that our
-method has the same convergence property than the inner method used. Some
+step is applied on the matrix composed of the saved residuals in order to
+compute a better solution and make a new iteration if necessary. We prove that
+our method has the same convergence property than the inner method used. Some
experiments using up to 16,394 cores show that compared to GMRES our algorithm
can be around 7 times faster.
\end{abstract}
4,096 & 7e-5 & 160.59 & 22,530 & 35.15 & 5,130 & 29.21 & 4,350 & 5.49 \\
4,096 & 6e-5 & 249.27 & 35,520 & 52.13 & 7,950 & 39.24 & 5,790 & 6.35 \\
8,192 & 6e-5 & 149.54 & 17,280 & 28.68 & 3,810 & 29.05 & 3,990 & 5.21 \\
- 8,192 & 5e-5 & 792.11 & 109,590 & 76.83 & 10,470 & 65.20 & 9,030 & 12.14 \\
+ 8,192 & 5e-5 & 785.04 & 109,590 & 76.07 & 10,470 & 69.42 & 9,030 & 11.30 \\
16,384 & 4e-5 & 718.61 & 86,400 & 98.98 & 10,830 & 131.86 & 14,790 & 7.26 \\
\hline
\begin{table*}
\begin{center}
-\begin{tabular}{|r|r|r|r|r|r|r|r|r|r|}
+\begin{tabular}{|r|r|r|r|r|r|r|r|r|r|r|}
\hline
- nb. cores & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSARM CGLS} & \multicolumn{2}{c|}{TSARM LSQR} & \multicolumn{3}{c|}{efficiency} \\
-\cline{2-10}
- & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & GMRES & TS CGLS & TS LSQR\\\hline \hline
- 512 & 3,969.69 & 33,120 & 709.57 & 5,790 & 622.76 & 5,070 & 1 & 1 & 1 \\
- 1024 & 1,530.06 & 25,860 & 290.95 & 4,830 & 307.71 & 5,070 & 1.30 & 1.21 & 1.01 \\
- 2048 & 919.62 & 31,470 & 237.52 & 8,040 & 194.22 & 6,510 & 1.08 & .75 & .80\\
- 4096 & 405.60 & 28,380 & 111.67 & 7,590 & 91.72 & 6,510 & 1.22 & .79 & .84 \\
- 8192 & 785.04 & 109,590 & 76.07 & 10,470 & 69.42 & 9,030 & .32 & .58 & .56 \\
+ nb. cores & \multicolumn{2}{c|}{GMRES} & \multicolumn{2}{c|}{TSARM CGLS} & \multicolumn{2}{c|}{TSARM LSQR} & best gain & \multicolumn{3}{c|}{efficiency} \\
+\cline{2-7} \cline{9-11}
+ & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. & & GMRES & TS CGLS & TS LSQR\\\hline \hline
+ 512 & 3,969.69 & 33,120 & 709.57 & 5,790 & 622.76 & 5,070 & 6.37 & 1 & 1 & 1 \\
+ 1024 & 1,530.06 & 25,860 & 290.95 & 4,830 & 307.71 & 5,070 & 5.25 & 1.30 & 1.21 & 1.01 \\
+ 2048 & 919.62 & 31,470 & 237.52 & 8,040 & 194.22 & 6,510 & 4.73 & 1.08 & .75 & .80\\
+ 4096 & 405.60 & 28,380 & 111.67 & 7,590 & 91.72 & 6,510 & 4.42 & 1.22 & .79 & .84 \\
+ 8192 & 785.04 & 109,590 & 76.07 & 10,470 & 69.42 & 9,030 & 11.30 & .32 & .58 & .56 \\
\hline