\makeatletter\r
\def\theequation{\arabic{equation}}\r
\r
-%\JOURNALNAME{\TEN{\it Int. J. System Control and Information\r
-%Processing,\r
-%Vol. \theVOL, No. \theISSUE, \thePUBYEAR\hfill\thepage}}%\r
+\JOURNALNAME{\TEN{\it International Journal of High Performance Computing and Networking}}\r
%\r
%\def\BottomCatch{%\r
%\vskip -10pt\r
\r
\setcounter{page}{1}\r
\r
-\LRH{F. Wang et~al.}\r
+\LRH{R. Couturier, L. Ziane Khodja and C. Guyeux}\r
\r
-\RRH{Metadata Based Management and Sharing of Distributed Biomedical\r
-Data}\r
+\RRH{TSIRM: A Two-Stage Iteration with least-squares Residual Minimization algorithm}\r
\r
\VOL{x}\r
\r
\r
\BottomCatch\r
\r
-\PUBYEAR{2012}\r
+\PUBYEAR{2015}\r
\r
\subtitle{}\r
\r
\r
\r
\begin{abstract}\r
-In this article, a two-stage iterative algorithm is proposed to improve the\r
+In this paper, a two-stage iterative algorithm is proposed to improve the\r
convergence of Krylov based iterative methods, typically those of GMRES\r
-variants. The principle of the proposed approach is to build an external\r
-iteration over the Krylov method, and to frequently store its current residual\r
+variants. The principle of the proposed approach is to build an external\r
+iteration over the Krylov method, and to frequently store its current residual\r
(at each GMRES restart for instance). After a given number of outer iterations,\r
a least-squares minimization step is applied on the matrix composed by the saved\r
-residuals, in order to compute a better solution and to make new iterations if\r
-required. It is proven that the proposal has the same convergence properties\r
-than the inner embedded method itself. Experiments using up to 16,394 cores\r
-also show that the proposed algorithm runs around 5 or 7 times faster than\r
-GMRES.\r
+residuals, in order to compute a better solution and to make new iterations if\r
+required. It is proven that the proposal has the same convergence properties\r
+than the inner embedded method itself.\r
+%%NEW\r
+Several experiments have been performed\r
+with the PETSc solver with linear and nonlinear problems. They show good\r
+speedups compared to GMRES with up to 16,394 cores with different\r
+preconditioners.\r
+%%ENDNEW\r
\end{abstract}\r
\r
+\r
+\r
\KEYWORD{Iterative Krylov methods; sparse linear and non linear systems; two stage iteration; least-squares residual minimization; PETSc.}\r
\r
%\REF{to this paper should be made as follows: Rodr\'{\i}guez\r
%Semantics and Ontologies}, Vol. x, No. x, pp.xxx\textendash xxx.}\r
\r
\begin{bio}\r
-Manuel Pedro Rodr\'iguez Bol\'ivar received his PhD in Accounting at\r
-the University of Granada. He is a Lecturer at the Department of\r
-Accounting and Finance, University of Granada. His research\r
-interests include issues related to conceptual frameworks of\r
-accounting, diffusion of financial information on Internet, Balanced\r
-Scorecard applications and environmental accounting. He is author of\r
-a great deal of research studies published at national and\r
-international journals, conference proceedings as well as book\r
-chapters, one of which has been edited by Kluwer Academic\r
-Publishers.\vs{9}\r
-\r
-\noindent Bel\'en Sen\'es Garc\'ia received her PhD in Accounting at\r
-the University of Granada. She is a Lecturer at the Department of\r
-Accounting and Finance, University of Granada. Her research\r
-interests are related to cultural, institutional and historic\r
-accounting and in environmental accounting. She has published\r
-research papers at national and international journals, conference\r
-proceedings as well as chapters of books.\vs{8}\r
-\r
-\noindent Both authors have published a book about environmental\r
-accounting edited by the Institute of Accounting and Auditing,\r
-Ministry of Economic Affairs, in Spain in October 2003.\r
+Raphaël Couturier ....\r
+\r
+\noindent Lilia Ziane Khodja ...\r
+\r
+\noindent Christophe Guyeux ...\r
\end{bio}\r
\r
\r
%%NEW\r
\begin{table*}[htbp]\r
\begin{center}\r
-\begin{tabular}{|r|r|r|r|r|r|r|} \r
+\begin{tabular}{|r|r|r|r|r|r|r|r|} \r
\hline\r
\r
- nb. cores & \multicolumn{2}{c|}{FGMRES/ASM} & \multicolumn{2}{c|}{TSIRM CGLS/ASM} & \multicolumn{2}{c|}{FGMRES/HYPRE} \\ \r
-\cline{2-7}\r
- & Time & \# Iter. & Time & \# Iter. & Time & \# Iter. \\\hline \hline\r
- 512 & 5.54 & 685 & 2.5 & 570 & 128.9 & 9 \\\r
- 2048 & 14.95 & 1,560 & 4.32 & 746 & 335.7 & 9 \\\r
- 4096 & 25.13 & 2,369 & 5.61 & 859 & >1000 & -- \\\r
- 8192 & 44.35 & 3,197 & 7.6 & 1083 & >1000 & -- \\\r
+ nb. cores & \multicolumn{2}{c|}{FGMRES/ASM} & \multicolumn{2}{c|}{TSIRM CGLS/ASM} & gain& \multicolumn{2}{c|}{FGMRES/HYPRE} \\ \r
+\cline{2-5} \cline{7-8}\r
+ & Time & \# Iter. & Time & \# Iter. & & Time & \# Iter. \\\hline \hline\r
+ 512 & 5.54 & 685 & 2.5 & 570 & 2.21 & 128.9 & 9 \\\r
+ 2048 & 14.95 & 1,560 & 4.32 & 746 & 3.48 & 335.7 & 9 \\\r
+ 4096 & 25.13 & 2,369 & 5.61 & 859 & 4.48 & >1000 & -- \\\r
+ 8192 & 44.35 & 3,197 & 7.6 & 1083 & 5.84 & >1000 & -- \\\r
\r
\hline\r
\r
\label{fig:03}\r
\end{figure}\r
\r
+\r
+\r
+\begin{table*}[htbp]\r
+\begin{center}\r
+\begin{tabular}{|r|r|r|r|r|r|} \r
+\hline\r
+\r
+ nb. cores & \multicolumn{2}{c|}{FGMRES/BJAC} & \multicolumn{2}{c|}{TSIRM CGLS/BJAC} & gain \\ \r
+\cline{2-5}\r
+ & Time & \# Iter. & Time & \# Iter. & \\\hline \hline\r
+ 1024 & 667.92 & 48,732 & 81.65 & 5,087 & 8.18 \\\r
+ 2048 & 966.87 & 77,177 & 90.34 & 5,716 & 10.70\\\r
+ 4096 & 1,742.31 & 124,411 & 119.21 & 6,905 & 14.61\\\r
+ 8192 & 2,739.21 & 187,626 & 168.9 & 9,000 & 16.22\\\r
+\r
+\hline\r
+\r
+\end{tabular}\r
+\caption{Comparison of FGMRES and TSIRM for ex20 of PETSc/SNES with a Block Jacobi preconditioner having 100,000 components per core on Curie ($\epsilon_{tsirm}=1e-10$, $max\_iter_{kryl}=30$, $s=12$, $max\_iter_{ls}=15$, $\epsilon_{ls}=1e-40$), time is expressed in seconds.}\r
+\label{tab:07}\r
+\end{center}\r
+\end{table*}\r
+\r
+\begin{table*}[htbp]\r
+\begin{center}\r
+\begin{tabular}{|r|r|r|r|r|r|} \r
+\hline\r
+\r
+ nb. cores & \multicolumn{2}{c|}{FGMRES/BJAC} & \multicolumn{2}{c|}{TSIRM CGLS/BJAC} & gain \\ \r
+\cline{2-5}\r
+ & Time & \# Iter. & Time & \# Iter. & \\\hline \hline\r
+ 1024 & 159.52 & 11,584 & 26.34 & 1,563 & 6.06 \\\r
+ 2048 & 226.24 & 16,459 & 37.23 & 2,248 & 6.08\\\r
+ 4096 & 391.21 & 27,794 & 50.93 & 2,911 & 7.69\\\r
+ 8192 & 543.23 & 37,770 & 79.21 & 4,324 & 6.86 \\\r
+\r
+\hline\r
+\r
+\end{tabular}\r
+\caption{Comparison of FGMRES and TSIRM for ex14 of PETSc/SNES with a Block Jacobi preconditioner having 100,000 components per core on Curie ($\epsilon_{tsirm}=1e-10$, $max\_iter_{kryl}=30$, $s=12$, $max\_iter_{ls}=15$, $\epsilon_{ls}=1e-40$), time is expressed in seconds.}\r
+\label{tab:08}\r
+\end{center}\r
+\end{table*}\r
+\r
+\r
%%ENDNEW\r
\r
%%%*********************************************************\r