\begin{figure}[htbp]
\centering
\includegraphics[width=0.8\textwidth]{strong_scaling_150x150x150}
-\caption{Strong scaling with 3 clusters of cores to solve a 3D Poisson problem of size $150^3$ components}
+\caption{Strong scaling with 3 clusters of 4 cores to solve a 3D Poisson problem of size $150^3$ components}
\label{fig:001}
\end{figure}
\begin{tabular}{c}
\includegraphics[width=0.8\textwidth]{weak_scaling_280k} \\ \includegraphics[width=0.8\textwidth]{weak_scaling_280K}\\
\end{tabular}
-\caption{Weak scaling with 3 clusters of cores to solve a 3D Poisson problem with approximately 280K components per core}
+\caption{Weak scaling with 3 clusters of 4 cores to solve a 3D Poisson problem with approximately 280K components per core}
\label{fig:002}
\end{figure}
,----
4. In Section 3. it is better if the paper can explain the intuition of multi-splitting. Currently it is more like "Here is what I did" presentation but "why do we do this" is left for the reader to guess.
`----
-The iterative algorithms suffer from the scalability problem on large computing platforms due to the large amount of communications and synchronizations. In this context, the multisplitting methods are well-known to be more adapted to large-scale clusters of processors. The main principle of the multisplitting methods is to split the large problem to solve in different blocks in such a way each block can be solved by a processor or a set of processors and thus to minimize by this way the synchronizations over the large cluster. However these methods suffer from slow convergence. In fact, the larger the number of splitting is, the larger the spectral radius is, thereby slowing the convergence of the multisplitting algorithm.
+The iterative algorithms suffer from the scalability problem on large computing platforms due to the large amount of communications and synchronizations. In this context, the multisplitting methods are well-known to be more adapted to large-scale clusters of processors. The main principle of the multisplitting methods is to split the large problem to solve in different blocks in such a way that each block can be solved by a processor or a set of processors and thus to minimize by this way the synchronizations over the large cluster. However these methods suffer from slow convergence. In fact, the larger the number of splitting is, the larger the spectral radius is, thereby slowing the convergence of the multisplitting algorithm.
We have used the well-known GMRES method to solve locally in parallel each block by a set of processors. In addition we have also implemented the outer iteration as a Krylov subspace iteration minimizing some error function which allows to accelerate the global convergence of the multisplitting algorithm.
,----
The paper should be rewritten to clearly explain what is being compared. It seems as if the method in [9] is not included in the comparison.
`----
-Section 4 is rewritten in order to explain our choice to compare our Krylov multisplitting method with only the GMRES method. We have added in the paper some experimental results obtained on a small cluster which clearly show that our method is more efficient than GMRES and block Jacobi multisplitting methods.
+Section 4 has been rewritten in order to explain our choice and to compare our Krylov multisplitting method with only the GMRES method. We have added in the paper some experimental results obtained on a small cluster which clearly show that our method is more efficient than GMRES and block Jacobi multisplitting methods.
,----
Was the method of reference [9] implemented by the authors of [9]? How did they do against GMRES?
`----
-Authors of [9] have not implemented the method of reference [9]. They have mainly focused on the convergence analysis of various forms of the algorithm [9] and presented results of numerical examples on a sequential computer.
+As explained in the paper, authors of [9] have not implemented the method of reference [9]. They have mainly focused on the convergence analysis of various forms of the algorithm [9] and presented results of numerical examples on a sequential computer.