-\item ``The theoretical part of the paper devoted to GMRES method should be eliminated, since it is a well-known topic and the contributions of the paper are mainly related to the sparse matrix-vector product.'' \\ \\ Thank you for your comment. We have reduced the theoretical part devoted to the GMRES method.
+\medskip
+In this paper, we aim to investigate the parallelization of the GMRES method on a GPU cluster. We have compared different versions of the parallel GMRES algorithm on a cluster of GPUs (with/without the optimizations). Obviously, we could optimize the CPU version but this would be beyond the objectives of this paper.
+
+\item ``There is no comparison with proposals of other authors.''
+
+\medskip
+In the literature, there are a few GMRES implementations on a multi-GPUs but, to the best of our knowledge, not on a GPU cluster which involves the distributed memory constraint.
+
+\item ``The only comparisons is the speedup with regard to the CPU version of the algorithm carried out by the authors. The GMRES algorithm it is not analyzed, since the paper focuses mainly on the sparse matrix-vector product.''
+
+\medskip
+As we previously mentioned, we have not only compared the CPU and GPU versions but also the different GPU versions between them (with/\linebreak[0]without optimizations). The GMRES algorithm has already been analyzed in many papers (we gave some references). In this paper we have focused on its implementation on a GPU cluster and on how to improve the communication between the computing nodes.
+
+\item ``Preconditioning and its influence in the communication should be perhaps most interesting and should be deeply considered, as it limits substantially the performance of GMRES.''
+
+\medskip
+In fact if we use preconditioning techniques, they will influence both the CPU and the GPU solvers. If we use a left preconditioning, the initial matrix vector product is not changed. In this case, the preconditioning process does not change the cost of the communication on a cluster of processors. It only reduces the number of iterations required for the convergence. What could be interesting to study is which preconditioning algorithm is more suited to GPU clusters, but this is out of the topic of this paper.
+
+\item ``The theoretical part of the paper devoted to GMRES method should be eliminated, since it is a well-known topic and the contributions of the paper are mainly related to the sparse matrix-vector product.''
+
+\medskip
+We have reduced the theoretical part devoted to the GMRES method.