+\item ``What would happen if the algorithm would have been also optimized for the cluster of CPUs (eg using AVX instructions , or using hybrid MPI + OpenMP programming, etc)?''
+
+\medskip
+In this paper, we aim to investigate the parallelization of the GMRES method on a GPU cluster. We have compared different versions of the parallel GMRES algorithm on a cluster of GPUs (with/without the optimizations). Obviously, we could optimize the CPU version but this would be beyond the objectives of this paper.
+
+\item ``There is no comparison with proposals of other authors.''
+
+\medskip
+In the literature, there are a few GMRES implementations on a multi-GPUs but, to the best of our knowledge, not on a GPU cluster which involves the distributed memory constraint.
+
+\item ``The only comparisons is the speedup with regard to the CPU version of the algorithm carried out by the authors. The GMRES algorithm it is not analyzed, since the paper focuses mainly on the sparse matrix-vector product.''
+
+\medskip
+As we previously mentioned, we have not only compared the CPU and GPU versions but also the different GPU versions between them (with/\linebreak[0]without optimizations). The GMRES algorithm has already been analyzed in many papers (we gave some references). In this paper we have focused on its implementation on a GPU cluster and on how to improve the communication between the computing nodes.
+
+\item ``Preconditioning and its influence in the communication should be perhaps most interesting and should be deeply considered, as it limits substantially the performance of GMRES.''
+
+\medskip