-\item ``It will be nice if the authors can emphasize the part of their experiences/optimizations that are generally applicable to other parallel algorithms.'' \\ \\ Thank you for your comment. {\bf je ne comprends là}In fact, the most problem affecting the performances and the scalability of the linear solvers is the communication on parallel computers. In future work, we plan to study other linear system solvers.
+\item ``It will be nice if the authors can emphasize the part of their experiences/optimizations that are generally applicable to other parallel algorithms.'' \\ \\ Thank you for your comment. In fact, we have added a paragraph on that in page {\bf A PRECISER, en gros ca serait une partie lesson learns, à voir si faut la faire apparaitre en tant que telle}.
+
+In fact, parallel linear system solving can be easy to optimized when the linear system is regular. This is the case for many applications. But for many other ones, this is not the case. When the matrix has not a regular structure, the amount of communication between processors is not the same. Another important parameter is the size bandwidth which has a huge influence on the amount of communications. In this work, we have generated matrices different kinds of matrices in order to analyze different difficulties. With the largest bandwidth as possible and with communications between all processors which is the most difficult situations, we propose to use two heuristics. Unfortunatly, there is no fast method that optimize the communication in any situation. For non linear systems of equations, there are differents algorithms but one of them consists in linearizing the systems. In this case, a linear system need to be solved. The big interest is that the matrix is the same at each step of the non linear system solving, so the partitioning method which is a time consuming step is performed once only.
+
+Another very important issue, that maybe too many people ignore, is that on a cluster of GPUs the influence of the communications is greater than on clusters of CPUs. There are two reasons for this. The first one comes from the fact that with a cluster of GPUs, the CPU/GPU communications slow down communications between two GPUs not on the same machines. The second one is due to the fact that with GPUs the ratio of the computation time over the communication time decreases since the computation time are reduced. So the impact of the communications between GPUs might be a very important issue that can limit the scalability of an parallel algorithm.
+