From: couturie <couturie@extinction>
Date: Thu, 23 Jan 2014 00:15:59 +0000 (+0100)
Subject: new
X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/GMRES_For_Journal.git/commitdiff_plain/d844520a8ca222e251c7cffebd7ba9a6d79d2bd8?ds=sidebyside

new
---

diff --git a/Revision.tex b/Revision.tex
index 9e6d49a..b38f253 100644
--- a/Revision.tex
+++ b/Revision.tex
@@ -37,7 +37,12 @@ In fact if we use preconditioning techniques, they will influence both the CPU a
 
 \item ``That is to say, it is unclear that how much a researcher working on GPU computing but not specialized in parallel linear solver can appreciate the paper.'' \\ \\ The GMRES method is one of the widely used iterative method for solving large and sparse linear systems. As we mentioned in the paper, the techniques and optimizations that we have used for GMRES method may be applied and adapted to other iterative methods on GPU clusters.
 
-\item ``It will be nice if the authors can emphasize the part of their experiences/optimizations that are generally applicable to other parallel algorithms.'' \\ \\ Thank you for your comment. {\bf je ne comprends lÃ }In fact, the most problem affecting the performances and the scalability of the linear solvers is the communication on parallel computers. In future work, we plan to study other linear system solvers. 
+\item ``It will be nice if the authors can emphasize the part of their experiences/optimizations that are generally applicable to other parallel algorithms.'' \\ \\ Thank you for your comment. In fact, we have added a paragraph on that in page {\bf A PRECISER, en gros ca serait une partie lesson learns, Ã  voir si faut la faire apparaitre en tant que telle}. 
+
+In fact, parallel linear system solving can be easy to optimized when the linear system is regular. This is the case for many applications. But for many other ones, this is not the case. When the matrix has not a regular structure, the amount of communication between processors is not the same. Another important parameter is the size bandwidth which has a huge influence on the amount of communications. In this work, we have generated matrices different kinds of matrices in order to analyze different difficulties. With the largest bandwidth as possible and with communications between all processors  which  is the most difficult situations, we propose to use two heuristics. Unfortunatly, there is no fast method that optimize the communication in any situation. For non linear systems of equations, there are differents algorithms but one of them consists in linearizing the systems. In this case, a linear system need to be solved. The big interest is that the matrix is the same at each step of the non linear system solving, so the partitioning method which is a time consuming step is performed once only.
+
+Another very important issue, that maybe too many people ignore, is that on a cluster of GPUs the influence of the communications is greater than on clusters of CPUs. There are two reasons for this. The first one comes from the fact that with a cluster of GPUs, the CPU/GPU communications slow down communications between two GPUs not on the same machines. The second one is due to the fact that with GPUs the ratio of the computation time over the communication time decreases since the computation time are reduced. So the impact of the communications between GPUs might be a very important issue that can limit the scalability of an parallel algorithm. 
+
 
 \item ``Follow up on point 1). The experiment section can be enhanced as well. The numbers presented are very specific to the input matrix workload, which is generated by the authors. So it is unclear how much other researchers can benefit from it. It will be nice to focus on more detailed measuring and metrics, i.e., how to evaluate if your algorithm/optimization has maximally exploited the system capacity based on the CPU/GPU power and bandwidth available? Or is your algorithm as presented is the optimal at all?'' \\ \\ The sparse matrices that we have found in the literature are very small for our experiments and they don't allow to exploit the computing power of a GPU cluster. This is why we used a generator of large sparse matrices based on the real-world matrices of the Davis collection of the Florida university.
 \end{enumerate}