modif revision

[GMRES_For_Journal.git] / GMRES_Journal.tex
diff --git a/GMRES_Journal.tex b/GMRES_Journal.tex

index f5f05e40c36b578540135a19da8edf536f3898ed..2966d4f2d3d615943fdccada53a5903ba97ee5b8 100644 (file)
--- a/GMRES_Journal.tex
+++ b/GMRES_Journal.tex
@@ -802,7 +802,7 @@ having five bands on a cluster of 24 CPU cores vs. a cluster of 12 GPUs.}
  \end{center}
  \end{table}
  
  \end{center}
  \end{table}
  
-Table~\ref{tab:09} shows in the second, third and fourth columns the total communication volume on a cluster of 12 GPUs by using row-by-row partitioning or hypergraph partitioning and compressed format. The total communication volume defines the total number of the vector elements exchanged between the 12 GPUs. From these columns we can see that the two heuristics, compressed format for the vectors and the hypergraph partitioning, minimize the number of vector elements to be exchanged over the GPU cluster. The compressed format allows the GPUs to exchange the needed vector elements witout any communication overheads. The hypergraph partitioning allows to split the large sparse matrices so as to minimize  data dependencies between the GPU computing nodes. However, we can notice in the fourth column that the hypergraph partitioning takes longer than the computation times. As we have mentioned before, the hypergraph partitioning method is less efficient in terms of memory consumption and partitioning time than its graph counterpart. So for the applications which often use the same sparse matrices, we can perform the hypergraph partitioning only once and, then, we save the traces in files to be reused several times. Therefore, this allows us to avoid the partitioning of the sparse matrices at each resolution of the linear systems.
+Table~\ref{tab:09} shows in the second, third and fourth columns the total communication volume on a cluster of 12 GPUs by using row-by-row partitioning or hypergraph partitioning and compressed format. The total communication volume defines the total number of the vector elements exchanged between the 12 GPUs. From these columns we can see that the two heuristics, compressed format for the vectors and the hypergraph partitioning, minimize the number of vector elements to be exchanged over the GPU cluster. The compressed format allows the GPUs to exchange the needed vector elements witout any communication overheads. The hypergraph partitioning allows to split the large sparse matrices so as to minimize  data dependencies between the GPU computing nodes. However, we can notice in the fifth column that the hypergraph partitioning takes longer than the computation times. As we have mentioned before, the hypergraph partitioning method is less efficient in terms of memory consumption and partitioning time than its graph counterpart. So for the applications which often use the same sparse matrices, we can perform the hypergraph partitioning only once and, then, we save the traces in files to be reused several times. Therefore, this allows us to avoid the partitioning of the sparse matrices at each resolution of the linear systems.
  
  \begin{table}
  \begin{center}
  
  \begin{table}
  \begin{center}