new

[book_gpu.git] / BookGPU / Chapters / chapter15 / ch15.tex
diff --git a/BookGPU/Chapters/chapter15/ch15.tex b/BookGPU/Chapters/chapter15/ch15.tex

index 7e25220e38354c0472ab23827a8bf24f8ecbb005..7860441dee773c0c5483a01ed444ef457eca36ae 100644 (file)
--- a/BookGPU/Chapters/chapter15/ch15.tex
+++ b/BookGPU/Chapters/chapter15/ch15.tex
@@ -670,7 +670,7 @@ Fig.~\ref{offdiagonal} for an off-diagonal sector.
    These copies, along with possible scalings or transpositions, are
    implemented as CUDA kernels which can be applied to two
    matrices of any size starting at any offset. 
-  Memory accesses are coalesced\index{coalesced memory accesses} \cite{CUDA_ProgGuide} in order to
+  Memory accesses are coalesced\index{GPU!coalesced memory accesses} \cite{CUDA_ProgGuide} in order to
    provide the best performance for such memory-bound kernels.
  \item[Step 2] (``Local copies''):~data are copied from
    local $R$-matrices to temporary arrays ($U$, $V$) and to $\Re^{O}$.
@@ -917,7 +917,7 @@ one C2050 (Fermi) GPU, located at
   UPMC (Universit\'e Pierre et Marie Curie, Paris, France). 
  As a remark, the execution times measured on the C2050 would be the same 
  on the C2070 and on  the C2075, the only difference between these GPUs 
-being their memory size and their TDP (Thermal Design Power)\index{TDP (Thermal Design Power)}. 
+being their memory size and their TDP (Thermal Design Power)\index{TDP (thermal design power)}. 
  We emphasize that the execution times correspond to the
  complete propagation for all six energies of the large case (see
  Table~\ref{data-sets}), that is to say to the complete execution of
@@ -1093,7 +1093,9 @@ in order to enable concurrent executions among the required kernels.
    & Speedup & - & \multicolumn{2}{c|}{1.13} & \multicolumn{2}{c|}{1.17}  \\  
    \hline
  \end{tabular}
-\caption{\label{t:perfs_V6} Performance results with multiple
+\caption[Performance results with multiple
+  concurrent energies 
+  on one C2070 GPU.]{\label{t:perfs_V6} Performance results with multiple
    concurrent energies 
    on one C2070 GPU. GPU initialization times are not considered here. }
  \end{center}