new

[book_gpu.git] / BookGPU / Chapters / chapter15 / ch15.tex
diff --git a/BookGPU/Chapters/chapter15/ch15.tex b/BookGPU/Chapters/chapter15/ch15.tex

index 7e25220e38354c0472ab23827a8bf24f8ecbb005..7860441dee773c0c5483a01ed444ef457eca36ae 100644 (file)
--- a/BookGPU/Chapters/chapter15/ch15.tex
+++ b/BookGPU/Chapters/chapter15/ch15.tex
@@ -670,7 +670,7 @@ Fig.~\ref{offdiagonal} for an off-diagonal sector.
    These copies, along with possible scalings or transpositions, are
    implemented as CUDA kernels which can be applied to two
    matrices of any size starting at any offset. 
    These copies, along with possible scalings or transpositions, are
    implemented as CUDA kernels which can be applied to two
    matrices of any size starting at any offset. 
-  Memory accesses are coalesced\index{coalesced memory accesses} \cite{CUDA_ProgGuide} in order to
+  Memory accesses are coalesced\index{GPU!coalesced memory accesses} \cite{CUDA_ProgGuide} in order to
    provide the best performance for such memory-bound kernels.
  \item[Step 2] (``Local copies''):~data are copied from
    local $R$-matrices to temporary arrays ($U$, $V$) and to $\Re^{O}$.
    provide the best performance for such memory-bound kernels.
  \item[Step 2] (``Local copies''):~data are copied from
    local $R$-matrices to temporary arrays ($U$, $V$) and to $\Re^{O}$.
@@ -917,7 +917,7 @@ one C2050 (Fermi) GPU, located at
   UPMC (Universit\'e Pierre et Marie Curie, Paris, France). 
  As a remark, the execution times measured on the C2050 would be the same 
  on the C2070 and on  the C2075, the only difference between these GPUs 
   UPMC (Universit\'e Pierre et Marie Curie, Paris, France). 
  As a remark, the execution times measured on the C2050 would be the same 
  on the C2070 and on  the C2075, the only difference between these GPUs 
-being their memory size and their TDP (Thermal Design Power)\index{TDP (Thermal Design Power)}. 
+being their memory size and their TDP (Thermal Design Power)\index{TDP (thermal design power)}. 
  We emphasize that the execution times correspond to the
  complete propagation for all six energies of the large case (see
  Table~\ref{data-sets}), that is to say to the complete execution of
  We emphasize that the execution times correspond to the
  complete propagation for all six energies of the large case (see
  Table~\ref{data-sets}), that is to say to the complete execution of
@@ -1093,7 +1093,9 @@ in order to enable concurrent executions among the required kernels.
    & Speedup & - & \multicolumn{2}{c|}{1.13} & \multicolumn{2}{c|}{1.17}  \\  
    \hline
  \end{tabular}
    & Speedup & - & \multicolumn{2}{c|}{1.13} & \multicolumn{2}{c|}{1.17}  \\  
    \hline
  \end{tabular}
-\caption{\label{t:perfs_V6} Performance results with multiple
+\caption[Performance results with multiple
+  concurrent energies 
+  on one C2070 GPU.]{\label{t:perfs_V6} Performance results with multiple
    concurrent energies 
    on one C2070 GPU. GPU initialization times are not considered here. }
  \end{center}
    concurrent energies 
    on one C2070 GPU. GPU initialization times are not considered here. }
  \end{center}