X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/cacf1137755cd5d5eba063f453055e49a25d579f..e9bec782df9908a1d94ab5629dbe7710293019bd:/BookGPU/Chapters/chapter15/ch15.tex?ds=inline diff --git a/BookGPU/Chapters/chapter15/ch15.tex b/BookGPU/Chapters/chapter15/ch15.tex index 7e25220..7860441 100644 --- a/BookGPU/Chapters/chapter15/ch15.tex +++ b/BookGPU/Chapters/chapter15/ch15.tex @@ -670,7 +670,7 @@ Fig.~\ref{offdiagonal} for an off-diagonal sector. These copies, along with possible scalings or transpositions, are implemented as CUDA kernels which can be applied to two matrices of any size starting at any offset. - Memory accesses are coalesced\index{coalesced memory accesses} \cite{CUDA_ProgGuide} in order to + Memory accesses are coalesced\index{GPU!coalesced memory accesses} \cite{CUDA_ProgGuide} in order to provide the best performance for such memory-bound kernels. \item[Step 2] (``Local copies''):~data are copied from local $R$-matrices to temporary arrays ($U$, $V$) and to $\Re^{O}$. @@ -917,7 +917,7 @@ one C2050 (Fermi) GPU, located at UPMC (Universit\'e Pierre et Marie Curie, Paris, France). As a remark, the execution times measured on the C2050 would be the same on the C2070 and on the C2075, the only difference between these GPUs -being their memory size and their TDP (Thermal Design Power)\index{TDP (Thermal Design Power)}. +being their memory size and their TDP (Thermal Design Power)\index{TDP (thermal design power)}. We emphasize that the execution times correspond to the complete propagation for all six energies of the large case (see Table~\ref{data-sets}), that is to say to the complete execution of @@ -1093,7 +1093,9 @@ in order to enable concurrent executions among the required kernels. & Speedup & - & \multicolumn{2}{c|}{1.13} & \multicolumn{2}{c|}{1.17} \\ \hline \end{tabular} -\caption{\label{t:perfs_V6} Performance results with multiple +\caption[Performance results with multiple + concurrent energies + on one C2070 GPU.]{\label{t:perfs_V6} Performance results with multiple concurrent energies on one C2070 GPU. GPU initialization times are not considered here. } \end{center}