X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/1874c46934f4ba7e8c2013d3829f65309456d292..0f26b548f3029a96eafd6667024aabe2b36e464b:/BookGPU/Chapters/chapter4/ch4.tex?ds=inline diff --git a/BookGPU/Chapters/chapter4/ch4.tex b/BookGPU/Chapters/chapter4/ch4.tex index 805de25..90612c9 100644 --- a/BookGPU/Chapters/chapter4/ch4.tex +++ b/BookGPU/Chapters/chapter4/ch4.tex @@ -20,7 +20,7 @@ to $I$ as an $H\times L$ pixel gray-level image and to $I(x,y)$ as the gray-leve value of each pixel of coordinates $(x,y)$. - +\clearpage \section{Definition} Within a digital image $I$, the convolution operation is performed between image $I$ and convolution mask \emph{h} (To avoid confusion with other @@ -240,7 +240,7 @@ However, our technique requires writing one kernel per mask size, which can be s \lstinputlisting[label={lst:convoGene8x8pL3},caption=CUDA kernel achieving a $3\times 3$ convolution operation with the mask in symbol memory and direct data fetches in texture memory]{Chapters/chapter4/code/convoGene8x8pL3.cu} \subsection{Using shared memory to store prefetched data\index{prefetching}.} - \index{memory~hierarchy!shared~memory} + \index{memory hierarchy!shared memory} A more convenient way of coding a convolution kernel is to use shared memory to perform a prefetching stage of the whole halo before computing the convolution sums. This proves to be quite efficient and more versatile, but it obviously generates some overhead because \begin{itemize}