value of each pixel of coordinates $(x,y)$.
-
+\clearpage
\section{Definition}
Within a digital image $I$, the convolution operation is performed between
image $I$ and convolution mask \emph{h} (To avoid confusion with other
\lstinputlisting[label={lst:convoGene8x8pL3},caption=CUDA kernel achieving a $3\times 3$ convolution operation with the mask in symbol memory and direct data fetches in texture memory]{Chapters/chapter4/code/convoGene8x8pL3.cu}
\subsection{Using shared memory to store prefetched data\index{prefetching}.}
- \index{memory~hierarchy!shared~memory}
+ \index{memory hierarchy!shared memory}
A more convenient way of coding a convolution kernel is to use shared memory to perform a prefetching stage of the whole halo before computing the convolution sums.
This proves to be quite efficient and more versatile, but it obviously generates some overhead because
\begin{itemize}