\section{Overview}
In this chapter, after dealing with GPU median filter implementations,
-we propose to explore how convolutions\index{Convolution} can be implemented on modern
+we propose to explore how convolutions\index{convolution} can be implemented on modern
GPUs. Widely used in digital image processing filters, the \emph{convolution
operation} basically consists of taking the sum of products of elements
from two 2D functions, letting one of the two functions move over
convolutions of the techniques applied to median filters in the
previous chapter, as a reminder: texture memory used with incoming
data, pinned memory with output data, optimized use of registers
-while processing data and multiple output per thread\index{Multiple output per thread}.
+while processing data and multiple output per thread\index{multiple output per thread}.
One significant difference lies in the fact
that the median filter uses only one parameter, the size of the window mask,
which can be hard-coded, while a convolution mask requires referring to several parameters; hard-coding
\lstinputlisting[label={lst:convoGene8x8pL3},caption=CUDA kernel achieving a $3\times 3$ convolution operation with the mask in symbol memory and direct data fetches in texture memory]{Chapters/chapter4/code/convoGene8x8pL3.cu}
-\subsection{Using shared memory to store prefetched data\index{Prefetching}.}
+\subsection{Using shared memory to store prefetched data\index{prefetching}.}
\index{memory~hierarchy!shared~memory}
A more convenient way of coding a convolution kernel is to use shared memory to perform a prefetching stage of the whole halo before computing the convolution sums.
This proves to be quite efficient and more versatile, but it obviously generates some overhead because
\label{tab:cpyToArray}
\end{table}
\lstinputlisting[label={lst:convoSepSh},caption=data copy between the calls to 1D convolution kernels achieving a 2D separable convolution operation]{Chapters/chapter4/code/convoSepSh.cu}
-\lstinputlisting[label={lst:convoSepShV},caption=CUDA kernel achieving a horizontal 1D convolution operation after a preloading \index{Prefetching} of data into shared memory]{Chapters/chapter4/code/convoSepShV.cu}
+\lstinputlisting[label={lst:convoSepShV},caption=CUDA kernel achieving a horizontal 1D convolution operation after a preloading \index{prefetching} of data into shared memory]{Chapters/chapter4/code/convoSepShV.cu}
\lstinputlisting[label={lst:convoSepShH},caption=CUDA kernel achieving a vertical 1D convolution operation after a preloading of data into shared memory]{Chapters/chapter4/code/convoSepShH.cu}
\section{Conclusion}