\label{chapter1}
\section{Introduction}\label{ch1:intro}
-
+``test" "test" ``test''
This chapter introduces the Graphics Processing Unit (GPU) architecture and all
the concepts needed to understand how GPUs work and can be used to speed up the
execution of some algorithms. First of all this chapter gives a brief history of
has been replaced by OpenCL, BrookGPU by Standford University~\cite{ch1:Buck:2004:BGS}.
Another environment based on pragma (insertion of pragma directives inside the
code to help the compiler to generate efficient code) is called OpenACC. For a
-comparison with OpenCL, interested readers may refer to~\cite{ch1:CMR:12}.
+comparison with OpenCL, interested readers may refer to~\cite{ch1:Dongarra}.
\section{Architecture of current GPUs}
-The architecture \index{architecture of a GPU} of current GPUs is constantly
+The architecture \index{GPU!architecture of a} of current GPUs is constantly
evolving. Nevertheless some trends remain constant throughout this evolution.
Processing units composing a GPU are far simpler than a traditional CPU and
it is much easier to integrate many computing units inside a GPU card than to do
threads are different from traditional threads for a CPU. In
Chapter~\ref{chapter2}, some examples of GPU programming will explain the
details of the GPU threads. Threads are gathered into blocks of 32
-threads, called warps. These warps are important when designing an algorithm
+threads, called ``warps''. These warps are important when designing an algorithm
for GPU.
\section{Memory hierarchy}
-The memory hierarchy of GPUs\index{memory~hierarchy} is different from that of CPUs. In practice, there are registers\index{memory~hierarchy!registers}, local
-memory\index{memory~hierarchy!local~memory}, shared
-memory\index{memory~hierarchy!shared~memory}, cache
-memory\index{memory~hierarchy!cache~memory}, and global
-memory\index{memory~hierarchy!global~memory}.
+The memory hierarchy of GPUs\index{memory hierarchy} is different from that of CPUs. In practice, there are registers\index{memory hierarchy!registers}, local
+memory\index{memory hierarchy!local memory}, shared
+memory\index{memory hierarchy!shared memory}, cache
+memory\index{memory hierarchy!cache memory}, and global
+memory\index{memory hierarchy!global memory}.
As previously mentioned each thread can access its own registers. It is