proposing an alternative to CUDA and which is multiplatform and portable. This
is a great advantage since it is even possible to execute OpenCL programs on
traditional CPUs. The main drawback is that it is less close to the hardware
-and consequently it sometimes provides less efficient programs. Moreover, CUDA
+and, consequently, it sometimes provides less efficient programs. Moreover, CUDA
benefits from more mature compilation and optimization procedures. Other less
known environments have been proposed, but most of them have been discontinued,
-such FireStream by ATI which is not maintained anymore and has been replaced by
+such as FireStream by ATI, which is not maintained anymore and has been replaced by
OpenCL and BrookGPU by Stanford University~\cite{ch1:Buck:2004:BGS}. Another
environment based on pragma (insertion of pragma directives inside the code to
help the compiler to generate efficient code) is called OpenACC. For a
used very frequently, then threads can access it for their computation. Threads
can obviously change the content of this shared memory either with computation
or by loading other data and they can store its content in the global memory. So
-shared memory can be seen as a cache memory which is manageable manually. This
-obviously requires an effort from the programmer.
+shared memory can be seen as a cache memory, which is manually managed. This
+obviously requires effort from the programmer.
On recent cards, the programmer may decide what amount of cache memory and
shared memory is attributed to a kernel. The cache memory is an L1 cache which is