X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/fd7d6f1c6c149f79839200277703c9ef950150f8..b1fd489e34a8d46d286a0d271c38cbfb442f511f:/BookGPU/Chapters/chapter1/ch1.tex diff --git a/BookGPU/Chapters/chapter1/ch1.tex b/BookGPU/Chapters/chapter1/ch1.tex index 2fef3a4..bfa57c8 100755 --- a/BookGPU/Chapters/chapter1/ch1.tex +++ b/BookGPU/Chapters/chapter1/ch1.tex @@ -59,10 +59,10 @@ have been proposed. The other well-known alternative is OpenCL which aims at proposing an alternative to CUDA and which is multiplatform and portable. This is a great advantage since it is even possible to execute OpenCL programs on traditional CPUs. The main drawback is that it is less close to the hardware -and consequently it sometimes provides less efficient programs. Moreover, CUDA +and, consequently, it sometimes provides less efficient programs. Moreover, CUDA benefits from more mature compilation and optimization procedures. Other less known environments have been proposed, but most of them have been discontinued, -such FireStream by ATI which is not maintained anymore and has been replaced by +such as FireStream by ATI, which is not maintained anymore and has been replaced by OpenCL and BrookGPU by Stanford University~\cite{ch1:Buck:2004:BGS}. Another environment based on pragma (insertion of pragma directives inside the code to help the compiler to generate efficient code) is called OpenACC. For a @@ -267,8 +267,8 @@ to fill the shared memory at the start of the kernel with global data that are used very frequently, then threads can access it for their computation. Threads can obviously change the content of this shared memory either with computation or by loading other data and they can store its content in the global memory. So -shared memory can be seen as a cache memory which is manageable manually. This -obviously requires an effort from the programmer. +shared memory can be seen as a cache memory, which is manually managed. This +obviously requires effort from the programmer. On recent cards, the programmer may decide what amount of cache memory and shared memory is attributed to a kernel. The cache memory is an L1 cache which is