From: couturie Date: Sat, 6 Oct 2012 18:22:05 +0000 (+0200) Subject: suite X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/commitdiff_plain/0dee9cd4943e9ee5a8d83e90a8f42842e9cfae15 suite --- diff --git a/BookGPU/Chapters/chapter1/ch1.tex b/BookGPU/Chapters/chapter1/ch1.tex index 42b018c..cf7a8b5 100755 --- a/BookGPU/Chapters/chapter1/ch1.tex +++ b/BookGPU/Chapters/chapter1/ch1.tex @@ -74,7 +74,7 @@ comparison with OpenCL, interested readers may refer to~\cite{ch1:CMR:12}. \section{Architecture of current GPUs} -Architecture \index{Architecture of a GPU} of current GPUs is constantly +Architecture \index{architecture of a GPU} of current GPUs is constantly evolving. Nevertheless some trends remains true through this evolution. Processing units composing a GPU are far more simpler than a traditional CPU but it is much easier to integrate many computing units inside a @@ -231,12 +231,12 @@ will explicit that. \section{Memory hierarchy} -The memory hierarchy of GPUs\index{Memory~hierarchy} is different from the CPUs -one. In practice, there are registers\index{Memory~hierarchy!registers}, local -memory\index{Memory~hierarchy!local~memory}, shared -memory\index{Memory~hierarchy!shared~memory}, cache -memory\index{Memory~hierarchy!cache~memory} and global -memory\index{Memory~hierarchy!global~memory}. +The memory hierarchy of GPUs\index{memory~hierarchy} is different from the CPUs +one. In practice, there are registers\index{memory~hierarchy!registers}, local +memory\index{memory~hierarchy!local~memory}, shared +memory\index{memory~hierarchy!shared~memory}, cache +memory\index{memory~hierarchy!cache~memory} and global +memory\index{memory~hierarchy!global~memory}. As previously mentioned each thread can access its own registers. It is diff --git a/BookGPU/Chapters/chapter2/ch2.tex b/BookGPU/Chapters/chapter2/ch2.tex index bff6d44..804afc2 100755 --- a/BookGPU/Chapters/chapter2/ch2.tex +++ b/BookGPU/Chapters/chapter2/ch2.tex @@ -41,17 +41,17 @@ parameter is set to \texttt{cudaMemcpyHostToDevice}. The first parameter of the function is the destination array, the second is the source array and the third is the number of elements to copy (exprimed in bytes). -Now the GPU contains the data needed to perform the addition. In sequential such -addition is achieved out with a loop on all the elements. With a GPU, it is -possible to perform the addition of all elements of the arrays in parallel (if -the number of blocks and threads per blocks is sufficient). In +Now that the GPU contains the data needed to perform the addition. In sequential +such addition is achieved out with a loop on all the elements. With a GPU, it +is possible to perform the addition of all elements of the arrays in parallel +(if the number of blocks and threads per blocks is sufficient). In Listing\ref{ch2:lst:ex1} at the beginning, a simple kernel, called \texttt{addition} is defined to compute in parallel the summation of the -two arrays. With CUDA, a kernel starts with the keyword \texttt{\_\_global\_\_} -which indicates that this kernel can be call from the C code. The first -instruction in this kernel is used to computed the \texttt{tid} which -representes the thread index. This thread index is computed according to the -values of the block index (it is a variable of CUDA +two arrays. With CUDA, a kernel starts with the keyword \texttt{\_\_global\_\_} \index{CUDA~keywords!\_\_shared\_\_} +which indicates that this kernel can be called from the C code. The first +instruction in this kernel is used to compute the variable \texttt{tid} which +represents the thread index. This thread index\index{thread index} is computed +according to the values of the block index (it is a variable of CUDA called \texttt{blockIdx}\index{CUDA~keywords!blockIdx}). Blocks of threads can be decomposed into 1 dimension, 2 dimensions or 3 dimensions. According to the dimension of data manipulated, the appropriate dimension can be useful. In our