suite

[book_gpu.git] / BookGPU / Chapters / chapter2 / ch2.tex
diff --git a/BookGPU/Chapters/chapter2/ch2.tex b/BookGPU/Chapters/chapter2/ch2.tex

index 0640708ecb3fed538952560cf313bc8c5a5e6223..804afc2933c1b9b924babaf32a2fff7c06d3ee41 100755 (executable)
--- a/BookGPU/Chapters/chapter2/ch2.tex
+++ b/BookGPU/Chapters/chapter2/ch2.tex
@@ -41,18 +41,18 @@ parameter is set to  \texttt{cudaMemcpyHostToDevice}. The first parameter of the
  function is the destination array, the  second is the source array and the third
  is the number of elements to copy (exprimed in bytes).
  
-Now the GPU contains the data needed to perform the addition. In sequential such
-addition is  achieved out with a  loop on all the  elements.  With a  GPU, it is
-possible to perform  the addition of all elements of the  arrays in parallel (if
-the   number  of   blocks   and   threads  per   blocks   is  sufficient).    In
+Now that the GPU contains the data needed to perform the addition. In sequential
+such addition is achieved  out with a loop on all the  elements.  With a GPU, it
+is possible  to perform the addition of  all elements of the  arrays in parallel
+(if  the  number   of  blocks  and  threads  per   blocks  is  sufficient).   In
  Listing\ref{ch2:lst:ex1}     at    the     beginning,    a     simple    kernel,
  called \texttt{addition} is defined to  compute in parallel the summation of the
-two arrays. With CUDA, a  kernel starts with the keyword \texttt{\_\_global\_\_}
-which  indicates that  this  kernel  can be  call  from the  C  code. The  first
-instruction  in  this  kernel  is   used  to  computed  the  \texttt{tid}  which
-representes the  thread index.  This thread  index is computed  according to the
-values    of    the    block    index    (it   is    a    variable    of    CUDA
-called  \texttt{blockIdx\index{CUDA~keywords!blockIdx}}). Blocks of  threads can
+two arrays. With CUDA, a  kernel starts with the keyword \texttt{\_\_global\_\_} \index{CUDA~keywords!\_\_shared\_\_}
+which  indicates that this  kernel can  be called  from the  C code.   The first
+instruction in this  kernel is used to compute  the variable \texttt{tid} which
+represents the thread index.   This thread index\index{thread index} is computed
+according  to  the  values  of  the  block  index (it  is  a  variable  of  CUDA
+called  \texttt{blockIdx}\index{CUDA~keywords!blockIdx}). Blocks of  threads can
  be decomposed into  1 dimension, 2 dimensions or 3  dimensions. According to the
  dimension of data  manipulated, the appropriate dimension can  be useful. In our
  example, only  one dimension  is used.  Then  using notation \texttt{.x}  we can