suite

[book_gpu.git] / BookGPU / Chapters / chapter2 / ch2.tex
diff --git a/BookGPU/Chapters/chapter2/ch2.tex b/BookGPU/Chapters/chapter2/ch2.tex

index 0640708ecb3fed538952560cf313bc8c5a5e6223..804afc2933c1b9b924babaf32a2fff7c06d3ee41 100755 (executable)
--- a/BookGPU/Chapters/chapter2/ch2.tex
+++ b/BookGPU/Chapters/chapter2/ch2.tex
@@ -41,18 +41,18 @@ parameter is set to  \texttt{cudaMemcpyHostToDevice}. The first parameter of the
  function is the destination array, the  second is the source array and the third
  is the number of elements to copy (exprimed in bytes).
  
  function is the destination array, the  second is the source array and the third
  is the number of elements to copy (exprimed in bytes).
  
-Now the GPU contains the data needed to perform the addition. In sequential such
-addition is  achieved out with a  loop on all the  elements.  With a  GPU, it is
-possible to perform  the addition of all elements of the  arrays in parallel (if
-the   number  of   blocks   and   threads  per   blocks   is  sufficient).    In
+Now that the GPU contains the data needed to perform the addition. In sequential
+such addition is achieved  out with a loop on all the  elements.  With a GPU, it
+is possible  to perform the addition of  all elements of the  arrays in parallel
+(if  the  number   of  blocks  and  threads  per   blocks  is  sufficient).   In
  Listing\ref{ch2:lst:ex1}     at    the     beginning,    a     simple    kernel,
  called \texttt{addition} is defined to  compute in parallel the summation of the
  Listing\ref{ch2:lst:ex1}     at    the     beginning,    a     simple    kernel,
  called \texttt{addition} is defined to  compute in parallel the summation of the
-two arrays. With CUDA, a  kernel starts with the keyword \texttt{\_\_global\_\_}
-which  indicates that  this  kernel  can be  call  from the  C  code. The  first
-instruction  in  this  kernel  is   used  to  computed  the  \texttt{tid}  which
-representes the  thread index.  This thread  index is computed  according to the
-values    of    the    block    index    (it   is    a    variable    of    CUDA
-called  \texttt{blockIdx\index{CUDA~keywords!blockIdx}}). Blocks of  threads can
+two arrays. With CUDA, a  kernel starts with the keyword \texttt{\_\_global\_\_} \index{CUDA~keywords!\_\_shared\_\_}
+which  indicates that this  kernel can  be called  from the  C code.   The first
+instruction in this  kernel is used to compute  the variable \texttt{tid} which
+represents the thread index.   This thread index\index{thread index} is computed
+according  to  the  values  of  the  block  index (it  is  a  variable  of  CUDA
+called  \texttt{blockIdx}\index{CUDA~keywords!blockIdx}). Blocks of  threads can
  be decomposed into  1 dimension, 2 dimensions or 3  dimensions. According to the
  dimension of data  manipulated, the appropriate dimension can  be useful. In our
  example, only  one dimension  is used.  Then  using notation \texttt{.x}  we can
  be decomposed into  1 dimension, 2 dimensions or 3  dimensions. According to the
  dimension of data  manipulated, the appropriate dimension can  be useful. In our
  example, only  one dimension  is used.  Then  using notation \texttt{.x}  we can