-A kernel is a function which contains a block a instruction that are executed by
-threads of a GPU. When the problem considered is a 2 dimensions or 3 dimensions
-problem, it is possible to group thread blocks into grid. In practice, the
-number of thread blocks and the size of thread block is given in parameter to
-each kernel. Figure~\ref{ch1:fig:scalability} illustrates an example of a
-kernel composed of 8 thread blocks. Then this kernel is executed on a small
-device containing only 2 SMs. So in in this case, blocks are executed 2 by 2 in
-any order. If the kernel is executed on a larger CUDA device containing 4 SMs,
-blocks are executed 4 by 4 simultaneously. The execution times should be
+A kernel is a function which contains a block of instructions that are executed
+by the threads of a GPU. When the problem considered is a 2 dimensions or 3
+dimensions problem, it is possible to group thread blocks into grid. In
+practice, the number of thread blocks and the size of thread block is given in
+parameter to each kernel. Figure~\ref{ch1:fig:scalability} illustrates an
+example of a kernel composed of 8 thread blocks. Then this kernel is executed on
+a small device containing only 2 SMs. So in in this case, blocks are executed 2
+by 2 in any order. If the kernel is executed on a larger CUDA device containing
+4 SMs, blocks are executed 4 by 4 simultaneously. The execution times should be