-On most powerful GPU cards, called Fermi, multiprocessors are called streaming
-multiprocessors (SM). Each SM contains 32 cores and is able to perform 32
-floating point or integer operations on 32bits numbers per clock or 16 floating
-point on 64bits number per clock. SMs have their own registers, execution
-pipelines and caches. On Fermi architecture, there are 64Kb shared memory + L1
-cache and 32,536 32bits registers per SM. More precisely the programmer can
-decide what amount of shared memory and L1 cache SM can use. The constaint is
-that the sum of both amounts is less or equal to 64Kb.
-
-Threads are used to benefit from the important number of cores of a GPU. Those
-threads are different from traditional threads for CPU. In
-chapter~\ref{chapter2}, some examples of GPU programming will explicit the
-details of the GPU threads. However, threads are gathered into blocks of 32
-threads, called ``warp''. Those warps are important when designing an algorithm
+On the most powerful GPU cards, called Fermi, multiprocessors are called streaming
+multiprocessors (SMs). Each SM contains 32 cores and is able to perform 32
+floating points or integer operations per clock on 32 bit numbers or 16 floating
+points per clock on 64 bit numbers. SMs have their own registers, execution
+pipelines and caches. On Fermi architecture, there are 64Kb shared memory plus L1
+cache and 32,536 32 bit registers per SM. More precisely the programmer can
+decide what amounts of shared memory and L1 cache SM are to be used. The constraint is
+that the sum of both amounts should be less than or equal to 64Kb.
+
+Threads are used to benefit from the large number of cores of a GPU. These
+threads are different from traditional threads for a CPU. In
+Chapter~\ref{chapter2}, some examples of GPU programming will explain the
+details of the GPU threads. Threads are gathered into blocks of 32
+threads, called ``warps''. These warps are important when designing an algorithm