Like any parallel code, a GPU parallel implementation first requires to determine the sequential code and the data-parallel operations of a algorithm. In fact, all the operations that are easy to execute in parallel must be made by the GPU to accelerate the execution, like the steps 3 and 4. On the other hand, all the sequential operations and the operations that have data dependencies between CUDA threads or recursive computations must be executed by only one CUDA thread or a CPU thread (the steps 1 and 2).\LZK{La méthode est déjà mal présentée, dans ce cas c'est encore plus difficile de comprendre que représentent ces différentes étapes!} Initially, we specify the organization of parallel threads by specifying the dimension of the grid \verb+Dimgrid+, the number of blocks per grid \verb+DimBlock+ and the number of threads per block.
Like any parallel code, a GPU parallel implementation first requires to determine the sequential code and the data-parallel operations of a algorithm. In fact, all the operations that are easy to execute in parallel must be made by the GPU to accelerate the execution, like the steps 3 and 4. On the other hand, all the sequential operations and the operations that have data dependencies between CUDA threads or recursive computations must be executed by only one CUDA thread or a CPU thread (the steps 1 and 2).\LZK{La méthode est déjà mal présentée, dans ce cas c'est encore plus difficile de comprendre que représentent ces différentes étapes!} Initially, we specify the organization of parallel threads by specifying the dimension of the grid \verb+Dimgrid+, the number of blocks per grid \verb+DimBlock+ and the number of threads per block.