%We noticed that the code is executed by a large number of GPU threads organized as grid of to dimension (Number of block per grid (NbBlock), number of threads per block(Nbthread)), the Nbthread is fixed initially, the NbBlock is computed as fallow:
%$ NbBlocks= \frac{N+Nbthreads-1}{Nbthreads} where N: the number of root$
%We noticed that the code is executed by a large number of GPU threads organized as grid of to dimension (Number of block per grid (NbBlock), number of threads per block(Nbthread)), the Nbthread is fixed initially, the NbBlock is computed as fallow:
%$ NbBlocks= \frac{N+Nbthreads-1}{Nbthreads} where N: the number of root$