From 254d68992f882593d5924b2cd3e97de2fa251051 Mon Sep 17 00:00:00 2001 From: zianekhodja Date: Sat, 16 Jan 2016 16:47:58 +0100 Subject: [PATCH] new --- paper.tex | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/paper.tex b/paper.tex index 41c5960..00671de 100644 --- a/paper.tex +++ b/paper.tex @@ -736,9 +736,7 @@ investigated. \LZK{Répétition! Le même texte est déjà écrit comme intro da Like any parallel code, a GPU parallel implementation first requires to determine the sequential code and the data-parallel operations of a algorithm. In fact, all the operations that are easy to execute in parallel must be made by the GPU to accelerate the execution, like the steps 3 and 4. On the other hand, all the sequential operations and the operations that have data dependencies between CUDA threads or recursive computations must be executed by only one CUDA thread or a CPU thread (the steps 1 and 2).\LZK{La méthode est déjà mal présentée, dans ce cas c'est encore plus difficile de comprendre que représentent ces différentes étapes!} Initially, we specify the organization of parallel threads by specifying the dimension of the grid \verb+Dimgrid+, the number of blocks per grid \verb+DimBlock+ and the number of threads per block. -The code is organized kernels which are part of code that are run on -GPU devices. For step 3, there are two kernels, the first named -\textit{save} is used to save vector $Z^{K-1}$ and the second one is +The code is organized as kernels which are parts of code that are run on GPU devices. For step 3, there are two kernels, the first is named \textit{save} is used to save vector $Z^{K-1}$ and the second one is named \textit{update} and is used to update the $Z^{K}$ vector. For step 4, a kernel tests the convergence of the method. In order to compute the function H, we have two possibilities: either to use the @@ -757,6 +755,7 @@ comes in particular from the fact that it is very difficult to debug CUDA running threads like threads on a CPU host. In the following paragraph Algorithm~\ref{alg1-cuda} shows the GPU parallel implementation of Ehrlich-Aberth method. +\LZK{Vaut mieux expliquer l'implémentation en faisant référence à l'algo séquentiel que de parler des différentes steps.} \begin{algorithm}[htpb] \label{alg1-cuda} -- 2.39.5