-each processor needs to compute its offset and its local size. Then
-processors need to allocate memory on their GPU (line 5). At the
-beginning of each iteration, a processor starts by transfering the
-whole vector Z from the CPU to the GPU (line 7). Then only the local
-part of $Z^{prev}$ is saved (line 8). After that, a processor is able
-to compute its own roots (line 9). Next, the local error can be
-computed (ligne 10) and the global error (line 11). Then the local
-roots are transfered from the GPU memory to the CPU memory (line 12)
-before being exchanged between all processors (linge 13) in order to
-give to all processors the last version of the roots. If the
-convergence is not statisfied, an new iteration is executed.
+each processor needs to compute its offset and its local
+size. Processors need to allocate memory on their GPU and need to copy
+their data on the GPU (line 5). At the beginning of each iteration, a
+processor starts by transfering the whole vector Z from the CPU to the
+GPU (line 7). Only the local part of $Z^{prev}$ is saved (line
+8). After that, a processor is able to compute an updated version of
+its own roots (line 9) with the EA method. The local error is computed
+(ligne 10) and the global error using $MPI\_Reduce$ (line 11). Then
+the local roots are transfered from the GPU memory to the CPU memory
+(line 12) before being exchanged between all processors (lige 13) in
+order to give to all processors the last version of the roots (with
+the MPI\_AlltoAll routine). If the convergence is not statisfied, an
+new iteration is executed.