-approximations even although the latest values of other $z_i((k))$
-have not received from the other processors, in difference with
-the synchronous version where it would wait.
-in~\cite{Raphaelall01}proposed two methods of parallelization for
-architecture with shared memory and distributed memory,it able to
-compute the root of polynomial degree 10000 on 430 s with only 8
-pc and 2 communications per iteration. Compare to the sequential
-it take 3300 s to obtain the same results.
-
-After this few works discuses this problem until the apparition of
-the Compute Unified Device Architecture (CUDA)~\cite{CUDA10},a
+approximations even though the latest values of other $z_i((k))$
+have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration.
+Couturier et al. ~\cite{Raphaelall01} proposed two methods of parallelisation for
+a shared memory architecture and for distributed memory one. They were able to
+compute the roots of polynomials of degree 10000 in 430 seconds with only 8
+personal computers and 2 communications per iteration. Comparing to the sequential implementation
+where it takes up to 3300 seconds to obtain the same results, the authors show an interesting speedup, indeed.
+
+Very few works had been since this last work until the appearing of
+the Compute Unified Device Architecture (CUDA)~\cite{CUDA10}, a