Many authors have dealt with the parallelization of
simultaneous methods, i.e. that find all the zeros simultaneously.
-Freeman~\cite{Freeman89} implemeted and compared DK, EA and another method of the fourth order proposed
-by Farmer and Loizou~\cite{Loizon83}, on a 8- processor linear
+Freeman~\cite{Freeman89} implemented and compared DK, EA and another method of the fourth order proposed
+by Farmer and Loizou~\cite{Loizon83}, on a 8-processor linear
chain, for polynomials of degree up to 8. The third method often
-diverges, but the first two methods have speed-up 5.5
-(speed-up=(Time on one processor)/(Time on p processors)). Later,
+diverges, but the first two methods have speed-up equal to 5.5. Later,
Freeman and Bane~\cite{Freemanall90} considered asynchronous
algorithms, in which each processor continues to update its
approximations even though the latest values of other $z_i((k))$
have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration.
-Couturier and al~\cite{Raphaelall01} proposed two methods of parallelization for
+Couturier and al.~\cite{Raphaelall01} proposed two methods of parallelization for
a shared memory architecture and for distributed memory one. They were able to
-compute the roots of polynomials of degree 10000 in 430 seconds with only 8
+compute the roots of sparse polynomials of degree 10000 in 430 seconds with only 8
personal computers and 2 communications per iteration. Comparing to the sequential implementation
-where it takes up to 3300 seconds to obtain the same results, the authors show an interesting speedup, indeed.
+where it takes up to 3300 seconds to obtain the same results, the authors show an interesting speedup.
-Very few works had been since this last work until the appearing of
+Very few works had been performed since this last work until the appearing of
the Compute Unified Device Architecture (CUDA)~\cite{CUDA10}, a
parallel computing platform and a programming model invented by
NVIDIA. The computing power of GPUs (Graphics Processing Unit) has exceeded that of CPUs. However, CUDA adopts a totally new computing architecture to use the
Ghidouche and al~\cite{Kahinall14} proposed an implementation of the
Durand-Kerner method on GPU. Their main
-result showed that a parallel CUDA implementation is 10 times as fast as
-the sequential implementation on a single CPU for high degree
-polynomials of about 48000. To our knowledge, it is the first time such high degree polynomials are numerically solved.
-
-
-In this paper, we focus on the implementation of the Ehrlich-Aberth method for
-high degree polynomials on GPU. The paper is organized as fellows. Initially, we recall the Ehrlich-Aberth method in Section \ref{sec1}. Improvements for the Ehrlich-Aberth method are proposed in Section \ref{sec2}. Related work to the implementation of simultaneous methods using a parallel approach is presented in Section \ref{secStateofArt}.
-In Section \ref{sec5} we propose a parallel implementation of the Ehrlich-Aberth method on GPU and discuss it. Section \ref{sec6} presents and investigates our implementation and experimental study results. Finally, Section\ref{sec7} 6 concludes this paper and gives some hints for future research directions in this topic.
+result showed that a parallel CUDA implementation is about 10 times faster than
+the sequential implementation on a single CPU for sparse
+polynomials of degree 48000.
+
+
+In this paper, we focus on the implementation of the Ehrlich-Aberth
+method for high degree polynomials on GPU. We propose an adaptation of
+the exponential logarithm in order to be able to solve sparse and full
+polynomial of degree up to $1,000,000$. The paper is organized as
+follows. Initially, we recall the Ehrlich-Aberth method in Section
+\ref{sec1}. Improvements for the Ehrlich-Aberth method are proposed in
+Section \ref{sec2}. Related work to the implementation of simultaneous
+methods using a parallel approach is presented in Section
+\ref{secStateofArt}. In Section \ref{sec5} we propose a parallel
+implementation of the Ehrlich-Aberth method on GPU and discuss
+it. Section \ref{sec6} presents and investigates our implementation
+and experimental study results. Finally, Section\ref{sec7} 6 concludes
+this paper and gives some hints for future research directions in this
+topic.
\section{The Sequential Aberth method}
\label{sec1}