\begin{abstract}
Polynomials are mathematical algebraic structures that play a great
-role in science and engineering. Finding roots of high degree
+role in science and engineering. Finding the roots of high degree
polynomials is computationally demanding. In this paper, we present
the results of a parallel implementation of the Ehrlich-Aberth
algorithm for the root finding problem for high degree polynomials on
\linenumbers
-\section{The problem of finding roots of a polynomial}
-Polynomials are mathematical algebraic structures used in science and engineering to capture physical phenomenons and to express any outcome in the form of a function of some unknown variables. Formally speaking, a polynomial $p(x)$ of degree \textit{n} having $n$ coefficients in the complex plane \textit{C} is :
+\section{The problem of finding the roots of a polynomial}
+Polynomials are mathematical algebraic structures used in science and engineering to capture physical phenomena and to express any outcome in the form of a function of some unknown variables. Formally speaking, a polynomial $p(x)$ of degree \textit{n} having $n$ coefficients in the complex plane \textit{C} is :
%%\begin{center}
\begin{equation}
{\Large p(x)=\sum_{i=0}^{n}{a_{i}x^{i}}}.
\end{equation}
%%\end{center}
-The root finding problem consists in finding the values of all the $n$ values of the variable $x$ for which \textit{p(x)} is nullified. Such values are called zeroes of $p$. If zeros are $\alpha_{i},\textit{i=1,...,n}$ the $p(x)$ can be written as :
+The root finding problem consists in finding the values of all the $n$ values of the variable $x$ for which \textit{p(x)} is nullified. Such values are called zeros of $p$. If zeros are $\alpha_{i},\textit{i=1,...,n}$ the $p(x)$ can be written as :
\begin{equation}
{\Large p(x)=a_{n}\prod_{i=1}^{n}(x-\alpha_{i}), a_{0} a_{n}\neq 0}.
\end{equation}
$g(x)= f(x)-x$.
\end{center}
-Often it is not be possible to solve such nonlinear equation
-root-finding problems analytically. When this occurs we turn to
+It is often impossible to solve such nonlinear equation
+root-finding problems analytically. When this occurs, we turn to
numerical methods to approximate the solution.
Generally speaking, algorithms for solving problems can be divided into
two main groups: direct methods and iterative methods.
-\\
-Direct methods exist only for $n \leq 4$, solved in closed form by G. Cardano
-in the mid-16th century. However, N. H. Abel in the early 19th
-century showed that polynomials of degree five or more could not
+
+Direct methods only exist for $n \leq 4$, solved in closed form
+by G. Cardano in the mid-16th century. However, N. H. Abel in the early 19th
+century proved that polynomials of degree five or more could not
be solved by direct methods. Since then, mathematicians have
focussed on numerical (iterative) methods such as the famous
-Newton method, the Bernoulli method of the 18th, and the Graeffe method.
+Newton method, the Bernoulli method of the 18th century, and the Graeffe method.
Later on, with the advent of electronic computers, other methods have
been developed such as the Jenkins-Traub method, the Larkin method,
-the Muller method, and several methods for simultaneous
+the Muller method, and several other methods for the simultaneous
approximation of all the roots, starting with the Durand-Kerner (DK)
method:
%%\begin{center}
the Ehrlich-Aberth method (EA) has a cubic order of convergence for simple roots whereas the Durand-Kerner has a quadratic order of convergence.
-Iterative methods raise several problem when implemented e.g.
-specific sizes of numbers must be used to deal with this
-difficulty. Moreover, the convergence time of iterative methods
+Moreover, the convergence times of iterative methods
drastically increases like the degrees of high polynomials. It is expected that the
-parallelization of these algorithms will improve the convergence
-time.
+parallelization of these algorithms will reduce the execution times.
Many authors have dealt with the parallelization of
simultaneous methods, i.e. that find all the zeros simultaneously.
Freeman~\cite{Freeman89} implemented and compared DK, EA and another method of the fourth order proposed
-by Farmer and Loizou~\cite{Loizou83}, on a 8-processor linear
-chain, for polynomials of degree up to 8. The third method often
-diverges, but the first two methods have speed-up equal to 5.5. Later,
+by Farmer and Loizou~\cite{Loizou83}, on an 8-processor linear
+chain, for polynomials of degree 8. The third method often
+diverges, but the first two methods have speed-ups equal to 5.5. Later,
Freeman and Bane~\cite{Freemanall90} considered asynchronous
algorithms, in which each processor continues to update its
-approximations even though the latest values of other $z_i^{k}$
-have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration.
+approximations even though the latest values of other roots
+have not yet been received from the other processors. In contrast,
+synchronous algorithms wait the computation of all roots at a given
+iterations before making a new one.
Couturier and al.~\cite{Raphaelall01} proposed two methods of parallelization for
a shared memory architecture and for distributed memory one. They were able to
compute the roots of sparse polynomials of degree 10,000 in 430 seconds with only 8
-personal computers and 2 communications per iteration. Comparing to the sequential implementation
-where it takes up to 3,300 seconds to obtain the same results, the authors show an interesting speedup.
+personal computers and 2 communications per iteration. Compared to sequential implementations
+where it takes up to 3,300 seconds to obtain the same results, the
+authors' work experiment show an interesting speedup.
-Very few works had been performed since this last work until the appearing of
+Few works have been conducted after those works until the appearance of
the Compute Unified Device Architecture (CUDA)~\cite{CUDA10}, a
parallel computing platform and a programming model invented by
NVIDIA. The computing power of GPUs (Graphics Processing Unit) has exceeded that of CPUs. However, CUDA adopts a totally new computing architecture to use the
\section{Ehrlich-Aberth method}
\label{sec1}
-A cubically convergent iteration method for finding zeros of
-polynomials was proposed by O. Aberth~\cite{Aberth73}. The Ehrlich-Aberth method contain 4 main steps, presented in the following.
+A cubically convergent iteration method to find zeros of
+polynomials was proposed by O. Aberth~\cite{Aberth73}. The
+Ehrlich-Aberth method contains 4 main steps, presented in what
+follows.
+
%The Aberth method is a purely algebraic derivation.
%To illustrate the derivation, we let $w_{i}(z)$ be the product of linear factors