parallelization of these algorithms will improve the convergence
time.
-Many authors have dealt with parallelisation of
+Many authors have dealt with the parallelisation of
simultaneous methods, i.e. that find all the zeros simultaneously.
Freeman~\cite{Freeman89} implemeted and compared DK, EA and another method of the fourth order proposed
by Farmer and Loizou~\cite{Loizon83}, on a 8- processor linear
computation time per iteration, as reported
in~\cite{Benall68,Jana06,Janall99,Riceall06}.
-There are many
-schemes for simultaneous approximations of all roots of a given
+There are many schemes for the simultaneous approximation of all roots of a given
polynomial. Several works on different methods and issues of root
-finding have been reported in~\cite{Azad07, Gemignani07, Kalantari08, Skachek08, Zhancall08, Zhuall08}. However, Durand-Kerner and Ehrlich methods are the most practical choices among
+finding have been reported in~\cite{Azad07, Gemignani07, Kalantari08, Skachek08, Zhancall08, Zhuall08}. However, Durand-Kerner and Ehrlisch-Aberth methods are the most practical choices among
them~\cite{Bini04}. These two methods have been extensively
-studied for parallelization due to their following advantages. The
-computation involved in these methods has some inherent
+studied for parallelization due to their intrinsics, i.e. the
+computations involved in both methods has some inherent
parallelism that can be suitably exploited by SIMD machines.
Moreover, they have fast rate of convergence (quadratic for the
-Durand-Kerner method and cubic for the Ehrlich). Various parallel
+Durand-Kerner and cubic for the Ehrlisch-Aberth). Various parallel
algorithms reported for these methods can be found
in~\cite{Cosnard90, Freeman89,Freemanall90,,Jana99,Janall99}.
Freeman and Bane~\cite{Freemanall90} presented two parallel
algorithms on a local memory MIMD computer with the compute-to
communication time ratio O(n). However, their algorithms require
each processor to communicate its current approximation to all
-other processors at the end of each iteration. Therefore they
+other processors at the end of each iteration (synchronous). Therefore they
cause a high degree of memory conflict. Recently the author
in~\cite{Mirankar71} proposed two versions of parallel algorithm
-for the Durand-Kerner method, and Aberth method on model of
+for the Durand-Kerner method, and Ehrlisch-Aberth method on a model of
Optoelectronic Transpose Interconnection System (OTIS).The
algorithms are mapped on an OTIS-2D torus using N processors. This
-solution need N processors to compute N roots, that it is not
-practical (is not suitable to compute large polynomial's degrees).
-Until then, the related works are not able to compute the root of
-the large polynomial's degrees (higher then 1000) and with small
-time.
-
- Finding polynomial roots rapidly and accurately it is our
-objective, with the apparition of the CUDA(Compute Unified Device
-Architecture), finding the roots of polynomials becomes rewarding
-and very interesting, CUDA adopts a totally new computing
-architecture to use the hardware resources provided by GPU in
-order to offer a stronger computing ability to the massive data
-computing. In~\cite{Kahinall14} we proposed the first implantation
-of the root finding polynomials method on GPU (Graphics Processing
-Unit),which is the Durand-Kerner method. The main result prove
-that a parallel implementation is 10 times as fast as the
+solution needs N processors to compute N roots, which is not
+practical for solving polynomials with large degrees.
+Until very recently, the literature doen not mention implementations able to compute the roots of
+large degree polynomials (higher then 1000) and within small or at least tractable times. Finding polynomial roots rapidly and accurately is the main objective of our work.
+With the advent of CUDA (Compute Unified Device
+Architecture), finding the roots of polynomials receives a new attention because of the new possibilities to solve higher degree polynomials in less time.
+In~\cite{Kahinall14} we already proposed the first implementation
+of a root finding method on GPUs, that of the Durand-Kerner method. The main result showed
+that a parallel CUDA implementation is 10 times as fast as the
sequential implementation on a single CPU for high degree
-polynomials that is greater than about 48000. Indeed, in this
-paper we present a parallel implementation of Aberth method on
-GPU, more details are discussed in the following of this paper.
+polynomials of 48000. In this paper we present a parallel implementation of Ehlisch-Aberth method on
+GPUs, which details are discussed in the sequel.
\section {A parallel implementation of Aberth method}