correct

[kahina_paper1.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 96e01fd054a410f2d09c9958b0154fcd26892f22..1a307883c303d75f16d1073e57c98899c90aefd6 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -82,7 +82,7 @@
  
  \begin{abstract}
  Polynomials are mathematical algebraic structures that play a great
-role in science and engineering. Finding roots of high degree
+role in science and engineering. Finding the roots of high degree
  polynomials is computationally demanding. In this paper, we present
  the results of a parallel implementation of the Ehrlich-Aberth
  algorithm for the root finding problem for high degree polynomials on
@@ -101,15 +101,15 @@ Polynomial root finding, Iterative methods, Ehrlich-Aberth, Durand-Kerner, GPU
  
  \linenumbers
  
-\section{The problem of finding roots of a polynomial}
-Polynomials are mathematical algebraic structures used in science and engineering to capture physical phenomenons and to express any outcome in the form of a function of some unknown variables. Formally speaking,  a polynomial $p(x)$ of degree \textit{n} having $n$ coefficients in the complex plane \textit{C} is :
+\section{The problem of finding the roots of a polynomial}
+Polynomials are mathematical algebraic structures used in science and engineering to capture physical phenomena and to express any outcome in the form of a function of some unknown variables. Formally speaking,  a polynomial $p(x)$ of degree \textit{n} having $n$ coefficients in the complex plane \textit{C} is :
  %%\begin{center}
  \begin{equation}
       {\Large p(x)=\sum_{i=0}^{n}{a_{i}x^{i}}}.
  \end{equation}
  %%\end{center}
  
-The root finding problem consists in finding the values of all the $n$ values of the variable $x$ for which \textit{p(x)} is nullified. Such values are called zeroes of $p$. If zeros are $\alpha_{i},\textit{i=1,...,n}$ the $p(x)$ can be written as :
+The root finding problem consists in finding the values of all the $n$ values of the variable $x$ for which \textit{p(x)} is nullified. Such values are called zeros of $p$. If zeros are $\alpha_{i},\textit{i=1,...,n}$ the $p(x)$ can be written as :
  \begin{equation}
       {\Large p(x)=a_{n}\prod_{i=1}^{n}(x-\alpha_{i}), a_{0} a_{n}\neq 0}.
  \end{equation}
@@ -127,22 +127,22 @@ root-finding problem into a fixed-point problem by setting :
  $g(x)= f(x)-x$.
  \end{center}
  
-Often it is not be possible to solve such nonlinear equation
-root-finding problems analytically. When this occurs we turn to
+It is often impossible to solve such nonlinear equation
+root-finding problems analytically. When this occurs, we turn to
  numerical methods to approximate the solution. 
  Generally speaking, algorithms for solving problems can be divided into
  two main groups: direct methods and iterative methods.
-\\
-Direct methods exist only for $n \leq 4$, solved in closed form by G. Cardano
-in the mid-16th century. However, N. H. Abel in the early 19th
-century showed that polynomials of degree five or more could not
+
+Direct methods only exist for $n \leq 4$, solved in closed form
+by G. Cardano in the mid-16th century. However, N. H. Abel in the early 19th
+century proved that polynomials of degree five or more could not
  be solved by  direct methods. Since then, mathematicians have
  focussed on numerical (iterative) methods such as the famous
-Newton method, the Bernoulli method of the 18th, and the Graeffe method.
+Newton method, the Bernoulli method of the 18th century, and the Graeffe method.
  
  Later on, with the advent of electronic computers, other methods have
  been developed such as the Jenkins-Traub method, the Larkin method,
-the Muller method, and several methods for simultaneous
+the Muller method, and several other methods for the simultaneous
  approximation of all the roots, starting with the Durand-Kerner (DK)
  method:
  %%\begin{center}
@@ -176,30 +176,30 @@ Aberth, Ehrlich and Farmer-Loizou~\cite{Loizou83} have proved that
  the Ehrlich-Aberth method (EA) has a cubic order of convergence for simple roots whereas the Durand-Kerner has a quadratic order of convergence.
  
  
-Iterative methods raise several problem when implemented e.g.
-specific sizes of numbers must be used to deal with this
-difficulty. Moreover, the convergence time of iterative methods
+Moreover, the convergence times of iterative methods
  drastically increases like the degrees of high polynomials. It is expected that the
-parallelization of these algorithms will improve the convergence
-time.
+parallelization of these algorithms will reduce the execution times.
  
  Many authors have dealt with the parallelization of
  simultaneous methods, i.e. that find all the zeros simultaneously. 
  Freeman~\cite{Freeman89} implemented and compared DK, EA and another method of the fourth order proposed
-by Farmer and Loizou~\cite{Loizou83}, on a 8-processor linear
-chain, for polynomials of degree up to 8. The third method often
-diverges, but the first two methods have speed-up equal to 5.5. Later,
+by Farmer and Loizou~\cite{Loizou83}, on an 8-processor linear
+chain, for polynomials of degree 8. The third method often
+diverges, but the first two methods have speed-ups equal to 5.5. Later,
  Freeman and Bane~\cite{Freemanall90}  considered asynchronous
  algorithms, in which each processor continues to update its
-approximations even though the latest values of other $z_i^{k}$
-have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration.
+approximations even though the latest values of other roots
+have not yet been received from the other processors.  In contrast,
+synchronous algorithms   wait the computation of all roots at a given
+iterations  before making a new one.
  Couturier and al.~\cite{Raphaelall01} proposed two methods of parallelization for
  a shared memory architecture and for distributed memory one. They were able to
  compute the roots of sparse polynomials of degree 10,000 in 430 seconds with only 8
-personal computers and 2 communications per iteration. Comparing to the sequential implementation
-where it takes up to 3,300 seconds to obtain the same results, the authors show an interesting speedup.
+personal computers and 2 communications per iteration. Compared to sequential implementations
+where it takes up to 3,300 seconds to obtain the same results, the
+authors' work experiment show an interesting speedup.
  
-Very few works had been performed since this last work until the appearing of
+Few works have been conducted after those works until the appearance of
  the Compute Unified Device Architecture (CUDA)~\cite{CUDA10}, a
  parallel computing platform and a programming model invented by
  NVIDIA. The computing power of GPUs (Graphics Processing Unit) has exceeded that of CPUs. However, CUDA adopts a totally new computing architecture to use the
@@ -231,8 +231,11 @@ topic.
  
  \section{Ehrlich-Aberth method}
  \label{sec1}
-A cubically convergent iteration method for finding zeros of
-polynomials was proposed by O. Aberth~\cite{Aberth73}. The Ehrlich-Aberth method contain 4 main steps, presented in the following. 
+A cubically convergent iteration method to find zeros of
+polynomials was proposed by O. Aberth~\cite{Aberth73}. The
+Ehrlich-Aberth method contains 4 main steps, presented in what
+follows.
+
  %The Aberth method is a purely algebraic derivation. 
  %To illustrate the derivation, we let $w_{i}(z)$ be the product of linear factors 
  
@@ -345,14 +348,12 @@ propose to use the logarithm and the exponential of a complex in order to comput
  Using the logarithm (eq.~\ref{deflncomplex}) and the exponential (eq.~\ref{defexpcomplex}) operators, we can replace any multiplications and divisions with additions and subtractions. Consequently, computations
  manipulate lower absolute values and the roots for large polynomial's degrees can be looked for successfully~\cite{Karimall98}.
  
-Applying this solution for the Ehrlich-Aberth method we obtain the
-iteration function with exponential and logarithm:
+Applying this solution for the iteration function Eq.~\ref{Eq:Hi} of Ehrlich-Aberth method, we obtain the iteration function with exponential and logarithm:
  %%$$ \exp \bigl(  \ln(p(z)_{k})-ln(\ln(p(z)_{k}^{'}))- \ln(1- \exp(\ln(p(z)_{k})-ln(\ln(p(z)_{k}^{'})+\ln\sum_{i\neq j}^{n}\frac{1}{z_{k}-z_{j}})$$
  \begin{equation}
  \label{Log_H2}
  EA.EL: z^{k+1}_{i}=z_{i}^{k}-\exp \left(\ln \left(
-p(z_{i}^{k})\right)-\ln\left(p'(z^{k}_{i})\right)- \ln
-\left(1-Q(z^{k}_{i})\right)\right),
+p(z_{i}^{k})\right)-\ln\left(p'(z^{k}_{i})\right)- \ln\left(1-Q(z^{k}_{i})\right)\right),
  \end{equation}
  
  where:
@@ -366,7 +367,7 @@ Q(z^{k}_{i})=\exp\left( \ln (p(z^{k}_{i}))-\ln(p'(z^{k}_{i}))+\ln \left(
  This solution is applied when the root except the circle unit, represented by the radius $R$ evaluated in C language as :
  \begin{equation}
  \label{R.EL}
-R = exp(log(DBL_MAX)/(2*n) );
+R = exp(log(DBL\_MAX)/(2*n) );
  \end{equation}
  
  
@@ -409,7 +410,7 @@ other processors at the end of each iteration (synchronous). Therefore they
  cause a high degree of memory conflict. Recently the author
  in~\cite{Mirankar71} proposed two versions of parallel algorithm
  for the Durand-Kerner method, and Ehrlich-Aberth method on a model of
-Optoelectronic Transpose Interconnection System (OTIS).The
+Optoelectronic Transpose Interconnection System (OTIS). The
  algorithms are mapped on an OTIS-2D torus using $N$ processors. This
  solution needs $N$ processors to compute $N$ roots, which is not
  practical for solving polynomials with large degrees.
@@ -542,7 +543,7 @@ polynomials of 48,000.
  \subsection{Parallel implementation with CUDA }
  
  In order to implement the Ehrlich-Aberth method in CUDA, it is
-possible to use the Jacobi scheme or the Gauss Seidel one.  With the
+possible to use the Jacobi scheme or the Gauss-Seidel one.  With the
  Jacobi iteration, at iteration $k+1$ we need all the previous values
  $z^{k}_{i}$ to compute the new values $z^{k+1}_{i}$, that is :
  
@@ -560,7 +561,7 @@ With the Gauss-Seidel iteration, we have:
  \begin{equation}
  \label{eq:Aberth-H-GS}
  EAGS: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
-{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}})}, i=1,. . . .,n
+{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum_{j=i+1}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}})}, i=1,. . . .,n
  \end{equation}
  
  Using Eq.~\ref{eq:Aberth-H-GS} to update the vector solution
@@ -582,7 +583,7 @@ quickly because, just as any Jacobi algorithm (for solving linear systems of equ
  
  %In CUDA programming, all the instructions of the  \verb=for= loop are executed by the GPU as a kernel. A kernel is a function written in CUDA and defined by the  \verb=__global__= qualifier added before a usual \verb=C= function, which instructs the compiler to generate appropriate code to pass it to the CUDA runtime in order to be executed on the GPU. 
  
-Algorithm~\ref{alg2-cuda} shows sketch of the Ehrlich-Aberth algorithm using CUDA.
+Algorithm~\ref{alg2-cuda} shows sketch of the Ehrlich-Aberth method using CUDA.
  
  \begin{enumerate}
  \begin{algorithm}[H]
@@ -590,7 +591,7 @@ Algorithm~\ref{alg2-cuda} shows sketch of the Ehrlich-Aberth algorithm using CUD
  %\LinesNumbered
  \caption{CUDA Algorithm to find roots with the Ehrlich-Aberth method}
  
-\KwIn{$Z^{0}$ (Initial root's vector), $\varepsilon$ (error tolerance
+\KwIn{$Z^{0}$ (Initial root's vector), $\varepsilon$ (Error tolerance
    threshold), P (Polynomial to solve), Pu (Derivative of P), $n$ (Polynomial's degrees), $\Delta z_{max}$ (Maximum value of stop condition)}
  
  \KwOut {$Z$ (Solution root's vector), $ZPrec$ (Previous solution root's vector)}