petit commit

[kahina_paper2.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index d3016464c09b779da0c0b270061d1e2408719a3c..52ab448fa8338b6bcf14132da7668556fc6c7868 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -463,7 +463,7 @@ Most of the numerical methods that deal with the polynomial root-finding problem
  %the Ehrlich-Aberth method (EA) has a cubic order of convergence for simple roots whereas the Durand-Kerner has a quadratic order of %convergence.
  
  The main problem of the simultaneous methods is that the necessary time needed for the convergence increases with the increasing of the polynomial's degree. Many authors have treated the problem of implementing  simultaneous methods in parallel. Freeman~\cite{Freeman89} implemented and compared Durand-Kerner method, Ehrlich-Aberth method and another method of the fourth order of convergence proposed by Farmer and Loizou~\cite{Loizou83} on a 8-processor linear chain, for polynomials of degree up-to 8. The method of Farmer and Loizou~\cite{Loizou83} often diverges, but the first two methods (Durand-Kerner and Ehrlich-Aberth methods) have a speed-up equals to 5.5. Later, Freeman and Bane~\cite{Freemanall90} considered asynchronous algorithms in which each processor continues to update its approximations even though the latest values of other approximations $z^{k}_{i}$ have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration. Couturier et al.~\cite{Raphaelall01} proposed two methods of parallelization for a shared memory architecture with OpenMP and for a distributed memory one with MPI. They are able to compute the roots of sparse polynomials of degree 10,000 in 116 seconds with OpenMP and 135 seconds with MPI only by using 8 personal computers and 2 communications per iteration. The authors showed an interesting speedup comparing to the sequential implementation which takes up-to 3,300 seconds to obtain same results.
  %the Ehrlich-Aberth method (EA) has a cubic order of convergence for simple roots whereas the Durand-Kerner has a quadratic order of %convergence.
  
  The main problem of the simultaneous methods is that the necessary time needed for the convergence increases with the increasing of the polynomial's degree. Many authors have treated the problem of implementing  simultaneous methods in parallel. Freeman~\cite{Freeman89} implemented and compared Durand-Kerner method, Ehrlich-Aberth method and another method of the fourth order of convergence proposed by Farmer and Loizou~\cite{Loizou83} on a 8-processor linear chain, for polynomials of degree up-to 8. The method of Farmer and Loizou~\cite{Loizou83} often diverges, but the first two methods (Durand-Kerner and Ehrlich-Aberth methods) have a speed-up equals to 5.5. Later, Freeman and Bane~\cite{Freemanall90} considered asynchronous algorithms in which each processor continues to update its approximations even though the latest values of other approximations $z^{k}_{i}$ have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration. Couturier et al.~\cite{Raphaelall01} proposed two methods of parallelization for a shared memory architecture with OpenMP and for a distributed memory one with MPI. They are able to compute the roots of sparse polynomials of degree 10,000 in 116 seconds with OpenMP and 135 seconds with MPI only by using 8 personal computers and 2 communications per iteration. The authors showed an interesting speedup comparing to the sequential implementation which takes up-to 3,300 seconds to obtain same results.
-\LZK{``only by using 8 personal computers and 2 communications per iteration''. Pour MPI? et Pour OpenMP}
+\LZK{``only by using 8 personal computers and 2 communications per iteration''. Pour MPI? et Pour OpenMP?}
  
  Very few work had been performed since then until the appearing of the Compute Unified Device Architecture (CUDA)~\cite{CUDA15}, a parallel computing platform and a programming model invented by NVIDIA. The computing power of GPUs (Graphics Processing Units) has exceeded that of traditional processors CPUs. However, CUDA adopts a totally new computing architecture to use the hardware resources provided by the GPU in order to offer a stronger computing ability to the massive data computing. Ghidouche et al.~\cite{Kahinall14} proposed an implementation of the Durand-Kerner method on a single GPU. Their main results showed that a parallel CUDA implementation is about 10 times faster than the sequential implementation on a single CPU for sparse polynomials of degree 48,000.
  
  
  Very few work had been performed since then until the appearing of the Compute Unified Device Architecture (CUDA)~\cite{CUDA15}, a parallel computing platform and a programming model invented by NVIDIA. The computing power of GPUs (Graphics Processing Units) has exceeded that of traditional processors CPUs. However, CUDA adopts a totally new computing architecture to use the hardware resources provided by the GPU in order to offer a stronger computing ability to the massive data computing. Ghidouche et al.~\cite{Kahinall14} proposed an implementation of the Durand-Kerner method on a single GPU. Their main results showed that a parallel CUDA implementation is about 10 times faster than the sequential implementation on a single CPU for sparse polynomials of degree 48,000.
  
@@ -524,7 +524,7 @@ CUDA (Compute Unified Device Architecture) is a parallel computing architecture
  \section{The Ehrlich-Aberth algorithm on a GPU}
  \label{sec3}
  
  \section{The Ehrlich-Aberth algorithm on a GPU}
  \label{sec3}
  
-\subsection{The EA method}
+\subsection{The Ehrlich-Aberth method}
  %A cubically convergent iteration method to find zeros of
  %polynomials was proposed by O. Aberth~\cite{Aberth73}. The
  %Ehrlich-Aberth (EA is short) method contains 4 main steps, presented in what
  %A cubically convergent iteration method to find zeros of
  %polynomials was proposed by O. Aberth~\cite{Aberth73}. The
  %Ehrlich-Aberth (EA is short) method contains 4 main steps, presented in what
@@ -616,13 +616,18 @@ CUDA (Compute Unified Device Architecture) is a parallel computing architecture
  %\label{fig:03}
  %\end{figure}
  
  %\label{fig:03}
  %\end{figure}
  
-the Ehrlich-Aberth method is an iterative  method, contain 4 steps, start from the initial approximations of all the roots of the polynomial,the second step initialize the solution vector $Z$ using the Guggenheimer method to assure the distinction of the initial vector roots, than in step 3 we apply the the iterative function based on the Newton's method and Weiestrass operator~\cite{,}, wich will make it possible to converge to the roots solution, provided that all the root are different.
+%the Ehrlich-Aberth method is an iterative  method, contain 4 steps, start from the initial approximations of all the roots of the polynomial,the second step initialize the solution vector $Z$ using the Guggenheimer method to assure the distinction of the initial vector roots, than in step 3 we apply the the iterative function based on the Newton's method and Weiestrass operator~\cite{,}, wich will make it possible to converge to the roots solution, provided that all the root are different.
  
  
+The Ehrlich-Aberth method is a simultaneous method~\cite{} using the following iteration
  \begin{equation}
  \label{Eq:EA1}
  \begin{equation}
  \label{Eq:EA1}
-EA: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
-{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n
+z^{k+1}_i=z_i^k-\frac{\frac{p(z_i^k)}{p'(z_i^k)}}{1-\frac{p(z_i^k)}{p'(z_i^k)}\displaystyle\sum\limits_{\substack{j=1 \\ j\neq i}}^{j=n}{\frac{1}{(z_i^k-z_j^k)}}}, i=1,\ldots,n,
  \end{equation}
  \end{equation}
+to find the roots $Z$
+
+ contain 4 steps, start from the initial approximations of all the roots of the polynomial,the second step initialize the solution vector $Z$ using the Guggenheimer method to assure the distinction of the initial vector roots, than in step 3 we apply the the iterative function based on the Newton's method and Weiestrass operator~\cite{,}, wich will make it possible to converge to the roots solution, provided that all the root are different.
+
+
  
   At the end of each application of the iterative function, a stop condition is verified consists in stopping the iterative process when the whole of the modules of the roots are lower than a fixed value $\xi$ 
  
  
   At the end of each application of the iterative function, a stop condition is verified consists in stopping the iterative process when the whole of the modules of the roots are lower than a fixed value $\xi$