X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper1.git/blobdiff_plain/84bdc1ad86104770f6df03822fbbd4284fa7a7b1..572aa90d1d2b8a3c1220ab9ad1e1f4c4477583f2:/paper.tex diff --git a/paper.tex b/paper.tex index b3e3076..d9c3324 100644 --- a/paper.tex +++ b/paper.tex @@ -185,22 +185,21 @@ time. Many authors have dealt with the parallelization of simultaneous methods, i.e. that find all the zeros simultaneously. -Freeman~\cite{Freeman89} implemeted and compared DK, EA and another method of the fourth order proposed -by Farmer and Loizou~\cite{Loizon83}, on a 8- processor linear +Freeman~\cite{Freeman89} implemented and compared DK, EA and another method of the fourth order proposed +by Farmer and Loizou~\cite{Loizon83}, on a 8-processor linear chain, for polynomials of degree up to 8. The third method often -diverges, but the first two methods have speed-up 5.5 -(speed-up=(Time on one processor)/(Time on p processors)). Later, +diverges, but the first two methods have speed-up equal to 5.5. Later, Freeman and Bane~\cite{Freemanall90} considered asynchronous algorithms, in which each processor continues to update its approximations even though the latest values of other $z_i((k))$ have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration. -Couturier and al~\cite{Raphaelall01} proposed two methods of parallelization for +Couturier and al.~\cite{Raphaelall01} proposed two methods of parallelization for a shared memory architecture and for distributed memory one. They were able to -compute the roots of polynomials of degree 10000 in 430 seconds with only 8 +compute the roots of sparse polynomials of degree 10000 in 430 seconds with only 8 personal computers and 2 communications per iteration. Comparing to the sequential implementation -where it takes up to 3300 seconds to obtain the same results, the authors show an interesting speedup, indeed. +where it takes up to 3300 seconds to obtain the same results, the authors show an interesting speedup. -Very few works had been since this last work until the appearing of +Very few works had been performed since this last work until the appearing of the Compute Unified Device Architecture (CUDA)~\cite{CUDA10}, a parallel computing platform and a programming model invented by NVIDIA. The computing power of GPUs (Graphics Processing Unit) has exceeded that of CPUs. However, CUDA adopts a totally new computing architecture to use the @@ -210,14 +209,25 @@ computing ability to the massive data computing. Ghidouche and al~\cite{Kahinall14} proposed an implementation of the Durand-Kerner method on GPU. Their main -result showed that a parallel CUDA implementation is 10 times as fast as -the sequential implementation on a single CPU for high degree -polynomials of about 48000. To our knowledge, it is the first time such high degree polynomials are numerically solved. - - -In this paper, we focus on the implementation of the Ehrlich-Aberth method for -high degree polynomials on GPU. The paper is organized as fellows. Initially, we recall the Ehrlich-Aberth method in Section \ref{sec1}. Improvements for the Ehrlich-Aberth method are proposed in Section \ref{sec2}. Related work to the implementation of simultaneous methods using a parallel approach is presented in Section \ref{secStateofArt}. -In Section \ref{sec5} we propose a parallel implementation of the Ehrlich-Aberth method on GPU and discuss it. Section \ref{sec6} presents and investigates our implementation and experimental study results. Finally, Section\ref{sec7} 6 concludes this paper and gives some hints for future research directions in this topic. +result showed that a parallel CUDA implementation is about 10 times faster than +the sequential implementation on a single CPU for sparse +polynomials of degree 48000. + + +In this paper, we focus on the implementation of the Ehrlich-Aberth +method for high degree polynomials on GPU. We propose an adaptation of +the exponential logarithm in order to be able to solve sparse and full +polynomial of degree up to $1,000,000$. The paper is organized as +follows. Initially, we recall the Ehrlich-Aberth method in Section +\ref{sec1}. Improvements for the Ehrlich-Aberth method are proposed in +Section \ref{sec2}. Related work to the implementation of simultaneous +methods using a parallel approach is presented in Section +\ref{secStateofArt}. In Section \ref{sec5} we propose a parallel +implementation of the Ehrlich-Aberth method on GPU and discuss +it. Section \ref{sec6} presents and investigates our implementation +and experimental study results. Finally, Section\ref{sec7} 6 concludes +this paper and gives some hints for future research directions in this +topic. \section{The Sequential Aberth method} \label{sec1}