X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper1.git/blobdiff_plain/065c8445483c2244976fa7af7a1dfc81f6e0096e..e8348fd796778f1fa87ca325cdb696aabc999afb:/paper.tex diff --git a/paper.tex b/paper.tex index 73a2b47..a6e1cf8 100644 --- a/paper.tex +++ b/paper.tex @@ -1,10 +1,9 @@ \documentclass[review]{elsarticle} -\usepackage{lineno,hyperref} -\usepackage[utf8]{inputenc} +\usepackage{lineno,hyperref} +%%\usepackage[utf8]{inputenc} %%\usepackage[T1]{fontenc} %%\usepackage[french]{babel} - \usepackage{amsmath,amsfonts,amssymb} \usepackage[ruled,vlined]{algorithm2e} \usepackage{array,multirow,makecell} @@ -53,12 +52,12 @@ \begin{frontmatter} -\title{A parallel root finding polynomial on GPU} +\title{Parallel polynomial root finding using GPU} %% Group authors per affiliation: -%\author{Elsevier\fnref{myfootnote}} -%\address{Radarweg 29, Amsterdam} -%\fntext[myfootnote]{Since 1880.} +\author{Elsevier\fnref{myfootnote}} +\address{Radarweg 29, Amsterdam} +\fntext[myfootnote]{Since 1880.} %% or include affiliations in footnotes: \author[mymainaddress]{Ghidouche Kahina\corref{mycorrespondingauthor}} @@ -66,7 +65,7 @@ \cortext[mycorrespondingauthor]{Corresponding author} \ead{kahina.ghidouche@gmail.com} -\author[mysecondaryaddress]{Couturier Raphaël\corref{mycorrespondingauthor}} +\author[mysecondaryaddress]{Couturier Raphael\corref{mycorrespondingauthor}} %%\cortext[mycorrespondingauthor]{Corresponding author} \ead{raphael.couturier@univ-fcomte.fr} @@ -74,10 +73,8 @@ %%\cortext[mycorrespondingauthor]{Corresponding author} \ead{ar.sider@univ-bejaia.dz} -\address[mymainaddress]{Department of informatics, University of - Béjaia, Algeria} -\address[mysecondaryaddress]{FEMTO-ST Institute, University of - Bourgogne Franche-Comte } +\address[mymainaddress]{Department of informatics,University of Bejaia,Algeria} +\address[mysecondaryaddress]{FEMTO-ST Institute, University of Franche-Compté } \begin{abstract} in this article we present a parallel implementation @@ -94,130 +91,129 @@ root finding of polynomials, high degree, iterative methods, Durant-Kerner, GPU, \linenumbers -\section{Root finding problem} -We consider a polynomial of degree \textit{n} having coefficients -in the complex \textit{C} and zeros $\alpha_{i},\textit{i=1,...,n}$. +\section{The problem of finding roots of a polynomial} +Polynomials are algebraic structures used in mathematics that capture physical phenomenons and that express the outcome in the form of a function of some unknown variable. Formally speaking, a polynomial $p(x)$ of degree \textit{n} having $n$ coefficients in the complex plane \textit{C} and zeros $\alpha_{i},\textit{i=1,...,n}$ %%\begin{center} \begin{equation} - {\Large p(x)=\sum{a_{i}x^{i}}=a_{n}\prod(x-\alpha_{i}),a_{0} a_{n}\neq 0} + {\Large p(x)=\sum{a_{i}x^{i}}=a_{n}\prod(x-\alpha_{i}),a_{0} a_{n}\neq 0}. \end{equation} %%\end{center} - The root finding problem consist to find -all n root of \textit{p(x)}. the problem of finding a root is -equivalent to the problem of finding a fixed-point. To see this -consider the fixed-point problem of finding the n-dimensional -vector x such that +The root finding problem consists in finding the values of all the $n$ values of the variable $x$ for which \textit{p(x)} is nullified. Such values are called zeroes of $p$. The problem of finding a root is equivalent to that of solving a fixed-point problem. To see this, consider the fixed-point problem of finding the $n$-dimensional +vector $x$ such that \begin{center} -$x=g(x). $ +$x=g(x)$ \end{center} -Where $g : C^{n}\longrightarrow C^{n}$. Note that we can easily +where $g : C^{n}\longrightarrow C^{n}$. Usually, we can easily rewrite this fixed-point problem as a root-finding problem by -setting $f (x) = x-g(x)$ and likewise we can recast the +setting $f(x) = x-g(x)$ and likewise we can recast the root-finding problem into a fixed-point problem by setting \begin{center} -$g(x)= f(x)-x$ +$g(x)= f(x)-x$. \end{center} -Often it will not be possible to solve such nonlinear equation + +Often it is not be possible to solve such nonlinear equation root-finding problems analytically. When this occurs we turn to -numerical methods to approximate the solution. Generally speaking, -algorithms for solving problems numerically can be divided into +numerical methods to approximate the solution. +Generally speaking, algorithms for solving problems can be divided into two main groups: direct methods and iterative methods. \\ - Direct methods exist only for $n \leq 4$,solved in closed form by G. Cardano +Direct methods exist only for $n \leq 4$, solved in closed form by G. Cardano in the mid-16th century. However, N.H. Abel in the early 19th century showed that polynomials of degree five or more could not -be solved by directs methods. Since then researchers have -concentrated on numerical (iterative) methods such as the famous +be solved by directs methods. Since then, mathmathicians have +focussed on numerical (iterative) methods such as the famous Newton's method, Bernoulli's method of the 18th, and Graeffe's. -With the advent of electronic computers, different methods has -been developed such as the Jenkins-Traub method, Larkin s method, + +Later on, with the advent of electronic computers, other methods has +been developed such as the Jenkins-Traub method, Larkin's method, Muller's method, and several methods for simultaneous -approximation of all the roots, starting with the Durand-Kerner -method: +approximation of all the roots, starting with the Durand-Kerner (DK) +method : %%\begin{center} \begin{equation} Z_{i}=Z_{i}-\frac{P(Z_{i})}{\prod_{i\neq j}(z_{i}-z_{j})} \end{equation} %%\end{center} -This formula is mentioned for the first time from +This formula is mentioned for the first time by Weiestrass~\cite{Weierstrass03} as part of the fundamental theorem -of Algebra and is rediscovered from Ilieff~\cite{Ilie50}, +of Algebra and is rediscovered by Ilieff~\cite{Ilie50}, Docev~\cite{Docev62}, Durand~\cite{Durand60}, -Kerner~\cite{Kerner66}. Another method discovered from +Kerner~\cite{Kerner66}. Another method discovered by Borsch-Supan~\cite{ Borch-Supan63} and also described and brought -in the following form from Ehrlich~\cite{Ehrlich67} and -Aberth~\cite{Aberth73}. +in the following form by Ehrlich~\cite{Ehrlich67} and +Aberth~\cite{Aberth73} uses a different iteration formula given as fellows : %%\begin{center} \begin{equation} - Z_{i}=Z_{i}-\frac{1}{{\frac {P'(Z_{i})} {P(Z_{i})}}-{\sum_{i\neq j}(z_{i}-z_{j})}} + Z_{i}=Z_{i}-\frac{1}{{\frac {P'(Z_{i})} {P(Z_{i})}}-{\sum_{i\neq j}(z_{i}-z_{j})}}. \end{equation} %%\end{center} Aberth, Ehrlich and Farmer-Loizou~\cite{Loizon83} have proved that -the above method has cubic order of convergence for simple roots. +the Ehrlisch-Aberth method (EA) has a cubic order of convergence for simple roots whereas the Durand-Kerner has a quadratic order of convergence. Iterative methods raise several problem when implemented e.g. specific sizes of numbers must be used to deal with this -difficulty.Moreover,the convergence time of iterative methods -drastically increase like the degrees of high polynomials. The +difficulty. Moreover, the convergence time of iterative methods +drastically increases like the degrees of high polynomials. It is expected that the parallelization of these algorithms will improve the convergence time. -Many authors have treated the problem of parallelization of -simultaneous methods. Freeman~\cite{Freeman89} has tested the DK -method, EA method and another method of the fourth order proposed -from Farmer and Loizou~\cite{Loizon83},on a 8- processor linear -chain, for polynomial of degree up to 8. The third method often +Many authors have dealt with the parallelisation of +simultaneous methods, i.e. that find all the zeros simultaneously. +Freeman~\cite{Freeman89} implemeted and compared DK, EA and another method of the fourth order proposed +by Farmer and Loizou~\cite{Loizon83}, on a 8- processor linear +chain, for polynomials of degree up to 8. The third method often diverges, but the first two methods have speed-up 5.5 -(speed-up=(Time on one processor)/(Time on p processors)). Later -Freeman and Bane~\cite{Freemanall90} consider asynchronous +(speed-up=(Time on one processor)/(Time on p processors)). Later, +Freeman and Bane~\cite{Freemanall90} considered asynchronous algorithms, in which each processor continues to update its -approximations even although the latest values of other $z_i((k))$ -have not received from the other processors, in difference with -the synchronous version where it would wait. -in~\cite{Raphaelall01}proposed two methods of parallelization for -architecture with shared memory and distributed memory,it able to -compute the root of polynomial degree 10000 on 430 s with only 8 -pc and 2 communications per iteration. Compare to the sequential -it take 3300 s to obtain the same results. - -After this few works discuses this problem until the apparition of -the Compute Unified Device Architecture (CUDA)~\cite{CUDA10},a +approximations even though the latest values of other $z_i((k))$ +have not been received from the other processors, in contrast with synchronous algorithms where it would wait those values before making a new iteration. +Couturier et al. ~\cite{Raphaelall01} proposed two methods of parallelisation for +a shared memory architecture and for distributed memory one. They were able to +compute the roots of polynomials of degree 10000 in 430 seconds with only 8 +personal computers and 2 communications per iteration. Comparing to the sequential implementation +where it takes up to 3300 seconds to obtain the same results, the authors show an interesting speedup, indeed. + +Very few works had been since this last work until the appearing of +the Compute Unified Device Architecture (CUDA)~\cite{CUDA10}, a parallel computing platform and a programming model invented by -NVIDIA. The computing ability of GPU has exceeded the counterpart -of CPU. It is a waste of resource to be just a graphics card for -GPU. CUDA adopts a totally new computing architecture to use the +NVIDIA. The computing power of GPUs (Graphics Processing Unit) has exceeded that of +of CPUs. However, CUDA adopts a totally new computing architecture to use the hardware resources provided by GPU in order to offer a stronger computing ability to the massive data computing. -Indeed,~\cite{Kahinall14}proposed the implementation of the -Durand-Kerner method on GPU (Graphics Processing Unit). The main -result prove that a parallel implementation is 10 times as fast as +Ghidouche et al. ~\cite{Kahinall14} proposed an implementation of the +Durand-Kerner method on GPU. Their main +result showed that a parallel CUDA implementation is 10 times as fast as the sequential implementation on a single CPU for high degree -polynomials that is greater than about 48000. -\paragraph{} -The mean part of our work is to implement the Aberth method for the problem root finding for -high degree polynomials on GPU architecture (Graphics Processing Unit). Initially we present the Aberth method in section 1. Amelioration of Aberth method was proposed in section 2. A related works for the implementation of simultaneous methods in a parallel computer was discuss in section 3. Section 4 we propose a parallel implementation of Aberth method on GPU. Section 5, we present our result and discuss it. Finally, in Section 6, we present our conclusions and future research directions. +polynomials of about 48000. To our knowledge, it is the first time such high degree polynomials are numerically solved. + + +In this paper, we focus on the implementation of the Aberth method for +high degree polynomials on GPU. The paper is organised as fellows. Initially, we recall the Aberth method in Section.\ref{sec1}. Improvements for the Aberth method are proposed in Section.\ref{sec2}. Related work to the implementation of simultaneous methods using a parallel approach is presented in Section.\ref{secStateofArt}. +In Section.4 we propose a parallel implementation of the Aberth method on GPU and discuss it. Section 5 presents and investigates our implementation and experimental study results. Finally, Section 6 concludes this paper and gives some hints for future research directions in this topic. -\section{Aberth method} +\section{The Sequential Aberth method} +\label{sec1} A cubically convergent iteration method for finding zeros of polynomials was proposed by O.Aberth~\cite{Aberth73}. The Aberth -method is a purely algebraic derivation.To illustrate the -derivation, we let $w_{i}(z)$ be the product of linear factor +method is a purely algebraic derivation. To illustrate the +derivation, we let $w_{i}(z)$ be the product of linear factors \begin{equation} w_{i}(z)=\prod_{j=1,j \neq i}^{n} (z-x_{j}) \end{equation} -And rational function $R_{i}(z)$ be the correction term of -Weistrass method~\cite{Weierstrass03}: +And let a rational function $R_{i}(z)$ be the correction term of the +Weistrass method~\cite{Weierstrass03} \begin{equation} -R_{i}(z)=\frac{p(z)}{w_{i}(z)} , i=1,2,...,n +R_{i}(z)=\frac{p(z)}{w_{i}(z)} , i=1,2,...,n. \end{equation} Differentiating the rational function $R_{i}(z)$ and applying the @@ -227,29 +223,28 @@ Newton method, we have: \frac{R_{i}(z)}{R_{i}^{'}(z)}= \frac{p(z)}{p^{'}(z)-p(z)\frac{w_{i}(z)}{w_{i}^{'}(z)}}= \frac{p(z)}{p^{'}(z)-p(z) \sum _{j=1,j \neq i}^{n}\frac{1}{z-x_{i}}}, i=1,2,...,n \end{equation} -Substituting $x_{j}$ for z we obtain the Aberth iteration method +Substituting $x_{j}$ for z we obtain the Aberth iteration method. -Let present the means stages of Aberth method. +In the fellowing we present the main stages of the running of the Aberth method. \subsection{Polynomials Initialization} - The initialization of polynomial P(z) with complex coefficients - are given by: +The initialization of a polynomial p(z) is done by setting each of the $n$ complex coefficients $a_{i}$ +: \begin{equation} p(z)=\sum{a_{i}z^{n-i}} , a_{n} \neq 0,a_{0}=1, a_{i}\subset C \end{equation} -\subsection{Vector $Z^{(0)}$ Initialization} +\subsection{Vector $z^{(0)}$ Initialization} -The choice of the initial points $z^{(0)}_{i}, i = 1, . . . , n.$ -from which starting the iteration (2) or (3), is rather delicate -since the number of steps needed by the iterative method to reach +Like for any iterative method, we need to choose $n$ initial guess points $z^{(0)}_{i}, i = 1, . . . , n.$ +The initial guess is very important since the number of steps needed by the iterative method to reach a given approximation strongly depends on it. -In~\cite{Aberth73}the Aberth iteration is started by selecting n -equispaced points on a circle of center 0 and radius r, where r is -an upper bound to the moduli of the zeros. After,~\cite{Bini96} -performs this choice by selecting complex numbers along different +In~\cite{Aberth73} the Aberth iteration is started by selecting $n$ +equi-spaced points on a circle of center 0 and radius r, where r is +an upper bound to the moduli of the zeros. Later, Bini et al.~\cite{Bini96} +performed this choice by selecting complex numbers along different circles and relies on the result of~\cite{Ostrowski41}. \begin{equation} @@ -265,19 +260,19 @@ v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2}. \end{equation} \subsection{Iterative Function $H_{i}$} -The operator used with Aberth method is corresponding to the +The operator used by the Aberth method is corresponding to the following equation which will enable the convergence towards polynomial solutions, provided all the roots are distinct. \begin{equation} -H_{i}(z)=z_{i}-\frac{1}{\frac{P^{'}(z_{i})}{P(z_{i})}-\sum_{j\neq +H_{i}(z)=z_{i}-\frac{1}{\frac{p^{'}(z_{i})}{p(z_{i})}-\sum_{j\neq i}{\frac{1}{z_{i}-z_{j}}}} \end{equation} -\subsection{Convergence condition} -Determines the success of the termination. It consists in stopping -the iterative function $H_{i}(z)$ when the root are stable, the method -converge sufficiently: +\subsection{Convergence Condition} +The convergence condition determines the termination of the algorithm. It consists in stopping from running +the iterative function $H_{i}(z)$ when the roots are sufficiently stable. We consider that the method +converges sufficiently when: \begin{equation} \forall i \in @@ -285,33 +280,34 @@ converge sufficiently: \end{equation} -\section{Amelioration of Aberth method } -The Aberth method implementation suffer of overflow problems. This +\section{Improving the Ehrlisch-Aberth Method} +\label{sec2} +The Ehrlisch-Aberth method implementation suffers of overflow problems. This situation occurs, for instance, in the case where a polynomial -having positive coefficients and large degree is computed at a -point $\xi$ where $|\xi| > 1$. Indeed the limited number in the -mantissa of floating takings the computation of P(z) wrong when z -is large. for example $(10^{50}) +1+ (- 10^{50})$ will give result -0 instead of 1 in reality. Consequently we can not compute the roots -for large polynomial's degree. This problem was discuss in +having positive coefficients and a large degree is computed at a +point $\xi$ where $|\xi| > 1$, where $|x|$ stands for the modolus of a complex $x$. Indeed, the limited number in the +mantissa of floating points representations makes the computation of p(z) wrong when z +is large. For example $(10^{50}) +1+ (- 10^{50})$ will give the wrong result +of $0$ instead of $1$. Consequently, we can not compute the roots +for large degrees. This problem was early discussed in ~\cite{Karimall98} for the Durand-Kerner method, the authors -propose to use the logarithm and the exponential of a complex: +propose to use the logarithm and the exponential of a complex in order to compute the power at a high exponent. \begin{equation} +\label{deflncomplex} \forall(x,y)\in R^{*2}; \ln (x+i.y)=\ln(x^{2}+y^{2}) 2+i.\arcsin(y\sqrt{x^{2}+y^{2}})_{\left] -\pi, \pi\right[ } \end{equation} %%\begin{equation} \begin{align} +\label{defexpcomplex} \forall(x,y)\in R^{*2}; \exp(x+i.y) & = \exp(x).\exp(i.y)\\ - & =\exp(x).\cos(y)+i.\exp(x).\sin(y) + & =\exp(x).\cos(y)+i.\exp(x).\sin(y)\label{defexpcomplex} \end{align} %%\end{equation} -The application of logarithm can replace any multiplications and -divisions with additions and subtractions. Consequently, it -manipulates lower absolute values and can be compute the roots for -large polynomial's degree exceed~\cite{Karimall98}. +Using the logarithm (eq. \ref{deflncomplex}) and the exponential (eq. \ref{defexpcomplex}) operators, we can replace any multiplications and divisions with additions and subtractions. Consequently, computations +manipulate lower absolute values and the roots for large polynomial's degrees can be looked for successfully~\cite{Karimall98}. Applying this solution for the Aberth method we obtain the iteration function with logarithm: @@ -319,70 +315,65 @@ iteration function with logarithm: \begin{equation} H_{i}(z)=z_{i}^{k}-\exp \left(\ln \left( p(z_{k})\right)-\ln\left(p(z_{k}^{'})\right)- \ln -\left(1-Q(z_{k})\right)\right) +\left(1-Q(z_{k})\right)\right), \end{equation} -Where: + +where: \begin{equation} Q(z_{k})=\exp\left( \ln (p(z_{k}))-\ln(p(z_{k}^{'}))+\ln \left( -\sum_{k\neq j}^{n}\frac{1}{z_{k}-z_{j}}\right)\right) +\sum_{k\neq j}^{n}\frac{1}{z_{k}-z_{j}}\right)\right). \end{equation} -This solution is applying when it is necessary +This solution is applied when it is necessary ??? When ??? (SIDER) \section{The implementation of simultaneous methods in a parallel computer} - The main problem of the simultaneous methods is that the necessary -time needed for the convergence is increased with the increasing -of the degree of the polynomial. The parallelization of these -algorithms will improve the convergence time. Researchers usually -adopt one of the two following approaches to parallelize root -finding algorithms. One approach is to reduce the total number of -iterations as implemented by Miranker +\label{secStateofArt} +The main problem of simultaneous methods is that the necessary +time needed for convergence is increased when we increase +the degree of the polynomial. The parallelisation of these +algorithms is expected to improve the convergence time. +Authors usually adopt one of the two following approaches to parallelize root +finding algorithms. The first approach aims at reducing the total number of +iterations as by Miranker ~\cite{Mirankar68,Mirankar71}, Schedler~\cite{Schedler72} and -Winogard~\cite{Winogard72}. Another approach is to reduce the +Winogard~\cite{Winogard72}. The second approach aims at reducing the computation time per iteration, as reported -in~\cite{Benall68,Jana06,Janall99,Riceall06}. There are many -schemes for simultaneous approximations of all roots of a given +in~\cite{Benall68,Jana06,Janall99,Riceall06}. + +There are many schemes for the simultaneous approximation of all roots of a given polynomial. Several works on different methods and issues of root -finding have been reported in~\cite{Azad07, Gemignani07, Kalantari08, Skachek08, Zhancall08, Zhuall08}. However, Durand-Kerner and Ehrlich methods are the most practical choices among +finding have been reported in~\cite{Azad07, Gemignani07, Kalantari08, Skachek08, Zhancall08, Zhuall08}. However, Durand-Kerner and Ehrlisch-Aberth methods are the most practical choices among them~\cite{Bini04}. These two methods have been extensively -studied for parallelization due to their following advantages. The -computation involved in these methods has some inherent +studied for parallelization due to their intrinsics, i.e. the +computations involved in both methods has some inherent parallelism that can be suitably exploited by SIMD machines. Moreover, they have fast rate of convergence (quadratic for the -Durand-Kerner method and cubic for the Ehrlich). Various parallel +Durand-Kerner and cubic for the Ehrlisch-Aberth). Various parallel algorithms reported for these methods can be found in~\cite{Cosnard90, Freeman89,Freemanall90,,Jana99,Janall99}. Freeman and Bane~\cite{Freemanall90} presented two parallel algorithms on a local memory MIMD computer with the compute-to communication time ratio O(n). However, their algorithms require each processor to communicate its current approximation to all -other processors at the end of each iteration. Therefore they +other processors at the end of each iteration (synchronous). Therefore they cause a high degree of memory conflict. Recently the author in~\cite{Mirankar71} proposed two versions of parallel algorithm -for the Durand-Kerner method, and Aberth method on model of +for the Durand-Kerner method, and Ehrlisch-Aberth method on a model of Optoelectronic Transpose Interconnection System (OTIS).The algorithms are mapped on an OTIS-2D torus using N processors. This -solution need N processors to compute N roots, that it is not -practical (is not suitable to compute large polynomial's degrees). -Until then, the related works are not able to compute the root of -the large polynomial's degrees (higher then 1000) and with small -time. - - Finding polynomial roots rapidly and accurately it is our -objective, with the apparition of the CUDA(Compute Unified Device -Architecture), finding the roots of polynomials becomes rewarding -and very interesting, CUDA adopts a totally new computing -architecture to use the hardware resources provided by GPU in -order to offer a stronger computing ability to the massive data -computing. In~\cite{Kahinall14} we proposed the first implantation -of the root finding polynomials method on GPU (Graphics Processing -Unit),which is the Durand-Kerner method. The main result prove -that a parallel implementation is 10 times as fast as the +solution needs N processors to compute N roots, which is not +practical for solving polynomials with large degrees. +Until very recently, the literature doen not mention implementations able to compute the roots of +large degree polynomials (higher then 1000) and within small or at least tractable times. Finding polynomial roots rapidly and accurately is the main objective of our work. +With the advent of CUDA (Compute Unified Device +Architecture), finding the roots of polynomials receives a new attention because of the new possibilities to solve higher degree polynomials in less time. +In~\cite{Kahinall14} we already proposed the first implementation +of a root finding method on GPUs, that of the Durand-Kerner method. The main result showed +that a parallel CUDA implementation is 10 times as fast as the sequential implementation on a single CPU for high degree -polynomials that is greater than about 48000. Indeed, in this -paper we present a parallel implementation of Aberth method on -GPU, more details are discussed in the following of this paper. +polynomials of 48000. In this paper we present a parallel implementation of Ehlisch-Aberth method on +GPUs, which details are discussed in the sequel. \section {A parallel implementation of Aberth method}