From: Kahina Guidouche Date: Sun, 6 Sep 2015 16:50:24 +0000 (+0200) Subject: this is the commit 06.09.15 X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper1.git/commitdiff_plain/944acbd1d3f1134e456324577c906ede08485b2f?ds=inline;hp=-c this is the commit 06.09.15 --- 944acbd1d3f1134e456324577c906ede08485b2f diff --git a/Root.tex b/Root.tex index 2ee2c28..5134838 100644 --- a/Root.tex +++ b/Root.tex @@ -11,8 +11,9 @@ we consider a polynomial of degree \textit{n} having coefficients in the complex \textit{C} and zeros $\alpha _{i},\textit{i=1,...,n}$. \\ \begin{center} - {\Large$p(x)=\sum{a_{i}x^{i}}=a_{n}\prod(x-\alpha_{i}), -a_{0}a_{n}\neq0,$} +\begin{equation} + {\Large p(x)=\sum{a_{i}x^{i}}=a_{n}\prod(x-\alpha_{i}),a_{0} a_{n}\neq 0} +\end{equation} \end{center} the root finding problem consist to find @@ -48,24 +49,26 @@ Muller s method, and several methods for simultaneous approximation of all the roots, starting with the Durand-Kerner method: \begin{center} - -$ Z_{i}=Z_{i}-\frac{P(Z_{i})}{\prod_{i\neq j}(z_{i}-z_{j})} $ +\begin{equation} Z_{i}=Z_{i}-\frac{P(Z_{i})}{\prod_{i\neq j}(z_{i}-z_{j})} +\end{equation} \end{center} -This formula is mentioned for the first time from Weiestrass [12] -as part of the fundamental theorem of Algebra and is rediscovered -from Ilieff~\cite{Ilief50} [2], Docev [3], Durand [4], Kerner [5]. -Another method discovered from Borsch-Supan [6] and also described -and brought in the following form from Ehrlich [7] and +This formula is mentioned for the first time from +Weiestrass~\cite{Weierstrass03} as part of the fundamental theorem +of Algebra and is rediscovered from Ilieff~\cite{Ilie50}, +Docev~\cite{Docev62}, Durand~\cite{Durand60}, +Kerner~\cite{Kerner66}. Another method discovered from +Borsch-Supan~\cite{ Borch-Supan63} and also described and brought +in the following form from Ehrlich~\cite{Ehrlich67} and Aberth~\cite{Aberth73}. \begin{center} - -$ Z_{i}=Z_{i}-\frac{1}{{\frac {P'(Z_{i})} -{P(Z_{i})}}-{\sum_{i\neq j}(z_{i}-z_{j})}} $ +\begin{equation} + Z_{i}=Z_{i}-\frac{1}{{\frac {P'(Z_{i})} {P(Z_{i})}}-{\sum_{i\neq j}(z_{i}-z_{j})}} +\end{equation} \end{center} -Aberth, Ehrlich and Farmer-Loizou [10] have proved that the above -method has cubic order of convergence for simple roots. +Aberth, Ehrlich and Farmer-Loizou~\cite{Loizon83} have proved that +the above method has cubic order of convergence for simple roots. Iterative methods raise several problem when implemented e.g. @@ -76,37 +79,38 @@ parallelization of these algorithms will improve the convergence time. Many authors have treated the problem of parallelization of -simultaneous methods. Freeman [13] has tested the DK method, EA -method and another method of the fourth order proposed from Farmer -and Loizou [10],on a 8- processor linear chain, for polynomial of -degree up to 8. The third method often diverges, but the first two -methods have speed-up 5.5 (speed-up=(Time on one processor)/(Time -on p processors)). Later Freeman and Bane [14] consider -asynchronous algorithms, in which each processor continues to -update its approximations even although the latest values of other -$z_i((k))$ have not received from the other processors, in -difference with the synchronous version where it would wait. in -[15]proposed two methods of parallelization for architecture with -shared memory and distributed memory,it able to compute the root -of polynomial degree 10000 on 430 s with only 8 pc and 2 -communications per iteration. Compare to the sequential it take -3300 s to obtain the same results. +simultaneous methods. Freeman~\cite{Freeman89} has tested the DK +method, EA method and another method of the fourth order proposed +from Farmer and Loizou~\cite{Loizon83},on a 8- processor linear +chain, for polynomial of degree up to 8. The third method often +diverges, but the first two methods have speed-up 5.5 +(speed-up=(Time on one processor)/(Time on p processors)). Later +Freeman and Bane~\cite{Freemanall90} consider asynchronous +algorithms, in which each processor continues to update its +approximations even although the latest values of other $z_i((k))$ +have not received from the other processors, in difference with +the synchronous version where it would wait. +in~\cite{Raphaelall01}proposed two methods of parallelization for +architecture with shared memory and distributed memory,it able to +compute the root of polynomial degree 10000 on 430 s with only 8 +pc and 2 communications per iteration. Compare to the sequential +it take 3300 s to obtain the same results. After this few works discuses this problem until the apparition of -the Compute Unified Device Architecture (CUDA) [19],a parallel -computing platform and a programming model invented by NVIDIA. the -computing ability of GPU has exceeded the counterpart of CPU. It -is a waste of resource to be just a graphics card for GPU. CUDA -adopts a totally new computing architecture to use the hardware -resources provided by GPU in order to offer a stronger computing -ability to the massive data computing. - - -Indeed [16]proposed the implementation of the Durand-Kerner method -on GPU (Graphics Processing Unit). The main result prove that a -parallel implementation is 10 times as fast as the sequential -implementation on a single CPU for high degree polynomials that is -greater than about 48000. +the Compute Unified Device Architecture (CUDA)~\cite{CUDA10},a +parallel computing platform and a programming model invented by +NVIDIA. the computing ability of GPU has exceeded the counterpart +of CPU. It is a waste of resource to be just a graphics card for +GPU. CUDA adopts a totally new computing architecture to use the +hardware resources provided by GPU in order to offer a stronger +computing ability to the massive data computing. + + +Indeed,~\cite{Kahinall14}proposed the implementation of the +Durand-Kerner method on GPU (Graphics Processing Unit). The main +result prove that a parallel implementation is 10 times as fast as +the sequential implementation on a single CPU for high degree +polynomials that is greater than about 48000. \paragraph{} The mean part of our work is to implement the Aberth method on GPU and compare it with the Durand Kerner @@ -115,18 +119,28 @@ implementation.................To be continued.................. \section{Aberth method and difficulties} A cubically convergent iteration method for finding zeros of -polynomials was proposed by O.Aberth[?].The Aberth method is a -purely algebraic derivation.To illustrate the derivation, we let -$w_{i}(z)$ be the product of linear factor $ w_{i}(z)=\prod_{j=1,j -\neq i}^{n} (z-x_{j})$ +polynomials was proposed by O.Aberth~\cite{Aberth73}.The Aberth +method is a purely algebraic derivation.To illustrate the +derivation, we let $w_{i}(z)$ be the product of linear factor $ +w_{i}(z)=\prod_{j=1,j \neq i}^{n} (z-x_{j})$ and rational function $R_{i}(z)$ be the correction term of -Weistrass method (?) -$$R_{i}(z)=\dfrac{p(z)}{w_{i}(Z)} , i=1,2,...,n. $$ +Weistrass method~\cite{Weierstrass03}: + +\begin{equation} +R_{i}(z)=\dfrac{p(z)}{w_{i}(Z)} , i=1,2,...,n. +\end{equation} Differentiating the rational function $R_{i}(z)$ and applying the Newton method, we have -$$\dfrac{R_{i}(z)}{R_{i}^{'}(z)}= \dfrac{p(z)}{p^{'}(z)-p(z)\dfrac{w_{i}(z)}{w_{i}^{'}(z)}}= \dfrac{p(z)}{p^{'}(z)-p(z) \sum _{j=1,j \neq i}^{n}\dfrac{1}{z-x_{i}}}, i=1,2,...,n $$ + +\begin{equation} +\dfrac{R_{i}(z)}{R_{i}^{'}(z)}= +\dfrac{p(z)}{p^{'}(z)-p(z)\dfrac{w_{i}(z)}{w_{i}^{'}(z)}}= +\dfrac{p(z)}{p^{'}(z)-p(z) \sum _{j=1,j \neq +i}^{n}\dfrac{1}{z-x_{i}}}, i=1,2,...,n +\end{equation} + Substituting $x_{j}$ for z we obtain the Aberth iteration method Let present the means stages of Aberth's method. @@ -134,7 +148,10 @@ Let present the means stages of Aberth's method. \subsection{Polynomials Initialization} The initialization of polynomial P(z) with complex coefficients are given by: - $$ p(z)=\sum{a_{i}z^{n-i}}. where a_{n} \neq 0,a_{0}=1, a_{i}\subset C $$ + +\begin{equation} + p(z)=\sum{a_{i}z^{n-i}}. where a_{n} \neq 0,a_{0}=1, a_{i}\subset C +\end{equation} \subsection{Vector $Z^{0)}$ Initialization} @@ -142,26 +159,39 @@ Let present the means stages of Aberth's method. The choice of the initial points $z^{(0)}_{i} , i = 1, . . . , n,$ from which starting the iteration (2) or (3), is rather delicate since the number of steps needed by the iterative method to reach -a given approximation strongly depends on it. In [1] the Aberth -iteration is started by selecting n equispaced points on a circle -of center 0 and radius r, where r is an upper bound to the moduli -of the zeros. After[18] performs this choice by selecting complex -numbers along different circles and relies on the result of [19]. - -$$\sigma_{0}=\frac{u+v}{2};u=\frac{\sum_{i=1}^{n}u_{i}}{n.max_{i=1}^{n}u_{i}}; v=\frac{\sum_{i=0}^{n-1}v_{i}}{n.min_{i=0}^{n-1}v_{i}};u_{i}=2.|a_{i}|^{\frac{1}{i}}; v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2} $$ +a given approximation strongly depends on it. +In~\cite{Aberth73}the Aberth iteration is started by selecting n +equispaced points on a circle of center 0 and radius r, where r is +an upper bound to the moduli of the zeros. After,~\cite{Bini96} +performs this choice by selecting complex numbers along different +circles and relies on the result of~\cite{Ostrowski41}. + +\begin{equation} +\sigma_{0}=\frac{u+v}{2};u=\frac{\sum_{i=1}^{n}u_{i}}{n.max_{i=1}^{n}u_{i}}; +v=\frac{\sum_{i=0}^{n-1}v_{i}}{n.min_{i=0}^{n-1}v_{i}};u_{i}=2.|a_{i}|^{\frac{1}{i}}; +v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2} +\end{equation} \subsection{Iterative Function Hi} The operator used with Aberth's method is corresponding to the following equation which will enable the convergence towards polynomial solutions, provided all the roots are distinct. -$$ H_{i}(z)=z_{i}-\frac{1}{\frac{P^{'}(z_{i})}{P(z_{i})}-\sum_{j\neq i}{\frac{1}{z_{i}-z_{j}}}} $$ +\begin{equation} +H_{i}(z)=z_{i}-\frac{1}{\frac{P^{'}(z_{i})}{P(z_{i})}-\sum_{j\neq +i}{\frac{1}{z_{i}-z_{j}}}} +\end{equation} \subsection{Convergence condition} determines the success of the termination. It consists in stopping the iterative function $H_{i}(z)$ when the are stable,the method converge sufficiently: -$$ \forall i \in [1,n]; \frac{z_{i}^{(k)}-z_{i}^{(k-1)}}{z_{i}^{(k)}}< \xi$$ + +\begin{equation} +\forall i \in +[1,n];\frac{z_{i}^{(k)}-z_{i}^{(k-1)}}{z_{i}^{(k)}}<\xi +\end{equation} + \section{Difficulties and amelioration} the Aberth method implementation suffer of overflow problems. This @@ -171,40 +201,169 @@ point $\xi$ where $|\xi| > 1$.Indeed the limited number in the mantissa of floating takings the computation of P(z) wrong when z is large. for example $(10^{50}) +1+ (- 10_{50})$ will give result 0 instead of 1 in reality.consequently we can't compute the roots -for large polynomial's degree. This problem was discuss in [17] -for the Durand-Kerner method, the authors propose to use the -logratihm and the exponential of a complex: - -$$ \forall(x,y)\in R^{*2}; \ln (x+i.y)=\ln(x^{2}+y^{2}) 2+i.\arcsin(y\sqrt{x^{2}+y^{2}})_{\left] -\pi, \pi\right[ } $$ -$$ \forall(x,y)\in R^{*2}; \exp(x+i.y)= \exp(x).\exp(i.y)$$ -$$ =\exp(x).\cos(y)+i.\exp(x).\sin(y)$$ - +for large polynomial's degree. This problem was discuss in +~\cite{Karimall98} for the Durand-Kerner method, the authors +propose to use the logratihm and the exponential of a complex: + +\begin{equation} + \forall(x,y)\in R^{*2}; \ln (x+i.y)=\ln(x^{2}+y^{2}) +2+i.\arcsin(y\sqrt{x^{2}+y^{2}})_{\left] -\pi, \pi\right[ } +\end{equation} +%%\begin{equation} +\begin{align} + \forall(x,y)\in R^{*2}; \exp(x+i.y)&= \exp(x).\exp(i.y)\\ + &=\exp(x).\cos(y)+i.\exp(x).\sin(y) +\end{align} +%%\end{equation} The application of logarithm can replace any multiplications and divisions with additions and subtractions; consequently it manipulates lower absolute values and can be compute the roots for -large polynomial's degree exceed 1000[17]. +large polynomial's degree exceed~\cite{Karimall98}. Applying this solution for the Aberth method we obtain the iteration function with logarithm: %%$$ \exp \bigl( \ln(p(z)_{k})-ln(\ln(p(z)_{k}^{'}))- \ln(1- \exp(\ln(p(z)_{k})-ln(\ln(p(z)_{k}^{'})+\ln\sum_{i\neq j}^{n}\frac{1}{z_{k}-z_{j}})$$ +\begin{equation} +H_{i}(z)=z_{i}^{k}-\exp \left(\ln \left( +p(z_{k})\right)-\ln\left(p(z_{k}^{'})\right)- \ln +\left(1-Q(z_{k})\right)\right) +\end{equation} +where: -$$ H_{i}(z)=z_{i}^{k}-\exp \left(\ln \left( p(z_{k})\right)-\ln\left(p(z_{k}^{'})\right)- \ln\left(1- \exp\left( \ln (p(z_{k}))-\ln(p(z_{k}^{'}))+\ln \left( \sum_{k\neq j}^{n}\frac{1}{z_{k}-z_{j}}\right)\right) \right) \right)$$ - +\begin{equation} +Q(z_{k})=\exp\left( \ln (p(z_{k}))-\ln(p(z_{k}^{'}))+\ln \left( +\sum_{k\neq j}^{n}\frac{1}{z_{k}-z_{j}}\right)\right) +\end{equation} this solution is applying when it is necessary \section{The implementation of simultaneous methods in a parallel computer} + The main problem of the simultaneous methods is that the necessary +time needed for the convergence is increased with the increasing +of the degree of the polynomial. The parallelization of these +algorithms will improve the convergence time. Researchers usually +adopt one of the two following approaches to parallelize root +finding algorithms. One approach is to reduce the total number of +iterations as implemented by Miranker +~\cite{Mirankar68,Mirankar71}, Schedler~\cite{Schedler72} and +Winogard~\cite{Winogard72}. Another approach is to reduce the +computation time per iteration, as reported +in~\cite{Benall68,Jana06,Janall99,Riceall06}. There are many +schemes for simultaneous approximations of all roots of a given +polynomial. Several works on different methods and issues of root +finding have been reported in~\cite{Azad07,Gemignani07,Kalantari08 +,Skachek08,Zhancall08,Zhuall08}. However, Durand-Kerner and +Ehrlich methods are the most practical choices among +them~\cite{Bini04}. These two methods have been extensively +studied for parallelization due to their following advantages. The +computation involved in these methods has some inherent +parallelism that can be suitably exploited by SIMD machines. +Moreover, they have fast rate of convergence (quadratic for the +Durand-Kerner method and cubic for the Ehrlich). Various parallel +algorithms reported for these methods can be found +in~\cite{Cosnard90, Freeman89,Freemanall90,,Jana99,Janall99}. +Freeman and Bane~\cite{Freemanall90} presented two parallel +algorithms on a local memory MIMD computer with the compute-to +communication time ratio O(n). However, their algorithms require +each processor to communicate its current approximation to all +other processors at the end of each iteration. Therefore they +cause a high degree of memory conflict. Recently the author +in~\cite{Mirankar71} proposed two versions of parallel algorithm +for the Durand-Kerner method, and Aberth method on an on model of +Optoelectronic Transpose Interconnection System (OTIS).The +algorithms are mapped on an OTIS-2D torus using N processors. This +solution need N processors to compute N roots, that it is not +practical (is not suitable to compute large polynomial's degrees). +Until then, the related works are not able to compute the root of +the large polynomial's degrees (higher then 1000) and with small +time. - + Finding polynomial roots rapidly and accurately it is our +objective, with the apparition of the CUDA(Compute Unified Device +Architecture), finding the roots of polynomials becomes rewarding +and very interesting, CUDA adopts a totally new computing +architecture to use the hardware resources provided by GPU in +order to offer a stronger computing ability to the massive data +computing.in~\cite{Kahinall14} we proposed the first implantation +of the root finding polynomials method on GPU (Graphics Processing +Unit),which is the Durand-Kerner method. The main result prove +that a parallel implementation is 10 times as fast as the +sequential implementation on a single CPU for high degree +polynomials that is greater than about 48000. Indeed, in this +paper we present a parallel implementation of Aberth's method on +GPU, more details are discussed in the following of this paper. + +\section {A parallel implementation of Aberth's method} +\subsection{Background on the GPU architecture} +A GPU is viewed as an accelerator for the data-parallel and +intensive arithmetic computations. It draws its computing power +from the parallel nature of its hardware and software +architectures. A GPU is composed of hundreds of Streaming +Processors (SPs) organized in several blocks called Streaming +Multiprocessors (SMs). It also has a memory hierarchy. It has a +private read-write local memory per SP, fast shared memory and +read-only constant and texture caches per SM and a read-write +global memory shared by all its SPs~\cite{NVIDIA10} + + On a CPU equipped with a GPU, all the data-parallel and intensive +functions of an application running on the CPU are off-loaded onto +the GPU in order to accelerate their computations. A similar +data-parallel function is executed on a GPU as a kernel by +thousands or even millions of parallel threads, grouped together +as a grid of thread blocks. Therefore, each SM of the GPU executes +one or more thread blocks in SIMD fashion (Single Instruction, +Multiple Data) and in turn each SP of a GPU SM runs one or more +threads within a block in SIMT fashion (Single Instruction, +Multiple threads). Indeed at any given clock cycle, the threads +execute the same instruction of a kernel, but each of them +operates on different data. + GPUs only work on data filled in their +global memories and the final results of their kernel executions +must be communicated to their CPUs. Hence, the data must be +transferred in and out of the GPU. However, the speed of memory +copy between the GPU and the CPU is slower than the memory +bandwidths of the GPU memories and, thus, it dramatically affects +the performances of GPU computations. Accordingly, it is necessary +to limit data transfers between the GPU and its CPU during the +computations. +\subsection{Background on the CUDA Programming Model} + +The CUDA programming model is similar in style to a single program +multiple-data (SPMD) softwaremodel. The GPU is treated as a +coprocessor that executes data-parallel kernel functions. CUDA +provides three key abstractions, a hierarchy of thread groups, +shared memories, and barrier synchronization. Threads have a three +level hierarchy. A grid is a set of thread blocks that execute a +kernel function. Each grid consists of blocks of threads. Each +block is composed of hundreds of threads. Threads within one block +can share data using shared memory and can be synchronized at a +barrier. All threads within a block are executed concurrently on a +multithreaded architecture.The programmer specifies the number of +threads per block, and the number of blocks per grid. A thread in +the CUDA programming language is much lighter weight than a thread +in traditional operating systems. A thread in CUDA typically +processes one data element at a time. The CUDA programming model +has two shared read-write memory spaces, the shared memory space +and the global memory space. The shared memory is local to a block +and the global memory space is accessible by all blocks. CUDA also +provides two read-only memory spaces, the constant space and the +texture space, which reside in external DRAM, and are accessed via +read-only caches + +\subsection{A parallel implementation of the Aberth's method } +\subsection{A CUDA implementation of the Aberth's method } +\subsection{A GPU implementation of the Aberth's method } +\subsubsection{the step to parallelize} +\subsubsection{the kernel corresponding } +\subsubsection{Comparison between sequential algorithm and GPU algorithm } \bibliographystyle{plain} \bibliography{biblio} %% \begin{thebibliography}{2} %% \bibitem [1] {1} O. Aberth, Iteration Methods for Finding - %% all Zeros of a Polynomial Simultaneously, Math. Comput. 27, 122 %% (1973) 339–344. @@ -245,17 +404,9 @@ this solution is applying when it is necessary %% jede ganze rationale function einer veranderlichen dagestellt %% werden kann als ein product aus linearen functionen derselben %% veranderlichen, Ges. Werke 3, 251-269. -%% \bibitem [13] {13} Freeman, T. L. (1989), Calculating polynomial zeros on a -%% local memory parallel computer, Parallel Computing 12, 351-358. -%% \bibitem [14] {14} Freeman, T. L., Brankin, R. K. (1990), Asynchronous -%% polynomial zero-finding algorithms, Parallel Computing 17, -%% 673-681. -%% \bibitem [15] {15} Raphaël,C. François,S. (2001), Extraction de racines dans des -%% polynômes creux de degré élevé. RSRCP (Réseaux et Systèmes -%% Répartis, Calculateurs Parallèles), Numéro thématique : -%% Algorithmes itératifs parallèles et distribués, 13(1):67--81. +%% %% \bibitem [16]{16} Kahina, G. Raphaël, C. Abderrahmane, S. A %% parallel implementation of the Durand-Kerner algorithm for @@ -272,5 +423,67 @@ this solution is applying when it is necessary %%200. %% \bibitem [19] {19} A. Ostrowski, On a Theorem by J.L. Walsh Concerning the Moduli of Roots of Algebraic Equations, %%Bull. A.M.S., 47 (1941) 742–746. + +%%\bibitem [20] {20} Mirankar WL (1968) Parallel methods for +%%approximating the roots of a function. IBM Res Dev 297– 301 30. +%%\bibitem [21] {21} Mirankar WL (1971) A survey of parallelism in +%%numerical analysis. SIAM Rev 524–547 + +%ù\bibitem [22] {22}Bini DA, Gemignani L (2004) Inverse power and +%%Durand–Kerner iterations for univariate polynomial root-finding. +%%Comput Math Appl 47:447–459 + +%%\bibitem [23] {23}Ben-Or M, Feig E, Kozzen D, Tiwary P (1968) A fast parallel +%algorithm for determining all roots of a polynomial with real +%%roots. In: Proc of ACM, pp 340–349 + +%%\bibitem [24] {24}Zhanc X, Wan M, Yi Z (2008) A constrained learning algorithm for +%%finding multiple real roots of polynomial. In: Proc of the 2008 +%%intl symposium on computational intelligence and design, pp 38–41 + +%%\bibitem [25] {25}Kalantari B (2008) Polynomial root finding and polynomiography. +%%World Scientific, New Jersey + +%%\bibitem [27] {27} Gemignani L (2007) Structured matrix methods for polynomial root +%%finding. In: Proc of the 2007 Intl symposium on symbolic and +%%algebraic computation, pp 175–180 Skachek V, Roth RM (2008) + +%%\bibitem [28] {28}Probabilistic algorithm for finding roots of linearized +%%polynomials. Design, codes and cryptography. Kluwer, Norwell + +%%\bibitem [29] {29}Schedler GS (1967) Parallel numerical methods for the solution of +%%equations. Commun ACM 286– 290 Ben-Or M, Feig E, Kozzen D, Tiwary + +%%\bibitem [30] {30}P (1968) A fast parallel algorithm for determining all roots of a +%%polynomial with real roots. In: Proc of ACM, pp 340–349 + +%%\bibitem [31] {31}Rice TA, Jamieson LH (1989) A highly parallel algorithm for root +%%extraction. IEEE Trans Comp 38(3):443–449 20. Jana PK (2006) + +%%\bibitem [32] {32}Winogard S (1972) Parallel iteration methods in complexity of +%%computer communications. Plenum, New York + +%ù\bibitem [33] {33} Cosnard M, Fraigniaud P (1990) Finding the roots of a polynomial +%%on an MIMD multicomputer. Parallel Comput 15:75–85 + +%%\bibitem [41] {41} Jana PK (1999) Finding polynomial zeroes on a Multi-mesh of trees +%%(MMT). In: Proc of the 2nd int conference on information +%%technology, Bhubaneswar, December 20–22, pp 202–206 + +%%\bibitem [42] {42}Zhu W, Zeng Z, Lin D (2008) An adaptive algorithm finding +%%multiple roots of polynomials. Lect Notes Comput Sci 5262:674–681 + + + +%%\bibitem [43] {43}Polynomial interpolation and polynomial root finding on OTIS-Mesh. +%%Parallel Comput 32:301–312 + +%%\bibitem [44] {44}Jana PK, Sinha BP, Datta Gupta R (1999) Efficient parallel +%%algorithms for finding polynomial zeroes. In: Proc of the 6th int +%%conference on advance computing, CDAC, Pune University Campus, +%%India, December 14–16, pp 189–196 + + + %% \end{thebibliography} \end{document}