From: Kahina Guidouche <guidouche@slayer.iut-bm.univ-fcomte.fr>
Date: Sun, 6 Sep 2015 16:50:24 +0000 (+0200)
Subject: this is the commit 06.09.15
X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper1.git/commitdiff_plain/944acbd1d3f1134e456324577c906ede08485b2f?ds=sidebyside

this is the commit 06.09.15
---

diff --git a/Root.tex b/Root.tex
index 2ee2c28..5134838 100644
--- a/Root.tex
+++ b/Root.tex
@@ -11,8 +11,9 @@ we consider a polynomial of degree \textit{n} having coefficients
 in the complex \textit{C} and zeros $\alpha
 _{i},\textit{i=1,...,n}$. \\
 \begin{center}
-     {\Large$p(x)=\sum{a_{i}x^{i}}=a_{n}\prod(x-\alpha_{i}),
-a_{0}a_{n}\neq0,$}
+\begin{equation}
+     {\Large p(x)=\sum{a_{i}x^{i}}=a_{n}\prod(x-\alpha_{i}),a_{0} a_{n}\neq 0}
+\end{equation}
 \end{center}
 
  the root finding problem consist to find
@@ -48,24 +49,26 @@ Muller s method, and several methods for simultaneous
 approximation of all the roots, starting with the Durand-Kerner
 method:
 \begin{center}
-
-$  Z_{i}=Z_{i}-\frac{P(Z_{i})}{\prod_{i\neq j}(z_{i}-z_{j})} $
+\begin{equation} Z_{i}=Z_{i}-\frac{P(Z_{i})}{\prod_{i\neq j}(z_{i}-z_{j})}
+\end{equation}
 \end{center}
 
-This formula is mentioned for the first time from Weiestrass [12]
-as part of the fundamental theorem of Algebra and is rediscovered
-from Ilieff~\cite{Ilief50} [2], Docev [3], Durand [4], Kerner [5].
-Another method discovered from Borsch-Supan [6] and also described
-and brought in the following form from Ehrlich [7] and
+This formula is mentioned for the first time from
+Weiestrass~\cite{Weierstrass03} as part of the fundamental theorem
+of Algebra and is rediscovered from Ilieff~\cite{Ilie50},
+Docev~\cite{Docev62}, Durand~\cite{Durand60},
+Kerner~\cite{Kerner66}. Another method discovered from
+Borsch-Supan~\cite{ Borch-Supan63} and also described and brought
+in the following form from Ehrlich~\cite{Ehrlich67} and
 Aberth~\cite{Aberth73}.
 \begin{center}
-
-$  Z_{i}=Z_{i}-\frac{1}{{\frac {P'(Z_{i})}
-{P(Z_{i})}}-{\sum_{i\neq j}(z_{i}-z_{j})}} $
+\begin{equation}
+ Z_{i}=Z_{i}-\frac{1}{{\frac {P'(Z_{i})} {P(Z_{i})}}-{\sum_{i\neq j}(z_{i}-z_{j})}}
+\end{equation}
 \end{center}
 
-Aberth, Ehrlich and Farmer-Loizou [10] have proved that the above
-method has cubic order of convergence for simple roots.
+Aberth, Ehrlich and Farmer-Loizou~\cite{Loizon83} have proved that
+the above method has cubic order of convergence for simple roots.
 
 
 Iterative methods raise several problem when implemented e.g.
@@ -76,37 +79,38 @@ parallelization of these algorithms will improve the convergence
 time.
 
 Many authors have treated the problem of parallelization of
-simultaneous methods. Freeman [13] has tested the DK method, EA
-method and another method of the fourth order proposed from Farmer
-and Loizou [10],on a 8- processor linear chain, for polynomial of
-degree up to 8. The third method often diverges, but the first two
-methods have speed-up 5.5 (speed-up=(Time on one processor)/(Time
-on p processors)). Later Freeman and Bane [14] consider
-asynchronous algorithms, in which each processor continues to
-update its approximations even although the latest values of other
-$z_i((k))$ have not received from the other processors, in
-difference with the synchronous version where it would wait. in
-[15]proposed two methods of parallelization for architecture with
-shared memory and distributed memory,it able to compute the root
-of polynomial degree  10000 on 430 s with only 8 pc and 2
-communications per iteration. Compare to the sequential it take
-3300 s to obtain the same results.
+simultaneous methods. Freeman~\cite{Freeman89} has tested the DK
+method, EA method and another method of the fourth order proposed
+from Farmer and Loizou~\cite{Loizon83},on a 8- processor linear
+chain, for polynomial of degree up to 8. The third method often
+diverges, but the first two methods have speed-up 5.5
+(speed-up=(Time on one processor)/(Time on p processors)). Later
+Freeman and Bane~\cite{Freemanall90}  consider asynchronous
+algorithms, in which each processor continues to update its
+approximations even although the latest values of other $z_i((k))$
+have not received from the other processors, in difference with
+the synchronous version where it would wait.
+in~\cite{Raphaelall01}proposed two methods of parallelization for
+architecture with shared memory and distributed memory,it able to
+compute the root of polynomial degree  10000 on 430 s with only 8
+pc and 2 communications per iteration. Compare to the sequential
+it take 3300 s to obtain the same results.
 
 After this few works discuses this problem until the apparition of
-the Compute Unified Device Architecture (CUDA) [19],a parallel
-computing platform and a programming model invented by NVIDIA. the
-computing ability of GPU has exceeded the counterpart of CPU. It
-is a waste of resource to be just a graphics card for GPU.  CUDA
-adopts a totally new computing architecture to use the hardware
-resources provided by GPU in order to offer a stronger computing
-ability to the massive data computing.
-
-
-Indeed [16]proposed the implementation of the Durand-Kerner method
-on GPU (Graphics Processing Unit). The main result prove that a
-parallel implementation is 10 times as fast as the sequential
-implementation on a single CPU for high degree polynomials that is
-greater than about 48000.
+the Compute Unified Device Architecture (CUDA)~\cite{CUDA10},a
+parallel computing platform and a programming model invented by
+NVIDIA. the computing ability of GPU has exceeded the counterpart
+of CPU. It is a waste of resource to be just a graphics card for
+GPU.  CUDA adopts a totally new computing architecture to use the
+hardware resources provided by GPU in order to offer a stronger
+computing ability to the massive data computing.
+
+
+Indeed,~\cite{Kahinall14}proposed the implementation of the
+Durand-Kerner method on GPU (Graphics Processing Unit). The main
+result prove that a parallel implementation is 10 times as fast as
+the sequential implementation on a single CPU for high degree
+polynomials that is greater than about 48000.
 \paragraph{}
 The mean part of our work is to implement the Aberth method on GPU
 and compare it with the Durand Kerner
@@ -115,18 +119,28 @@ implementation.................To be continued..................
 
 \section{Aberth method and difficulties}
 A cubically convergent iteration method for finding zeros of
-polynomials was proposed by O.Aberth[?].The Aberth method is a
-purely algebraic derivation.To illustrate the derivation, we let
-$w_{i}(z)$ be the product of linear factor $ w_{i}(z)=\prod_{j=1,j
-\neq i}^{n} (z-x_{j})$
+polynomials was proposed by O.Aberth~\cite{Aberth73}.The Aberth
+method is a purely algebraic derivation.To illustrate the
+derivation, we let $w_{i}(z)$ be the product of linear factor $
+w_{i}(z)=\prod_{j=1,j \neq i}^{n} (z-x_{j})$
 
 and rational function $R_{i}(z)$ be the correction term of
-Weistrass method (?)
-$$R_{i}(z)=\dfrac{p(z)}{w_{i}(Z)} , i=1,2,...,n. $$
+Weistrass method~\cite{Weierstrass03}:
+
+\begin{equation}
+R_{i}(z)=\dfrac{p(z)}{w_{i}(Z)} , i=1,2,...,n.
+\end{equation}
 
 Differentiating the rational function $R_{i}(z)$ and applying the
 Newton method, we have
-$$\dfrac{R_{i}(z)}{R_{i}^{'}(z)}= \dfrac{p(z)}{p^{'}(z)-p(z)\dfrac{w_{i}(z)}{w_{i}^{'}(z)}}= \dfrac{p(z)}{p^{'}(z)-p(z) \sum _{j=1,j \neq i}^{n}\dfrac{1}{z-x_{i}}}, i=1,2,...,n $$
+
+\begin{equation}
+\dfrac{R_{i}(z)}{R_{i}^{'}(z)}=
+\dfrac{p(z)}{p^{'}(z)-p(z)\dfrac{w_{i}(z)}{w_{i}^{'}(z)}}=
+\dfrac{p(z)}{p^{'}(z)-p(z) \sum _{j=1,j \neq
+i}^{n}\dfrac{1}{z-x_{i}}}, i=1,2,...,n
+\end{equation}
+
 Substituting $x_{j}$ for z we obtain the Aberth iteration method
 
 Let present the means stages of Aberth's method.
@@ -134,7 +148,10 @@ Let present the means stages of Aberth's method.
 \subsection{Polynomials Initialization}
  The initialization of polynomial P(z) with complex coefficients
  are given by:
- $$ p(z)=\sum{a_{i}z^{n-i}}. where a_{n} \neq 0,a_{0}=1, a_{i}\subset C $$
+
+\begin{equation}
+  p(z)=\sum{a_{i}z^{n-i}}. where a_{n} \neq 0,a_{0}=1, a_{i}\subset C
+\end{equation}
 
 
 \subsection{Vector $Z^{0)}$ Initialization}
@@ -142,26 +159,39 @@ Let present the means stages of Aberth's method.
 The choice of the initial points $z^{(0)}_{i} , i = 1, . . . , n,$
 from which starting the iteration  (2) or (3), is rather delicate
 since the number of steps needed by the iterative method to reach
-a given approximation strongly depends on it. In [1] the Aberth
-iteration is started by selecting n equispaced points on a circle
-of center 0 and radius r, where r is an upper bound to the moduli
-of the zeros. After[18]  performs this choice by selecting complex
-numbers along different circles and relies on the result of [19].
-
-$$\sigma_{0}=\frac{u+v}{2};u=\frac{\sum_{i=1}^{n}u_{i}}{n.max_{i=1}^{n}u_{i}}; v=\frac{\sum_{i=0}^{n-1}v_{i}}{n.min_{i=0}^{n-1}v_{i}};u_{i}=2.|a_{i}|^{\frac{1}{i}}; v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2} $$
+a given approximation strongly depends on it.
+In~\cite{Aberth73}the Aberth iteration is started by selecting n
+equispaced points on a circle of center 0 and radius r, where r is
+an upper bound to the moduli of the zeros. After,~\cite{Bini96}
+performs this choice by selecting complex numbers along different
+circles and relies on the result of~\cite{Ostrowski41}.
+
+\begin{equation}
+\sigma_{0}=\frac{u+v}{2};u=\frac{\sum_{i=1}^{n}u_{i}}{n.max_{i=1}^{n}u_{i}};
+v=\frac{\sum_{i=0}^{n-1}v_{i}}{n.min_{i=0}^{n-1}v_{i}};u_{i}=2.|a_{i}|^{\frac{1}{i}};
+v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2}
+\end{equation}
 
 \subsection{Iterative Function Hi}
 The operator used with Aberth's method is corresponding to the
 following equation which will enable the convergence towards
 polynomial solutions, provided all the roots are distinct.
 
-$$ H_{i}(z)=z_{i}-\frac{1}{\frac{P^{'}(z_{i})}{P(z_{i})}-\sum_{j\neq i}{\frac{1}{z_{i}-z_{j}}}} $$
+\begin{equation}
+H_{i}(z)=z_{i}-\frac{1}{\frac{P^{'}(z_{i})}{P(z_{i})}-\sum_{j\neq
+i}{\frac{1}{z_{i}-z_{j}}}}
+\end{equation}
 
 \subsection{Convergence condition}
 determines the success of the termination. It consists in stopping
 the iterative function $H_{i}(z)$ when the are stable,the method
 converge sufficiently:
-$$  \forall i \in [1,n]; \frac{z_{i}^{(k)}-z_{i}^{(k-1)}}{z_{i}^{(k)}}< \xi$$
+
+\begin{equation}
+\forall i \in
+[1,n];\frac{z_{i}^{(k)}-z_{i}^{(k-1)}}{z_{i}^{(k)}}<\xi
+\end{equation}
+
 
 \section{Difficulties and amelioration}
 the Aberth method implementation suffer of overflow problems. This
@@ -171,40 +201,169 @@ point $\xi$ where $|\xi| > 1$.Indeed the limited number in the
 mantissa of floating takings the computation of P(z) wrong when z
 is large. for example $(10^{50}) +1+ (- 10_{50})$ will give result
 0 instead of 1 in reality.consequently we can't compute the roots
-for large polynomial's degree. This problem was discuss in [17]
-for the Durand-Kerner method, the authors propose to use the
-logratihm and the exponential of a complex:
-
-$$ \forall(x,y)\in R^{*2}; \ln (x+i.y)=\ln(x^{2}+y^{2}) 2+i.\arcsin(y\sqrt{x^{2}+y^{2}})_{\left] -\pi, \pi\right[ } $$
-$$ \forall(x,y)\in R^{*2}; \exp(x+i.y)= \exp(x).\exp(i.y)$$
-$$                                       =\exp(x).\cos(y)+i.\exp(x).\sin(y)$$
-
+for large polynomial's degree. This problem was discuss in
+~\cite{Karimall98} for the Durand-Kerner method, the authors
+propose to use the logratihm and the exponential of a complex:
+
+\begin{equation}
+ \forall(x,y)\in R^{*2}; \ln (x+i.y)=\ln(x^{2}+y^{2})
+2+i.\arcsin(y\sqrt{x^{2}+y^{2}})_{\left] -\pi, \pi\right[ }
+\end{equation}
+%%\begin{equation}
+\begin{align}
+ \forall(x,y)\in R^{*2}; \exp(x+i.y)&= \exp(x).\exp(i.y)\\
+                                    &=\exp(x).\cos(y)+i.\exp(x).\sin(y)
+\end{align}
+%%\end{equation}
 
 The application of logarithm can replace any multiplications and
 divisions with additions and subtractions; consequently it
 manipulates lower absolute values and can be compute the roots for
-large polynomial's degree exceed 1000[17].
+large polynomial's degree exceed~\cite{Karimall98}.
 
 Applying this solution for the Aberth method we obtain the
 iteration function with logarithm:
 %%$$ \exp \bigl(  \ln(p(z)_{k})-ln(\ln(p(z)_{k}^{'}))- \ln(1- \exp(\ln(p(z)_{k})-ln(\ln(p(z)_{k}^{'})+\ln\sum_{i\neq j}^{n}\frac{1}{z_{k}-z_{j}})$$
+\begin{equation}
+H_{i}(z)=z_{i}^{k}-\exp \left(\ln \left(
+p(z_{k})\right)-\ln\left(p(z_{k}^{'})\right)- \ln
+\left(1-Q(z_{k})\right)\right)
+\end{equation}
+where:
 
-$$ H_{i}(z)=z_{i}^{k}-\exp \left(\ln \left( p(z_{k})\right)-\ln\left(p(z_{k}^{'})\right)- \ln\left(1- \exp\left( \ln (p(z_{k}))-\ln(p(z_{k}^{'}))+\ln \left( \sum_{k\neq j}^{n}\frac{1}{z_{k}-z_{j}}\right)\right) \right) \right)$$
-
+\begin{equation}
+Q(z_{k})=\exp\left( \ln (p(z_{k}))-\ln(p(z_{k}^{'}))+\ln \left(
+\sum_{k\neq j}^{n}\frac{1}{z_{k}-z_{j}}\right)\right)
+\end{equation}
 
 
 this solution is applying when it is necessary
 
 \section{The implementation of simultaneous methods in a parallel computer}
+    The main problem of the simultaneous methods is that the necessary
+time needed for the convergence is increased with the increasing
+of the degree of the polynomial. The parallelization of these
+algorithms will improve the convergence time. Researchers usually
+adopt one of the two following approaches to parallelize root
+finding algorithms. One approach is to reduce the total number of
+iterations as implemented by Miranker
+~\cite{Mirankar68,Mirankar71}, Schedler~\cite{Schedler72} and
+Winogard~\cite{Winogard72}. Another approach is to reduce the
+computation time per iteration, as reported
+in~\cite{Benall68,Jana06,Janall99,Riceall06}. There are many
+schemes for simultaneous approximations of all roots of a given
+polynomial. Several works on different methods and issues of root
+finding have been reported in~\cite{Azad07,Gemignani07,Kalantari08
+,Skachek08,Zhancall08,Zhuall08}. However, Durand-Kerner and
+Ehrlich methods are the most practical choices among
+them~\cite{Bini04}. These two methods have been extensively
+studied for parallelization due to their following advantages. The
+computation involved in these methods has some inherent
+parallelism that can be suitably exploited by SIMD machines.
+Moreover, they have fast rate of convergence (quadratic for the
+Durand-Kerner method and cubic for the Ehrlich). Various parallel
+algorithms reported for these methods can be found
+in~\cite{Cosnard90, Freeman89,Freemanall90,,Jana99,Janall99}.
+Freeman and Bane~\cite{Freemanall90} presented two parallel
+algorithms on a local memory MIMD computer with the compute-to
+communication time ratio O(n). However, their algorithms require
+each processor to communicate its current approximation to all
+other processors at the end of each iteration. Therefore they
+cause a high degree of memory conflict. Recently the author
+in~\cite{Mirankar71} proposed two versions of parallel algorithm
+for the Durand-Kerner method, and Aberth method on an on model of
+Optoelectronic Transpose Interconnection System (OTIS).The
+algorithms are mapped on an OTIS-2D torus using N processors. This
+solution need N processors to compute N roots, that it is not
+practical (is not suitable to compute large polynomial's degrees).
+Until then, the related works are not able to compute the root of
+the large polynomial's degrees (higher then 1000) and with small
+time.
 
-
+    Finding polynomial roots rapidly and accurately it is our
+objective, with the apparition of the CUDA(Compute Unified Device
+Architecture), finding the roots of polynomials becomes rewarding
+and very interesting, CUDA adopts a totally new computing
+architecture to use the hardware resources provided by GPU in
+order to offer a stronger computing ability to the massive data
+computing.in~\cite{Kahinall14} we proposed the first implantation
+of the root finding polynomials method on GPU (Graphics Processing
+Unit),which is the Durand-Kerner method. The main result prove
+that a parallel implementation is 10 times as fast as the
+sequential implementation on a single CPU for high degree
+polynomials that is greater than about 48000. Indeed, in this
+paper we present a parallel implementation of Aberth's method on
+GPU, more details are discussed in the following of this paper.
+
+\section {A parallel implementation of Aberth's method}
+\subsection{Background on the GPU architecture}
+A GPU is viewed as an accelerator for the data-parallel and
+intensive arithmetic computations. It draws its computing power
+from the parallel nature of its hardware and software
+architectures. A GPU is composed of hundreds of Streaming
+Processors (SPs) organized in several blocks called Streaming
+Multiprocessors (SMs). It also has a memory hierarchy. It has a
+private read-write local memory per SP, fast shared memory and
+read-only constant and texture caches per SM and a read-write
+global memory shared by all its SPs~\cite{NVIDIA10}
+
+    On a CPU equipped with a GPU, all the data-parallel and intensive
+functions of an application running on the CPU are off-loaded onto
+the GPU in order to accelerate their computations. A similar
+data-parallel function is executed on a GPU as a kernel by
+thousands or even millions of parallel threads, grouped together
+as a grid of thread blocks. Therefore, each SM of the GPU executes
+one or more thread blocks in SIMD fashion (Single  Instruction,
+Multiple Data) and in turn each SP of a GPU SM runs one or more
+threads within a block in SIMT fashion (Single Instruction,
+Multiple threads). Indeed at any given clock cycle, the threads
+execute the same instruction of a kernel, but each of them
+operates on different data.
+ GPUs only work on data filled in their
+global memories and the final results of their kernel executions
+must be communicated to their CPUs. Hence, the data must be
+transferred in and out of the GPU. However, the speed of memory
+copy between the GPU and the CPU is slower than the memory
+bandwidths of the GPU memories and, thus, it dramatically affects
+the performances of GPU computations. Accordingly, it is necessary
+to limit data transfers between the GPU and its CPU during the
+computations.
+\subsection{Background on the CUDA Programming Model}
+
+The CUDA programming model is similar in style to a single program
+multiple-data (SPMD) softwaremodel. The GPU is treated as a
+coprocessor that executes data-parallel kernel functions. CUDA
+provides three key abstractions, a hierarchy of thread groups,
+shared memories, and barrier synchronization. Threads have a three
+level hierarchy. A grid is a set of thread blocks that execute a
+kernel function. Each grid consists of blocks of threads. Each
+block is composed of hundreds of threads. Threads within one block
+can share data using shared memory and can be synchronized at a
+barrier. All threads within a block are executed concurrently on a
+multithreaded architecture.The programmer specifies the number of
+threads per block, and the number of blocks per grid. A thread in
+the CUDA programming language is much lighter weight than a thread
+in traditional operating systems. A thread in CUDA typically
+processes one data element at a time. The CUDA programming model
+has two shared read-write memory spaces, the shared memory space
+and the global memory space. The shared memory is local to a block
+and the global memory space is accessible by all blocks. CUDA also
+provides two read-only memory spaces, the constant space and the
+texture space, which reside in external DRAM, and are accessed via
+read-only caches
+
+\subsection{A parallel implementation of the Aberth's method }
+\subsection{A CUDA implementation of the Aberth's method }
+\subsection{A GPU implementation of the Aberth's method }
+\subsubsection{the  step to parallelize}
+\subsubsection{the kernel corresponding }
+\subsubsection{Comparison between sequential algorithm and GPU algorithm }
 \bibliographystyle{plain}
 \bibliography{biblio}
 %% \begin{thebibliography}{2}
 
 %% \bibitem [1] {1} O. Aberth, Iteration Methods for Finding
 
-
 %% all Zeros of a Polynomial Simultaneously, Math. Comput. 27, 122
 %% (1973) 339Â344.
 
@@ -245,17 +404,9 @@ this solution is applying when it is necessary
 %% jede ganze rationale function einer veranderlichen dagestellt
 %% werden kann als ein product aus linearen functionen derselben
 %% veranderlichen, Ges. Werke 3, 251-269.
-%% \bibitem [13] {13} Freeman, T. L. (1989), Calculating polynomial zeros on a
-%% local memory parallel computer, Parallel Computing 12, 351-358.
 
-%% \bibitem [14] {14} Freeman, T. L., Brankin, R. K. (1990), Asynchronous
-%% polynomial zero-finding algorithms, Parallel Computing 17,
-%% 673-681.
 
-%% \bibitem [15] {15} RaphaÃ«l,C. FranÃ§ois,S. (2001), Extraction de racines dans des
-%% polynÃ´mes creux de degrÃ© Ã©levÃ©. RSRCP (RÃ©seaux et SystÃ¨mes
-%% RÃ©partis, Calculateurs ParallÃ¨les), NumÃ©ro thÃ©matique :
-%% Algorithmes itÃ©ratifs parallÃ¨les et distribuÃ©s, 13(1):67--81.
+%%
 
 %% \bibitem [16]{16} Kahina, G. RaphaÃ«l, C. Abderrahmane, S. A
 %% parallel implementation of the Durand-Kerner algorithm for
@@ -272,5 +423,67 @@ this solution is applying when it is necessary
 %%200.
 %% \bibitem [19] {19} A. Ostrowski, On a Theorem by J.L. Walsh Concerning the Moduli of Roots of Algebraic Equations,
 %%Bull. A.M.S., 47 (1941) 742746.
+
+%%\bibitem [20] {20} Mirankar WL (1968) Parallel methods for
+%%approximating the roots of a function. IBM Res Dev 297 301 30.
+%%\bibitem [21] {21} Mirankar WL (1971) A survey of parallelism in
+%%numerical analysis. SIAM Rev 524547
+
+%ù\bibitem [22] {22}Bini DA, Gemignani L (2004) Inverse power and
+%%DurandKerner iterations for univariate polynomial root-finding.
+%%Comput Math Appl 47:447459
+
+%%\bibitem [23] {23}Ben-Or M, Feig E, Kozzen D, Tiwary P (1968) A fast parallel
+%algorithm for determining all roots of a polynomial with real
+%%roots. In: Proc of ACM, pp 340349
+
+%%\bibitem [24] {24}Zhanc X, Wan M, Yi Z (2008) A constrained learning algorithm for
+%%finding multiple real roots of polynomial. In: Proc of the 2008
+%%intl symposium on computational intelligence and design, pp 3841
+
+%%\bibitem [25] {25}Kalantari B (2008) Polynomial root finding and polynomiography.
+%%World Scientific, New Jersey
+
+%%\bibitem [27] {27} Gemignani L (2007) Structured matrix methods for polynomial root
+%%finding. In: Proc of the 2007 Intl symposium on symbolic and
+%%algebraic computation, pp 175180 Skachek V, Roth RM (2008)
+
+%%\bibitem [28] {28}Probabilistic algorithm for finding roots of linearized
+%%polynomials. Design, codes and cryptography. Kluwer, Norwell
+
+%%\bibitem [29] {29}Schedler GS (1967) Parallel numerical methods for the solution of
+%%equations. Commun ACM 286 290 Ben-Or M, Feig E, Kozzen D, Tiwary
+
+%%\bibitem [30] {30}P (1968) A fast parallel algorithm for determining all roots of a
+%%polynomial with real roots. In: Proc of ACM, pp 340349
+
+%%\bibitem [31] {31}Rice TA, Jamieson LH (1989) A highly parallel algorithm for root
+%%extraction. IEEE Trans Comp 38(3):443449 20. Jana PK (2006)
+
+%%\bibitem [32] {32}Winogard S (1972) Parallel iteration methods in complexity of
+%%computer communications. Plenum, New York
+
+%ù\bibitem [33] {33} Cosnard M, Fraigniaud P (1990) Finding the roots of a polynomial
+%%on an MIMD multicomputer. Parallel Comput 15:7585
+
+%%\bibitem [41] {41} Jana PK (1999) Finding polynomial zeroes on a Multi-mesh of trees
+%%(MMT). In: Proc of the 2nd int conference on information
+%%technology, Bhubaneswar, December 2022, pp 202206
+
+%%\bibitem [42] {42}Zhu W, Zeng Z, Lin D (2008) An adaptive algorithm finding
+%%multiple roots of polynomials. Lect Notes Comput Sci 5262:674681
+
+
+
+%%\bibitem [43] {43}Polynomial interpolation and polynomial root finding on OTIS-Mesh.
+%%Parallel Comput 32:301312
+
+%%\bibitem [44] {44}Jana PK, Sinha BP, Datta Gupta R (1999) Efficient parallel
+%%algorithms for finding polynomial zeroes. In: Proc of the 6th int
+%%conference on advance computing, CDAC, Pune University Campus,
+%%India, December 1416, pp 189196
+
+
+
 %% \end{thebibliography}
 \end{document}