X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper1.git/blobdiff_plain/abc776be5595380b3d192f54e5a7b84da33b02ec..5c06f09b51f4a75b98bd366ffc527c17cfea74a4:/paper.tex?ds=inline diff --git a/paper.tex b/paper.tex index 788db9a..493bb39 100644 --- a/paper.tex +++ b/paper.tex @@ -300,7 +300,7 @@ Here we give a second form of the iterative function used by Ehrlich-Aberth meth \begin{equation} \label{Eq:Hi} EA2: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}} -{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=0,. . . .,n +{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n \end{equation} It can be noticed that this equation is equivalent to Eq.~\ref{Eq:EA}, but we prefer the latter one because we can use it to improve the @@ -385,7 +385,7 @@ Authors usually adopt one of the two following approaches to parallelize root finding algorithms. The first approach aims at reducing the total number of iterations as by Miranker ~\cite{Mirankar68,Mirankar71}, Schedler~\cite{Schedler72} and -Winogard~\cite{Winogard72}. The second approach aims at reducing the +Winograd~\cite{Winogard72}. The second approach aims at reducing the computation time per iteration, as reported in~\cite{Benall68,Jana06,Janall99,Riceall06}. @@ -409,8 +409,8 @@ cause a high degree of memory conflict. Recently the author in~\cite{Mirankar71} proposed two versions of parallel algorithm for the Durand-Kerner method, and Ehrlich-Aberth method on a model of Optoelectronic Transpose Interconnection System (OTIS).The -algorithms are mapped on an OTIS-2D torus using N processors. This -solution needs N processors to compute N roots, which is not +algorithms are mapped on an OTIS-2D torus using $N$ processors. This +solution needs $N$ processors to compute $N$ roots, which is not practical for solving polynomials with large degrees. %Until very recently, the literature did not mention implementations %able to compute the roots of large degree polynomials (higher then @@ -423,7 +423,7 @@ In~\cite{Kahinall14} we already proposed the first implementation of a root finding method on GPUs, that of the Durand-Kerner method. The main result showed that a parallel CUDA implementation is 10 times as fast as the sequential implementation on a single CPU for high degree -polynomials of 48000. +polynomials of 48,000. %In this paper we present a parallel implementation of Ehrlich-Aberth %method on GPUs for sparse and full polynomials with high degree (up %to $1,000,000$). @@ -543,18 +543,25 @@ polynomials of 48000. In order to implement the Ehrlich-Aberth method in CUDA, it is possible to use the Jacobi scheme or the Gauss Seidel one. With the Jacobi iteration, at iteration $k+1$ we need all the previous values -$z^{(k)}_{i}$ to compute the new values $z^{(k+1)}_{i}$, that is : +$z^{k}_{i}$ to compute the new values $z^{k+1}_{i}$, that is : \begin{equation} -EAJ: z^{k+1}_{i}=\frac{p(z^{k}_{i})}{p'(z^{k}_{i})-p(z^{k}_{i})\sum^{n}_{j=1 j\neq i}\frac{1}{z^{k}_{i}-z^{k}_{j}}}, i=1,...,n. +EAJ: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}} +{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n. \end{equation} With the Gauss-Seidel iteration, we have: +%\begin{equation} +%\label{eq:Aberth-H-GS} +%EAGS: z^{k+1}_{i}=\frac{p(z^{k}_{i})}{p'(z^{k}_{i})-p(z^{k}_{i})(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum^{n}_{j=i+1}\frac{1}{z^{k}_{i}-z^{k}_{j}})}, i=1,...,n. +%\end{equation} + \begin{equation} \label{eq:Aberth-H-GS} -EAGS: z^{k+1}_{i}=\frac{p(z^{k}_{i})}{p'(z^{k}_{i})-p(z^{k}_{i})(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum^{n}_{j=i+1}\frac{1}{z^{k}_{i}-z^{k}_{j}})}, i=1,...,n. +EAGS: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}} +{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}})}, i=1,. . . .,n \end{equation} -%%Here a finiched my revision %% + Using Eq.~\ref{eq:Aberth-H-GS} to update the vector solution \textit{Z}, we expect the Gauss-Seidel iteration to converge more quickly because, just as any Jacobi algorithm (for solving linear systems of equations), it uses the most fresh computed roots $z^{k+1}_{i}$. @@ -843,7 +850,8 @@ numerical applications on GPU. In future works, we plan to investigate the possibility of using several multiple GPUs simultaneously, either with multi-GPU machine or -with cluster of GPUs. +with cluster of GPUs. It may also be interesting to study the +implementation of other root finding polynomial methods on GPU.