new

[kahina_paper1.git] / paper.tex
diff --git a/paper.tex b/paper.tex

index 788db9ac129fd34557039e46f743d4b9f398221c..493bb3996d87560220f76b26a00b3ee6446416cc 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -300,7 +300,7 @@ Here we give a second form of the iterative function used by Ehrlich-Aberth meth
  \begin{equation}
  \label{Eq:Hi}
  EA2: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
  \begin{equation}
  \label{Eq:Hi}
  EA2: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
-{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=0,. . . .,n
+{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n
  \end{equation}
  It can be noticed that this equation is equivalent to Eq.~\ref{Eq:EA},
  but we prefer the latter one because we can use it to improve the
  \end{equation}
  It can be noticed that this equation is equivalent to Eq.~\ref{Eq:EA},
  but we prefer the latter one because we can use it to improve the
@@ -385,7 +385,7 @@ Authors usually adopt one of the two following approaches to parallelize root
  finding algorithms. The first approach aims at reducing the total number of
  iterations as by Miranker
  ~\cite{Mirankar68,Mirankar71}, Schedler~\cite{Schedler72} and
  finding algorithms. The first approach aims at reducing the total number of
  iterations as by Miranker
  ~\cite{Mirankar68,Mirankar71}, Schedler~\cite{Schedler72} and
-Winogard~\cite{Winogard72}. The second approach aims at reducing the
+Winograd~\cite{Winogard72}. The second approach aims at reducing the
  computation time per iteration, as reported
  in~\cite{Benall68,Jana06,Janall99,Riceall06}. 
  
  computation time per iteration, as reported
  in~\cite{Benall68,Jana06,Janall99,Riceall06}. 
  
@@ -409,8 +409,8 @@ cause a high degree of memory conflict. Recently the author
  in~\cite{Mirankar71} proposed two versions of parallel algorithm
  for the Durand-Kerner method, and Ehrlich-Aberth method on a model of
  Optoelectronic Transpose Interconnection System (OTIS).The
  in~\cite{Mirankar71} proposed two versions of parallel algorithm
  for the Durand-Kerner method, and Ehrlich-Aberth method on a model of
  Optoelectronic Transpose Interconnection System (OTIS).The
-algorithms are mapped on an OTIS-2D torus using N processors. This
-solution needs N processors to compute N roots, which is not
+algorithms are mapped on an OTIS-2D torus using $N$ processors. This
+solution needs $N$ processors to compute $N$ roots, which is not
  practical for solving polynomials with large degrees.
  %Until very recently, the literature did not mention implementations
  %able to compute the roots of large degree polynomials (higher then
  practical for solving polynomials with large degrees.
  %Until very recently, the literature did not mention implementations
  %able to compute the roots of large degree polynomials (higher then
@@ -423,7 +423,7 @@ In~\cite{Kahinall14} we already proposed the first implementation
  of a root finding method on GPUs, that of the Durand-Kerner method. The main result showed
  that a parallel CUDA implementation is 10 times as fast as the
  sequential implementation on a single CPU for high degree
  of a root finding method on GPUs, that of the Durand-Kerner method. The main result showed
  that a parallel CUDA implementation is 10 times as fast as the
  sequential implementation on a single CPU for high degree
-polynomials of 48000.
+polynomials of 48,000.
  %In this paper we present a parallel implementation of Ehrlich-Aberth
  %method on GPUs for sparse and full polynomials with high degree (up
  %to $1,000,000$).
  %In this paper we present a parallel implementation of Ehrlich-Aberth
  %method on GPUs for sparse and full polynomials with high degree (up
  %to $1,000,000$).
@@ -543,18 +543,25 @@ polynomials of 48000.
  In order to implement the Ehrlich-Aberth method in CUDA, it is
  possible to use the Jacobi scheme or the Gauss Seidel one.  With the
  Jacobi iteration, at iteration $k+1$ we need all the previous values
  In order to implement the Ehrlich-Aberth method in CUDA, it is
  possible to use the Jacobi scheme or the Gauss Seidel one.  With the
  Jacobi iteration, at iteration $k+1$ we need all the previous values
-$z^{(k)}_{i}$ to compute the new values $z^{(k+1)}_{i}$, that is :
+$z^{k}_{i}$ to compute the new values $z^{k+1}_{i}$, that is :
  
  \begin{equation}
  
  \begin{equation}
-EAJ: z^{k+1}_{i}=\frac{p(z^{k}_{i})}{p'(z^{k}_{i})-p(z^{k}_{i})\sum^{n}_{j=1 j\neq i}\frac{1}{z^{k}_{i}-z^{k}_{j}}}, i=1,...,n.
+EAJ: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
+{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n.
  \end{equation}
  
  With the Gauss-Seidel iteration, we have:
  \end{equation}
  
  With the Gauss-Seidel iteration, we have:
+%\begin{equation}
+%\label{eq:Aberth-H-GS}
+%EAGS: z^{k+1}_{i}=\frac{p(z^{k}_{i})}{p'(z^{k}_{i})-p(z^{k}_{i})(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum^{n}_{j=i+1}\frac{1}{z^{k}_{i}-z^{k}_{j}})}, i=1,...,n.
+%\end{equation}
+
  \begin{equation}
  \label{eq:Aberth-H-GS}
  \begin{equation}
  \label{eq:Aberth-H-GS}
-EAGS: z^{k+1}_{i}=\frac{p(z^{k}_{i})}{p'(z^{k}_{i})-p(z^{k}_{i})(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum^{n}_{j=i+1}\frac{1}{z^{k}_{i}-z^{k}_{j}})}, i=1,...,n.
+EAGS: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
+{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}(\sum^{i-1}_{j=1}\frac{1}{z^{k}_{i}-z^{k+1}_{j}}+\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}})}, i=1,. . . .,n
  \end{equation}
  \end{equation}
-%%Here a finiched my revision %%
+
  Using Eq.~\ref{eq:Aberth-H-GS} to update the vector solution
  \textit{Z}, we expect the Gauss-Seidel iteration to converge more
  quickly because, just as any Jacobi algorithm (for solving linear systems of equations), it uses the most fresh computed roots $z^{k+1}_{i}$.
  Using Eq.~\ref{eq:Aberth-H-GS} to update the vector solution
  \textit{Z}, we expect the Gauss-Seidel iteration to converge more
  quickly because, just as any Jacobi algorithm (for solving linear systems of equations), it uses the most fresh computed roots $z^{k+1}_{i}$.
@@ -843,7 +850,8 @@ numerical applications on GPU.
  
  In future works, we plan to investigate the possibility of using
  several multiple GPUs simultaneously, either with multi-GPU machine or
  
  In future works, we plan to investigate the possibility of using
  several multiple GPUs simultaneously, either with multi-GPU machine or
-with cluster of GPUs.
+with cluster of GPUs. It may also be interesting to study the
+implementation of other root finding polynomial methods on GPU.