MAJ de la section III: EA method

author Kahina <kahina@kahina-VPCEH3K1E.(none)>

Wed, 6 Jan 2016 03:08:30 +0000 (04:08 +0100)

committer Kahina <kahina@kahina-VPCEH3K1E.(none)>

Wed, 6 Jan 2016 03:13:40 +0000 (04:13 +0100)
author Kahina <kahina@kahina-VPCEH3K1E.(none)>
Wed, 6 Jan 2016 03:08:30 +0000 (04:08 +0100)
committer Kahina <kahina@kahina-VPCEH3K1E.(none)>
Wed, 6 Jan 2016 03:13:40 +0000 (04:13 +0100)
diff --git a/paper.tex b/paper.tex

index e4ae26a375d472222e979982da797116c451bc5f..41bdf69f57eaa29dda38df583b9cf727e28d1161 100644 (file)
--- a/paper.tex
+++ b/paper.tex
@@ -526,10 +526,10 @@ CUDA (Compute Unified Device Architecture) is a parallel computing architecture
  \label{sec3}
  
  \subsection{The EA method}
-A cubically convergent iteration method to find zeros of
-polynomials was proposed by O. Aberth~\cite{Aberth73}. The
-Ehrlich-Aberth (EA is short) method contains 4 main steps, presented in what
-follows.
+%A cubically convergent iteration method to find zeros of
+%polynomials was proposed by O. Aberth~\cite{Aberth73}. The
+%Ehrlich-Aberth (EA is short) method contains 4 main steps, presented in what
+%follows.
  
  %The Aberth method is a purely algebraic derivation. 
  %To illustrate the derivation, we let $w_{i}(z)$ be the product of linear factors 
@@ -555,59 +555,59 @@ follows.
  %Substituting $x_{j}$ for $z_{j}$ we obtain the Aberth iteration method.% 
  
  
-\subsubsection{Polynomials Initialization}
-The initialization of a polynomial $p(z)$ is done by setting each of the $n$ complex coefficients $a_{i}$:
+%\subsubsection{Polynomials Initialization}
+%The initialization of a polynomial $p(z)$ is done by setting each of the $n$ complex coefficients %$a_{i}$:
  
-\begin{equation}
-\label{eq:SimplePolynome}
-  p(z)=\sum{a_{i}z^{n-i}} , a_{n} \neq 0,a_{0}=1, a_{i}\subset C
-\end{equation}
+%\begin{equation}
+%\label{eq:SimplePolynome}
+%  p(z)=\sum{a_{i}z^{n-i}} , a_{n} \neq 0,a_{0}=1, a_{i}\subset C
+%\end{equation}
  
  
-\subsubsection{Vector $Z^{(0)}$ Initialization}
-\label{sec:vec_initialization}
-As for any iterative method, we need to choose $n$ initial guess points $z^{0}_{i}, i = 1, . . . , n.$
-The initial guess is very important since the number of steps needed by the iterative method to reach
-a given approximation strongly depends on it.
-In~\cite{Aberth73} the Ehrlich-Aberth iteration is started by selecting $n$
-equi-distant points on a circle of center 0 and radius r, where r is
-an upper bound to the moduli of the zeros. Later, Bini and al.~\cite{Bini96}
-performed this choice by selecting complex numbers along different
-circles which relies on the result of~\cite{Ostrowski41}.
+%\subsubsection{Vector $Z^{(0)}$ Initialization}
+%\label{sec:vec_initialization}
+%As for any iterative method, we need to choose $n$ initial guess points $z^{0}_{i}, i = 1, . . . , %n.$
+%The initial guess is very important since the number of steps needed by the iterative method to %reach
+%a given approximation strongly depends on it.
+%In~\cite{Aberth73} the Ehrlich-Aberth iteration is started by selecting $n$
+%equi-distant points on a circle of center 0 and radius r, where r is
+%an upper bound to the moduli of the zeros. Later, Bini and al.~\cite{Bini96}
+%performed this choice by selecting complex numbers along different
+%circles which relies on the result of~\cite{Ostrowski41}.
  
-\begin{equation}
-\label{eq:radiusR}
+%\begin{equation}
+%\label{eq:radiusR}
  %%\begin{align}
-\sigma_{0}=\frac{u+v}{2};u=\frac{\sum_{i=1}^{n}u_{i}}{n.max_{i=1}^{n}u_{i}};
-v=\frac{\sum_{i=0}^{n-1}v_{i}}{n.min_{i=0}^{n-1}v_{i}};\\
+%\sigma_{0}=\frac{u+v}{2};u=\frac{\sum_{i=1}^{n}u_{i}}{n.max_{i=1}^{n}u_{i}};
+%v=\frac{\sum_{i=0}^{n-1}v_{i}}{n.min_{i=0}^{n-1}v_{i}};\\
  %%\end{align}
-\end{equation}
-Where:
-\begin{equation}
-u_{i}=2.|a_{i}|^{\frac{1}{i}};
-v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2}.
-\end{equation}
+%\end{equation}
+%Where:
+%\begin{equation}
+%u_{i}=2.|a_{i}|^{\frac{1}{i}};
+%v_{i}=\frac{|\frac{a_{n}}{a_{i}}|^{\frac{1}{n-i}}}{2}.
+%\end{equation}
  
-\subsubsection{Iterative Function}
-The operator used by the Aberth method  corresponds to the
-equation~\ref{Eq:EA1}, it enables the convergence towards
-the polynomials zeros, provided all the roots are distinct.
+%\subsubsection{Iterative Function}
+%The operator used by the Aberth method  corresponds to the
+%equation~\ref{Eq:EA1}, it enables the convergence towards
+%the polynomials zeros, provided all the roots are distinct.
  
  %Here we give a second form of the iterative function used by the Ehrlich-Aberth method: 
  
-\begin{equation}
-\label{Eq:EA1}
-EA: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
-{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n
-\end{equation}
+%\begin{equation}
+%\label{Eq:EA1}
+%EA: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
+%{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, %i=1,. . . .,n
+%\end{equation}
  
-\subsubsection{Convergence Condition}
-The convergence condition determines the termination of the algorithm. It consists in stopping the iterative function  when the roots are sufficiently stable. We consider that the method converges sufficiently when:
+%\subsubsection{Convergence Condition}
+%The convergence condition determines the termination of the algorithm. It consists in stopping the %iterative function  when the roots are sufficiently stable. We consider that the method converges %sufficiently when:
  
-\begin{equation}
-\label{eq:Aberth-Conv-Cond}
-\forall i \in [1,n];\vert\frac{z_{i}^{k}-z_{i}^{k-1}}{z_{i}^{k}}\vert<\xi
-\end{equation}
+%\begin{equation}
+%\label{eq:Aberth-Conv-Cond}
+%\forall i \in [1,n];\vert\frac{z_{i}^{k}-z_{i}^{k-1}}{z_{i}^{k}}\vert<\xi
+%\end{equation}
  
  
  %\begin{figure}[htbp]
@@ -617,10 +617,20 @@ The convergence condition determines the termination of the algorithm. It consis
  %\label{fig:03}
  %\end{figure}
  
-%the Ehrlich-Aberth method is an iterative  method, contain 4 steps, start from the initial approximations of all the
-%roots of the polynomial,the second step initialize the solution vector $Z$ using the Guggenheimer method to assure the distinction of the initial vector roots, than in step 3 we apply the the iterative function based on the Newton's method and Weiestrass operator[...,...], wich will make it possible to converge to the roots solution, provided that all the root are different. At the end of each application of the iterative function, a stop condition is verified consists in stopping the iterative process when the whole of the modules of the roots
-%are lower than a fixed value $ε$ 
+the Ehrlich-Aberth method is an iterative  method, contain 4 steps, start from the initial approximations of all the roots of the polynomial,the second step initialize the solution vector $Z$ using the Guggenheimer method to assure the distinction of the initial vector roots, than in step 3 we apply the the iterative function based on the Newton's method and Weiestrass operator~\cite{,}, wich will make it possible to converge to the roots solution, provided that all the root are different.
+
+\begin{equation}
+\label{Eq:EA1}
+EA: z^{k+1}_{i}=z_{i}^{k}-\frac{\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}}
+{1-\frac{p(z_{i}^{k})}{p'(z_{i}^{k})}\sum_{j=1,j\neq i}^{j=n}{\frac{1}{(z_{i}^{k}-z_{j}^{k})}}}, i=1,. . . .,n
+\end{equation}
+
+ At the end of each application of the iterative function, a stop condition is verified consists in stopping the iterative process when the whole of the modules of the roots are lower than a fixed value $\xi$ 
  
+\begin{equation}
+\label{eq:Aberth-Conv-Cond}
+\forall i \in [1,n];\vert\frac{z_{i}^{k}-z_{i}^{k-1}}{z_{i}^{k}}\vert<\xi
+\end{equation}
  
  \subsection{EA parallel implementation on CUDA}
  We introduced three paradigms of parallel programming. Our objective consists in implementing a root finding polynomial algorithm on multiple GPUs. To this end, it is primordial to know how to manage CUDA contexts of different GPUs. A direct method for controlling the various GPUs is to use as many threads or processes as GPU devices. We can choose the GPU index based on the identifier of OpenMP thread or the rank of the MPI process. Both approaches will be investigated.
@@ -845,7 +855,7 @@ The experiments shows the execution time of the EA algorithm, on a single GPU an
  
  Results with full polynomials show very important savings in execution time. For a polynomial of degree 1,4 million, the CUDA-OpenMP approach with 4 GPUs solves it 4 times as fast as single GPU, thus achieving a quasi-linear speedup.
  
-\subsection{Evaluting the Multi-GPU (CUDA-MPI) approach}
+\subsection{Evaluating the Multi-GPU (CUDA-MPI) approach}
  In this part we perform a set of experiments to compare Multi-GPU (CUDA MPI) approach with single GPU, for solving full and sparse polynomials of degrees ranging from 100,000 to 1,400,000.
  
  \subsubsection{Execution times in seconds of the Ehrlich-Aberth method for solving sparse polynomials on GPUs using distributed memory paradigm with MPI}
@@ -860,6 +870,7 @@ In this part we perform a set of experiments to compare Multi-GPU (CUDA MPI) app
  This figure shows 4 curves of execution time of EA algorithm, a curve with single GPU, 3 curves with multiple GPUs (2, 3, 4). We can clearly see that the curve with single GPU is above the other curves, which shows consumption in execution time compared to the Multi-GPU. We can see also that the CUDA-MPI approach reduces the execution time by a factor of 100 for polynomials of degree more than 1,000,000 whereas a single GPU is of the scale 1000.
  %%SIDER : Je n'ai pas reformuler car je n'ai pas compris la phrase, merci de l'ecrire ici en fran\cais.
  \\cette figure montre 4 courbes de temps d'exécution pour l'algorithme EA, une courbe avec un seul GPU, 3 courbes pour multiple GPUs(2, 3, 4), on peut constaté clairement que la courbe à un seul GPU est au-dessus des autres courbes, vue sa consomation en temps d'exècution. On peut voir aussi qu'avec l'approche Multi-GPU (CUDA-MPI) reduit le temps d'exècution jusqu'à l'echelle 100 pour le polynômes qui dépasse 1,000,000 tandis que Single GPU est de l'echelle 1000.
+
  \subsubsection{Execution times in seconds of the Ehrlich-Aberth method for solving full polynomials on GPUs using distributed memory paradigm with MPI}
  
  \begin{figure}[htbp]
@@ -876,42 +887,59 @@ This is due to the use of MPI parallel paradigm that divides the problem computa
  
  
  
+\subsection{Comparative  between (CUDA-OpenMP) approach and (CUDA-MPI) approach}
+In this part we present some experiment comparing the two Multi-GPU approach (OpenMP versus MPI) for solving sparse polynomial, full polynomials than we compare the execution time of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with MPI and with OpenMP.
  
+\subsubsection{Comparison between MPI and OpenMP versions of the Ehrlich-Aberth method for solving sparse polynomials on GPUs}
+In this experiment we chose three polynomials of different size like (200K, 800K, 1,4M). We compare their execution time according to the number of the GPUs.
  \begin{figure}[htbp]
  \centering
   \includegraphics[angle=-90,width=0.5\textwidth]{Sparse}
-\caption{Comparaison between MPI and OpenMP versions of the Ehrlich-Aberth method for solving sparse plynomials on GPUs.}
+\caption{Comparison between MPI and OpenMP versions of the Ehrlich-Aberth method for solving sparse polynomials on GPUs.}
  \label{fig:05}
  \end{figure}
+in figure ~\ref{fig:05} we have two curves: MPI curve and OpenMP curve for each polynomials size. We can see that the results are similar between OpenMP curves and MPI curves for the polynomials size (200K, 1,4M), but there is a slight different between MPI curve and OpenMP curve for the polynomial of size 800K. ...
  
+\subsubsection{Comparison between MPI and OpenMP versions of the Ehrlich-Aberth method for solving full polynomials on GPUs}
  \begin{figure}[htbp]
  \centering
   \includegraphics[angle=-90,width=0.5\textwidth]{Full}
-\caption{Comparaison between MPI and OpenMP versions of the Ehrlich-Aberth method for solving full polynomials on GPUs.}
+\caption{Comparison between MPI and OpenMP versions of the Ehrlich-Aberth method for solving full polynomials on GPUs.}
  \label{fig:06}
  \end{figure}
+in figure ~\ref{fig:06}, we can see that the two paradigm MPI and OpenMP give the same result for solving full polynomials with EA algorithm.
+% size (200k,800K, 1,4M) are very similar for solving full polynomials with the EA algorithm.
  
+\subsubsection{Comparison of execution times of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with distributed memory paradigm using MPI}
+in this experiment we compare the execution time of EA algorithm according to the number of the GPU for solving sparse and full polynomials on Multi-GPU using MPI. We chose three sparse and full polynomials of different size like (200K, 800K, 1,4M). 
  \begin{figure}[htbp]
  \centering
   \includegraphics[angle=-90,width=0.5\textwidth]{MPI}
-\caption{Comparaison of execution times of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with distributed memory paradigm using MPI.}
+\caption{Comparison of execution times of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with distributed memory paradigm using MPI.}
  \label{fig:07}
  \end{figure}
+in figure ~\ref{fig:07} we can see that (CUDA-MPI) can solve sparse and full polynomials of high degrees, the execution time with sparse polynomial are very low comparing to full polynomials. with sparse polynomials the number of monomial are reduce, consequently the number of operation are reduce than the execution time decrease. 
+
+\subsubsection{Comparison of execution times of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with shared memory paradigm using OpenMP}
  
  \begin{figure}[htbp]
  \centering
   \includegraphics[angle=-90,width=0.5\textwidth]{OMP}
-\caption{Comparaison of execution times of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with shared memory paradigm using OpenMP.}
+\caption{Comparison of execution times of the Ehrlich-Aberth method for solving sparse and full polynomials on GPUs with shared memory paradigm using OpenMP.}
  \label{fig:08}
  \end{figure}
  
+in figure ~\ref{fig:08}
+\subsection{Scalability of the EA method on Multi-GPU to solve very high polynomials degrees}
+ This experiment we report the execution time according to the degrees polynomials ranging from 1,000,000 to 5,000,000 for both approaches (cuda-OpenMP) and (CUDA-MPI) with sparse and full  polynomials. 
  \begin{figure}[htbp]
  \centering
   \includegraphics[angle=-90,width=0.5\textwidth]{big}
-\caption{Execution times in seconds of the Ehrlich-Aberth method for solving full polynomials of high degrees on 4 GPUs.}
+ \caption{Execution times in seconds of the Ehrlich-Aberth method for solving full polynomials of high degrees on 4 GPUs.}
  \label{fig:09}
  \end{figure}
-
+in figure ~\ref{fig:09} we can see that both (cuda-OpenMP) and (CUDA-MPI) approaches are scalable can solve very high polynomials degrees. with full polynomial the both approaches give very interesting ans similar results for polynomials of 5,000,000 degrees we not reach 30,000 s 
+%for sparse and full polynomials
  % An example of a floating figure using the graphicx package.
  % Note that \label must occur AFTER (or within) \caption.
  % For figures, \caption should occur after the \includegraphics.
@@ -1010,7 +1038,7 @@ This is due to the use of MPI parallel paradigm that divides the problem computa
  
  
  \section{Conclusion}
-\label{sec5}
+\label{sec6}
  In this paper, we have presented a parallel implementation of Ehrlich-Aberth algorithm for solving full and sparse polynomials, on single GPU with CUDA and on multiple GPUs using two parallel paradigms : shared memory with OpenMP and distributed memory with MPI. These architectures were addressed by a CUDA-OpenMP approach and CUDA-MPI approach, respectively. 
  The experiments show that, using parallel programming model like (OpenMP, MPI), we can efficiently manage multiple graphics cards to work together to solve the same problem and accelerate the parallel execution with 4 GPUs and solve a polynomial of degree 1,000,000, four times faster than on single GPU, that is a quasi-linear speedup.
author	Kahina <kahina@kahina-VPCEH3K1E.(none)>
	Wed, 6 Jan 2016 03:08:30 +0000 (04:08 +0100)
committer	Kahina <kahina@kahina-VPCEH3K1E.(none)>
	Wed, 6 Jan 2016 03:13:40 +0000 (04:13 +0100)