From 96a949ea31a5f5a9aa66c6fbdb87faa473ad9ee5 Mon Sep 17 00:00:00 2001 From: couturie Date: Tue, 3 Nov 2015 15:52:25 -0500 Subject: [PATCH] suite --- paper.tex | 58 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 30 insertions(+), 28 deletions(-) diff --git a/paper.tex b/paper.tex index d2016d1..9d26e36 100644 --- a/paper.tex +++ b/paper.tex @@ -627,17 +627,18 @@ The last kernel checks the convergence of the roots after each update of $Z^{(k)}$, according to formula Eq.~\ref{eq:Aberth-Conv-Cond}. We used the functions of the CUBLAS Library (CUDA Basic Linear Algebra Subroutines) to implement this kernel. The kernel terminates its computations when all the roots have -converged. Many important remarks should be noticed. First, as blocks -of threads are scheduled automatically by the GPU, we have absolutely -no control on the order of the blocks. Consequently, our algorithm is -executed more or less in an asynchronous iterations way, where blocks -of roots are updated in a non deterministic way. As the Durand-Kerner -method has been proved to convergence with asynchronous iterations, we -think it is similar with the Ehrlich-Aberth method, but we did not try -to prove this in that paper. Another consequence of that, is that -several executions of our algorithm with the same polynomials do no -give necessarily the same result with the same number of iterations -(even if the variation is not very significant). +converged. It should be noticed that, as blocks of threads are +scheduled automatically by the GPU, we have absolutely no control on +the order of the blocks. Consequently, our algorithm is executed more +or less in an asynchronous iteration model, where blocks of roots are +updated in a non deterministic way. As the Durand-Kerner method has +been proved to converge with asynchronous iterations, we think it is +similar with the Ehrlich-Aberth method, but we did not try to prove +this in that paper. Another consequence of that, is that several +executions of our algorithm with the same polynomial do no give +necessarily the same result (but roots have the same accuracy) and the +same number of iterations (even if the variation is not very +significant). @@ -647,14 +648,14 @@ give necessarily the same result with the same number of iterations \section{Experimental study} \label{sec6} %\subsection{Definition of the used polynomials } -We study two categories of polynomials : the sparse polynomials and the full polynomials. -\paragraph{A sparse polynomial}:is a polynomial for which only some coefficients are not null. We use in the following polynomial for which the roots are distributed on 2 distinct circles : +We study two categories of polynomials: sparse polynomials and the full polynomials.\\ +{\it A sparse polynomial} is a polynomial for which only some +coefficients are not null. In this paper, we consider sparse polynomials for which the roots are distributed on 2 distinct circles: \begin{equation} \forall \alpha_{1} \alpha_{2} \in C,\forall n_{1},n_{2} \in N^{*}; P(z)= (z^{n_{1}}-\alpha_{1})(z^{n_{2}}-\alpha_{2}) -\end{equation} - - -\paragraph{A full polynomial}:is in contrast, a polynomial for which all the coefficients are not null. the second form used to obtain a full polynomial is: +\end{equation}\noindent +{\it A full polynomial} is, in contrast, a polynomial for which +all the coefficients are not null. A full polynomial is defined by: %%\begin{equation} %%\forall \alpha_{i} \in C,\forall n_{i}\in N^{*}; P(z)= \sum^{n}_{i=1}(z^{n^{i}}.a_{i}) %%\end{equation} @@ -662,23 +663,24 @@ We study two categories of polynomials : the sparse polynomials and the full pol \begin{equation} {\Large \forall a_{i} \in C, i\in N; p(x)=\sum^{n}_{i=0} a_{i}.x^{i}} \end{equation} -With this form, we can have until \textit{n} non zero terms whereas the sparse ones have just two non zero terms. +%With this form, we can have until \textit{n} non zero terms whereas the sparse ones have just two non zero terms. %\subsection{The study condition} -The our experiences results concern two parameters which are -the polynomial degree and the execution time of our program -to converge on the solution. The polynomial degree allows us -to validate that our algorithm is powerful with high degree -polynomials. The execution time remains the -element-key which justifies our work of parallelization. - For our tests we used a CPU Intel(R) Xeon(R) CPU -E5620@2.40GHz and a GPU K40 (with 6 Go of ram). +%Two parameters are studied are +%the polynomial degree and the execution time of our program +%to converge on the solution. The polynomial degree allows us +%to validate that our algorithm is powerful with high degree +%polynomials. The execution time remains the +%element-key which justifies our work of parallelization. +For our tests, a CPU Intel(R) Xeon(R) CPU +E5620@2.40GHz and a GPU K40 (with 6 Go of ram) is used. %\subsection{Comparative study} -In this section, we discuss the performance Ehrlich-Aberth method of root finding polynomials implemented on CPUs and on GPUs. +%First, performances of the Ehrlich-Aberth method of root finding polynomials +%implemented on CPUs and on GPUs are studied. -We performed a set of experiments on the sequential and the parallel algorithms, for both sparse and full polynomials and different sizes. We took into account the execution time, the polynomial size and the number of threads per block performed by sum or each experiment on CPUs and on GPUs. +We performed a set of experiments on the sequential and the parallel algorithms, for both sparse and full polynomials and different sizes. We took into account the execution times, the polynomial size and the number of threads per block performed by sum or each experiment on CPUs and on GPUs. All experimental results obtained from the simulations are made in double precision data, for a convergence tolerance of the methods set to $10^{-7}$. Since we were more interested in the comparison of the performance behaviors of Ehrlich-Aberth and Durand-Kerner methods on CPUs versus on GPUs. The initialization values of the vector solution of the Ehrlich-Aberth method are given in section 2.2. \subsection{The execution time in seconds of Ehrlich-Aberth algorithm on CPU OpenMP (1 core, 4 cores) vs. on a Tesla GPU} -- 2.39.5