From e8280a0a25ef036dc912b6552f8d49a4b1083d44 Mon Sep 17 00:00:00 2001 From: couturie Date: Wed, 4 Nov 2015 10:16:06 -0500 Subject: [PATCH 1/1] new --- paper.tex | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/paper.tex b/paper.tex index f75c983..6a5bd46 100644 --- a/paper.tex +++ b/paper.tex @@ -740,9 +740,18 @@ and the GPU implementation. %We notice that the convergence precision is a round $10^{-7}$ for the both implementation on CPU and GPU. Consequently, we can conclude that Ehrlich-Aberth on GPU are faster and accurately then CPU implementation. -\subsection{Influence of the number of threads on the execution times of different polynomials (sparse and full)} -To optimize the performances of an algorithm on a GPU, it is necessary to maximize the use of cores GPU (maximize the number of threads executed in parallel) and to optimize the use of the various memoirs GPU. In fact, it is interesting to see the influence of the number of threads per block on the execution time of Ehrlich-Aberth algorithm. -For that, we notice that the maximum number of threads per block for the Nvidia Tesla K40 GPU is 1024, so we varied the number of threads per block from 8 to 1024. We took into account the execution time for both sparse and full of 10 different polynomials of size 50,000 and 10 different polynomials of size 500,000 degrees. +\subsection{Influence of the number of threads on the execution times + of different polynomials (sparse and full)} + +To optimize the performances of an algorithm on a GPU, it is necessary +to maximize the use of the GPU cores. In fact, it is interesting to +see the influence of the number of threads per block on the execution +time of Ehrlich-Aberth algorithm. For that, we notice that the +maximum number of threads per block for the Nvidia Tesla K40 GPU is +1024. So the number of threads per block ranges from 8 to 1024. We +took into account the execution time for both sparse and full of 10 +different polynomials of size 50,000 and 10 different polynomials of +size 500,000 degrees. \begin{figure}[htbp] \centering @@ -751,11 +760,17 @@ For that, we notice that the maximum number of threads per block for the Nvidia \label{fig:02} \end{figure} -The figure 2 show that, the best execution time for both sparse and full polynomial are given when the threads number varies between 64 and 256 threads per bloc. We notice that with small polynomials the best number of threads per block is 64, Whereas, the large polynomials the best number of threads per block is 256. However,In the following experiments we specify that the number of thread by block is 256. +Figure~\ref{fig:02} shows that, the best execution time for both +sparse and full polynomial are given when the threads number varies +between 64 and 256 threads per block. We notice that with small +polynomials the best number of threads per block is 64, whereas the +large polynomials the best number of threads per block is +256. However, in the following experiments we specify that the number +of threads per block is 256. -\subsection{The impact of exp-log solution to compute very high degrees of polynomial} +\subsection{Influence of exponential-logarithm solution to compute very high degrees polynomials} -In this experiment we report the performance of log.exp solution describe in ~\ref{sec2} to compute very high degrees polynomials. +In this experiment we report the performance of exp.log solution described in ~\ref{sec2} to compute very high degrees polynomials. \begin{figure}[htbp] \centering \includegraphics[width=0.8\textwidth]{figures/sparse_full_explog} -- 2.39.5