From: Kahina Date: Thu, 5 Nov 2015 12:05:02 +0000 (+0100) Subject: new X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper1.git/commitdiff_plain/10844d40f853b6fad58b5f366618e3b2aec1066c?ds=inline new --- diff --git a/paper.tex b/paper.tex index 493bb39..8f03506 100644 --- a/paper.tex +++ b/paper.tex @@ -363,11 +363,13 @@ Q(z^{k}_{i})=\exp\left( \ln (p(z^{k}_{i}))-\ln(p'(z^{k}_{i}))+\ln \left( \sum_{i\neq j}^{n}\frac{1}{z^{k}_{i}-z^{k}_{j}}\right)\right)i=1,...,n, \end{equation} -This solution is applied when the root except the circle unit, represented by the radius $R$ evaluated in C language as: - +This solution is applied when the root except the circle unit, represented by the radius $R$ evaluated in C language as : +\label{R.EL} +\begin{center} \begin{verbatim} R = exp(log(DBL_MAX)/(2*n) ); -\end{verbatim} +\end{verbatim} +\end{center} %\begin{equation} @@ -581,8 +583,9 @@ quickly because, just as any Jacobi algorithm (for solving linear systems of equ %In CUDA programming, all the instructions of the \verb=for= loop are executed by the GPU as a kernel. A kernel is a function written in CUDA and defined by the \verb=__global__= qualifier added before a usual \verb=C= function, which instructs the compiler to generate appropriate code to pass it to the CUDA runtime in order to be executed on the GPU. -Algorithm~\ref{alg2-cuda} shows a sketch of the Ehrlich-Aberth algorithm using CUDA. +Algorithm~\ref{alg2-cuda} shows steps of the Ehrlich-Aberth algorithm using CUDA. +\begin{enumerate} \begin{algorithm}[H] \label{alg2-cuda} %\LinesNumbered @@ -595,21 +598,22 @@ Algorithm~\ref{alg2-cuda} shows a sketch of the Ehrlich-Aberth algorithm using C \BlankLine -Initialization of the of P\; -Initialization of the of Pu\; -Initialization of the solution vector $Z^{0}$\; -Allocate and copy initial data to the GPU global memory\; -k=0\; +\item Initialization of the of P\; +\item Initialization of the of Pu\; +\item Initialization of the solution vector $Z^{0}$\; +\item Allocate and copy initial data to the GPU global memory\; +\item k=0\; \While {$\Delta z_{max} > \epsilon$}{ - Let $\Delta z_{max}=0$\; -$ kernel\_save(ZPrec,Z)$\; -k=k+1\; -$ kernel\_update(Z,P,Pu)$\; -$kernel\_testConverge(\Delta z_{max},Z,ZPrec)$\; +\item Let $\Delta z_{max}=0$\; +\item $ kernel\_save(ZPrec,Z)$\; +\item k=k+1\; +\item $ kernel\_update(Z,P,Pu)$\; +\item $kernel\_testConverge(\Delta z_{max},Z,ZPrec)$\; } -Copy results from GPU memory to CPU memory\; +\item Copy results from GPU memory to CPU memory\; \end{algorithm} +\end{enumerate} ~\\ After the initialization step, all data of the root finding problem @@ -622,8 +626,8 @@ polynomial's root found at the previous time-step in GPU memory, in order to check the convergence of the roots after each iteration (line 8, Algorithm~\ref{alg2-cuda}). -The second kernel executes the iterative function $H$ and updates -Z, according to Algorithm~\ref{alg3-update}. We notice that the +The second kernel executes the iterative function and updates +$Z$, according to Algorithm~\ref{alg3-update}. We notice that the update kernel is called in two forms, according to the value \emph{R} which determines the radius beyond which we apply the exponential logarithm algorithm. @@ -644,7 +648,7 @@ The first form executes formula the EA function Eq.~\ref{Eq:Hi} if the modulus of the current complex is less than the a certain value called the radius i.e. ($ |z^{k}_{i}|<= R$), else the kernel executes the EA.EL function Eq.~\ref{Log_H2} -(with Eq.~\ref{deflncomplex}, Eq.~\ref{defexpcomplex}). The radius $R$ is evaluated as : +(with Eq.~\ref{deflncomplex}, Eq.~\ref{defexpcomplex}). The radius $R$ is evaluated as in ~\ref{R.EL} : $$R = \exp( \log(DBL\_MAX) / (2*n) )$$ where $DBL\_MAX$ stands for the maximum representable double value.