X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/blobdiff_plain/016edf5f7ca01fdf537b4abcb6de87d75c604510..a595bc795f31e05fc7fcc8415e1549bdda84b076:/prng_gpu.tex diff --git a/prng_gpu.tex b/prng_gpu.tex index 57526f2..730b620 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -8,6 +8,7 @@ \usepackage{moreverb} \usepackage{commath} \usepackage{algorithm2e} +\usepackage{listings} \usepackage[standard]{ntheorem} % Pour mathds : les ensembles IR, IN, etc. @@ -246,7 +247,7 @@ return $x$\; \end{algorithm} \begin{algorithm}[h!] -\SetAlgoLined +%\SetAlgoLined %%RAPH: cette ligne provoque une erreur chez moi \KwIn{the internal configuration $z$ (a 32-bit word)} \KwOut{$y$ (a 32-bit word)} $z\leftarrow{z\oplus{(z\ll13)}}$\; @@ -668,52 +669,121 @@ Indeed this result is weaker than the theorem establishing the chaos for the fin On parle du séquentiel avec des nombres 64 bits\\ -Faire le lien avec le paragraphe précédent (je considère que la stratégie s'appelle $S^i$\\ In order to implement efficiently a PRNG based on chaotic iterations it is possible to improve previous works [ref]. One solution consists in considering -that the strategy used $S^i$ contains all the bits for which the negation is +that the strategy used contains all the bits for which the negation is achieved out. Then instead of applying the negation on these bits we can simply -apply the xor operator between the current number and the strategy $S^i$. In +apply the xor operator between the current number and the strategy. In order to obtain the strategy we also use a classical PRNG. -\begin{figure}[htbp] -\begin{center} -\fbox{ -\begin{minipage}{14cm} -unsigned int CIprng() \{\\ - static unsigned int x = 123123123;\\ - unsigned long t1 = xorshift();\\ - unsigned long t2 = xor128();\\ - unsigned long t3 = xorwow();\\ - x = x\textasciicircum (unsigned int)t1;\\ - x = x\textasciicircum (unsigned int)(t2$>>$32);\\ - x = x\textasciicircum (unsigned int)(t3$>>$32);\\ - x = x\textasciicircum (unsigned int)t2;\\ - x = x\textasciicircum (unsigned int)(t1$>>$32);\\ - x = x\textasciicircum (unsigned int)t3;\\ - return x;\\ -\} -\end{minipage} +%% \begin{figure}[htbp] +%% \begin{center} +%% \fbox{ +%% \begin{minipage}{14cm} +%% unsigned int CIprng() \{\\ +%% static unsigned int x = 123123123;\\ +%% unsigned long t1 = xorshift();\\ +%% unsigned long t2 = xor128();\\ +%% unsigned long t3 = xorwow();\\ +%% x = x\textasciicircum (unsigned int)t1;\\ +%% x = x\textasciicircum (unsigned int)(t2$>>$32);\\ +%% x = x\textasciicircum (unsigned int)(t3$>>$32);\\ +%% x = x\textasciicircum (unsigned int)t2;\\ +%% x = x\textasciicircum (unsigned int)(t1$>>$32);\\ +%% x = x\textasciicircum (unsigned int)t3;\\ +%% return x;\\ +%% \} +%% \end{minipage} +%% } +%% \end{center} +%% \caption{sequential Chaotic Iteration PRNG} +%% \label{algo:seqCIprng} +%% \end{figure} + + + +\lstset{language=C,caption={C code of the sequential chaotic iterations based PRNG},label=algo:seqCIprng} +\begin{lstlisting} +unsigned int CIprng() { + static unsigned int x = 123123123; + unsigned long t1 = xorshift(); + unsigned long t2 = xor128(); + unsigned long t3 = xorwow(); + x = x^(unsigned int)t1; + x = x^(unsigned int)(t2>>32); + x = x^(unsigned int)(t3>>32); + x = x^(unsigned int)t2; + x = x^(unsigned int)(t1>>32); + x = x^(unsigned int)t3; + return x; } -\end{center} -\caption{sequential Chaotic Iteration PRNG} -\label{algo:seqCIprng} -\end{figure} +\end{lstlisting} + + + -In Figure~\ref{algo:seqCIprng} a sequential version of our chaotic iterations -based PRNG is presented. This version uses three classical 64 bits PRNG: the -\texttt{xorshift}, the \texttt{xor128} and the \texttt{xorwow}. These three -PRNGs are presented in~\cite{Marsaglia2003}. As each PRNG used works with -64-bits and as our PRNG works with 32 bits, the use of \texttt{(unsigned int)} -selects the 32 least significant bits whereas \texttt{(unsigned int)(t3$>>$32)} -selects the 32 most significants bits of the variable \texttt{t}. This version -sucesses the BigCrush of the TestU01 battery [P. L’ecuyer and + +In listing~\ref{algo:seqCIprng} a sequential version of our chaotic iterations +based PRNG is presented. This version uses three classical 64-bits PRNG: the +\texttt{xorshift}, the \texttt{xor128} and the \texttt{xorwow}. In the +following, we call them xor-like PRNGSs. These three PRNGs are presented +in~\cite{Marsaglia2003}. As each xor-like PRNG used works with 64-bits and as our PRNG +works with 32-bits, the use of \texttt{(unsigned int)} selects the 32 least +significant bits whereas \texttt{(unsigned int)(t3$>>$32)} selects the 32 most +significants bits of the variable \texttt{t}. So to produce a random number +realizes 6 xor operations with 6 32-bits numbers produced by 3 64-bits PRNG. +This version successes the BigCrush of the TestU01 battery [P. L’ecuyer and R. Simard. Testu01]. \section{Efficient prng based on chaotic iterations on GPU} -On parle du passage du sequentiel au GPU +In order to benefit from computing power of GPU, a program needs to define +independent blocks of threads which can be computed simultaneously. In general, +the larger the number of threads is, the more local memory is used and the less +branching instructions are used (if, while, ...), the better performance is +obtained on GPU. So with algorithm \ref{algo:seqCIprng} presented in the +previous section, it is possible to build a similar program which computes PRNG +on GPU. The principe consists in assigning the computation of a PRNG as in +sequential to each thread of the GPU. Of course, it is essential that the three +xor-like PRNGs used for our computation have different parameters. So we chose +them randomly with another PRNG. As the initialisation is performed by the CPU, +we have chosen to use the ISAAC PRNG to initalize all the parameters for the GPU +version of our PRNG. The implementation of the three xor-like PRNGs is +straightforward as soon as their parameters have been allocated in the GPU +memory. Each xor-like PRNGs used works with an internal number $x$ which keeps +the last generated random numbers. Other internal variables are also used by the +xor-like PRNGs. More precisely, the implementation of the xor128, the xorshift +and the xorwow respectively require 4, 5 and 6 unsigned long as internal +variables. + +\begin{algorithm} + +\KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like PRNGs in global memory\; +NumThreads: Number of threads\;} +\KwOut{NewNb: array containing random numbers in global memory} +\If{threadId is concerned} { + retrieve data from InternalVarXorLikeArray[threadId] in local variables\; + \For{i=1 to n} { + compute a new PRNG as in Listing\ref{algo:seqCIprng}\; + store the new PRNG in NewNb[NumThreads*threadId+i]\; + } + store internal variables in InternalVarXorLikeArray[threadId]\; +} + +\caption{main kernel for the chaotic iterations based PRNG GPU version} +\label{algo:gpu_kernel} +\end{algorithm} + +According to the available memory in the GPU and the number of threads used +simultenaously, the number of random numbers that a thread can generate inside a +kernel is limited, i.e. the variable \texttt{n} in +algorithm~\ref{algo:gpu_kernel}. For example, if $100,000$ threads are used and +if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)} +then the memory required to store internals variables of xor-like +PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers} +and random number of our PRNG is equals to $100,000\times ((4+5+6)\times +2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb. \section{Experiments}