From: couturie Date: Sat, 26 Nov 2011 20:46:01 +0000 (+0100) Subject: new X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/commitdiff_plain/f8b9c6eb41cd165ad2c65d1a4571a41fdcd26ff5?ds=sidebyside;hp=--cc new --- f8b9c6eb41cd165ad2c65d1a4571a41fdcd26ff5 diff --git a/mabase.bib b/mabase.bib index 8162128..16ad9f6 100644 --- a/mabase.bib +++ b/mabase.bib @@ -4278,3 +4278,13 @@ booktitle = "Proceedings of the {ACM}/{SIGDA} 17th International Note = {Version 4.0}, } + + +@Article{BBS, + author = {Lenore Blum and Manuel Blum and Michael Shub}, + title = {A Simple Unpredictable Pseudo-Random Number Generator}, + journal = {SIAM Journal on Computing}, + year = {1986}, + volume = {15}, + pages = {364--383}, +} diff --git a/prng_gpu.tex b/prng_gpu.tex index d2e5e05..1129a07 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -900,7 +900,7 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;} \KwOut{NewNb: array containing random numbers in global memory} \If{threadId is concerned} { - retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory\; + retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\; offset = threadIdx\%permutation\_size\; o1 = threadIdx-offset+tab1[offset]\; o2 = threadIdx-offset+tab2[offset]\; @@ -943,6 +943,24 @@ Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general chaotic iterations presented previously, and for this reason, it satisfies the Devaney's formulation of a chaotic behavior. +\section{A cryptographically secure prng for GPU} + +It is possible to build a cryptographically secure prng based on the previous +algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing +the {\it xor-like} algorithm by another cryptographically secure prng. In +practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form: +$$x_{n+1}=x_n^2~ mod~ M$$ where M is the product of the prime numbers. Those +prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is +known to be slow and not efficient for the generation of random numbers. For +current GPU cards, the modulus operation is the most time consuming +operation. So in order to obtain quite reasonable performances, it is required +to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be +less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in +pratice we can choose prime number around 256 that are congruent to 3 modulus 4. +With 32 bits numbers, only the 4 least significant bits of $x_n$ can be chosen +(the maximum number of undistinguishing is less or equals to +$log_2(log_2(x_n))$). + \section{Experiments} \label{sec:experiments} @@ -955,8 +973,8 @@ cards have 240 cores. In Figure~\ref{fig:time_gpu} we compare the number of random numbers generated per second. The xor-like prng is a xor64 described in~\cite{Marsaglia2003}. In order to obtain the optimal performance we remove the storage of random numbers -in the GPU memory. This step is time consumming and slows down the random number -generation. Moreover, if you are interested by applications that consome random +in the GPU memory. This step is time consuming and slows down the random number +generation. Moreover, if you are interested by applications that consume random numbers directly when they are generated, their storage is completely useless. In this figure we can see that when the number of threads is greater than approximately 30,000 upto 5 millions the number of random numbers generated