From: couturie <couturie@carcariass.(none)>
Date: Sat, 26 Nov 2011 20:46:01 +0000 (+0100)
Subject: new
X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/commitdiff_plain/f8b9c6eb41cd165ad2c65d1a4571a41fdcd26ff5?hp=--cc

new
---

f8b9c6eb41cd165ad2c65d1a4571a41fdcd26ff5
diff --git a/mabase.bib b/mabase.bib
index 8162128..16ad9f6 100644
--- a/mabase.bib
+++ b/mabase.bib
@@ -4278,3 +4278,13 @@ booktitle =	"Proceedings of the {ACM}/{SIGDA} 17th International
  Note = {Version 4.0},
  }
 
+
+
+@Article{BBS,
+	author = 			 {Lenore Blum and Manuel Blum and Michael Shub},
+	title = 			 {A Simple Unpredictable Pseudo-Random Number Generator},
+	journal = 		 {SIAM Journal on Computing},
+	year = 				 {1986},
+	volume = 	 {15},
+	pages = 		 {364--383},
+}
diff --git a/prng_gpu.tex b/prng_gpu.tex
index d2e5e05..1129a07 100644
--- a/prng_gpu.tex
+++ b/prng_gpu.tex
@@ -900,7 +900,7 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;}
 
 \KwOut{NewNb: array containing random numbers in global memory}
 \If{threadId is concerned} {
-  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory\;
+  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\;
   offset = threadIdx\%permutation\_size\;
   o1 = threadIdx-offset+tab1[offset]\;
   o2 = threadIdx-offset+tab2[offset]\;
@@ -943,6 +943,24 @@ Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general
 chaotic iterations presented previously, and for this reason, it satisfies the 
 Devaney's formulation of a chaotic behavior.
 
+\section{A cryptographically secure prng for GPU}
+
+It is  possible to build a  cryptographically secure prng based  on the previous
+algorithm (algorithm~\ref{algo:gpu_kernel2}).   It simply consists  in replacing
+the  {\it  xor-like} algorithm  by  another  cryptographically  secure prng.  In
+practice, we suggest  to use the BBS algorithm~\cite{BBS}  which takes the form:
+$$x_{n+1}=x_n^2~ mod~  M$$ where M  is the product  of the prime  numbers. Those
+prime numbers  need to be congruent  to 3 modulus  4. In practice, this  PRNG is
+known to  be slow and  not efficient for  the generation of random  numbers. For
+current  GPU   cards,  the  modulus   operation  is  the  most   time  consuming
+operation. So in  order to obtain quite reasonable  performances, it is required
+to use only modulus on 32  bits integer numbers. Consequently $x_n^2$ need to be
+less than  $2^{32}$ and the  number $M$  need to be  less than $2^{16}$.   So in
+pratice we can choose prime number around 256 that are congruent to 3 modulus 4.
+With 32 bits numbers,  only the 4 least significant bits of  $x_n$ can be chosen
+(the   maximum    number   of   undistinguishing   is   less    or   equals   to
+$log_2(log_2(x_n))$).
+
 \section{Experiments}
 \label{sec:experiments}
 
@@ -955,8 +973,8 @@ cards have 240 cores.
 In Figure~\ref{fig:time_gpu}  we compare the number of  random numbers generated
 per second. The xor-like prng  is a xor64 described in~\cite{Marsaglia2003}.  In
 order to obtain the optimal performance  we remove the storage of random numbers
-in the GPU memory. This step is time consumming and slows down the random number
-generation.  Moreover, if you are interested by applications that consome random
+in the GPU memory. This step is time consuming and slows down the random number
+generation.  Moreover, if you are interested by applications that consume random
 numbers  directly   when  they  are  generated,  their   storage  is  completely
 useless. In this  figure we can see  that when the number of  threads is greater
 than approximately 30,000 upto 5 millions the number of random numbers generated