bbs

[prng_gpu.git] / prng_gpu.tex
diff --git a/prng_gpu.tex b/prng_gpu.tex

index d2e5e0547d736054ce547ef49dd39e7b5f0c1a3c..6e063fd1f07d19932c823816b8961c14b2868580 100644 (file)
--- a/prng_gpu.tex
+++ b/prng_gpu.tex
@@ -900,7 +900,7 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;}
  
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadId is concerned} {
  
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadId is concerned} {
-  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory\;
+  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\;
    offset = threadIdx\%permutation\_size\;
    o1 = threadIdx-offset+tab1[offset]\;
    o2 = threadIdx-offset+tab2[offset]\;
    offset = threadIdx\%permutation\_size\;
    o1 = threadIdx-offset+tab1[offset]\;
    o2 = threadIdx-offset+tab2[offset]\;
@@ -943,6 +943,27 @@ Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general
  chaotic iterations presented previously, and for this reason, it satisfies the 
  Devaney's formulation of a chaotic behavior.
  
  chaotic iterations presented previously, and for this reason, it satisfies the 
  Devaney's formulation of a chaotic behavior.
  
+\section{A cryptographically secure prng for GPU}
+
+It is  possible to build a  cryptographically secure prng based  on the previous
+algorithm (algorithm~\ref{algo:gpu_kernel2}).   It simply consists  in replacing
+the  {\it  xor-like} algorithm  by  another  cryptographically  secure prng.  In
+practice, we suggest  to use the BBS algorithm~\cite{BBS}  which takes the form:
+$$x_{n+1}=x_n^2~ mod~ M$$  where $M$ is the product of  two prime numbers. Those
+prime numbers  need to be congruent  to 3 modulus  4. In practice, this  PRNG is
+known to  be slow and  not efficient for  the generation of random  numbers. For
+current  GPU   cards,  the  modulus   operation  is  the  most   time  consuming
+operation. So in  order to obtain quite reasonable  performances, it is required
+to use only modulus on 32  bits integer numbers. Consequently $x_n^2$ need to be
+less than  $2^{32}$ and the  number $M$  need to be  less than $2^{16}$.   So in
+pratice we can  choose prime numbers around 256 that are  congruent to 3 modulus
+4.  With  32 bits numbers,  only the  4 least significant  bits of $x_n$  can be
+chosen  (the   maximum  number  of   undistinguishing  is  less  or   equals  to
+$log_2(log_2(x_n))$). So  to generate a 32 bits  number, we need to  use 8 times
+the BBS algorithm, with different combinations of $M$ is required.
+
+Currently this PRNG does not succeed to pass all the tests of TestU01.
+
  \section{Experiments}
  \label{sec:experiments}
  
  \section{Experiments}
  \label{sec:experiments}
  
@@ -952,12 +973,14 @@ Intel  Xeon E5530 cadenced  at 2.40  GHz for  our experiments  and we  have used
  another one  equipped with  a less performant  CPU and  a GeForce GTX  280. Both
  cards have 240 cores.
  
  another one  equipped with  a less performant  CPU and  a GeForce GTX  280. Both
  cards have 240 cores.
  
-In Figure~\ref{fig:time_gpu}  we compare the number of  random numbers generated
-per second. The xor-like prng  is a xor64 described in~\cite{Marsaglia2003}.  In
-order to obtain the optimal performance  we remove the storage of random numbers
-in the GPU memory. This step is time consumming and slows down the random number
-generation.  Moreover, if you are interested by applications that consome random
-numbers  directly   when  they  are  generated,  their   storage  is  completely
+In  Figure~\ref{fig:time_xorlike_gpu} we  compare the  number of  random numbers
+generated per second with the xor-like based PRNG. In this figure, the optimized
+version use the {\it xor64} described in~\cite{Marsaglia2003}. The naive version
+use  the three  xor-like  PRNGs described  in Listing~\ref{algo:seqCIprng}.   In
+order to obtain the optimal performance we removed the storage of random numbers
+in the GPU memory. This step is time consuming and slows down the random numbers
+generation.  Moreover, if one is  interested by applications that consume random
+numbers  directly   when  they  are  generated,  their   storage  are  completely
  useless. In this  figure we can see  that when the number of  threads is greater
  than approximately 30,000 upto 5 millions the number of random numbers generated
  per second  is almost constant.  With the  naive version, it is  between 2.5 and
  useless. In this  figure we can see  that when the number of  threads is greater
  than approximately 30,000 upto 5 millions the number of random numbers generated
  per second  is almost constant.  With the  naive version, it is  between 2.5 and
@@ -968,10 +991,10 @@ should be of better quality.
  
  \begin{figure}[htbp]
  \begin{center}
  
  \begin{figure}[htbp]
  \begin{center}
-  \includegraphics[scale=.7]{curve_time_gpu.pdf}
+  \includegraphics[scale=.7]{curve_time_xorlike_gpu.pdf}
  \end{center}
  \end{center}
-\caption{Number of random numbers generated per second}
-\label{fig:time_gpu}
+\caption{Number of random numbers generated per second with the xorlike based PRNG}
+\label{fig:time_xorlike_gpu}
  \end{figure}
  
  
  \end{figure}
  
  
@@ -979,7 +1002,22 @@ In  comparison,   Listing~\ref{algo:seqCIprng}  allows  us   to  generate  about
  138MSample/s with only one core of the Xeon E5530.
  
  
  138MSample/s with only one core of the Xeon E5530.
  
  
+In Figure~\ref{fig:time_bbs_gpu}  we highlight the performance  of the optimized
+BBS based  PRNG on GPU. Performances are  less important. On the  Tesla C1060 we
+obtain approximately 1.8GSample/s and on the GTX 280 about 1.6GSample/s.
+
+\begin{figure}[htbp]
+\begin{center}
+  \includegraphics[scale=.7]{curve_time_bbs_gpu.pdf}
+\end{center}
+\caption{Number of random numbers generated per second with the BBS based PRNG}
+\label{fig:time_bbs_gpu}
+\end{figure}
  
  
+Both  these  experimentations allows  us  to conclude  that  it  is possible  to
+generate a  huge number of pseudo-random  numbers with the  xor-like version and
+about tens  times less with the BBS  based version. The former  version has only
+chaotic properties whereas the latter also has cryptographically properties.
  
  
  %% \section{Cryptanalysis of the Proposed PRNG}
  
  
  %% \section{Cryptanalysis of the Proposed PRNG}
@@ -1634,15 +1672,19 @@ proving that $H$ is not secure, a contradiction.
  
  
  In  this  paper  we have  presented  a  new  class  of  PRNGs based  on  chaotic
  
  
  In  this  paper  we have  presented  a  new  class  of  PRNGs based  on  chaotic
-iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. 
+iterations. We have proven that these PRNGs are chaotic in the sense of Devenay.
+We also propose a PRNG cryptographically secure and its implementation on GPU.
+
+An  efficient implementation  on  GPU based  on  a xor-like  PRNG  allows us  to
+generate   a  huge   number   of  pseudo-random   numbers   per  second   (about
+20Gsample/s). This PRNG succeeds to pass the hardest batteries of TestU01.
+
+In future  work we plan to  extend this work  for parallel PRNG for  clusters or
+grid computing. We also plan to improve  the BBS version in order to succeed all
+the tests of TestU01.
  
  
-An efficient implementation on GPU allows us to generate a huge number of pseudo
-random numbers  per second  (about 20Gsample/s). Our  PRNGs succeed to  pass the
-hardest batteries of test (TestU01).
  
  
-In future  work we plan  to extend our  work in order to  have cryptographically
-secure PRNGs because in some situations this property may be important.
  
  
-\bibliographystyle{plain}
+\bibliographystyle{plain} 
  \bibliography{mabase}
  \end{document}
  \bibliography{mabase}
  \end{document}