new

[prng_gpu.git] / prng_gpu.tex
diff --git a/prng_gpu.tex b/prng_gpu.tex

index 150e4342563c7ffba2d0d8086cd38847d623491c..1129a07e18b89155ea4519ec23c979dbe397df71 100644 (file)
--- a/prng_gpu.tex
+++ b/prng_gpu.tex
@@ -39,14 +39,14 @@ on GPU}
  \begin{document}
  
  \author{Jacques M. Bahi, Rapha\"{e}l Couturier, and Christophe
-Guyeux\thanks{Authors in alphabetic order}}
+Guyeux, Pierre-Cyrille Heam\thanks{Authors in alphabetic order}}
  
  \maketitle
  
  \begin{abstract}
-In this paper we present a new produce pseudo-random numbers generator (PRNG) on
+In this paper we present a new pseudo-random numbers generator (PRNG) on
  graphics processing units  (GPU). This PRNG is based  on chaotic iterations.  it
-is proven  to be chaotic  in the Devany's  formulation. We propose  an efficient
+is proven  to be chaotic  in the Devanay's  formulation. We propose  an efficient
  implementation  for  GPU which  succeeds  to  the  {\it BigCrush},  the  hardest
  batteries of test of TestU01.  Experimentations show that this PRNG can generate
  about 20 billions of random numbers  per second on Tesla C1060 and NVidia GTX280
@@ -59,7 +59,7 @@ cards.
  
  Random  numbers are  used in  many scientific  applications and  simulations. On
  finite  state machines,  as computers,  it is  not possible  to  generate random
-numbers but only pseudo-random numbers. In practice, a good pseudo-random number
+numbers but only pseudo-random numbers. In practice, a good pseudo-random numbers
  generator (PRNG) needs  to verify some features to be used  by scientists. It is
  important  to  be  able  to  generate  pseudo-random  numbers  efficiently,  the
  generation  needs to  be reproducible  and a  PRNG needs  to satisfy  many usual
@@ -83,9 +83,7 @@ also provide  an efficient  PRNG for  GPU respecting based  on IC.  Such devices
  allows us to generated almost 20 billions of random numbers per second.
  
  In order  to establish  that our  PRNGs are chaotic  according to  the Devaney's
-formulation, we  extend what we  have proposed in~\cite{guyeux10}.  Moreover, we
-define a  new distance to measure  the disorder in  the chaos and we  prove some
-interesting properties with this distance.
+formulation, we  extend what we  have proposed in~\cite{guyeux10}.
  
  The rest of this paper  is organised as follows. In Section~\ref{section:related
    works} we  review some GPU implementions  of PRNG.  Section~\ref{section:BASIC
@@ -95,10 +93,7 @@ is   studied.    Section~\ref{sec:efficient    prng}   presents   an   efficient
  implementation of  our chaotic PRNG  on a CPU.   Section~\ref{sec:efficient prng
    gpu}   describes   the  GPU   implementation   of   our   chaotic  PRNG.    In
  Section~\ref{sec:experiments}     some    experimentations     are    presented.
-Section~\ref{sec:de  la  relativité du  désordre}  describes  the relativity  of
-disorder.   In Section~\ref{sec: chaos  order topology}  the proof  that chaotic
-iterations   can  be   described   by   iterations  on   a   real  interval   is
-established. Finally, we give a conclusion and some perspectives.
+ Finally, we give a conclusion and some perspectives.
  
  
  
@@ -134,6 +129,12 @@ efficient one because it provides the fastest number of generated random numbers
  per joule. Concerning the GPU,  authors can generate betweend 11 and 16GSample/s
  with a GTX 280  GPU. The drawback of this work is  that those PRNGs only succeed
  the {\it Crush} test which is easier than the {\it Big Crush} test.
+
+Cuda  has developped  a  library for  the  generation of  random numbers  called
+Curand~\cite{curand11}.        Several       PRNGs        are       implemented:
+Xorwow~\cite{Marsaglia2003} and  some variants of Sobol. Some  tests report that
+the  fastest version provides  15GSample/s on  the new  Fermi C2050  card. Their
+PRNGs fail to succeed the whole tests of TestU01 on only one test.
  \newline
  \newline
  To the best of our knowledge no GPU implementation have been proven to have chaotic properties.
@@ -413,7 +414,7 @@ It takes as input: a function $f$;
  an integer $b$, ensuring that the number of executed iterations is at least $b$
  and at most $2b+1$; and an initial configuration $x^0$.
  It returns the new generated configuration $x$.  Internally, it embeds two
-\textit{XORshift}$(k)$ PRNGs \cite{Marsaglia2003} that returns integers
+\textit{XORshift}$(k)$ PRNGs~\cite{Marsaglia2003} that returns integers
  uniformly distributed
  into $\llbracket 1 ; k \rrbracket$.
  \textit{XORshift} is a category of very fast PRNGs designed by George Marsaglia,
@@ -860,6 +861,9 @@ and  random  number of  our  PRNG  is  equals to  $100,000\times  ((4+5+6)\times
  All the  tests performed  to pass the  BigCrush of TestU01  succeeded. Different
  number of threads, called \texttt{NumThreads} in our algorithm, have been tested
  upto $10$ millions.
+\newline
+\newline
+{\bf QUESTION : on laisse cette remarque, je suis mitigé !!!}
  
  \begin{remark}
  Algorithm~\ref{algo:gpu_kernel}  has  the  advantage to  manipulate  independent
@@ -896,7 +900,7 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;}
  
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadId is concerned} {
-  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory\;
+  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\;
    offset = threadIdx\%permutation\_size\;
    o1 = threadIdx-offset+tab1[offset]\;
    o2 = threadIdx-offset+tab2[offset]\;
@@ -939,6 +943,24 @@ Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general
  chaotic iterations presented previously, and for this reason, it satisfies the 
  Devaney's formulation of a chaotic behavior.
  
+\section{A cryptographically secure prng for GPU}
+
+It is  possible to build a  cryptographically secure prng based  on the previous
+algorithm (algorithm~\ref{algo:gpu_kernel2}).   It simply consists  in replacing
+the  {\it  xor-like} algorithm  by  another  cryptographically  secure prng.  In
+practice, we suggest  to use the BBS algorithm~\cite{BBS}  which takes the form:
+$$x_{n+1}=x_n^2~ mod~  M$$ where M  is the product  of the prime  numbers. Those
+prime numbers  need to be congruent  to 3 modulus  4. In practice, this  PRNG is
+known to  be slow and  not efficient for  the generation of random  numbers. For
+current  GPU   cards,  the  modulus   operation  is  the  most   time  consuming
+operation. So in  order to obtain quite reasonable  performances, it is required
+to use only modulus on 32  bits integer numbers. Consequently $x_n^2$ need to be
+less than  $2^{32}$ and the  number $M$  need to be  less than $2^{16}$.   So in
+pratice we can choose prime number around 256 that are congruent to 3 modulus 4.
+With 32 bits numbers,  only the 4 least significant bits of  $x_n$ can be chosen
+(the   maximum    number   of   undistinguishing   is   less    or   equals   to
+$log_2(log_2(x_n))$).
+
  \section{Experiments}
  \label{sec:experiments}
  
@@ -951,8 +973,8 @@ cards have 240 cores.
  In Figure~\ref{fig:time_gpu}  we compare the number of  random numbers generated
  per second. The xor-like prng  is a xor64 described in~\cite{Marsaglia2003}.  In
  order to obtain the optimal performance  we remove the storage of random numbers
-in the GPU memory. This step is time consumming and slows down the random number
-generation.  Moreover, if you are interested by applications that consome random
+in the GPU memory. This step is time consuming and slows down the random number
+generation.  Moreover, if you are interested by applications that consume random
  numbers  directly   when  they  are  generated,  their   storage  is  completely
  useless. In this  figure we can see  that when the number of  threads is greater
  than approximately 30,000 upto 5 millions the number of random numbers generated
@@ -978,10 +1000,10 @@ In  comparison,   Listing~\ref{algo:seqCIprng}  allows  us   to  generate  about
  
  
  
-\section{Cryptanalysis of the Proposed PRNG}
+%% \section{Cryptanalysis of the Proposed PRNG}
  
  
-Mettre ici la preuve de PCH
+%% Mettre ici la preuve de PCH
  
  %\section{The relativity of disorder}
  %\label{sec:de la relativité du désordre}
@@ -1513,8 +1535,132 @@ Mettre ici la preuve de PCH
  
  
  
+\section{Security Analysis}
+
+
+
+
+In this section the concatenation of two strings $u$ and $v$ is classically
+denoted by $uv$.
+In a cryptographic context, a pseudo-random generator is a deterministic
+algorithm $G$ transforming strings  into strings and such that, for any
+seed $w$ of length $N$, $G(w)$ (the output of $G$ on the input $w$) has size
+$\ell_G(N)$ with $\ell_G(N)>N$.
+The notion of {\it secure} PRNGs can now be defined as follows. 
+
+\begin{definition}
+A cryptographic PRNG $G$ is secure if for any probabilistic polynomial time
+algorithm $D$, for any positive polynomial $p$, and for all sufficiently
+large $k$'s,
+$$| \mathrm{Pr}[D(G(U_k))=1]-Pr[D(U_{\ell_G(k)}=1]|< \frac{1}{p(N)},$$
+where $U_r$ is the uniform distribution over $\{0,1\}^r$ and the
+probabilities are taken over $U_N$, $U_{\ell_G(N)}$ as well as over the
+internal coin tosses of $D$. 
+\end{definition}
+
+Intuitively, it means that there is no polynomial time algorithm that can
+distinguish a perfect uniform random generator from $G$ with a non
+negligible probability. The interested reader is referred
+to~\cite[chapter~3]{Goldreich} for more information. Note that it is
+quite easily possible to change the function $\ell$ into any polynomial
+function $\ell^\prime$ satisfying $\ell^\prime(N)>N)$~\cite[Chapter 3.3]{Goldreich}.
+
+The generation schema developed in (\ref{equation Oplus}) is based on a
+pseudo-random generator. Let $H$ be a cryptographic PRNG. We may assume,
+without loss of generality, that for any string $S_0$ of size $N$, the size
+of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. 
+Let $S_1,\ldots,S_k$ be the 
+strings of length $N$ such that $H(S_0)=S_1 \ldots S_k$ ($H(S_0)$ is the concatenation of
+the $S_i$'s). The cryptographic PRNG $X$ defined in (\ref{equation Oplus})
+is the algorithm mapping any string of length $2N$ $x_0S_0$ into the string
+$(x_0\oplus S_0 \oplus S_1)(x_0\oplus S_0 \oplus S_1\oplus S_2)\ldots
+(x_o\bigoplus_{i=0}^{i=k}S_i)$. Particularly one has $\ell_{X}(2N)=kN=\ell_H(N)$. 
+We claim now that if this PRNG is secure,
+then the new one is secure too.
+
+\begin{proposition}
+If $H$ is a secure cryptographic PRNG, then $X$ is a secure cryptographic
+PRNG too.
+\end{proposition}
+
+\begin{proof}
+The proposition is proved by contraposition. Assume that $X$ is not
+secure. By Definition, there exists a polynomial time probabilistic
+algorithm $D$, a positive polynomial $p$, such that for all $k_0$ there exists
+$N\geq \frac{k_0}{2}$ satisfying 
+$$| \mathrm{Pr}[D(X(U_{2N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)}.$$
+We describe a new probabilistic algorithm $D^\prime$ on an input $w$ of size
+$kN$:
+\begin{enumerate}
+\item Decompose $w$ into $w=w_1\ldots w_{k}$, where each $w_i$ has size $N$.
+\item Pick a string $y$ of size $N$ uniformly at random.
+\item Compute $z=(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y
+  \bigoplus_{i=1}^{i=k} w_i).$
+\item Return $D(z)$.
+\end{enumerate}
+
+
+Consider  for each $y\in \mathbb{B}^{kN}$ the function $\varphi_{y}$
+from $\mathbb{B}^{kN}$ into $\mathbb{B}^{kN}$ mapping $w=w_1\ldots w_k$
+(each $w_i$ has length $N$) to 
+$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y
+  \bigoplus_{i=1}^{i=k_1} w_i).$ By construction, one has for every $w$,
+\begin{equation}\label{PCH-1}
+D^\prime(w)=D(\varphi_y(w)),
+\end{equation}
+where $y$ is randomly generated. 
+Moreover, for each $y$, $\varphi_{y}$ is injective: if 
+$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y\bigoplus_{i=1}^{i=k_1}
+w_i)=(y\oplus w_1^\prime)(y\oplus w_1^\prime\oplus w_2^\prime)\ldots
+(y\bigoplus_{i=1}^{i=k} w_i^\prime)$, then for every $1\leq j\leq k$,
+$y\bigoplus_{i=1}^{i=j} w_i^\prime=y\bigoplus_{i=1}^{i=j} w_i$. It follows,
+by a direct induction, that $w_i=w_i^\prime$. Furthermore, since $\mathbb{B}^{kN}$
+is finite, each $\varphi_y$ is bijective. Therefore, and using (\ref{PCH-1}),
+one has
+\begin{equation}\label{PCH-2}
+\mathrm{Pr}[D^\prime(U_{kN})=1]=\mathrm{Pr}[D(\varphi_y(U_{kN}))=1]=\mathrm{Pr}[D(U_{kN})=1].
+\end{equation}
+
+Now, using (\ref{PCH-1}) again, one has  for every $x$,
+\begin{equation}\label{PCH-3}
+D^\prime(H(x))=D(\varphi_y(H(x))),
+\end{equation}
+where $y$ is randomly generated. By construction, $\varphi_y(H(x))=X(yx)$,
+thus
+\begin{equation}\label{PCH-3}
+D^\prime(H(x))=D(yx),
+\end{equation}
+where $y$ is randomly generated. 
+It follows that 
+
+\begin{equation}\label{PCH-4}
+\mathrm{Pr}[D^\prime(H(U_{N}))=1]=\mathrm{Pr}[D(U_{2N})=1].
+\end{equation}
+ From (\ref{PCH-2}) and (\ref{PCH-4}), one can deduce that
+there exist a polynomial time probabilistic
+algorithm $D^\prime$, a positive polynomial $p$, such that for all $k_0$ there exists
+$N\geq \frac{k_0}{2}$ satisfying 
+$$| \mathrm{Pr}[D(H(U_{N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)},$$
+proving that $H$ is not secure, a contradiction. 
+\end{proof}
+
+
+
+
  
  \section{Conclusion}
+
+
+In  this  paper  we have  presented  a  new  class  of  PRNGs based  on  chaotic
+iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. 
+
+An efficient implementation on GPU allows us to generate a huge number of pseudo
+random numbers  per second  (about 20Gsample/s). Our  PRNGs succeed to  pass the
+hardest batteries of test (TestU01).
+
+In future  work we plan  to extend our  work in order to  have cryptographically
+secure PRNGs because in some situations this property may be important.
+
  \bibliographystyle{plain}
  \bibliography{mabase}
  \end{document}