Relecture

[prng_gpu.git] / prng_gpu.tex
diff --git a/prng_gpu.tex b/prng_gpu.tex

index 84efafa5443f23f4504a723320ec63e7c2f36b49..dc3d3cb8c84b23b9e2ec02267ca2598466104977 100644 (file)
--- a/prng_gpu.tex
+++ b/prng_gpu.tex
@@ -433,10 +433,10 @@ during implementations (due to the discrete nature of $f$). It is as if
  $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ;  \mathsf{N}
  \rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG).
  
  $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ;  \mathsf{N}
  \rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG).
  
-\section{Application to pseudorandomness}
+\section{Application to Pseudorandomness}
  \label{sec:pseudorandom}
  
  \label{sec:pseudorandom}
  
-\subsection{A First pseudorandom Number Generator}
+\subsection{A First Pseudorandom Number Generator}
  
  We have proposed in~\cite{bgw09:ip} a new family of generators that receives 
  two PRNGs as inputs. These two generators are mixed with chaotic iterations, 
  
  We have proposed in~\cite{bgw09:ip} a new family of generators that receives 
  two PRNGs as inputs. These two generators are mixed with chaotic iterations, 
@@ -807,17 +807,26 @@ have $d((S,E),(\tilde S,E))<\epsilon$.
  \section{Efficient PRNG based on Chaotic Iterations}
  \label{sec:efficient prng}
  
  \section{Efficient PRNG based on Chaotic Iterations}
  \label{sec:efficient prng}
  
-In  order to  implement efficiently  a PRNG  based on  chaotic iterations  it is
-possible to improve  previous works [ref]. One solution  consists in considering
-that the  strategy used contains all the  bits for which the  negation is
-achieved out. Then in order to apply  the negation on these bits we can simply
-apply the  xor operator between  the current number  and the strategy. In
-order to obtain the strategy we also use a classical PRNG.
+Based on the proof presented in the previous section, it is now possible to 
+improve the speed of the generator formerly presented in~\cite{bgw09:ip,guyeux10}. 
+The first idea is to consider
+that the provided strategy is a pseudorandom Boolean vector obtained by a
+given PRNG.
+An iteration of the system is simply the bitwise exclusive or between
+the last computed state and the current strategy.
+Topological properties of disorder exhibited by chaotic 
+iterations can be inherited by the inputted generator, hoping by doing so to 
+obtain some statistical improvements while preserving speed.
  
  
-Here  is an  example with  16-bits numbers  showing how  the bitwise  operations
+
+Let us give an example using 16-bits numbers, to clearly understand how the bitwise xor operations
  are
  are
-applied.  Suppose  that $x$ and the  strategy $S^i$ are defined  in binary mode.
-Then the following table shows the result of $x$ xor $S^i$.
+done.  
+Suppose  that $x$ and the  strategy $S^i$ are given as
+binary vectors.
+Table~\ref{TableExemple} shows the result of $x \oplus S^i$.
+
+\begin{table}
  $$
  \begin{array}{|cc|cccccccccccccccc|}
  \hline
  $$
  \begin{array}{|cc|cccccccccccccccc|}
  \hline
@@ -831,13 +840,13 @@ x \oplus S^i&=&1&1&0&1&1&1&0&0&0&1&1&1&0&1&0&1\\
  \hline
   \end{array}
  $$
  \hline
   \end{array}
  $$
+\caption{Example of an arbitrary round of the proposed generator}
+\label{TableExemple}
+\end{table}
  
  
  
  
  
  
-
-
-\lstset{language=C,caption={C code of the sequential chaotic iterations based
-PRNG},label=algo:seqCIprng}
+\lstset{language=C,caption={C code of the sequential PRNG based on chaotic iterations},label=algo:seqCIprng}
  \begin{lstlisting}
  unsigned int CIprng() {
    static unsigned int x = 123123123;
  \begin{lstlisting}
  unsigned int CIprng() {
    static unsigned int x = 123123123;
@@ -858,52 +867,60 @@ unsigned int CIprng() {
  
  
  
  
  
  
-In listing~\ref{algo:seqCIprng}  a sequential version of  our chaotic iterations
-based PRNG is  presented.  The xor operator is  represented by \textasciicircum.
-This  function uses  three classical  64-bits PRNG:  the  \texttt{xorshift}, the
-\texttt{xor128}  and  the  \texttt{xorwow}.   In  the following,  we  call  them
-xor-like PRNGSs.   These three PRNGs are  presented in~\cite{Marsaglia2003}.  As
-each xor-like PRNG  used works with 64-bits and as our  PRNG works with 32-bits,
-the use of \texttt{(unsigned int)} selects the 32 least significant bits whereas
-\texttt{(unsigned int)(t3$>>$32)}  selects the 32 most significants  bits of the
-variable \texttt{t}.   So to produce a  random number realizes  6 xor operations
-with 6 32-bits  numbers produced by 3 64-bits PRNG.   This version successes the
-BigCrush of the TestU01 battery~\cite{LEcuyerS07}.
+In Listing~\ref{algo:seqCIprng}  a sequential version of  the proposed PRNG based on chaotic iterations
+ is  presented.  The xor operator is  represented by \textasciicircum.
+This  function uses  three classical  64-bits PRNGs, namely the  \texttt{xorshift}, the
+\texttt{xor128},  and  the  \texttt{xorwow}~\cite{Marsaglia2003}.   In  the following,  we  call  them
+``xor-like PRNGs''. 
+As
+each xor-like PRNG  uses 64-bits whereas our proposed generator works with 32-bits,
+we use the command \texttt{(unsigned int)}, that selects the 32 least significant bits of a given integer, and the code
+\texttt{(unsigned int)(t3$>>$32)}  in order to obtain the 32 most significant  bits of \texttt{t}.   
  
  
-\section{Efficient PRNGs based on chaotic iterations on GPU}
+So producing a  pseudorandom number needs  6 xor operations
+with 6 32-bits  numbers that are provided by 3 64-bits PRNGs.   This version successfully passes the
+stringent BigCrush battery of tests~\cite{LEcuyerS07}.
+
+\section{Efficient PRNGs based on Chaotic Iterations on GPU}
  \label{sec:efficient prng gpu}
  
  \label{sec:efficient prng gpu}
  
-In  order to benefit  from computing  power of  GPU, a  program needs  to define
-independent blocks of threads which  can be computed simultaneously. In general,
-the larger the number of threads is,  the more local memory is used and the less
-branching  instructions are  used (if,  while, ...),  the better  performance is
-obtained  on  GPU.  So  with  algorithm  \ref{algo:seqCIprng}  presented in  the
-previous section, it is possible to  build a similar program which computes PRNG
-on   GPU.  In  the   CUDA~\cite{Nvid10}  environment,   threads  have   a  local
-identificator, called \texttt{ThreadIdx} relative to the block containing them.
-
-
-\subsection{Naive version for GPU}
-
-From the CPU version, it is possible  to obtain a quite similar version for GPU.
-The principe consists in assigning the computation of a PRNG as in sequential to
-each thread  of the  GPU.  Of course,  it is  essential that the  three xor-like
-PRNGs  used for  our computation  have different  parameters. So  we  chose them
-randomly with  another PRNG. As the  initialisation is performed by  the CPU, we
-have  chosen  to  use  the  ISAAC  PRNG~\cite{Jenkins96}  to  initalize  all  the
-parameters for  the GPU version  of our PRNG.   The implementation of  the three
-xor-like  PRNGs  is  straightforward  as  soon as  their  parameters  have  been
-allocated in  the GPU memory.  Each xor-like PRNGs  used works with  an internal
-number  $x$  which keeps  the  last  generated  random numbers.  Other  internal
-variables  are   also  used   by  the  xor-like   PRNGs.  More   precisely,  the
-implementation of the  xor128, the xorshift and the  xorwow respectively require
-4, 5 and 6 unsigned long as internal variables.
+In  order to take benefits  from the computing  power of  GPU, a  program needs  to have
+independent blocks of threads that can be computed simultaneously. In general,
+the larger the number of threads is,  the more local memory is used, and the less
+branching  instructions are  used (if,  while, ...),  the better the performances on GPU is.  
+Obviously, having these requirements in mind, it is possible to  build a program similar to 
+the one presented in Algorithm  \ref{algo:seqCIprng}, which computes pseudorandom numbers
+on   GPU.  
+To do so, we must firstly recall that in
+ the   CUDA~\cite{Nvid10}  environment,   threads  have   a  local
+identifier called \texttt{ThreadIdx}, which is relative to the block containing them.
+
+
+\subsection{Naive Version for GPU}
+
+ 
+It is possible to deduce from the CPU version a quite similar version adapted to GPU.
+The simple principle consists to make each thread of the GPU computing the CPU version of our PRNG.  
+Of course,  the  three xor-like
+PRNGs  used in these computations must have different  parameters. 
+In a given thread, these lasts are
+randomly picked from another PRNGs. 
+The  initialization stage is performed by  the CPU.
+To do it, the  ISAAC  PRNG~\cite{Jenkins96} is used to  set  all  the
+parameters embedded into each thread.   
+
+The implementation of  the three
+xor-like  PRNGs  is  straightforward  when  their  parameters  have  been
+allocated in  the GPU memory.  Each xor-like  works with  an internal
+number  $x$  that saves  the  last  generated  pseudorandom number. Additionally,  the
+implementation of the  xor128, the xorshift, and the  xorwow respectively require
+4, 5, and 6 unsigned long as internal variables.
  
  \begin{algorithm}
  
  \KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like
  PRNGs in global memory\;
  
  \begin{algorithm}
  
  \KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like
  PRNGs in global memory\;
-NumThreads: Number of threads\;}
+NumThreads: number of threads\;}
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadIdx is concerned by the computation} {
    retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\;
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadIdx is concerned by the computation} {
    retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\;
@@ -914,37 +931,34 @@ NumThreads: Number of threads\;}
    store internal variables in InternalVarXorLikeArray[threadIdx]\;
  }
  
    store internal variables in InternalVarXorLikeArray[threadIdx]\;
  }
  
-\caption{main kernel for the chaotic iterations based PRNG GPU naive version}
+\caption{Main kernel of the GPU ``naive'' version of the PRNG based on chaotic iterations}
  \label{algo:gpu_kernel}
  \end{algorithm}
  
  \label{algo:gpu_kernel}
  \end{algorithm}
  
-Algorithm~\ref{algo:gpu_kernel}  presents a naive  implementation of  PRNG using
-GPU.  According  to the available  memory in the  GPU and the number  of threads
+Algorithm~\ref{algo:gpu_kernel}  presents a naive  implementation of the proposed  PRNG on
+GPU.  Due to the available  memory in the  GPU and the number  of threads
  used simultenaously,  the number  of random numbers  that a thread  can generate
  used simultenaously,  the number  of random numbers  that a thread  can generate
-inside   a    kernel   is   limited,   i.e.    the    variable   \texttt{n}   in
-algorithm~\ref{algo:gpu_kernel}. For example, if  $100,000$ threads are used and
-if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}
-then   the  memory   required   to  store   internals   variables  of   xor-like
+inside   a    kernel   is   limited  (\emph{i.e.},    the    variable   \texttt{n}   in
+algorithm~\ref{algo:gpu_kernel}). For instance, if  $100,000$ threads are used and
+if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)},
+then   the  memory   required   to  store all of the  internals   variables  of both the  xor-like
  PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
  PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
-and  random  number of  our  PRNG  is  equals to  $100,000\times  ((4+5+6)\times
-2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb.
+and  the pseudorandom  numbers generated by  our  PRNG,  is  equal to  $100,000\times  ((4+5+6)\times
+2+(1+100))=1,310,000$ 32-bits numbers, that is, approximately $52$Mb.
  
  
-All the  tests performed  to pass the  BigCrush of TestU01  succeeded. Different
-number of threads, called \texttt{NumThreads} in our algorithm, have been tested
-upto $10$ millions.
-\newline
-\newline
-{\bf QUESTION : on laisse cette remarque, je suis mitigé !!!}
+This generator is able to pass the whole BigCrush battery of tests, for all
+the versions that have been tested depending on their number of threads 
+(called \texttt{NumThreads} in our algorithm, tested until $10$ millions).
  
  \begin{remark}
  
  \begin{remark}
-Algorithm~\ref{algo:gpu_kernel}  has  the  advantage to  manipulate  independent
-PRNGs, so this version is easily usable on a cluster of computer. The only thing
-to ensure is to use a single ISAAC PRNG. For this, a simple solution consists in
-using a master node for the initialization which computes the initial parameters
+The proposed algorithm has  the  advantage to  manipulate  independent
+PRNGs, so this version is easily adaptable on a cluster of computers too. The only thing
+to ensure is to use a single ISAAC PRNG. To achieve this requirement, a simple solution consists in
+using a master node for the initialization. This master node computes the initial parameters
  for all the differents nodes involves in the computation.
  \end{remark}
  
  for all the differents nodes involves in the computation.
  \end{remark}
  
-\subsection{Improved version for GPU}
+\subsection{Improved Version for GPU}
  
  As GPU cards using CUDA have shared memory between threads of the same block, it
  is possible  to use this  feature in order  to simplify the  previous algorithm,
  
  As GPU cards using CUDA have shared memory between threads of the same block, it
  is possible  to use this  feature in order  to simplify the  previous algorithm,
@@ -1717,7 +1731,7 @@ proving that $H$ is not secure, a contradiction.
  
  
  
  
  
  
-\section{A cryptographically secure prng for GPU}
+\section{A Cryptographically Secure PRNG for GPU}
  \label{sec:CSGPU}
  It is  possible to build a  cryptographically secure prng based  on the previous
  algorithm (algorithm~\ref{algo:gpu_kernel2}).   It simply consists  in replacing
  \label{sec:CSGPU}
  It is  possible to build a  cryptographically secure prng based  on the previous
  algorithm (algorithm~\ref{algo:gpu_kernel2}).   It simply consists  in replacing