Ménage

[prng_gpu.git] / prng_gpu.tex
diff --git a/prng_gpu.tex b/prng_gpu.tex

index ccb0c95be7e36e53d03eef96a65fba9d1c184bfd..1c7c9fea5064b7e4686f67312d5ff2056e3b1e95 100644 (file)
--- a/prng_gpu.tex
+++ b/prng_gpu.tex
@@ -34,29 +34,183 @@
  
  \newcommand{\alert}[1]{\begin{color}{blue}\textit{#1}\end{color}}
  
-\title{Efficient Generation of Pseudo-Random Bumbers based on Chaotic Iterations
-on GPU}
+\title{Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU}
  \begin{document}
  
-\author{Jacques M. Bahi, Rapha\"{e}l Couturier, and Christophe
-Guyeux\thanks{Authors in alphabetic order}}
-
+\author{Jacques M. Bahi, Rapha\"{e}l Couturier,  Christophe
+Guyeux, and Pierre-Cyrille Heam\thanks{Authors in alphabetic order}}
+   
  \maketitle
  
  \begin{abstract}
-This is the abstract
+In this paper we present a new pseudorandom number generator (PRNG) on
+graphics processing units  (GPU). This PRNG is based  on the so-called chaotic iterations.  It
+is firstly proven  to be chaotic according to the Devaney's  formulation. We thus propose  an efficient
+implementation  for  GPU that successfully passes the   {\it BigCrush} tests, deemed to be the  hardest
+battery of tests in TestU01.  Experiments show that this PRNG can generate
+about 20 billions of random numbers  per second on Tesla C1060 and NVidia GTX280
+cards.
+It is finally established that, under reasonable assumptions, the proposed PRNG can be cryptographically 
+secure.
+
+
  \end{abstract}
  
  \section{Introduction}
  
-Interet des itérations chaotiques pour générer des nombre alea\\
-Interet de générer des nombres alea sur GPU
-\alert{RC, un petit state-of-the-art sur les PRNGs sur GPU ?}
-...
-
+Randomness is of importance in many fields as scientific simulations or cryptography. 
+``Random numbers'' can mainly be generated either by a deterministic and reproducible algorithm
+called a pseudorandom number generator (PRNG), or by a physical non-deterministic 
+process having all the characteristics of a random noise, called a truly random number
+generator (TRNG). 
+In this paper, we focus on reproducible generators, useful for instance in
+Monte-Carlo based simulators or in several cryptographic schemes.
+These domains need PRNGs that are statistically irreproachable. 
+On some fields as in numerical simulations, speed is a strong requirement
+that is usually attained by using parallel architectures. In that case,
+a recurrent problem is that a deflate of the statistical qualities is often
+reported, when the parallelization of a good PRNG is realized.
+This is why ad-hoc PRNGs for each possible architecture must be found to
+achieve both speed and randomness.
+On the other side, speed is not the main requirement in cryptography: the great
+need is to define \emph{secure} generators being able to withstand malicious
+attacks. Roughly speaking, an attacker should not be able in practice to make 
+the distinction between numbers obtained with the secure generator and a true random
+sequence. 
+Finally, a small part of the community working in this domain focus on a
+third requirement, that is to define chaotic generators.
+The main idea is to take benefits from a chaotic dynamical system to obtain a
+generator that is unpredictable, disordered, sensible to its seed, or in other words chaotic.
+Their desire is to map a given chaotic dynamics into a sequence that seems random 
+and unassailable due to chaos.
+However, the chaotic maps used as a pattern are defined in the real line 
+whereas computers deal with finite precision numbers.
+This distortion leads to a deflation of both chaotic properties and speed.
+Furthermore, authors of such chaotic generators often claim their PRNG
+as secure due to their chaos properties, but there is no obvious relation
+between chaos and security as it is understood in cryptography.
+This is why the use of chaos for PRNG still remains marginal and disputable.
+
+The authors' opinion is that topological properties of disorder, as they are
+properly defined in the mathematical theory of chaos, can reinforce the quality
+of a PRNG. But they are not substitutable for security or statistical perfection.
+Indeed, to the authors' point of view, such properties can be useful in the two following situations. On the
+one hand, a post-treatment based on a chaotic dynamical system can be applied
+to a PRNG statistically deflective, in order to improve its statistical 
+properties. Such an improvement can be found, for instance, in~\cite{bgw09:ip,bcgr11:ip}.
+On the other hand, chaos can be added to a fast, statistically perfect PRNG and/or a
+cryptographically secure one, in case where chaos can be of interest,
+\emph{only if these last properties are not lost during
+the proposed post-treatment}. Such an assumption is behind this research work.
+It leads to the attempts to define a 
+family of PRNGs that are chaotic while being fast and statistically perfect,
+or cryptographically secure.
+Let us finish this paragraph by noticing that, in this paper, 
+statistical perfection refers to the ability to pass the whole 
+{\it BigCrush} battery of tests, which is widely considered as the most
+stringent statistical evaluation of a sequence claimed as random.
+This battery can be found into the well-known TestU01 package~\cite{LEcuyerS07}.
+Chaos, for its part, refers to the well-established definition of a
+chaotic dynamical system proposed by Devaney~\cite{Devaney}.
+
+
+In a previous work~\cite{bgw09:ip,guyeux10} we have proposed a post-treatment on PRNGs making them behave
+as a chaotic dynamical system. Such a post-treatment leads to a new category of
+PRNGs. We have shown that proofs of Devaney's chaos can be established for this
+family, and that the sequence obtained after this post-treatment can pass the
+NIST~\cite{Nist10}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07} batteries of tests, even if the inputted generators
+cannot.
+The proposition of this paper is to improve widely the speed of the formerly
+proposed generator, without any lack of chaos or statistical properties.
+In particular, a version of this PRNG on graphics processing units (GPU)
+is proposed.
+Although GPU was initially designed  to accelerate
+the manipulation of  images, they are nowadays commonly  used in many scientific
+applications. Therefore,  it is important  to be able to  generate pseudorandom
+numbers inside a GPU when a scientific application runs in it. This remark
+motivates our proposal of a chaotic and statistically perfect PRNG for GPU.  
+Such device
+allows us to generated almost 20 billions of pseudorandom numbers per second.
+Last, but not least, we show that the proposed post-treatment preserves the
+cryptographical security of the inputted PRNG, when this last has such a 
+property.
+
+The remainder of this paper  is organized as follows. In Section~\ref{section:related
+  works} we  review some GPU implementations  of PRNGs.  Section~\ref{section:BASIC
+  RECALLS} gives some basic recalls  on the well-known Devaney's formulation of chaos, 
+  and on an iteration process called ``chaotic
+iterations'' on which the post-treatment is based. 
+Proofs of chaos are given in  Section~\ref{sec:pseudorandom}.
+Section~\ref{sec:efficient    prng}   presents   an   efficient
+implementation of  this chaotic PRNG  on a CPU, whereas   Section~\ref{sec:efficient prng
+  gpu}   describes   the  GPU   implementation. 
+Such generators are experimented in 
+Section~\ref{sec:experiments}.
+We show in Section~\ref{sec:security analysis} that, if the inputted
+generator is cryptographically secure, then it is the case too for the
+generator provided by the post-treatment.
+Such a proof leads to the proposition of a cryptographically secure and
+chaotic generator on GPU based on the famous Blum Blum Shum
+in Section~\ref{sec:CSGPU}.
+This research work ends by a conclusion section, in which the contribution is
+summarized and intended future work is presented.
+
+
+
+
+\section{Related works on GPU based PRNGs}
+\label{section:related works}
+
+Numerous research works on defining GPU based PRNGs have yet been proposed  in the
+literature, so that completeness is impossible.
+This is why authors of this document only give reference to the most significant attempts 
+in this domain, from their subjective point of view. 
+The  quantity of pseudorandom numbers generated per second is mentioned here 
+only when the information is given in the related work. 
+A million numbers  per second will be simply written as
+1MSample/s whereas a billion numbers per second is 1GSample/s.
+
+In \cite{Pang:2008:cec}  a PRNG based on  cellular automata is defined
+with no  requirement to an high  precision  integer   arithmetic  or to any bitwise
+operations. Authors can   generate  about
+3.2MSamples/s on a GeForce 7800 GTX GPU, which is quite an old card now.
+However, there is neither a mention of statistical tests nor any proof of
+chaos or cryptography in this document.
+
+In \cite{ZRKB10}, the authors propose  different versions of efficient GPU PRNGs
+based on  Lagged Fibonacci or Hybrid  Taus.  They have  used these
+PRNGs   for  Langevin   simulations   of  biomolecules   fully  implemented   on
+GPU. Performance of  the GPU versions are far better than  those obtained with a
+CPU, and these PRNGs succeed to pass the {\it BigCrush} battery of TestU01. 
+However the evaluations of the proposed PRNGs are only statistical ones.
+
+
+Authors of~\cite{conf/fpga/ThomasHL09}  have studied the  implementation of some
+PRNGs on  different computing architectures: CPU,  field-programmable gate array
+(FPGA), massively parallel  processors, and GPU. This study is of interest, because
+the  performance  of the  same  PRNGs on  different architectures are compared. 
+FPGA appears as  the  fastest  and the most
+efficient architecture, providing the fastest number of generated pseudorandom numbers
+per joule. 
+However, we can notice that authors can ``only'' generate between 11 and 16GSamples/s
+with a GTX 280  GPU, which should be compared with
+the results presented in this document.
+We can remark too that the PRNGs proposed in~\cite{conf/fpga/ThomasHL09} are only
+able to pass the {\it Crush} battery, which is very easy compared to the {\it Big Crush} one.
+
+Lastly, Cuda  has developed  a  library for  the  generation of  pseudorandom numbers  called
+Curand~\cite{curand11}.        Several       PRNGs        are       implemented, among
+other things 
+Xorwow~\cite{Marsaglia2003} and  some variants of Sobol. The  tests reported show that
+their  fastest version provides  15GSamples/s on  the new  Fermi C2050  card. 
+But their PRNGs cannot pass the whole TestU01 battery (only one test is failed).
+\newline
+\newline
+We can finally remark that, to the best of our knowledge, no GPU implementation have been proven to be chaotic, and the cryptographically secure property is surprisingly never regarded.
  
  \section{Basic Recalls}
  \label{section:BASIC RECALLS}
+
  This section is devoted to basic definitions and terminologies in the fields of
  topological chaos and chaotic iterations.
  \subsection{Devaney's Chaotic Dynamical Systems}
@@ -271,17 +425,18 @@ if and only if $\Gamma(f)$ is strongly connected.
  \end{theorem}
  
  This result of chaos has lead us to study the possibility to build a
-pseudo-random number generator (PRNG) based on the chaotic iterations. 
+pseudorandom number generator (PRNG) based on the chaotic iterations. 
  As $G_f$, defined on the domain   $\llbracket 1 ;  \mathsf{N} \rrbracket^{\mathds{N}} 
  \times \mathds{B}^\mathsf{N}$, is build from Boolean networks $f : \mathds{B}^\mathsf{N}
  \rightarrow \mathds{B}^\mathsf{N}$, we can preserve the theoretical properties on $G_f$
  during implementations (due to the discrete nature of $f$). It is as if
  $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ;  \mathsf{N}
-\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance).
+\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG).
  
-\section{Application to Pseudo-Randomness}
+\section{Application to Pseudorandomness}
+\label{sec:pseudorandom}
  
-\subsection{A First Pseudo-Random Number Generator}
+\subsection{A First Pseudorandom Number Generator}
  
  We have proposed in~\cite{bgw09:ip} a new family of generators that receives 
  two PRNGs as inputs. These two generators are mixed with chaotic iterations, 
@@ -326,11 +481,11 @@ return $y$\;
  
  
  This generator is synthesized in Algorithm~\ref{CI Algorithm}.
-It takes as input: a function $f$;
+It takes as input: a Boolean function $f$ satisfying Theorem~\ref{Th:Caractérisation   des   IC   chaotiques};
  an integer $b$, ensuring that the number of executed iterations is at least $b$
  and at most $2b+1$; and an initial configuration $x^0$.
  It returns the new generated configuration $x$.  Internally, it embeds two
-\textit{XORshift}$(k)$ PRNGs \cite{Marsaglia2003} that returns integers
+\textit{XORshift}$(k)$ PRNGs~\cite{Marsaglia2003} that returns integers
  uniformly distributed
  into $\llbracket 1 ; k \rrbracket$.
  \textit{XORshift} is a category of very fast PRNGs designed by George Marsaglia,
@@ -351,7 +506,7 @@ We have proven in \cite{bcgr11:ip} that,
    if and only if $M$ is a double stochastic matrix.
  \end{theorem} 
  
-This former generator as successively passed various batteries of statistical tests, as the NIST tests~\cite{bcgr11:ip}.
+This former generator as successively passed various batteries of statistical tests, as the NIST~\cite{bcgr11:ip}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07}.
  
  \subsection{Improving the Speed of the Former Generator}
  
@@ -406,7 +561,7 @@ the vectorial negation, leads to a speed improvement. However, proofs
  of chaos obtained in~\cite{bg10:ij} have been established
  only for chaotic iterations of the form presented in Definition 
  \ref{Def:chaotic iterations}. The question is now to determine whether the
-use of more general chaotic iterations to generate pseudo-random numbers 
+use of more general chaotic iterations to generate pseudorandom numbers 
  faster, does not deflate their topological chaos properties.
  
  \subsection{Proofs of Chaos of the General Formulation of the Chaotic Iterations}
@@ -650,18 +805,28 @@ have $d((S,E),(\tilde S,E))<\epsilon$.
  
  
  \section{Efficient PRNG based on Chaotic Iterations}
+\label{sec:efficient prng}
+
+Based on the proof presented in the previous section, it is now possible to 
+improve the speed of the generator formerly presented in~\cite{bgw09:ip,guyeux10}. 
+The first idea is to consider
+that the provided strategy is a pseudorandom Boolean vector obtained by a
+given PRNG.
+An iteration of the system is simply the bitwise exclusive or between
+the last computed state and the current strategy.
+Topological properties of disorder exhibited by chaotic 
+iterations can be inherited by the inputted generator, hoping by doing so to 
+obtain some statistical improvements while preserving speed.
  
-In  order to  implement efficiently  a PRNG  based on  chaotic iterations  it is
-possible to improve  previous works [ref]. One solution  consists in considering
-that the  strategy used contains all the  bits for which the  negation is
-achieved out. Then in order to apply  the negation on these bits we can simply
-apply the  xor operator between  the current number  and the strategy. In
-order to obtain the strategy we also use a classical PRNG.
  
-Here  is an  example with  16-bits numbers  showing how  the bitwise  operations
+Let us give an example using 16-bits numbers, to clearly understand how the bitwise xor operations
  are
-applied.  Suppose  that $x$ and the  strategy $S^i$ are defined  in binary mode.
-Then the following table shows the result of $x$ xor $S^i$.
+done.  
+Suppose  that $x$ and the  strategy $S^i$ are given as
+binary vectors.
+Table~\ref{TableExemple} shows the result of $x \oplus S^i$.
+
+\begin{table}
  $$
  \begin{array}{|cc|cccccccccccccccc|}
  \hline
@@ -675,35 +840,13 @@ x \oplus S^i&=&1&1&0&1&1&1&0&0&0&1&1&1&0&1&0&1\\
  \hline
   \end{array}
  $$
+\caption{Example of an arbitrary round of the proposed generator}
+\label{TableExemple}
+\end{table}
  
-%% \begin{figure}[htbp]
-%% \begin{center}
-%% \fbox{
-%% \begin{minipage}{14cm}
-%% unsigned int CIprng() \{\\
-%%   static unsigned int x = 123123123;\\
-%%   unsigned long t1 = xorshift();\\
-%%   unsigned long t2 = xor128();\\
-%%   unsigned long t3 = xorwow();\\
-%%   x = x\textasciicircum (unsigned int)t1;\\
-%%   x = x\textasciicircum (unsigned int)(t2$>>$32);\\
-%%   x = x\textasciicircum (unsigned int)(t3$>>$32);\\
-%%   x = x\textasciicircum (unsigned int)t2;\\
-%%   x = x\textasciicircum (unsigned int)(t1$>>$32);\\
-%%   x = x\textasciicircum (unsigned int)t3;\\
-%%   return x;\\
-%% \}
-%% \end{minipage}
-%% }
-%% \end{center}
-%% \caption{sequential Chaotic Iteration PRNG}
-%% \label{algo:seqCIprng}
-%% \end{figure}
-
-
-
-\lstset{language=C,caption={C code of the sequential chaotic iterations based
-PRNG},label=algo:seqCIprng}
+
+
+\lstset{language=C,caption={C code of the sequential PRNG based on chaotic iterations},label=algo:seqCIprng}
  \begin{lstlisting}
  unsigned int CIprng() {
    static unsigned int x = 123123123;
@@ -724,52 +867,60 @@ unsigned int CIprng() {
  
  
  
-In listing~\ref{algo:seqCIprng}  a sequential version of  our chaotic iterations
-based   PRNG    is   presented.   The    xor   operator   is    represented   by
-\textasciicircum.  This   function  uses  three  classical   64-bits  PRNG:  the
-\texttt{xorshift},  the   \texttt{xor128}  and  the   \texttt{xorwow}.   In  the
-following,  we call  them  xor-like  PRNGSs.  These  three  PRNGs are  presented
-in~\cite{Marsaglia2003}.  As each  xor-like PRNG used works with  64-bits and as
-our PRNG works  with 32-bits, the use of \texttt{(unsigned  int)} selects the 32
-least significant bits whereas  \texttt{(unsigned int)(t3$>>$32)} selects the 32
-most  significants bits  of the  variable \texttt{t}.   So to  produce  a random
-number realizes  6 xor operations with  6 32-bits numbers produced  by 3 64-bits
-PRNG.  This version successes the  BigCrush of the TestU01 battery [P.  L’ecuyer
-  and R. Simard. Testu01].
+In Listing~\ref{algo:seqCIprng}  a sequential version of  the proposed PRNG based on chaotic iterations
+ is  presented.  The xor operator is  represented by \textasciicircum.
+This  function uses  three classical  64-bits PRNGs, namely the  \texttt{xorshift}, the
+\texttt{xor128},  and  the  \texttt{xorwow}~\cite{Marsaglia2003}.   In  the following,  we  call  them
+``xor-like PRNGs''. 
+As
+each xor-like PRNG  uses 64-bits whereas our proposed generator works with 32-bits,
+we use the command \texttt{(unsigned int)}, that selects the 32 least significant bits of a given integer, and the code
+\texttt{(unsigned int)(t3$>>$32)}  in order to obtain the 32 most significant  bits of \texttt{t}.   
+
+So producing a  pseudorandom number needs  6 xor operations
+with 6 32-bits  numbers that are provided by 3 64-bits PRNGs.   This version successfully passes the
+stringent BigCrush battery of tests~\cite{LEcuyerS07}.
  
-\section{Efficient prng based on chaotic iterations on GPU}
+\section{Efficient PRNGs based on Chaotic Iterations on GPU}
+\label{sec:efficient prng gpu}
  
-In  order to benefit  from computing  power of  GPU, a  program needs  to define
-independent blocks of threads which  can be computed simultaneously. In general,
-the larger the number of threads is,  the more local memory is used and the less
-branching  instructions are  used (if,  while, ...),  the better  performance is
-obtained  on  GPU.  So  with  algorithm  \ref{algo:seqCIprng}  presented in  the
-previous section, it is possible to  build a similar program which computes PRNG
-on  GPU. In  the CUDA  [ref] environment,  threads have  a  local identificator,
-called \texttt{ThreadIdx} relative to the block containing them.
+In  order to take benefits  from the computing  power of  GPU, a  program needs  to have
+independent blocks of threads that can be computed simultaneously. In general,
+the larger the number of threads is,  the more local memory is used, and the less
+branching  instructions are  used (if,  while, ...),  the better the performances on GPU is.  
+Obviously, having these requirements in mind, it is possible to  build a program similar to 
+the one presented in Algorithm  \ref{algo:seqCIprng}, which computes pseudorandom numbers
+on   GPU.  
+To do so, we must firstly recall that in
+ the   CUDA~\cite{Nvid10}  environment,   threads  have   a  local
+identifier called \texttt{ThreadIdx}, which is relative to the block containing them.
  
  
-\subsection{Naive version for GPU}
+\subsection{Naive Version for GPU}
  
-From the CPU version, it is possible  to obtain a quite similar version for GPU.
-The principe consists in assigning the computation of a PRNG as in sequential to
-each thread  of the  GPU.  Of course,  it is  essential that the  three xor-like
-PRNGs  used for  our computation  have different  parameters. So  we  chose them
-randomly with  another PRNG. As the  initialisation is performed by  the CPU, we
-have chosen to use the ISAAC PRNG  [ref] to initalize all the parameters for the
-GPU version  of our  PRNG.  The  implementation of the  three xor-like  PRNGs is
-straightforward  as soon  as their  parameters have  been allocated  in  the GPU
-memory. Each xor-like  PRNGs used works with an internal  number $x$ which keeps
-the last generated random numbers. Other internal variables are also used by the
-xor-like PRNGs. More  precisely, the implementation of the  xor128, the xorshift
-and  the xorwow  respectively  require 4,  5  and 6  unsigned  long as  internal
-variables.
+ 
+It is possible to deduce from the CPU version a quite similar version adapted to GPU.
+The simple principle consists to make each thread of the GPU computing the CPU version of our PRNG.  
+Of course,  the  three xor-like
+PRNGs  used in these computations must have different  parameters. 
+In a given thread, these lasts are
+randomly picked from another PRNGs. 
+The  initialization stage is performed by  the CPU.
+To do it, the  ISAAC  PRNG~\cite{Jenkins96} is used to  set  all  the
+parameters embedded into each thread.   
+
+The implementation of  the three
+xor-like  PRNGs  is  straightforward  when  their  parameters  have  been
+allocated in  the GPU memory.  Each xor-like  works with  an internal
+number  $x$  that saves  the  last  generated  pseudorandom number. Additionally,  the
+implementation of the  xor128, the xorshift, and the  xorwow respectively require
+4, 5, and 6 unsigned long as internal variables.
  
  \begin{algorithm}
  
  \KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like
  PRNGs in global memory\;
-NumThreads: Number of threads\;}
+NumThreads: number of threads\;}
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadIdx is concerned by the computation} {
    retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\;
@@ -780,34 +931,34 @@ NumThreads: Number of threads\;}
    store internal variables in InternalVarXorLikeArray[threadIdx]\;
  }
  
-\caption{main kernel for the chaotic iterations based PRNG GPU naive version}
+\caption{Main kernel of the GPU ``naive'' version of the PRNG based on chaotic iterations}
  \label{algo:gpu_kernel}
  \end{algorithm}
  
-Algorithm~\ref{algo:gpu_kernel}  presents a naive  implementation of  PRNG using
-GPU.  According  to the available  memory in the  GPU and the number  of threads
+Algorithm~\ref{algo:gpu_kernel}  presents a naive  implementation of the proposed  PRNG on
+GPU.  Due to the available  memory in the  GPU and the number  of threads
  used simultenaously,  the number  of random numbers  that a thread  can generate
-inside   a    kernel   is   limited,   i.e.    the    variable   \texttt{n}   in
-algorithm~\ref{algo:gpu_kernel}. For example, if  $100,000$ threads are used and
-if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}
-then   the  memory   required   to  store   internals   variables  of   xor-like
+inside   a    kernel   is   limited  (\emph{i.e.},    the    variable   \texttt{n}   in
+algorithm~\ref{algo:gpu_kernel}). For instance, if  $100,000$ threads are used and
+if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)},
+then   the  memory   required   to  store all of the  internals   variables  of both the  xor-like
  PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
-and  random  number of  our  PRNG  is  equals to  $100,000\times  ((4+5+6)\times
-2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb.
+and  the pseudorandom  numbers generated by  our  PRNG,  is  equal to  $100,000\times  ((4+5+6)\times
+2+(1+100))=1,310,000$ 32-bits numbers, that is, approximately $52$Mb.
  
-All the  tests performed  to pass the  BigCrush of TestU01  succeeded. Different
-number of threads, called \texttt{NumThreads} in our algorithm, have been tested
-upto $10$ millions.
+This generator is able to pass the whole BigCrush battery of tests, for all
+the versions that have been tested depending on their number of threads 
+(called \texttt{NumThreads} in our algorithm, tested until $10$ millions).
  
  \begin{remark}
-Algorithm~\ref{algo:gpu_kernel}  has  the  advantage to  manipulate  independent
-PRNGs, so this version is easily usable on a cluster of computer. The only thing
-to ensure is to use a single ISAAC PRNG. For this, a simple solution consists in
-using a master node for the initialization which computes the initial parameters
+The proposed algorithm has  the  advantage to  manipulate  independent
+PRNGs, so this version is easily adaptable on a cluster of computers too. The only thing
+to ensure is to use a single ISAAC PRNG. To achieve this requirement, a simple solution consists in
+using a master node for the initialization. This master node computes the initial parameters
  for all the differents nodes involves in the computation.
  \end{remark}
  
-\subsection{Improved version for GPU}
+\subsection{Improved Version for GPU}
  
  As GPU cards using CUDA have shared memory between threads of the same block, it
  is possible  to use this  feature in order  to simplify the  previous algorithm,
@@ -823,7 +974,7 @@ which represent the indexes of the  other threads for which the results are used
  by the  current thread. In  the algorithm, we  consider that a  64-bits xor-like
  PRNG is used, that is why both 32-bits parts are used.
  
-This version also succeed to the BigCrush batteries of tests.
+This version also succeeds to the {\it BigCrush} batteries of tests.
  
  \begin{algorithm}
  
@@ -834,17 +985,15 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;}
  
  \KwOut{NewNb: array containing random numbers in global memory}
  \If{threadId is concerned} {
-  retrieve data from InternalVarXorLikeArray[threadId] in local variables\;
+  retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\;
    offset = threadIdx\%permutation\_size\;
    o1 = threadIdx-offset+tab1[offset]\;
    o2 = threadIdx-offset+tab2[offset]\;
    \For{i=1 to n} {
      t=xor-like()\;
-    shared\_mem[threadId]=(unsigned int)t\;
-    x = x $\oplus$ (unsigned int) t\;
-    x = x $\oplus$ (unsigned int) (t>>32)\;
-    x = x $\oplus$ shared[o1]\;
-    x = x $\oplus$ shared[o2]\;
+    t=t$\oplus$shmem[o1]$\oplus$shmem[o2]\;
+    shared\_mem[threadId]=t\;
+    x = x $\oplus$ t\;
  
      store the new PRNG in NewNb[NumThreads*threadId+i]\;
    }
@@ -858,9 +1007,9 @@ version}
  
  \subsection{Theoretical Evaluation of the Improved Version}
  
-A run of Algorithm~\ref{algo:gpu_kernel2} consists in four operations having 
+A run of Algorithm~\ref{algo:gpu_kernel2} consists in three operations having 
  the form of Equation~\ref{equation Oplus}, which is equivalent to the iterative
-system of Eq.~\ref{eq:generalIC}. That is, four iterations of the general chaotic
+system of Eq.~\ref{eq:generalIC}. That is, three iterations of the general chaotic
  iterations are realized between two stored values of the PRNG.
  To be certain that we are in the framework of Theorem~\ref{t:chaos des general},
  we must guarantee that this dynamical system iterates on the space 
@@ -880,558 +1029,217 @@ chaotic iterations presented previously, and for this reason, it satisfies the
  Devaney's formulation of a chaotic behavior.
  
  \section{Experiments}
-
-Different experiments have been performed in order to measure the generation
-speed.
-\begin{figure}[t]
+\label{sec:experiments}
+
+Different experiments  have been  performed in order  to measure  the generation
+speed. We have used  a computer equiped with Tesla C1060 NVidia  GPU card and an
+Intel  Xeon E5530 cadenced  at 2.40  GHz for  our experiments  and we  have used
+another one  equipped with  a less performant  CPU and  a GeForce GTX  280. Both
+cards have 240 cores.
+
+In  Figure~\ref{fig:time_xorlike_gpu} we  compare the  number of  random numbers
+generated per second with the xor-like based PRNG. In this figure, the optimized
+version use the {\it xor64} described in~\cite{Marsaglia2003}. The naive version
+use  the three  xor-like  PRNGs described  in Listing~\ref{algo:seqCIprng}.   In
+order to obtain the optimal performance we removed the storage of random numbers
+in the GPU memory. This step is time consuming and slows down the random numbers
+generation.  Moreover, if one is  interested by applications that consume random
+numbers  directly   when  they  are  generated,  their   storage  are  completely
+useless. In this  figure we can see  that when the number of  threads is greater
+than approximately 30,000 upto 5 millions the number of random numbers generated
+per second  is almost constant.  With the  naive version, it is  between 2.5 and
+3GSample/s.   With  the  optimized   version,  it  is  approximately  equals  to
+20GSample/s. Finally  we can remark  that both GPU  cards are quite  similar. In
+practice,  the Tesla C1060  has more  memory than  the GTX  280 and  this memory
+should be of better quality.
+
+\begin{figure}[htbp]
  \begin{center}
-  \includegraphics[scale=.7]{curve_time_gpu.pdf}
+  \includegraphics[scale=.7]{curve_time_xorlike_gpu.pdf}
  \end{center}
-\caption{Number of random numbers generated per second}
-\label{fig:time_naive_gpu}
+\caption{Number of random numbers generated per second with the xorlike based PRNG}
+\label{fig:time_xorlike_gpu}
  \end{figure}
  
  
-First of all we have compared the time to generate X random numbers with both
-the CPU version and the GPU version. 
-
-Faire une courbe du nombre de random en fonction du nombre de threads,
-éventuellement en fonction du nombres de threads par bloc.
-
-
-
-\section{The relativity of disorder}
-\label{sec:de la relativité du désordre}
-
-In the next two sections, we investigate the impact of the choices that have
-lead to the definitions of measures in Sections \ref{sec:chaotic iterations} and \ref{deuxième def}.
-
-\subsection{Impact of the topology's finenesse}
-
-Let us firstly introduce the following notations.
-
-\begin{notation}
-$\mathcal{X}_\tau$ will denote the topological space
-$\left(\mathcal{X},\tau\right)$, whereas $\mathcal{V}_\tau (x)$ will be the set
-of all the neighborhoods of $x$ when considering the topology $\tau$ (or simply
-$\mathcal{V} (x)$, if there is no ambiguity).
-\end{notation}
-
-
-
-\begin{theorem}
-\label{Th:chaos et finesse}
-Let $\mathcal{X}$ a set and $\tau, \tau'$ two topologies on $\mathcal{X}$ s.t.
-$\tau'$ is finer than $\tau$. Let $f:\mathcal{X} \to \mathcal{X}$, continuous
-both for $\tau$ and $\tau'$.
-
-If $(\mathcal{X}_{\tau'},f)$ is chaotic according to Devaney, then
-$(\mathcal{X}_\tau,f)$ is chaotic too.
-\end{theorem}
-
-\begin{proof}
-Let us firstly establish the transitivity of $(\mathcal{X}_\tau,f)$.
-
-Let $\omega_1, \omega_2$ two open sets of $\tau$. Then $\omega_1, \omega_2 \in
-\tau'$, becaus $\tau'$ is finer than $\tau$. As $f$ is $\tau'-$transitive, we
-can deduce that $\exists n \in \mathds{N}, \omega_1 \cap f^{(n)}(\omega_2) =
-\varnothing$. Consequently, $f$ is $\tau-$transitive.
-
-Let us now consider the regularity of $(\mathcal{X}_\tau,f)$, \emph{i.e.}, for
-all $x \in \mathcal{X}$, and for all $\tau-$neighborhood $V$ of $x$, there is a
-periodic point for $f$ into $V$.
-
-Let $x \in \mathcal{X}$ and $V \in \mathcal{V}_\tau (x)$ a $\tau-$neighborhood
-of $x$. By definition, $\exists \omega \in \tau, x \in \omega \subset V$.
-
-But $\tau \subset \tau'$, so $\omega \in \tau'$, and then $V \in
-\mathcal{V}_{\tau'} (x)$. As $(\mathcal{X}_{\tau'},f)$ is regular, there is a
-periodic point for $f$ into $V$, and the regularity of $(\mathcal{X}_\tau,f)$ is
-proven. 
-\end{proof}
-
-\subsection{A given system can always be claimed as chaotic}
-
-Let $f$ an iteration function on $\mathcal{X}$ having at least a fixed point.
-Then this function is chaotic (in a certain way):
-
-\begin{theorem}
-Let $\mathcal{X}$ a nonempty set and $f: \mathcal{X} \to \X$ a function having
-at least a fixed point.
-Then $f$ is $\tau_0-$chaotic, where $\tau_0$ is the trivial (indiscrete)
-topology on $\X$.
-\end{theorem}
-
-
-\begin{proof}
-$f$ is transitive when $\forall \omega, \omega' \in \tau_0 \setminus
-\{\varnothing\}, \exists n \in \mathds{N}, f^{(n)}(\omega) \cap \omega' \neq
-\varnothing$.
-As $\tau_0 = \left\{ \varnothing, \X \right\}$, this is equivalent to look for
-an integer $n$ s.t. $f^{(n)}\left( \X \right) \cap \X \neq \varnothing$. For
-instance, $n=0$ is appropriate.
-
-Let us now consider $x \in \X$ and $V \in \mathcal{V}_{\tau_0} (x)$. Then $V =
-\mathcal{X}$, so $V$ has at least a fixed point for $f$. Consequently $f$ is
-regular, and the result is established.
-\end{proof}
-
-
-
-
-\subsection{A given system can always be claimed as non-chaotic}
-
-\begin{theorem}
-Let $\mathcal{X}$ be a set and $f: \mathcal{X} \to \X$.
-If $\X$ is infinite, then $\left( \X_{\tau_\infty}, f\right)$ is not chaotic
-(for the Devaney's formulation), where $\tau_\infty$ is the discrete topology.
-\end{theorem}
-
-\begin{proof}
-Let us prove it by contradiction, assuming that $\left(\X_{\tau_\infty},
-f\right)$ is both transitive and regular.
+In  comparison,   Listing~\ref{algo:seqCIprng}  allows  us   to  generate  about
+138MSample/s with only one core of the Xeon E5530.
  
-Let $x \in \X$ and $\{x\}$ one of its neighborhood. This neighborhood must
-contain a periodic point for $f$, if we want that $\left(\X_{\tau_\infty},
-f\right)$ is regular. Then $x$ must be a periodic point of $f$.
  
-Let $I_x = \left\{ f^{(n)}(x), n \in \mathds{N}\right\}$. This set is finite
-because  $x$ is periodic, and $\mathcal{X}$ is infinite, then $\exists y \in
-\mathcal{X}, y \notin I_x$.
+In Figure~\ref{fig:time_bbs_gpu}  we highlight the performance  of the optimized
+BBS based  PRNG on GPU. Performances are  less important. On the  Tesla C1060 we
+obtain approximately 1.8GSample/s and on the GTX 280 about 1.6GSample/s.
  
-As $\left(\X_{\tau_\infty}, f\right)$ must be transitive, for all open nonempty
-sets $A$ and $B$, an integer $n$ must satisfy $f^{(n)}(A) \cap B \neq
-\varnothing$. However $\{x\}$ and $\{y\}$ are open sets and $y \notin I_x
-\Rightarrow \forall n, f^{(n)}\left( \{x\} \right) \cap \{y\} = \varnothing$.
-\end{proof}
+\begin{figure}[htbp]
+\begin{center}
+  \includegraphics[scale=.7]{curve_time_bbs_gpu.pdf}
+\end{center}
+\caption{Number of random numbers generated per second with the BBS based PRNG}
+\label{fig:time_bbs_gpu}
+\end{figure}
  
+Both  these  experiments allows  us  to conclude  that  it  is possible  to
+generate a  huge number of pseudorandom  numbers with the  xor-like version and
+about tens  times less with the BBS  based version. The former  version has only
+chaotic properties whereas the latter also has cryptographically properties.
  
  
  
  
  
-\section{Chaos on the order topology}
  
-\subsection{The phase space is an interval of the real line}
  
-\subsubsection{Toward a topological semiconjugacy}
+\section{Security Analysis}
+\label{sec:security analysis}
  
-In what follows, our intention is to establish, by using a topological
-semiconjugacy, that chaotic iterations over $\mathcal{X}$ can be described as
-iterations on a real interval. To do so, we must firstly introduce some
-notations and terminologies. 
  
-Let $\mathcal{S}_\mathsf{N}$ be the set of sequences belonging into $\llbracket
-1; \mathsf{N}\rrbracket$ and $\mathcal{X}_{\mathsf{N}} = \mathcal{S}_\mathsf{N}
-\times \B^\mathsf{N}$.
  
+In this section the concatenation of two strings $u$ and $v$ is classically
+denoted by $uv$.
+In a cryptographic context, a pseudorandom generator is a deterministic
+algorithm $G$ transforming strings  into strings and such that, for any
+seed $w$ of length $N$, $G(w)$ (the output of $G$ on the input $w$) has size
+$\ell_G(N)$ with $\ell_G(N)>N$.
+The notion of {\it secure} PRNGs can now be defined as follows. 
  
  \begin{definition}
-The function $\varphi: \mathcal{S}_{10} \times\mathds{B}^{10} \rightarrow \big[
-0, 2^{10} \big[$ is defined by:
-\begin{equation}
- \begin{array}{cccl}
-\varphi: & \mathcal{X}_{10} = \mathcal{S}_{10} \times\mathds{B}^{10}&
-\longrightarrow & \big[ 0, 2^{10} \big[ \\
- & (S,E) = \left((S^0, S^1, \hdots ); (E_0, \hdots, E_9)\right) & \longmapsto &
-\varphi \left((S,E)\right)
-\end{array}
-\end{equation}
-where $\varphi\left((S,E)\right)$ is the real number:
-\begin{itemize}
-\item whose integral part $e$ is $\displaystyle{\sum_{k=0}^9 2^{9-k} E_k}$, that
-is, the binary digits of $e$ are $E_0 ~ E_1 ~ \hdots ~ E_9$.
-\item whose decimal part $s$ is equal to $s = 0,S^0~ S^1~ S^2~ \hdots =
-\sum_{k=1}^{+\infty} 10^{-k} S^{k-1}.$ 
-\end{itemize}
+A cryptographic PRNG $G$ is secure if for any probabilistic polynomial time
+algorithm $D$, for any positive polynomial $p$, and for all sufficiently
+large $k$'s,
+$$| \mathrm{Pr}[D(G(U_k))=1]-Pr[D(U_{\ell_G(k)}=1]|< \frac{1}{p(N)},$$
+where $U_r$ is the uniform distribution over $\{0,1\}^r$ and the
+probabilities are taken over $U_N$, $U_{\ell_G(N)}$ as well as over the
+internal coin tosses of $D$. 
  \end{definition}
  
+Intuitively, it means that there is no polynomial time algorithm that can
+distinguish a perfect uniform random generator from $G$ with a non
+negligible probability. The interested reader is referred
+to~\cite[chapter~3]{Goldreich} for more information. Note that it is
+quite easily possible to change the function $\ell$ into any polynomial
+function $\ell^\prime$ satisfying $\ell^\prime(N)>N)$~\cite[Chapter 3.3]{Goldreich}.
+
+The generation schema developed in (\ref{equation Oplus}) is based on a
+pseudorandom generator. Let $H$ be a cryptographic PRNG. We may assume,
+without loss of generality, that for any string $S_0$ of size $N$, the size
+of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. 
+Let $S_1,\ldots,S_k$ be the 
+strings of length $N$ such that $H(S_0)=S_1 \ldots S_k$ ($H(S_0)$ is the concatenation of
+the $S_i$'s). The cryptographic PRNG $X$ defined in (\ref{equation Oplus})
+is the algorithm mapping any string of length $2N$ $x_0S_0$ into the string
+$(x_0\oplus S_0 \oplus S_1)(x_0\oplus S_0 \oplus S_1\oplus S_2)\ldots
+(x_o\bigoplus_{i=0}^{i=k}S_i)$. Particularly one has $\ell_{X}(2N)=kN=\ell_H(N)$. 
+We claim now that if this PRNG is secure,
+then the new one is secure too.
  
+\begin{proposition}
+If $H$ is a secure cryptographic PRNG, then $X$ is a secure cryptographic
+PRNG too.
+\end{proposition}
  
-$\varphi$ realizes the association between a point of $\mathcal{X}_{10}$ and a
-real number into $\big[ 0, 2^{10} \big[$. We must now translate the chaotic
-iterations $\Go$ on this real interval. To do so, two intermediate functions
-over $\big[ 0, 2^{10} \big[$ must be introduced:
-
-
-\begin{definition}
-\label{def:e et s}
-Let $x \in \big[ 0, 2^{10} \big[$ and:
-\begin{itemize}
-\item $e_0, \hdots, e_9$ the binary digits of the integral part of $x$:
-$\displaystyle{\lfloor x \rfloor = \sum_{k=0}^{9} 2^{9-k} e_k}$.
-\item $(s^k)_{k\in \mathds{N}}$ the digits of $x$, where the chosen decimal
-decomposition of $x$ is the one that does not have an infinite number of 9: 
-$\displaystyle{x = \lfloor x \rfloor + \sum_{k=0}^{+\infty} s^k 10^{-k-1}}$.
-\end{itemize}
-$e$ and $s$ are thus defined as follows:
-\begin{equation}
-\begin{array}{cccl}
-e: & \big[ 0, 2^{10} \big[ & \longrightarrow & \mathds{B}^{10} \\
- & x & \longmapsto & (e_0, \hdots, e_9)
-\end{array}
+\begin{proof}
+The proposition is proved by contraposition. Assume that $X$ is not
+secure. By Definition, there exists a polynomial time probabilistic
+algorithm $D$, a positive polynomial $p$, such that for all $k_0$ there exists
+$N\geq \frac{k_0}{2}$ satisfying 
+$$| \mathrm{Pr}[D(X(U_{2N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)}.$$
+We describe a new probabilistic algorithm $D^\prime$ on an input $w$ of size
+$kN$:
+\begin{enumerate}
+\item Decompose $w$ into $w=w_1\ldots w_{k}$, where each $w_i$ has size $N$.
+\item Pick a string $y$ of size $N$ uniformly at random.
+\item Compute $z=(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y
+  \bigoplus_{i=1}^{i=k} w_i).$
+\item Return $D(z)$.
+\end{enumerate}
+
+
+Consider  for each $y\in \mathbb{B}^{kN}$ the function $\varphi_{y}$
+from $\mathbb{B}^{kN}$ into $\mathbb{B}^{kN}$ mapping $w=w_1\ldots w_k$
+(each $w_i$ has length $N$) to 
+$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y
+  \bigoplus_{i=1}^{i=k_1} w_i).$ By construction, one has for every $w$,
+\begin{equation}\label{PCH-1}
+D^\prime(w)=D(\varphi_y(w)),
  \end{equation}
-and
-\begin{equation}
- \begin{array}{cccc}
-s: & \big[ 0, 2^{10} \big[ & \longrightarrow & \llbracket 0, 9
-\rrbracket^{\mathds{N}} \\
- & x & \longmapsto & (s^k)_{k \in \mathds{N}}
-\end{array}
+where $y$ is randomly generated. 
+Moreover, for each $y$, $\varphi_{y}$ is injective: if 
+$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y\bigoplus_{i=1}^{i=k_1}
+w_i)=(y\oplus w_1^\prime)(y\oplus w_1^\prime\oplus w_2^\prime)\ldots
+(y\bigoplus_{i=1}^{i=k} w_i^\prime)$, then for every $1\leq j\leq k$,
+$y\bigoplus_{i=1}^{i=j} w_i^\prime=y\bigoplus_{i=1}^{i=j} w_i$. It follows,
+by a direct induction, that $w_i=w_i^\prime$. Furthermore, since $\mathbb{B}^{kN}$
+is finite, each $\varphi_y$ is bijective. Therefore, and using (\ref{PCH-1}),
+one has
+\begin{equation}\label{PCH-2}
+\mathrm{Pr}[D^\prime(U_{kN})=1]=\mathrm{Pr}[D(\varphi_y(U_{kN}))=1]=\mathrm{Pr}[D(U_{kN})=1].
  \end{equation}
-\end{definition}
-
-We are now able to define the function $g$, whose goal is to translate the
-chaotic iterations $\Go$ on an interval of $\mathds{R}$.
  
-\begin{definition}
-$g:\big[ 0, 2^{10} \big[ \longrightarrow \big[ 0, 2^{10} \big[$ is defined by:
-\begin{equation}
-\begin{array}{cccc}
-g: & \big[ 0, 2^{10} \big[ & \longrightarrow & \big[ 0, 2^{10} \big[ \\
- & x & \longmapsto & g(x)
-\end{array}
+Now, using (\ref{PCH-1}) again, one has  for every $x$,
+\begin{equation}\label{PCH-3}
+D^\prime(H(x))=D(\varphi_y(H(x))),
  \end{equation}
-where g(x) is the real number of $\big[ 0, 2^{10} \big[$ defined bellow:
-\begin{itemize}
-\item its integral part has a binary decomposition equal to $e_0', \hdots,
-e_9'$, with:
- \begin{equation}
-e_i' = \left\{
-\begin{array}{ll}
-e(x)_i & \textrm{ if } i \neq s^0\\
-e(x)_i + 1 \textrm{ (mod 2)} & \textrm{ if } i = s^0\\
-\end{array}
-\right.
+where $y$ is randomly generated. By construction, $\varphi_y(H(x))=X(yx)$,
+thus
+\begin{equation}\label{PCH-3}
+D^\prime(H(x))=D(yx),
  \end{equation}
-\item whose decimal part is $s(x)^1, s(x)^2, \hdots$
-\end{itemize}
-\end{definition}
-
-\bigskip
+where $y$ is randomly generated. 
+It follows that 
  
-
-In other words, if $x = \displaystyle{\sum_{k=0}^{9} 2^{9-k} e_k + 
-\sum_{k=0}^{+\infty} s^{k} ~10^{-k-1}}$, then:
-\begin{equation}
-g(x) =
-\displaystyle{\sum_{k=0}^{9} 2^{9-k} (e_k + \delta(k,s^0) \textrm{ (mod 2)}) + 
-\sum_{k=0}^{+\infty} s^{k+1} 10^{-k-1}}. 
+\begin{equation}\label{PCH-4}
+\mathrm{Pr}[D^\prime(H(U_{N}))=1]=\mathrm{Pr}[D(U_{2N})=1].
  \end{equation}
-
-
-\subsubsection{Defining a metric on $\big[ 0, 2^{10} \big[$}
-
-Numerous metrics can be defined on the set $\big[ 0, 2^{10} \big[$, the most
-usual one being the Euclidian distance recalled bellow:
-
-\begin{notation}
-\index{distance!euclidienne}
-$\Delta$ is the Euclidian distance on $\big[ 0, 2^{10} \big[$, that is,
-$\Delta(x,y) = |y-x|^2$.
-\end{notation}
-
-\medskip
-
-This Euclidian distance does not reproduce exactly the notion of proximity
-induced by our first distance $d$ on $\X$. Indeed $d$ is finer than $\Delta$.
-This is the reason why we have to introduce the following metric:
-
-
-
-\begin{definition}
-Let $x,y \in \big[ 0, 2^{10} \big[$.
-$D$ denotes the function from $\big[ 0, 2^{10} \big[^2$ to $\mathds{R}^+$
-defined by: $D(x,y) = D_e\left(e(x),e(y)\right) + D_s\left(s(x),s(y)\right)$,
-where:
-\begin{center}
-$\displaystyle{D_e(E,\check{E}) = \sum_{k=0}^\mathsf{9} \delta (E_k,
-\check{E}_k)}$, ~~and~ $\displaystyle{D_s(S,\check{S}) = \sum_{k = 1}^\infty
-\dfrac{|S^k-\check{S}^k|}{10^k}}$.
-\end{center}
-\end{definition}
-
-\begin{proposition}
-$D$ is a distance on $\big[ 0, 2^{10} \big[$.
-\end{proposition}
-
-\begin{proof}
-The three axioms defining a distance must be checked.
-\begin{itemize}
-\item $D \geqslant 0$, because everything is positive in its definition. If
-$D(x,y)=0$, then $D_e(x,y)=0$, so the integral parts of $x$ and $y$ are equal
-(they have the same binary decomposition). Additionally, $D_s(x,y) = 0$, then
-$\forall k \in \mathds{N}^*, s(x)^k = s(y)^k$. In other words, $x$ and $y$ have
-the same $k-$th decimal digit, $\forall k \in \mathds{N}^*$. And so $x=y$.
-\item $D(x,y)=D(y,x)$.
-\item Finally, the triangular inequality is obtained due to the fact that both
-$\delta$ and $\Delta(x,y)=|x-y|$ satisfy it.
-\end{itemize}
+ From (\ref{PCH-2}) and (\ref{PCH-4}), one can deduce that
+there exist a polynomial time probabilistic
+algorithm $D^\prime$, a positive polynomial $p$, such that for all $k_0$ there exists
+$N\geq \frac{k_0}{2}$ satisfying 
+$$| \mathrm{Pr}[D(H(U_{N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)},$$
+proving that $H$ is not secure, a contradiction. 
  \end{proof}
  
  
-The convergence of sequences according to $D$ is not the same than the usual
-convergence related to the Euclidian metric. For instance, if $x^n \to x$
-according to $D$, then necessarily the integral part of each $x^n$ is equal to
-the integral part of $x$ (at least after a given threshold), and the decimal
-part of $x^n$ corresponds to the one of $x$ ``as far as required''.
-To illustrate this fact, a comparison between $D$ and the Euclidian distance is
-given Figure \ref{fig:comparaison de distances}. These illustrations show that
-$D$ is richer and more refined than the Euclidian distance, and thus is more
-precise.
-
-
-\begin{figure}[t]
-\begin{center}
-  \subfigure[Function $x \to dist(x;1,234) $ on the interval
-$(0;5)$.]{\includegraphics[scale=.35]{DvsEuclidien.pdf}}\quad
-  \subfigure[Function $x \to dist(x;3) $ on the interval
-$(0;5)$.]{\includegraphics[scale=.35]{DvsEuclidien2.pdf}}
-\end{center}
-\caption{Comparison between $D$ (in blue) and the Euclidian distane (in green).}
-\label{fig:comparaison de distances}
-\end{figure}
-
-
-
-
-\subsubsection{The semiconjugacy}
-
-It is now possible to define a topological semiconjugacy between $\mathcal{X}$
-and an interval of $\mathds{R}$:
-
-\begin{theorem}
-Chaotic iterations on the phase space $\mathcal{X}$ are simple iterations on
-$\mathds{R}$, which is illustrated by the semiconjugacy of the diagram bellow:
-\begin{equation*}
-\begin{CD}
-\left(~\mathcal{S}_{10} \times\mathds{B}^{10}, d~\right) @>G_{f_0}>>
-\left(~\mathcal{S}_{10} \times\mathds{B}^{10}, d~\right)\\
-    @V{\varphi}VV                    @VV{\varphi}V\\
-\left( ~\big[ 0, 2^{10} \big[, D~\right)  @>>g> \left(~\big[ 0, 2^{10} \big[,
-D~\right)
-\end{CD}
-\end{equation*}
-\end{theorem}
-
-\begin{proof}
-$\varphi$ has been constructed in order to be continuous and onto.
-\end{proof}
-
-In other words, $\mathcal{X}$ is approximately equal to $\big[ 0, 2^\mathsf{N}
-\big[$.
  
  
+\section{A Cryptographically Secure PRNG for GPU}
+\label{sec:CSGPU}
+It is  possible to build a  cryptographically secure prng based  on the previous
+algorithm (algorithm~\ref{algo:gpu_kernel2}).   It simply consists  in replacing
+the  {\it  xor-like} algorithm  by  another  cryptographically  secure prng.  In
+practice, we suggest  to use the BBS algorithm~\cite{BBS}  which takes the form:
+$$x_{n+1}=x_n^2~ mod~ M$$  where $M$ is the product of  two prime numbers. Those
+prime numbers  need to be congruent  to 3 modulus  4. In practice, this  PRNG is
+known to  be slow and  not efficient for  the generation of random  numbers. For
+current  GPU   cards,  the  modulus   operation  is  the  most   time  consuming
+operation. So in  order to obtain quite reasonable  performances, it is required
+to use only modulus on 32  bits integer numbers. Consequently $x_n^2$ need to be
+less than  $2^{32}$ and the  number $M$  need to be  less than $2^{16}$.   So in
+pratice we can  choose prime numbers around 256 that are  congruent to 3 modulus
+4.  With  32 bits numbers,  only the  4 least significant  bits of $x_n$  can be
+chosen  (the   maximum  number  of   undistinguishing  is  less  or   equals  to
+$log_2(log_2(x_n))$). So  to generate a 32 bits  number, we need to  use 8 times
+the BBS algorithm, with different combinations of $M$ is required.
  
+Currently this PRNG does not succeed to pass all the tests of TestU01.
  
  
-
-\subsection{Study of the chaotic iterations described as a real function}
-
-
-\begin{figure}[t]
-\begin{center}
-  \subfigure[ICs on the interval
-$(0,9;1)$.]{\includegraphics[scale=.35]{ICs09a1.pdf}}\quad
-  \subfigure[ICs on the interval
-$(0,7;1)$.]{\includegraphics[scale=.35]{ICs07a95.pdf}}\\
-  \subfigure[ICs on the interval
-$(0,5;1)$.]{\includegraphics[scale=.35]{ICs05a1.pdf}}\quad
-  \subfigure[ICs on the interval
-$(0;1)$]{\includegraphics[scale=.35]{ICs0a1.pdf}}
-\end{center}
-\caption{Representation of the chaotic iterations.}
-\label{fig:ICs}
-\end{figure}
-
-
-
-
-\begin{figure}[t]
-\begin{center}
-  \subfigure[ICs on the interval
-$(510;514)$.]{\includegraphics[scale=.35]{ICs510a514.pdf}}\quad
-  \subfigure[ICs on the interval
-$(1000;1008)$]{\includegraphics[scale=.35]{ICs1000a1008.pdf}}
-\end{center}
-\caption{ICs on small intervals.}
-\label{fig:ICs2}
-\end{figure}
-
-\begin{figure}[t]
-\begin{center}
-  \subfigure[ICs on the interval
-$(0;16)$.]{\includegraphics[scale=.3]{ICs0a16.pdf}}\quad
-  \subfigure[ICs on the interval 
-$(40;70)$.]{\includegraphics[scale=.45]{ICs40a70.pdf}}\quad
-\end{center}
-\caption{General aspect of the chaotic iterations.}
-\label{fig:ICs3}
-\end{figure}
-
-
-We have written a Python program to represent the chaotic iterations with the
-vectorial negation on the real line $\mathds{R}$. Various representations of
-these CIs are given in Figures \ref{fig:ICs}, \ref{fig:ICs2} and \ref{fig:ICs3}.
-It can be remarked that the function $g$ is a piecewise linear function: it is
-linear on each interval having the form $\left[ \dfrac{n}{10},
-\dfrac{n+1}{10}\right[$, $n \in \llbracket 0;2^{10}\times 10 \rrbracket$ and its
-slope is equal to 10. Let us justify these claims:
-
-\begin{proposition}
-\label{Prop:derivabilite des ICs}
-Chaotic iterations $g$ defined on $\mathds{R}$ have derivatives of all orders on
-$\big[ 0, 2^{10} \big[$, except on the 10241 points in $I$ defined by $\left\{
-\dfrac{n}{10} ~\big/~ n \in \llbracket 0;2^{10}\times 10\rrbracket \right\}$.
-
-Furthermore, on each interval of the form $\left[ \dfrac{n}{10},
-\dfrac{n+1}{10}\right[$, with $n \in \llbracket 0;2^{10}\times 10 \rrbracket$,
-$g$ is a linear function, having a slope equal to 10: $\forall x \notin I,
-g'(x)=10$.
-\end{proposition}
-
-
-\begin{proof}
-Let $I_n = \left[ \dfrac{n}{10}, \dfrac{n+1}{10}\right[$, with $n \in \llbracket
-0;2^{10}\times 10 \rrbracket$. All the points of $I_n$ have the same integral
-prat $e$ and the same decimal part $s^0$: on the set $I_n$,  functions $e(x)$
-and $x \mapsto s(x)^0$ of Definition \ref{def:e et s} only depend on $n$. So all
-the images $g(x)$ of these points $x$:
-\begin{itemize}
-\item Have the same integral part, which is $e$, except probably the bit number
-$s^0$. In other words, this integer has approximately the same binary
-decomposition than $e$, the sole exception being the digit $s^0$ (this number is
-then either $e+2^{10-s^0}$ or $e-2^{10-s^0}$, depending on the parity of $s^0$,
-\emph{i.e.}, it is equal to $e+(-1)^{s^0}\times 2^{10-s^0}$).
-\item A shift to the left has been applied to the decimal part $y$, losing by
-doing so the common first digit $s^0$. In other words, $y$ has been mapped into
-$10\times y - s^0$.
-\end{itemize}
-To sum up, the action of $g$ on the points of $I$ is as follows: first, make a
-multiplication by 10, and second, add the same constant to each term, which is
-$\dfrac{1}{10}\left(e+(-1)^{s^0}\times 2^{10-s^0}\right)-s^0$.
-\end{proof}
-
-\begin{remark}
-Finally, chaotic iterations are elements of the large family of functions that
-are both chaotic and piecewise linear (like the tent map).
-\end{remark}
-
-
-
-\subsection{Comparison of the two metrics on $\big[ 0, 2^\mathsf{N} \big[$}
-
-The two propositions bellow allow to compare our two distances on $\big[ 0,
-2^\mathsf{N} \big[$:
-
-\begin{proposition}
-Id: $\left(~\big[ 0, 2^\mathsf{N} \big[,\Delta~\right) \to \left(~\big[ 0,
-2^\mathsf{N} \big[, D~\right)$ is not continuous. 
-\end{proposition}
-
-\begin{proof}
-The sequence $x^n = 1,999\hdots 999$ constituted by $n$ 9 as decimal part, is
-such that:
-\begin{itemize}
-\item $\Delta (x^n,2) \to 0.$
-\item But $D(x^n,2) \geqslant 1$, then $D(x^n,2)$ does not converge to 0.
-\end{itemize}
-
-The sequential characterization of the continuity concludes the demonstration.
-\end{proof}
-
-
-
-A contrario:
-
-\begin{proposition}
-Id: $\left(~\big[ 0, 2^\mathsf{N} \big[,D~\right) \to \left(~\big[ 0,
-2^\mathsf{N} \big[, \Delta ~\right)$ is a continuous fonction. 
-\end{proposition}
-
-\begin{proof}
-If $D(x^n,x) \to 0$, then $D_e(x^n,x) = 0$ at least for $n$ larger than a given
-threshold, because $D_e$ only returns integers. So, after this threshold, the
-integral parts of all the $x^n$ are equal to the integral part of $x$. 
-
-Additionally, $D_s(x^n, x) \to 0$, then $\forall k \in \mathds{N}^*, \exists N_k
-\in \mathds{N}, n \geqslant N_k \Rightarrow D_s(x^n,x) \leqslant 10^{-k}$. This
-means that for all $k$, an index $N_k$ can be found such that, $\forall n
-\geqslant N_k$, all the $x^n$ have the same $k$ firsts digits, which are the
-digits of $x$. We can deduce the convergence $\Delta(x^n,x) \to 0$, and thus the
-result.
-\end{proof}
-
-The conclusion of these propositions is that the proposed metric is more precise
-than the Euclidian distance, that is:
-
-\begin{corollary}
-$D$ is finer than the Euclidian distance $\Delta$.
-\end{corollary}
-
-This corollary can be reformulated as follows:
-
-\begin{itemize}
-\item The topology produced by $\Delta$ is a subset of the topology produced by
-$D$.
-\item $D$ has more open sets than $\Delta$.
-\item It is harder to converge for the topology $\tau_D$ inherited by $D$, than
-to converge with the one inherited by $\Delta$, which is denoted here by
-$\tau_\Delta$.
-\end{itemize}
-
-
-\subsection{Chaos of the chaotic iterations on $\mathds{R}$}
-\label{chpt:Chaos des itérations chaotiques sur R}
-
-
-
-\subsubsection{Chaos according to Devaney}
-
-We have recalled previously that the chaotic iterations $\left(\Go,
-\mathcal{X}_d\right)$ are chaotic according to the formulation of Devaney. We
-can deduce that they are chaotic on $\mathds{R}$ too, when considering the order
-topology, because:
-\begin{itemize}
-\item $\left(\Go, \mathcal{X}_d\right)$ and $\left(g, \big[ 0, 2^{10}
-\big[_D\right)$ are semiconjugate by $\varphi$,
-\item Then $\left(g, \big[ 0, 2^{10} \big[_D\right)$ is a system chaotic
-according to Devaney, because the semiconjugacy preserve this character.
-\item But the topology generated by $D$ is finer than the topology generated by
-the Euclidian distance $\Delta$ -- which is the order topology.
-\item According to Theorem \ref{Th:chaos et finesse}, we can deduce that the
-chaotic iterations $g$ are indeed chaotic, as defined by Devaney, for the order
-topology on $\mathds{R}$.
-\end{itemize}
-
-This result can be formulated as follows.
-
-\begin{theorem}
-\label{th:IC et topologie de l'ordre}
-The chaotic iterations $g$ on $\mathds{R}$ are chaotic according to the
-Devaney's formulation, when $\mathds{R}$ has his usual topology, which is the
-order topology.
-\end{theorem}
-
-Indeed this result is weaker than the theorem establishing the chaos for the
-finer topology $d$. However the Theorem \ref{th:IC et topologie de l'ordre}
-still remains important. Indeed, we have studied in our previous works a set
-different from the usual set of study ($\mathcal{X}$ instead of $\mathds{R}$),
-in order to be as close as possible from the computer: the properties of
-disorder proved theoretically will then be preserved when computing. However, we
-could wonder whether this change does not lead to a disorder of a lower quality.
-In other words, have we replaced a situation of a good disorder lost when
-computing, to another situation of a disorder preserved but of bad quality.
-Theorem \ref{th:IC et topologie de l'ordre} prove exactly the contrary.
- 
+\section{Conclusion}
  
  
+In  this  paper  we have  presented  a  new  class  of  PRNGs based  on  chaotic
+iterations. We have proven that these PRNGs are chaotic in the sense of Devenay.
+We also propose a PRNG cryptographically secure and its implementation on GPU.
  
+An  efficient implementation  on  GPU based  on  a xor-like  PRNG  allows us  to
+generate   a  huge   number   of  pseudorandom   numbers   per  second   (about
+20Gsample/s). This PRNG succeeds to pass the hardest batteries of TestU01.
  
+In future  work we plan to  extend this work  for parallel PRNG for  clusters or
+grid computing. We also plan to improve  the BBS version in order to succeed all
+the tests of TestU01.
  
  
  
-\section{Conclusion}
-\bibliographystyle{plain}
+\bibliographystyle{plain} 
  \bibliography{mabase}
  \end{document}