X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/blobdiff_plain/9b6557073a4a37c514adbdccffe506f12cede15e..81be58cc120f6a94c5f98c292f65a74fc8df5973:/prng_gpu.tex?ds=inline diff --git a/prng_gpu.tex b/prng_gpu.tex index ccb0c95..1c7c9fe 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -34,29 +34,183 @@ \newcommand{\alert}[1]{\begin{color}{blue}\textit{#1}\end{color}} -\title{Efficient Generation of Pseudo-Random Bumbers based on Chaotic Iterations -on GPU} +\title{Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU} \begin{document} -\author{Jacques M. Bahi, Rapha\"{e}l Couturier, and Christophe -Guyeux\thanks{Authors in alphabetic order}} - +\author{Jacques M. Bahi, Rapha\"{e}l Couturier, Christophe +Guyeux, and Pierre-Cyrille Heam\thanks{Authors in alphabetic order}} + \maketitle \begin{abstract} -This is the abstract +In this paper we present a new pseudorandom number generator (PRNG) on +graphics processing units (GPU). This PRNG is based on the so-called chaotic iterations. It +is firstly proven to be chaotic according to the Devaney's formulation. We thus propose an efficient +implementation for GPU that successfully passes the {\it BigCrush} tests, deemed to be the hardest +battery of tests in TestU01. Experiments show that this PRNG can generate +about 20 billions of random numbers per second on Tesla C1060 and NVidia GTX280 +cards. +It is finally established that, under reasonable assumptions, the proposed PRNG can be cryptographically +secure. + + \end{abstract} \section{Introduction} -Interet des itérations chaotiques pour générer des nombre alea\\ -Interet de générer des nombres alea sur GPU -\alert{RC, un petit state-of-the-art sur les PRNGs sur GPU ?} -... - +Randomness is of importance in many fields as scientific simulations or cryptography. +``Random numbers'' can mainly be generated either by a deterministic and reproducible algorithm +called a pseudorandom number generator (PRNG), or by a physical non-deterministic +process having all the characteristics of a random noise, called a truly random number +generator (TRNG). +In this paper, we focus on reproducible generators, useful for instance in +Monte-Carlo based simulators or in several cryptographic schemes. +These domains need PRNGs that are statistically irreproachable. +On some fields as in numerical simulations, speed is a strong requirement +that is usually attained by using parallel architectures. In that case, +a recurrent problem is that a deflate of the statistical qualities is often +reported, when the parallelization of a good PRNG is realized. +This is why ad-hoc PRNGs for each possible architecture must be found to +achieve both speed and randomness. +On the other side, speed is not the main requirement in cryptography: the great +need is to define \emph{secure} generators being able to withstand malicious +attacks. Roughly speaking, an attacker should not be able in practice to make +the distinction between numbers obtained with the secure generator and a true random +sequence. +Finally, a small part of the community working in this domain focus on a +third requirement, that is to define chaotic generators. +The main idea is to take benefits from a chaotic dynamical system to obtain a +generator that is unpredictable, disordered, sensible to its seed, or in other words chaotic. +Their desire is to map a given chaotic dynamics into a sequence that seems random +and unassailable due to chaos. +However, the chaotic maps used as a pattern are defined in the real line +whereas computers deal with finite precision numbers. +This distortion leads to a deflation of both chaotic properties and speed. +Furthermore, authors of such chaotic generators often claim their PRNG +as secure due to their chaos properties, but there is no obvious relation +between chaos and security as it is understood in cryptography. +This is why the use of chaos for PRNG still remains marginal and disputable. + +The authors' opinion is that topological properties of disorder, as they are +properly defined in the mathematical theory of chaos, can reinforce the quality +of a PRNG. But they are not substitutable for security or statistical perfection. +Indeed, to the authors' point of view, such properties can be useful in the two following situations. On the +one hand, a post-treatment based on a chaotic dynamical system can be applied +to a PRNG statistically deflective, in order to improve its statistical +properties. Such an improvement can be found, for instance, in~\cite{bgw09:ip,bcgr11:ip}. +On the other hand, chaos can be added to a fast, statistically perfect PRNG and/or a +cryptographically secure one, in case where chaos can be of interest, +\emph{only if these last properties are not lost during +the proposed post-treatment}. Such an assumption is behind this research work. +It leads to the attempts to define a +family of PRNGs that are chaotic while being fast and statistically perfect, +or cryptographically secure. +Let us finish this paragraph by noticing that, in this paper, +statistical perfection refers to the ability to pass the whole +{\it BigCrush} battery of tests, which is widely considered as the most +stringent statistical evaluation of a sequence claimed as random. +This battery can be found into the well-known TestU01 package~\cite{LEcuyerS07}. +Chaos, for its part, refers to the well-established definition of a +chaotic dynamical system proposed by Devaney~\cite{Devaney}. + + +In a previous work~\cite{bgw09:ip,guyeux10} we have proposed a post-treatment on PRNGs making them behave +as a chaotic dynamical system. Such a post-treatment leads to a new category of +PRNGs. We have shown that proofs of Devaney's chaos can be established for this +family, and that the sequence obtained after this post-treatment can pass the +NIST~\cite{Nist10}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07} batteries of tests, even if the inputted generators +cannot. +The proposition of this paper is to improve widely the speed of the formerly +proposed generator, without any lack of chaos or statistical properties. +In particular, a version of this PRNG on graphics processing units (GPU) +is proposed. +Although GPU was initially designed to accelerate +the manipulation of images, they are nowadays commonly used in many scientific +applications. Therefore, it is important to be able to generate pseudorandom +numbers inside a GPU when a scientific application runs in it. This remark +motivates our proposal of a chaotic and statistically perfect PRNG for GPU. +Such device +allows us to generated almost 20 billions of pseudorandom numbers per second. +Last, but not least, we show that the proposed post-treatment preserves the +cryptographical security of the inputted PRNG, when this last has such a +property. + +The remainder of this paper is organized as follows. In Section~\ref{section:related + works} we review some GPU implementations of PRNGs. Section~\ref{section:BASIC + RECALLS} gives some basic recalls on the well-known Devaney's formulation of chaos, + and on an iteration process called ``chaotic +iterations'' on which the post-treatment is based. +Proofs of chaos are given in Section~\ref{sec:pseudorandom}. +Section~\ref{sec:efficient prng} presents an efficient +implementation of this chaotic PRNG on a CPU, whereas Section~\ref{sec:efficient prng + gpu} describes the GPU implementation. +Such generators are experimented in +Section~\ref{sec:experiments}. +We show in Section~\ref{sec:security analysis} that, if the inputted +generator is cryptographically secure, then it is the case too for the +generator provided by the post-treatment. +Such a proof leads to the proposition of a cryptographically secure and +chaotic generator on GPU based on the famous Blum Blum Shum +in Section~\ref{sec:CSGPU}. +This research work ends by a conclusion section, in which the contribution is +summarized and intended future work is presented. + + + + +\section{Related works on GPU based PRNGs} +\label{section:related works} + +Numerous research works on defining GPU based PRNGs have yet been proposed in the +literature, so that completeness is impossible. +This is why authors of this document only give reference to the most significant attempts +in this domain, from their subjective point of view. +The quantity of pseudorandom numbers generated per second is mentioned here +only when the information is given in the related work. +A million numbers per second will be simply written as +1MSample/s whereas a billion numbers per second is 1GSample/s. + +In \cite{Pang:2008:cec} a PRNG based on cellular automata is defined +with no requirement to an high precision integer arithmetic or to any bitwise +operations. Authors can generate about +3.2MSamples/s on a GeForce 7800 GTX GPU, which is quite an old card now. +However, there is neither a mention of statistical tests nor any proof of +chaos or cryptography in this document. + +In \cite{ZRKB10}, the authors propose different versions of efficient GPU PRNGs +based on Lagged Fibonacci or Hybrid Taus. They have used these +PRNGs for Langevin simulations of biomolecules fully implemented on +GPU. Performance of the GPU versions are far better than those obtained with a +CPU, and these PRNGs succeed to pass the {\it BigCrush} battery of TestU01. +However the evaluations of the proposed PRNGs are only statistical ones. + + +Authors of~\cite{conf/fpga/ThomasHL09} have studied the implementation of some +PRNGs on different computing architectures: CPU, field-programmable gate array +(FPGA), massively parallel processors, and GPU. This study is of interest, because +the performance of the same PRNGs on different architectures are compared. +FPGA appears as the fastest and the most +efficient architecture, providing the fastest number of generated pseudorandom numbers +per joule. +However, we can notice that authors can ``only'' generate between 11 and 16GSamples/s +with a GTX 280 GPU, which should be compared with +the results presented in this document. +We can remark too that the PRNGs proposed in~\cite{conf/fpga/ThomasHL09} are only +able to pass the {\it Crush} battery, which is very easy compared to the {\it Big Crush} one. + +Lastly, Cuda has developed a library for the generation of pseudorandom numbers called +Curand~\cite{curand11}. Several PRNGs are implemented, among +other things +Xorwow~\cite{Marsaglia2003} and some variants of Sobol. The tests reported show that +their fastest version provides 15GSamples/s on the new Fermi C2050 card. +But their PRNGs cannot pass the whole TestU01 battery (only one test is failed). +\newline +\newline +We can finally remark that, to the best of our knowledge, no GPU implementation have been proven to be chaotic, and the cryptographically secure property is surprisingly never regarded. \section{Basic Recalls} \label{section:BASIC RECALLS} + This section is devoted to basic definitions and terminologies in the fields of topological chaos and chaotic iterations. \subsection{Devaney's Chaotic Dynamical Systems} @@ -271,17 +425,18 @@ if and only if $\Gamma(f)$ is strongly connected. \end{theorem} This result of chaos has lead us to study the possibility to build a -pseudo-random number generator (PRNG) based on the chaotic iterations. +pseudorandom number generator (PRNG) based on the chaotic iterations. As $G_f$, defined on the domain $\llbracket 1 ; \mathsf{N} \rrbracket^{\mathds{N}} \times \mathds{B}^\mathsf{N}$, is build from Boolean networks $f : \mathds{B}^\mathsf{N} \rightarrow \mathds{B}^\mathsf{N}$, we can preserve the theoretical properties on $G_f$ during implementations (due to the discrete nature of $f$). It is as if $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ; \mathsf{N} -\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance). +\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG). -\section{Application to Pseudo-Randomness} +\section{Application to Pseudorandomness} +\label{sec:pseudorandom} -\subsection{A First Pseudo-Random Number Generator} +\subsection{A First Pseudorandom Number Generator} We have proposed in~\cite{bgw09:ip} a new family of generators that receives two PRNGs as inputs. These two generators are mixed with chaotic iterations, @@ -326,11 +481,11 @@ return $y$\; This generator is synthesized in Algorithm~\ref{CI Algorithm}. -It takes as input: a function $f$; +It takes as input: a Boolean function $f$ satisfying Theorem~\ref{Th:Caractérisation des IC chaotiques}; an integer $b$, ensuring that the number of executed iterations is at least $b$ and at most $2b+1$; and an initial configuration $x^0$. It returns the new generated configuration $x$. Internally, it embeds two -\textit{XORshift}$(k)$ PRNGs \cite{Marsaglia2003} that returns integers +\textit{XORshift}$(k)$ PRNGs~\cite{Marsaglia2003} that returns integers uniformly distributed into $\llbracket 1 ; k \rrbracket$. \textit{XORshift} is a category of very fast PRNGs designed by George Marsaglia, @@ -351,7 +506,7 @@ We have proven in \cite{bcgr11:ip} that, if and only if $M$ is a double stochastic matrix. \end{theorem} -This former generator as successively passed various batteries of statistical tests, as the NIST tests~\cite{bcgr11:ip}. +This former generator as successively passed various batteries of statistical tests, as the NIST~\cite{bcgr11:ip}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07}. \subsection{Improving the Speed of the Former Generator} @@ -406,7 +561,7 @@ the vectorial negation, leads to a speed improvement. However, proofs of chaos obtained in~\cite{bg10:ij} have been established only for chaotic iterations of the form presented in Definition \ref{Def:chaotic iterations}. The question is now to determine whether the -use of more general chaotic iterations to generate pseudo-random numbers +use of more general chaotic iterations to generate pseudorandom numbers faster, does not deflate their topological chaos properties. \subsection{Proofs of Chaos of the General Formulation of the Chaotic Iterations} @@ -650,18 +805,28 @@ have $d((S,E),(\tilde S,E))<\epsilon$. \section{Efficient PRNG based on Chaotic Iterations} +\label{sec:efficient prng} + +Based on the proof presented in the previous section, it is now possible to +improve the speed of the generator formerly presented in~\cite{bgw09:ip,guyeux10}. +The first idea is to consider +that the provided strategy is a pseudorandom Boolean vector obtained by a +given PRNG. +An iteration of the system is simply the bitwise exclusive or between +the last computed state and the current strategy. +Topological properties of disorder exhibited by chaotic +iterations can be inherited by the inputted generator, hoping by doing so to +obtain some statistical improvements while preserving speed. -In order to implement efficiently a PRNG based on chaotic iterations it is -possible to improve previous works [ref]. One solution consists in considering -that the strategy used contains all the bits for which the negation is -achieved out. Then in order to apply the negation on these bits we can simply -apply the xor operator between the current number and the strategy. In -order to obtain the strategy we also use a classical PRNG. -Here is an example with 16-bits numbers showing how the bitwise operations +Let us give an example using 16-bits numbers, to clearly understand how the bitwise xor operations are -applied. Suppose that $x$ and the strategy $S^i$ are defined in binary mode. -Then the following table shows the result of $x$ xor $S^i$. +done. +Suppose that $x$ and the strategy $S^i$ are given as +binary vectors. +Table~\ref{TableExemple} shows the result of $x \oplus S^i$. + +\begin{table} $$ \begin{array}{|cc|cccccccccccccccc|} \hline @@ -675,35 +840,13 @@ x \oplus S^i&=&1&1&0&1&1&1&0&0&0&1&1&1&0&1&0&1\\ \hline \end{array} $$ +\caption{Example of an arbitrary round of the proposed generator} +\label{TableExemple} +\end{table} -%% \begin{figure}[htbp] -%% \begin{center} -%% \fbox{ -%% \begin{minipage}{14cm} -%% unsigned int CIprng() \{\\ -%% static unsigned int x = 123123123;\\ -%% unsigned long t1 = xorshift();\\ -%% unsigned long t2 = xor128();\\ -%% unsigned long t3 = xorwow();\\ -%% x = x\textasciicircum (unsigned int)t1;\\ -%% x = x\textasciicircum (unsigned int)(t2$>>$32);\\ -%% x = x\textasciicircum (unsigned int)(t3$>>$32);\\ -%% x = x\textasciicircum (unsigned int)t2;\\ -%% x = x\textasciicircum (unsigned int)(t1$>>$32);\\ -%% x = x\textasciicircum (unsigned int)t3;\\ -%% return x;\\ -%% \} -%% \end{minipage} -%% } -%% \end{center} -%% \caption{sequential Chaotic Iteration PRNG} -%% \label{algo:seqCIprng} -%% \end{figure} - - - -\lstset{language=C,caption={C code of the sequential chaotic iterations based -PRNG},label=algo:seqCIprng} + + +\lstset{language=C,caption={C code of the sequential PRNG based on chaotic iterations},label=algo:seqCIprng} \begin{lstlisting} unsigned int CIprng() { static unsigned int x = 123123123; @@ -724,52 +867,60 @@ unsigned int CIprng() { -In listing~\ref{algo:seqCIprng} a sequential version of our chaotic iterations -based PRNG is presented. The xor operator is represented by -\textasciicircum. This function uses three classical 64-bits PRNG: the -\texttt{xorshift}, the \texttt{xor128} and the \texttt{xorwow}. In the -following, we call them xor-like PRNGSs. These three PRNGs are presented -in~\cite{Marsaglia2003}. As each xor-like PRNG used works with 64-bits and as -our PRNG works with 32-bits, the use of \texttt{(unsigned int)} selects the 32 -least significant bits whereas \texttt{(unsigned int)(t3$>>$32)} selects the 32 -most significants bits of the variable \texttt{t}. So to produce a random -number realizes 6 xor operations with 6 32-bits numbers produced by 3 64-bits -PRNG. This version successes the BigCrush of the TestU01 battery [P. L’ecuyer - and R. Simard. Testu01]. +In Listing~\ref{algo:seqCIprng} a sequential version of the proposed PRNG based on chaotic iterations + is presented. The xor operator is represented by \textasciicircum. +This function uses three classical 64-bits PRNGs, namely the \texttt{xorshift}, the +\texttt{xor128}, and the \texttt{xorwow}~\cite{Marsaglia2003}. In the following, we call them +``xor-like PRNGs''. +As +each xor-like PRNG uses 64-bits whereas our proposed generator works with 32-bits, +we use the command \texttt{(unsigned int)}, that selects the 32 least significant bits of a given integer, and the code +\texttt{(unsigned int)(t3$>>$32)} in order to obtain the 32 most significant bits of \texttt{t}. + +So producing a pseudorandom number needs 6 xor operations +with 6 32-bits numbers that are provided by 3 64-bits PRNGs. This version successfully passes the +stringent BigCrush battery of tests~\cite{LEcuyerS07}. -\section{Efficient prng based on chaotic iterations on GPU} +\section{Efficient PRNGs based on Chaotic Iterations on GPU} +\label{sec:efficient prng gpu} -In order to benefit from computing power of GPU, a program needs to define -independent blocks of threads which can be computed simultaneously. In general, -the larger the number of threads is, the more local memory is used and the less -branching instructions are used (if, while, ...), the better performance is -obtained on GPU. So with algorithm \ref{algo:seqCIprng} presented in the -previous section, it is possible to build a similar program which computes PRNG -on GPU. In the CUDA [ref] environment, threads have a local identificator, -called \texttt{ThreadIdx} relative to the block containing them. +In order to take benefits from the computing power of GPU, a program needs to have +independent blocks of threads that can be computed simultaneously. In general, +the larger the number of threads is, the more local memory is used, and the less +branching instructions are used (if, while, ...), the better the performances on GPU is. +Obviously, having these requirements in mind, it is possible to build a program similar to +the one presented in Algorithm \ref{algo:seqCIprng}, which computes pseudorandom numbers +on GPU. +To do so, we must firstly recall that in + the CUDA~\cite{Nvid10} environment, threads have a local +identifier called \texttt{ThreadIdx}, which is relative to the block containing them. -\subsection{Naive version for GPU} +\subsection{Naive Version for GPU} -From the CPU version, it is possible to obtain a quite similar version for GPU. -The principe consists in assigning the computation of a PRNG as in sequential to -each thread of the GPU. Of course, it is essential that the three xor-like -PRNGs used for our computation have different parameters. So we chose them -randomly with another PRNG. As the initialisation is performed by the CPU, we -have chosen to use the ISAAC PRNG [ref] to initalize all the parameters for the -GPU version of our PRNG. The implementation of the three xor-like PRNGs is -straightforward as soon as their parameters have been allocated in the GPU -memory. Each xor-like PRNGs used works with an internal number $x$ which keeps -the last generated random numbers. Other internal variables are also used by the -xor-like PRNGs. More precisely, the implementation of the xor128, the xorshift -and the xorwow respectively require 4, 5 and 6 unsigned long as internal -variables. + +It is possible to deduce from the CPU version a quite similar version adapted to GPU. +The simple principle consists to make each thread of the GPU computing the CPU version of our PRNG. +Of course, the three xor-like +PRNGs used in these computations must have different parameters. +In a given thread, these lasts are +randomly picked from another PRNGs. +The initialization stage is performed by the CPU. +To do it, the ISAAC PRNG~\cite{Jenkins96} is used to set all the +parameters embedded into each thread. + +The implementation of the three +xor-like PRNGs is straightforward when their parameters have been +allocated in the GPU memory. Each xor-like works with an internal +number $x$ that saves the last generated pseudorandom number. Additionally, the +implementation of the xor128, the xorshift, and the xorwow respectively require +4, 5, and 6 unsigned long as internal variables. \begin{algorithm} \KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like PRNGs in global memory\; -NumThreads: Number of threads\;} +NumThreads: number of threads\;} \KwOut{NewNb: array containing random numbers in global memory} \If{threadIdx is concerned by the computation} { retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\; @@ -780,34 +931,34 @@ NumThreads: Number of threads\;} store internal variables in InternalVarXorLikeArray[threadIdx]\; } -\caption{main kernel for the chaotic iterations based PRNG GPU naive version} +\caption{Main kernel of the GPU ``naive'' version of the PRNG based on chaotic iterations} \label{algo:gpu_kernel} \end{algorithm} -Algorithm~\ref{algo:gpu_kernel} presents a naive implementation of PRNG using -GPU. According to the available memory in the GPU and the number of threads +Algorithm~\ref{algo:gpu_kernel} presents a naive implementation of the proposed PRNG on +GPU. Due to the available memory in the GPU and the number of threads used simultenaously, the number of random numbers that a thread can generate -inside a kernel is limited, i.e. the variable \texttt{n} in -algorithm~\ref{algo:gpu_kernel}. For example, if $100,000$ threads are used and -if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)} -then the memory required to store internals variables of xor-like +inside a kernel is limited (\emph{i.e.}, the variable \texttt{n} in +algorithm~\ref{algo:gpu_kernel}). For instance, if $100,000$ threads are used and +if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}, +then the memory required to store all of the internals variables of both the xor-like PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers} -and random number of our PRNG is equals to $100,000\times ((4+5+6)\times -2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb. +and the pseudorandom numbers generated by our PRNG, is equal to $100,000\times ((4+5+6)\times +2+(1+100))=1,310,000$ 32-bits numbers, that is, approximately $52$Mb. -All the tests performed to pass the BigCrush of TestU01 succeeded. Different -number of threads, called \texttt{NumThreads} in our algorithm, have been tested -upto $10$ millions. +This generator is able to pass the whole BigCrush battery of tests, for all +the versions that have been tested depending on their number of threads +(called \texttt{NumThreads} in our algorithm, tested until $10$ millions). \begin{remark} -Algorithm~\ref{algo:gpu_kernel} has the advantage to manipulate independent -PRNGs, so this version is easily usable on a cluster of computer. The only thing -to ensure is to use a single ISAAC PRNG. For this, a simple solution consists in -using a master node for the initialization which computes the initial parameters +The proposed algorithm has the advantage to manipulate independent +PRNGs, so this version is easily adaptable on a cluster of computers too. The only thing +to ensure is to use a single ISAAC PRNG. To achieve this requirement, a simple solution consists in +using a master node for the initialization. This master node computes the initial parameters for all the differents nodes involves in the computation. \end{remark} -\subsection{Improved version for GPU} +\subsection{Improved Version for GPU} As GPU cards using CUDA have shared memory between threads of the same block, it is possible to use this feature in order to simplify the previous algorithm, @@ -823,7 +974,7 @@ which represent the indexes of the other threads for which the results are used by the current thread. In the algorithm, we consider that a 64-bits xor-like PRNG is used, that is why both 32-bits parts are used. -This version also succeed to the BigCrush batteries of tests. +This version also succeeds to the {\it BigCrush} batteries of tests. \begin{algorithm} @@ -834,17 +985,15 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;} \KwOut{NewNb: array containing random numbers in global memory} \If{threadId is concerned} { - retrieve data from InternalVarXorLikeArray[threadId] in local variables\; + retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\; offset = threadIdx\%permutation\_size\; o1 = threadIdx-offset+tab1[offset]\; o2 = threadIdx-offset+tab2[offset]\; \For{i=1 to n} { t=xor-like()\; - shared\_mem[threadId]=(unsigned int)t\; - x = x $\oplus$ (unsigned int) t\; - x = x $\oplus$ (unsigned int) (t>>32)\; - x = x $\oplus$ shared[o1]\; - x = x $\oplus$ shared[o2]\; + t=t$\oplus$shmem[o1]$\oplus$shmem[o2]\; + shared\_mem[threadId]=t\; + x = x $\oplus$ t\; store the new PRNG in NewNb[NumThreads*threadId+i]\; } @@ -858,9 +1007,9 @@ version} \subsection{Theoretical Evaluation of the Improved Version} -A run of Algorithm~\ref{algo:gpu_kernel2} consists in four operations having +A run of Algorithm~\ref{algo:gpu_kernel2} consists in three operations having the form of Equation~\ref{equation Oplus}, which is equivalent to the iterative -system of Eq.~\ref{eq:generalIC}. That is, four iterations of the general chaotic +system of Eq.~\ref{eq:generalIC}. That is, three iterations of the general chaotic iterations are realized between two stored values of the PRNG. To be certain that we are in the framework of Theorem~\ref{t:chaos des general}, we must guarantee that this dynamical system iterates on the space @@ -880,558 +1029,217 @@ chaotic iterations presented previously, and for this reason, it satisfies the Devaney's formulation of a chaotic behavior. \section{Experiments} - -Different experiments have been performed in order to measure the generation -speed. -\begin{figure}[t] +\label{sec:experiments} + +Different experiments have been performed in order to measure the generation +speed. We have used a computer equiped with Tesla C1060 NVidia GPU card and an +Intel Xeon E5530 cadenced at 2.40 GHz for our experiments and we have used +another one equipped with a less performant CPU and a GeForce GTX 280. Both +cards have 240 cores. + +In Figure~\ref{fig:time_xorlike_gpu} we compare the number of random numbers +generated per second with the xor-like based PRNG. In this figure, the optimized +version use the {\it xor64} described in~\cite{Marsaglia2003}. The naive version +use the three xor-like PRNGs described in Listing~\ref{algo:seqCIprng}. In +order to obtain the optimal performance we removed the storage of random numbers +in the GPU memory. This step is time consuming and slows down the random numbers +generation. Moreover, if one is interested by applications that consume random +numbers directly when they are generated, their storage are completely +useless. In this figure we can see that when the number of threads is greater +than approximately 30,000 upto 5 millions the number of random numbers generated +per second is almost constant. With the naive version, it is between 2.5 and +3GSample/s. With the optimized version, it is approximately equals to +20GSample/s. Finally we can remark that both GPU cards are quite similar. In +practice, the Tesla C1060 has more memory than the GTX 280 and this memory +should be of better quality. + +\begin{figure}[htbp] \begin{center} - \includegraphics[scale=.7]{curve_time_gpu.pdf} + \includegraphics[scale=.7]{curve_time_xorlike_gpu.pdf} \end{center} -\caption{Number of random numbers generated per second} -\label{fig:time_naive_gpu} +\caption{Number of random numbers generated per second with the xorlike based PRNG} +\label{fig:time_xorlike_gpu} \end{figure} -First of all we have compared the time to generate X random numbers with both -the CPU version and the GPU version. - -Faire une courbe du nombre de random en fonction du nombre de threads, -éventuellement en fonction du nombres de threads par bloc. - - - -\section{The relativity of disorder} -\label{sec:de la relativité du désordre} - -In the next two sections, we investigate the impact of the choices that have -lead to the definitions of measures in Sections \ref{sec:chaotic iterations} and \ref{deuxième def}. - -\subsection{Impact of the topology's finenesse} - -Let us firstly introduce the following notations. - -\begin{notation} -$\mathcal{X}_\tau$ will denote the topological space -$\left(\mathcal{X},\tau\right)$, whereas $\mathcal{V}_\tau (x)$ will be the set -of all the neighborhoods of $x$ when considering the topology $\tau$ (or simply -$\mathcal{V} (x)$, if there is no ambiguity). -\end{notation} - - - -\begin{theorem} -\label{Th:chaos et finesse} -Let $\mathcal{X}$ a set and $\tau, \tau'$ two topologies on $\mathcal{X}$ s.t. -$\tau'$ is finer than $\tau$. Let $f:\mathcal{X} \to \mathcal{X}$, continuous -both for $\tau$ and $\tau'$. - -If $(\mathcal{X}_{\tau'},f)$ is chaotic according to Devaney, then -$(\mathcal{X}_\tau,f)$ is chaotic too. -\end{theorem} - -\begin{proof} -Let us firstly establish the transitivity of $(\mathcal{X}_\tau,f)$. - -Let $\omega_1, \omega_2$ two open sets of $\tau$. Then $\omega_1, \omega_2 \in -\tau'$, becaus $\tau'$ is finer than $\tau$. As $f$ is $\tau'-$transitive, we -can deduce that $\exists n \in \mathds{N}, \omega_1 \cap f^{(n)}(\omega_2) = -\varnothing$. Consequently, $f$ is $\tau-$transitive. - -Let us now consider the regularity of $(\mathcal{X}_\tau,f)$, \emph{i.e.}, for -all $x \in \mathcal{X}$, and for all $\tau-$neighborhood $V$ of $x$, there is a -periodic point for $f$ into $V$. - -Let $x \in \mathcal{X}$ and $V \in \mathcal{V}_\tau (x)$ a $\tau-$neighborhood -of $x$. By definition, $\exists \omega \in \tau, x \in \omega \subset V$. - -But $\tau \subset \tau'$, so $\omega \in \tau'$, and then $V \in -\mathcal{V}_{\tau'} (x)$. As $(\mathcal{X}_{\tau'},f)$ is regular, there is a -periodic point for $f$ into $V$, and the regularity of $(\mathcal{X}_\tau,f)$ is -proven. -\end{proof} - -\subsection{A given system can always be claimed as chaotic} - -Let $f$ an iteration function on $\mathcal{X}$ having at least a fixed point. -Then this function is chaotic (in a certain way): - -\begin{theorem} -Let $\mathcal{X}$ a nonempty set and $f: \mathcal{X} \to \X$ a function having -at least a fixed point. -Then $f$ is $\tau_0-$chaotic, where $\tau_0$ is the trivial (indiscrete) -topology on $\X$. -\end{theorem} - - -\begin{proof} -$f$ is transitive when $\forall \omega, \omega' \in \tau_0 \setminus -\{\varnothing\}, \exists n \in \mathds{N}, f^{(n)}(\omega) \cap \omega' \neq -\varnothing$. -As $\tau_0 = \left\{ \varnothing, \X \right\}$, this is equivalent to look for -an integer $n$ s.t. $f^{(n)}\left( \X \right) \cap \X \neq \varnothing$. For -instance, $n=0$ is appropriate. - -Let us now consider $x \in \X$ and $V \in \mathcal{V}_{\tau_0} (x)$. Then $V = -\mathcal{X}$, so $V$ has at least a fixed point for $f$. Consequently $f$ is -regular, and the result is established. -\end{proof} - - - - -\subsection{A given system can always be claimed as non-chaotic} - -\begin{theorem} -Let $\mathcal{X}$ be a set and $f: \mathcal{X} \to \X$. -If $\X$ is infinite, then $\left( \X_{\tau_\infty}, f\right)$ is not chaotic -(for the Devaney's formulation), where $\tau_\infty$ is the discrete topology. -\end{theorem} - -\begin{proof} -Let us prove it by contradiction, assuming that $\left(\X_{\tau_\infty}, -f\right)$ is both transitive and regular. +In comparison, Listing~\ref{algo:seqCIprng} allows us to generate about +138MSample/s with only one core of the Xeon E5530. -Let $x \in \X$ and $\{x\}$ one of its neighborhood. This neighborhood must -contain a periodic point for $f$, if we want that $\left(\X_{\tau_\infty}, -f\right)$ is regular. Then $x$ must be a periodic point of $f$. -Let $I_x = \left\{ f^{(n)}(x), n \in \mathds{N}\right\}$. This set is finite -because $x$ is periodic, and $\mathcal{X}$ is infinite, then $\exists y \in -\mathcal{X}, y \notin I_x$. +In Figure~\ref{fig:time_bbs_gpu} we highlight the performance of the optimized +BBS based PRNG on GPU. Performances are less important. On the Tesla C1060 we +obtain approximately 1.8GSample/s and on the GTX 280 about 1.6GSample/s. -As $\left(\X_{\tau_\infty}, f\right)$ must be transitive, for all open nonempty -sets $A$ and $B$, an integer $n$ must satisfy $f^{(n)}(A) \cap B \neq -\varnothing$. However $\{x\}$ and $\{y\}$ are open sets and $y \notin I_x -\Rightarrow \forall n, f^{(n)}\left( \{x\} \right) \cap \{y\} = \varnothing$. -\end{proof} +\begin{figure}[htbp] +\begin{center} + \includegraphics[scale=.7]{curve_time_bbs_gpu.pdf} +\end{center} +\caption{Number of random numbers generated per second with the BBS based PRNG} +\label{fig:time_bbs_gpu} +\end{figure} +Both these experiments allows us to conclude that it is possible to +generate a huge number of pseudorandom numbers with the xor-like version and +about tens times less with the BBS based version. The former version has only +chaotic properties whereas the latter also has cryptographically properties. -\section{Chaos on the order topology} -\subsection{The phase space is an interval of the real line} -\subsubsection{Toward a topological semiconjugacy} +\section{Security Analysis} +\label{sec:security analysis} -In what follows, our intention is to establish, by using a topological -semiconjugacy, that chaotic iterations over $\mathcal{X}$ can be described as -iterations on a real interval. To do so, we must firstly introduce some -notations and terminologies. -Let $\mathcal{S}_\mathsf{N}$ be the set of sequences belonging into $\llbracket -1; \mathsf{N}\rrbracket$ and $\mathcal{X}_{\mathsf{N}} = \mathcal{S}_\mathsf{N} -\times \B^\mathsf{N}$. +In this section the concatenation of two strings $u$ and $v$ is classically +denoted by $uv$. +In a cryptographic context, a pseudorandom generator is a deterministic +algorithm $G$ transforming strings into strings and such that, for any +seed $w$ of length $N$, $G(w)$ (the output of $G$ on the input $w$) has size +$\ell_G(N)$ with $\ell_G(N)>N$. +The notion of {\it secure} PRNGs can now be defined as follows. \begin{definition} -The function $\varphi: \mathcal{S}_{10} \times\mathds{B}^{10} \rightarrow \big[ -0, 2^{10} \big[$ is defined by: -\begin{equation} - \begin{array}{cccl} -\varphi: & \mathcal{X}_{10} = \mathcal{S}_{10} \times\mathds{B}^{10}& -\longrightarrow & \big[ 0, 2^{10} \big[ \\ - & (S,E) = \left((S^0, S^1, \hdots ); (E_0, \hdots, E_9)\right) & \longmapsto & -\varphi \left((S,E)\right) -\end{array} -\end{equation} -where $\varphi\left((S,E)\right)$ is the real number: -\begin{itemize} -\item whose integral part $e$ is $\displaystyle{\sum_{k=0}^9 2^{9-k} E_k}$, that -is, the binary digits of $e$ are $E_0 ~ E_1 ~ \hdots ~ E_9$. -\item whose decimal part $s$ is equal to $s = 0,S^0~ S^1~ S^2~ \hdots = -\sum_{k=1}^{+\infty} 10^{-k} S^{k-1}.$ -\end{itemize} +A cryptographic PRNG $G$ is secure if for any probabilistic polynomial time +algorithm $D$, for any positive polynomial $p$, and for all sufficiently +large $k$'s, +$$| \mathrm{Pr}[D(G(U_k))=1]-Pr[D(U_{\ell_G(k)}=1]|< \frac{1}{p(N)},$$ +where $U_r$ is the uniform distribution over $\{0,1\}^r$ and the +probabilities are taken over $U_N$, $U_{\ell_G(N)}$ as well as over the +internal coin tosses of $D$. \end{definition} +Intuitively, it means that there is no polynomial time algorithm that can +distinguish a perfect uniform random generator from $G$ with a non +negligible probability. The interested reader is referred +to~\cite[chapter~3]{Goldreich} for more information. Note that it is +quite easily possible to change the function $\ell$ into any polynomial +function $\ell^\prime$ satisfying $\ell^\prime(N)>N)$~\cite[Chapter 3.3]{Goldreich}. + +The generation schema developed in (\ref{equation Oplus}) is based on a +pseudorandom generator. Let $H$ be a cryptographic PRNG. We may assume, +without loss of generality, that for any string $S_0$ of size $N$, the size +of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. +Let $S_1,\ldots,S_k$ be the +strings of length $N$ such that $H(S_0)=S_1 \ldots S_k$ ($H(S_0)$ is the concatenation of +the $S_i$'s). The cryptographic PRNG $X$ defined in (\ref{equation Oplus}) +is the algorithm mapping any string of length $2N$ $x_0S_0$ into the string +$(x_0\oplus S_0 \oplus S_1)(x_0\oplus S_0 \oplus S_1\oplus S_2)\ldots +(x_o\bigoplus_{i=0}^{i=k}S_i)$. Particularly one has $\ell_{X}(2N)=kN=\ell_H(N)$. +We claim now that if this PRNG is secure, +then the new one is secure too. +\begin{proposition} +If $H$ is a secure cryptographic PRNG, then $X$ is a secure cryptographic +PRNG too. +\end{proposition} -$\varphi$ realizes the association between a point of $\mathcal{X}_{10}$ and a -real number into $\big[ 0, 2^{10} \big[$. We must now translate the chaotic -iterations $\Go$ on this real interval. To do so, two intermediate functions -over $\big[ 0, 2^{10} \big[$ must be introduced: - - -\begin{definition} -\label{def:e et s} -Let $x \in \big[ 0, 2^{10} \big[$ and: -\begin{itemize} -\item $e_0, \hdots, e_9$ the binary digits of the integral part of $x$: -$\displaystyle{\lfloor x \rfloor = \sum_{k=0}^{9} 2^{9-k} e_k}$. -\item $(s^k)_{k\in \mathds{N}}$ the digits of $x$, where the chosen decimal -decomposition of $x$ is the one that does not have an infinite number of 9: -$\displaystyle{x = \lfloor x \rfloor + \sum_{k=0}^{+\infty} s^k 10^{-k-1}}$. -\end{itemize} -$e$ and $s$ are thus defined as follows: -\begin{equation} -\begin{array}{cccl} -e: & \big[ 0, 2^{10} \big[ & \longrightarrow & \mathds{B}^{10} \\ - & x & \longmapsto & (e_0, \hdots, e_9) -\end{array} +\begin{proof} +The proposition is proved by contraposition. Assume that $X$ is not +secure. By Definition, there exists a polynomial time probabilistic +algorithm $D$, a positive polynomial $p$, such that for all $k_0$ there exists +$N\geq \frac{k_0}{2}$ satisfying +$$| \mathrm{Pr}[D(X(U_{2N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)}.$$ +We describe a new probabilistic algorithm $D^\prime$ on an input $w$ of size +$kN$: +\begin{enumerate} +\item Decompose $w$ into $w=w_1\ldots w_{k}$, where each $w_i$ has size $N$. +\item Pick a string $y$ of size $N$ uniformly at random. +\item Compute $z=(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y + \bigoplus_{i=1}^{i=k} w_i).$ +\item Return $D(z)$. +\end{enumerate} + + +Consider for each $y\in \mathbb{B}^{kN}$ the function $\varphi_{y}$ +from $\mathbb{B}^{kN}$ into $\mathbb{B}^{kN}$ mapping $w=w_1\ldots w_k$ +(each $w_i$ has length $N$) to +$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y + \bigoplus_{i=1}^{i=k_1} w_i).$ By construction, one has for every $w$, +\begin{equation}\label{PCH-1} +D^\prime(w)=D(\varphi_y(w)), \end{equation} -and -\begin{equation} - \begin{array}{cccc} -s: & \big[ 0, 2^{10} \big[ & \longrightarrow & \llbracket 0, 9 -\rrbracket^{\mathds{N}} \\ - & x & \longmapsto & (s^k)_{k \in \mathds{N}} -\end{array} +where $y$ is randomly generated. +Moreover, for each $y$, $\varphi_{y}$ is injective: if +$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y\bigoplus_{i=1}^{i=k_1} +w_i)=(y\oplus w_1^\prime)(y\oplus w_1^\prime\oplus w_2^\prime)\ldots +(y\bigoplus_{i=1}^{i=k} w_i^\prime)$, then for every $1\leq j\leq k$, +$y\bigoplus_{i=1}^{i=j} w_i^\prime=y\bigoplus_{i=1}^{i=j} w_i$. It follows, +by a direct induction, that $w_i=w_i^\prime$. Furthermore, since $\mathbb{B}^{kN}$ +is finite, each $\varphi_y$ is bijective. Therefore, and using (\ref{PCH-1}), +one has +\begin{equation}\label{PCH-2} +\mathrm{Pr}[D^\prime(U_{kN})=1]=\mathrm{Pr}[D(\varphi_y(U_{kN}))=1]=\mathrm{Pr}[D(U_{kN})=1]. \end{equation} -\end{definition} - -We are now able to define the function $g$, whose goal is to translate the -chaotic iterations $\Go$ on an interval of $\mathds{R}$. -\begin{definition} -$g:\big[ 0, 2^{10} \big[ \longrightarrow \big[ 0, 2^{10} \big[$ is defined by: -\begin{equation} -\begin{array}{cccc} -g: & \big[ 0, 2^{10} \big[ & \longrightarrow & \big[ 0, 2^{10} \big[ \\ - & x & \longmapsto & g(x) -\end{array} +Now, using (\ref{PCH-1}) again, one has for every $x$, +\begin{equation}\label{PCH-3} +D^\prime(H(x))=D(\varphi_y(H(x))), \end{equation} -where g(x) is the real number of $\big[ 0, 2^{10} \big[$ defined bellow: -\begin{itemize} -\item its integral part has a binary decomposition equal to $e_0', \hdots, -e_9'$, with: - \begin{equation} -e_i' = \left\{ -\begin{array}{ll} -e(x)_i & \textrm{ if } i \neq s^0\\ -e(x)_i + 1 \textrm{ (mod 2)} & \textrm{ if } i = s^0\\ -\end{array} -\right. +where $y$ is randomly generated. By construction, $\varphi_y(H(x))=X(yx)$, +thus +\begin{equation}\label{PCH-3} +D^\prime(H(x))=D(yx), \end{equation} -\item whose decimal part is $s(x)^1, s(x)^2, \hdots$ -\end{itemize} -\end{definition} - -\bigskip +where $y$ is randomly generated. +It follows that - -In other words, if $x = \displaystyle{\sum_{k=0}^{9} 2^{9-k} e_k + -\sum_{k=0}^{+\infty} s^{k} ~10^{-k-1}}$, then: -\begin{equation} -g(x) = -\displaystyle{\sum_{k=0}^{9} 2^{9-k} (e_k + \delta(k,s^0) \textrm{ (mod 2)}) + -\sum_{k=0}^{+\infty} s^{k+1} 10^{-k-1}}. +\begin{equation}\label{PCH-4} +\mathrm{Pr}[D^\prime(H(U_{N}))=1]=\mathrm{Pr}[D(U_{2N})=1]. \end{equation} - - -\subsubsection{Defining a metric on $\big[ 0, 2^{10} \big[$} - -Numerous metrics can be defined on the set $\big[ 0, 2^{10} \big[$, the most -usual one being the Euclidian distance recalled bellow: - -\begin{notation} -\index{distance!euclidienne} -$\Delta$ is the Euclidian distance on $\big[ 0, 2^{10} \big[$, that is, -$\Delta(x,y) = |y-x|^2$. -\end{notation} - -\medskip - -This Euclidian distance does not reproduce exactly the notion of proximity -induced by our first distance $d$ on $\X$. Indeed $d$ is finer than $\Delta$. -This is the reason why we have to introduce the following metric: - - - -\begin{definition} -Let $x,y \in \big[ 0, 2^{10} \big[$. -$D$ denotes the function from $\big[ 0, 2^{10} \big[^2$ to $\mathds{R}^+$ -defined by: $D(x,y) = D_e\left(e(x),e(y)\right) + D_s\left(s(x),s(y)\right)$, -where: -\begin{center} -$\displaystyle{D_e(E,\check{E}) = \sum_{k=0}^\mathsf{9} \delta (E_k, -\check{E}_k)}$, ~~and~ $\displaystyle{D_s(S,\check{S}) = \sum_{k = 1}^\infty -\dfrac{|S^k-\check{S}^k|}{10^k}}$. -\end{center} -\end{definition} - -\begin{proposition} -$D$ is a distance on $\big[ 0, 2^{10} \big[$. -\end{proposition} - -\begin{proof} -The three axioms defining a distance must be checked. -\begin{itemize} -\item $D \geqslant 0$, because everything is positive in its definition. If -$D(x,y)=0$, then $D_e(x,y)=0$, so the integral parts of $x$ and $y$ are equal -(they have the same binary decomposition). Additionally, $D_s(x,y) = 0$, then -$\forall k \in \mathds{N}^*, s(x)^k = s(y)^k$. In other words, $x$ and $y$ have -the same $k-$th decimal digit, $\forall k \in \mathds{N}^*$. And so $x=y$. -\item $D(x,y)=D(y,x)$. -\item Finally, the triangular inequality is obtained due to the fact that both -$\delta$ and $\Delta(x,y)=|x-y|$ satisfy it. -\end{itemize} + From (\ref{PCH-2}) and (\ref{PCH-4}), one can deduce that +there exist a polynomial time probabilistic +algorithm $D^\prime$, a positive polynomial $p$, such that for all $k_0$ there exists +$N\geq \frac{k_0}{2}$ satisfying +$$| \mathrm{Pr}[D(H(U_{N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)},$$ +proving that $H$ is not secure, a contradiction. \end{proof} -The convergence of sequences according to $D$ is not the same than the usual -convergence related to the Euclidian metric. For instance, if $x^n \to x$ -according to $D$, then necessarily the integral part of each $x^n$ is equal to -the integral part of $x$ (at least after a given threshold), and the decimal -part of $x^n$ corresponds to the one of $x$ ``as far as required''. -To illustrate this fact, a comparison between $D$ and the Euclidian distance is -given Figure \ref{fig:comparaison de distances}. These illustrations show that -$D$ is richer and more refined than the Euclidian distance, and thus is more -precise. - - -\begin{figure}[t] -\begin{center} - \subfigure[Function $x \to dist(x;1,234) $ on the interval -$(0;5)$.]{\includegraphics[scale=.35]{DvsEuclidien.pdf}}\quad - \subfigure[Function $x \to dist(x;3) $ on the interval -$(0;5)$.]{\includegraphics[scale=.35]{DvsEuclidien2.pdf}} -\end{center} -\caption{Comparison between $D$ (in blue) and the Euclidian distane (in green).} -\label{fig:comparaison de distances} -\end{figure} - - - - -\subsubsection{The semiconjugacy} - -It is now possible to define a topological semiconjugacy between $\mathcal{X}$ -and an interval of $\mathds{R}$: - -\begin{theorem} -Chaotic iterations on the phase space $\mathcal{X}$ are simple iterations on -$\mathds{R}$, which is illustrated by the semiconjugacy of the diagram bellow: -\begin{equation*} -\begin{CD} -\left(~\mathcal{S}_{10} \times\mathds{B}^{10}, d~\right) @>G_{f_0}>> -\left(~\mathcal{S}_{10} \times\mathds{B}^{10}, d~\right)\\ - @V{\varphi}VV @VV{\varphi}V\\ -\left( ~\big[ 0, 2^{10} \big[, D~\right) @>>g> \left(~\big[ 0, 2^{10} \big[, -D~\right) -\end{CD} -\end{equation*} -\end{theorem} - -\begin{proof} -$\varphi$ has been constructed in order to be continuous and onto. -\end{proof} - -In other words, $\mathcal{X}$ is approximately equal to $\big[ 0, 2^\mathsf{N} -\big[$. +\section{A Cryptographically Secure PRNG for GPU} +\label{sec:CSGPU} +It is possible to build a cryptographically secure prng based on the previous +algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing +the {\it xor-like} algorithm by another cryptographically secure prng. In +practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form: +$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers. Those +prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is +known to be slow and not efficient for the generation of random numbers. For +current GPU cards, the modulus operation is the most time consuming +operation. So in order to obtain quite reasonable performances, it is required +to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be +less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in +pratice we can choose prime numbers around 256 that are congruent to 3 modulus +4. With 32 bits numbers, only the 4 least significant bits of $x_n$ can be +chosen (the maximum number of undistinguishing is less or equals to +$log_2(log_2(x_n))$). So to generate a 32 bits number, we need to use 8 times +the BBS algorithm, with different combinations of $M$ is required. +Currently this PRNG does not succeed to pass all the tests of TestU01. - -\subsection{Study of the chaotic iterations described as a real function} - - -\begin{figure}[t] -\begin{center} - \subfigure[ICs on the interval -$(0,9;1)$.]{\includegraphics[scale=.35]{ICs09a1.pdf}}\quad - \subfigure[ICs on the interval -$(0,7;1)$.]{\includegraphics[scale=.35]{ICs07a95.pdf}}\\ - \subfigure[ICs on the interval -$(0,5;1)$.]{\includegraphics[scale=.35]{ICs05a1.pdf}}\quad - \subfigure[ICs on the interval -$(0;1)$]{\includegraphics[scale=.35]{ICs0a1.pdf}} -\end{center} -\caption{Representation of the chaotic iterations.} -\label{fig:ICs} -\end{figure} - - - - -\begin{figure}[t] -\begin{center} - \subfigure[ICs on the interval -$(510;514)$.]{\includegraphics[scale=.35]{ICs510a514.pdf}}\quad - \subfigure[ICs on the interval -$(1000;1008)$]{\includegraphics[scale=.35]{ICs1000a1008.pdf}} -\end{center} -\caption{ICs on small intervals.} -\label{fig:ICs2} -\end{figure} - -\begin{figure}[t] -\begin{center} - \subfigure[ICs on the interval -$(0;16)$.]{\includegraphics[scale=.3]{ICs0a16.pdf}}\quad - \subfigure[ICs on the interval -$(40;70)$.]{\includegraphics[scale=.45]{ICs40a70.pdf}}\quad -\end{center} -\caption{General aspect of the chaotic iterations.} -\label{fig:ICs3} -\end{figure} - - -We have written a Python program to represent the chaotic iterations with the -vectorial negation on the real line $\mathds{R}$. Various representations of -these CIs are given in Figures \ref{fig:ICs}, \ref{fig:ICs2} and \ref{fig:ICs3}. -It can be remarked that the function $g$ is a piecewise linear function: it is -linear on each interval having the form $\left[ \dfrac{n}{10}, -\dfrac{n+1}{10}\right[$, $n \in \llbracket 0;2^{10}\times 10 \rrbracket$ and its -slope is equal to 10. Let us justify these claims: - -\begin{proposition} -\label{Prop:derivabilite des ICs} -Chaotic iterations $g$ defined on $\mathds{R}$ have derivatives of all orders on -$\big[ 0, 2^{10} \big[$, except on the 10241 points in $I$ defined by $\left\{ -\dfrac{n}{10} ~\big/~ n \in \llbracket 0;2^{10}\times 10\rrbracket \right\}$. - -Furthermore, on each interval of the form $\left[ \dfrac{n}{10}, -\dfrac{n+1}{10}\right[$, with $n \in \llbracket 0;2^{10}\times 10 \rrbracket$, -$g$ is a linear function, having a slope equal to 10: $\forall x \notin I, -g'(x)=10$. -\end{proposition} - - -\begin{proof} -Let $I_n = \left[ \dfrac{n}{10}, \dfrac{n+1}{10}\right[$, with $n \in \llbracket -0;2^{10}\times 10 \rrbracket$. All the points of $I_n$ have the same integral -prat $e$ and the same decimal part $s^0$: on the set $I_n$, functions $e(x)$ -and $x \mapsto s(x)^0$ of Definition \ref{def:e et s} only depend on $n$. So all -the images $g(x)$ of these points $x$: -\begin{itemize} -\item Have the same integral part, which is $e$, except probably the bit number -$s^0$. In other words, this integer has approximately the same binary -decomposition than $e$, the sole exception being the digit $s^0$ (this number is -then either $e+2^{10-s^0}$ or $e-2^{10-s^0}$, depending on the parity of $s^0$, -\emph{i.e.}, it is equal to $e+(-1)^{s^0}\times 2^{10-s^0}$). -\item A shift to the left has been applied to the decimal part $y$, losing by -doing so the common first digit $s^0$. In other words, $y$ has been mapped into -$10\times y - s^0$. -\end{itemize} -To sum up, the action of $g$ on the points of $I$ is as follows: first, make a -multiplication by 10, and second, add the same constant to each term, which is -$\dfrac{1}{10}\left(e+(-1)^{s^0}\times 2^{10-s^0}\right)-s^0$. -\end{proof} - -\begin{remark} -Finally, chaotic iterations are elements of the large family of functions that -are both chaotic and piecewise linear (like the tent map). -\end{remark} - - - -\subsection{Comparison of the two metrics on $\big[ 0, 2^\mathsf{N} \big[$} - -The two propositions bellow allow to compare our two distances on $\big[ 0, -2^\mathsf{N} \big[$: - -\begin{proposition} -Id: $\left(~\big[ 0, 2^\mathsf{N} \big[,\Delta~\right) \to \left(~\big[ 0, -2^\mathsf{N} \big[, D~\right)$ is not continuous. -\end{proposition} - -\begin{proof} -The sequence $x^n = 1,999\hdots 999$ constituted by $n$ 9 as decimal part, is -such that: -\begin{itemize} -\item $\Delta (x^n,2) \to 0.$ -\item But $D(x^n,2) \geqslant 1$, then $D(x^n,2)$ does not converge to 0. -\end{itemize} - -The sequential characterization of the continuity concludes the demonstration. -\end{proof} - - - -A contrario: - -\begin{proposition} -Id: $\left(~\big[ 0, 2^\mathsf{N} \big[,D~\right) \to \left(~\big[ 0, -2^\mathsf{N} \big[, \Delta ~\right)$ is a continuous fonction. -\end{proposition} - -\begin{proof} -If $D(x^n,x) \to 0$, then $D_e(x^n,x) = 0$ at least for $n$ larger than a given -threshold, because $D_e$ only returns integers. So, after this threshold, the -integral parts of all the $x^n$ are equal to the integral part of $x$. - -Additionally, $D_s(x^n, x) \to 0$, then $\forall k \in \mathds{N}^*, \exists N_k -\in \mathds{N}, n \geqslant N_k \Rightarrow D_s(x^n,x) \leqslant 10^{-k}$. This -means that for all $k$, an index $N_k$ can be found such that, $\forall n -\geqslant N_k$, all the $x^n$ have the same $k$ firsts digits, which are the -digits of $x$. We can deduce the convergence $\Delta(x^n,x) \to 0$, and thus the -result. -\end{proof} - -The conclusion of these propositions is that the proposed metric is more precise -than the Euclidian distance, that is: - -\begin{corollary} -$D$ is finer than the Euclidian distance $\Delta$. -\end{corollary} - -This corollary can be reformulated as follows: - -\begin{itemize} -\item The topology produced by $\Delta$ is a subset of the topology produced by -$D$. -\item $D$ has more open sets than $\Delta$. -\item It is harder to converge for the topology $\tau_D$ inherited by $D$, than -to converge with the one inherited by $\Delta$, which is denoted here by -$\tau_\Delta$. -\end{itemize} - - -\subsection{Chaos of the chaotic iterations on $\mathds{R}$} -\label{chpt:Chaos des itérations chaotiques sur R} - - - -\subsubsection{Chaos according to Devaney} - -We have recalled previously that the chaotic iterations $\left(\Go, -\mathcal{X}_d\right)$ are chaotic according to the formulation of Devaney. We -can deduce that they are chaotic on $\mathds{R}$ too, when considering the order -topology, because: -\begin{itemize} -\item $\left(\Go, \mathcal{X}_d\right)$ and $\left(g, \big[ 0, 2^{10} -\big[_D\right)$ are semiconjugate by $\varphi$, -\item Then $\left(g, \big[ 0, 2^{10} \big[_D\right)$ is a system chaotic -according to Devaney, because the semiconjugacy preserve this character. -\item But the topology generated by $D$ is finer than the topology generated by -the Euclidian distance $\Delta$ -- which is the order topology. -\item According to Theorem \ref{Th:chaos et finesse}, we can deduce that the -chaotic iterations $g$ are indeed chaotic, as defined by Devaney, for the order -topology on $\mathds{R}$. -\end{itemize} - -This result can be formulated as follows. - -\begin{theorem} -\label{th:IC et topologie de l'ordre} -The chaotic iterations $g$ on $\mathds{R}$ are chaotic according to the -Devaney's formulation, when $\mathds{R}$ has his usual topology, which is the -order topology. -\end{theorem} - -Indeed this result is weaker than the theorem establishing the chaos for the -finer topology $d$. However the Theorem \ref{th:IC et topologie de l'ordre} -still remains important. Indeed, we have studied in our previous works a set -different from the usual set of study ($\mathcal{X}$ instead of $\mathds{R}$), -in order to be as close as possible from the computer: the properties of -disorder proved theoretically will then be preserved when computing. However, we -could wonder whether this change does not lead to a disorder of a lower quality. -In other words, have we replaced a situation of a good disorder lost when -computing, to another situation of a disorder preserved but of bad quality. -Theorem \ref{th:IC et topologie de l'ordre} prove exactly the contrary. - +\section{Conclusion} +In this paper we have presented a new class of PRNGs based on chaotic +iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. +We also propose a PRNG cryptographically secure and its implementation on GPU. +An efficient implementation on GPU based on a xor-like PRNG allows us to +generate a huge number of pseudorandom numbers per second (about +20Gsample/s). This PRNG succeeds to pass the hardest batteries of TestU01. +In future work we plan to extend this work for parallel PRNG for clusters or +grid computing. We also plan to improve the BBS version in order to succeed all +the tests of TestU01. -\section{Conclusion} -\bibliographystyle{plain} +\bibliographystyle{plain} \bibliography{mabase} \end{document}