From: guyeux Date: Tue, 29 Nov 2011 14:40:10 +0000 (+0100) Subject: Remoulinage de choucroute X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/commitdiff_plain/0a8d52f6e872c9df6f150a008456bab4557b7d09?ds=inline Remoulinage de choucroute --- diff --git a/prng_gpu.tex b/prng_gpu.tex index 6e063fd..9de2d15 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -34,66 +34,126 @@ \newcommand{\alert}[1]{\begin{color}{blue}\textit{#1}\end{color}} -\title{Efficient Generation of Pseudo-Random Numbers based on Chaotic Iterations -on GPU} +\title{Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU} \begin{document} -\author{Jacques M. Bahi, Rapha\"{e}l Couturier, and Christophe -Guyeux, Pierre-Cyrille Heam\thanks{Authors in alphabetic order}} - +\author{Jacques M. Bahi, Rapha\"{e}l Couturier, Christophe +Guyeux, and Pierre-Cyrille Heam\thanks{Authors in alphabetic order}} + \maketitle \begin{abstract} -In this paper we present a new pseudo-random numbers generator (PRNG) on -graphics processing units (GPU). This PRNG is based on chaotic iterations. it -is proven to be chaotic in the Devanay's formulation. We propose an efficient -implementation for GPU which succeeds to the {\it BigCrush}, the hardest -batteries of test of TestU01. Experimentations show that this PRNG can generate +In this paper we present a new pseudorandom number generator (PRNG) on +graphics processing units (GPU). This PRNG is based on the so-called chaotic iterations. It +is firstly proven to be chaotic according to the Devaney's formulation. We thus propose an efficient +implementation for GPU that successfully passes the {\it BigCrush} tests, deemed to be the hardest +battery of tests in TestU01. Experiments show that this PRNG can generate about 20 billions of random numbers per second on Tesla C1060 and NVidia GTX280 cards. +It is finally established that, under reasonable assumptions, the proposed PRNG can be cryptographically +secure. \end{abstract} \section{Introduction} -Random numbers are used in many scientific applications and simulations. On -finite state machines, as computers, it is not possible to generate random -numbers but only pseudo-random numbers. In practice, a good pseudo-random numbers -generator (PRNG) needs to verify some features to be used by scientists. It is -important to be able to generate pseudo-random numbers efficiently, the -generation needs to be reproducible and a PRNG needs to satisfy many usual -statistical properties. Finally, from our point a view, it is essential to prove -that a PRNG is chaotic. Concerning the statistical tests, TestU01 is the -best-known public-domain statistical testing package. So we use it for all our -PRNGs, especially the {\it BigCrush} which provides the largest serie of tests. -Concerning the chaotic properties, Devaney~\cite{Devaney} proposed a common -mathematical formulation of chaotic dynamical systems. - -In a previous work~\cite{bgw09:ip} we have proposed a new familly of chaotic -PRNG based on chaotic iterations. We have proven that these PRNGs are -chaotic in the Devaney's sense. In this paper we propose a faster version which -is also proven to be chaotic. - -Although graphics processing units (GPU) was initially designed to accelerate +Randomness is of importance in many fields as scientific simulations or cryptography. +``Random numbers'' can mainly be generated either by a deterministic and reproducible algorithm +called a pseudorandom number generator (PRNG), or by a physical non-deterministic +process having all the characteristics of a random noise, called a truly random number +generator (TRNG). +In this paper, we focus on reproducible generators, useful for instance in +Monte-Carlo based simulators or in several cryptographic schemes. +These domains need PRNGs that are statistically irreproachable. +On some fields as in numerical simulations, speed is a strong requirement +that is usually attained by using parallel architectures. In that case, +a recurrent problem is that a deflate of the statistical qualities is often +reported, when the parallelization of a good PRNG is realized. +This is why ad-hoc PRNGs for each possible architecture must be found to +achieve both speed and randomness. +On the other side, speed is not the main requirement in cryptography: the great +need is to define \emph{secure} generators being able to withstand malicious +attacks. Roughly speaking, an attacker should not be able in practice to make +the distinction between numbers obtained with the secure generator and a true random +sequence. +Finally, a small part of the community working in this domain focus on a +third requirement, that is to define chaotic generators. +The main idea is to take benefits from a chaotic dynamical system to obtain a +generator that is unpredictable, disordered, sensible to its seed, or in other words chaotic. +Their desire is to map a given chaotic dynamics into a sequence that seems random +and unassailable due to chaos. +However, the chaotic maps used as a pattern are defined in the real line +whereas computers deal with finite precision numbers. +This distortion leads to a deflation of both chaotic properties and speed. +Furthermore, authors of such chaotic generators often claim their PRNG +as secure due to their chaos properties, but there is no obvious relation +between chaos and security as it is understood in cryptography. +This is why the use of chaos for PRNG still remains marginal and disputable. + +The authors' opinion is that topological properties of disorder, as they are +properly defined in the mathematical theory of chaos, can reinforce the quality +of a PRNG. But they are not substitutable for security or statistical perfection. +Indeed, to the authors' point of view, such properties can be useful in the two following situations. On the +one hand, a post-treatment based on a chaotic dynamical system can be applied +to a PRNG statistically deflective, in order to improve its statistical +properties. Such an improvement can be found, for instance, in~\cite{bgw09:ip,bcgr11:ip}. +On the other hand, chaos can be added to a fast, statistically perfect PRNG and/or a +cryptographically secure one, in case where chaos can be of interest, +\emph{only if these last properties are not lost during +the proposed post-treatment}. Such an assumption is behind this research work. +It leads to the attempts to define a +family of PRNGs that are chaotic while being fast and statistically perfect, +or cryptographically secure. +Let us finish this paragraph by noticing that, in this paper, +statistical perfection refers to the ability to pass the whole +{\it BigCrush} battery of tests, which is widely considered as the most +stringent statistical evaluation of a sequence claimed as random. +This battery can be found into the well-known TestU01 package. +Chaos, for its part, refers to the well-established definition of a +chaotic dynamical system proposed by Devaney~\cite{Devaney}. + + +In a previous work~\cite{bgw09:ip,guyeux10} we have proposed a post-treatment on PRNGs making them behave +as a chaotic dynamical system. Such a post-treatment leads to a new category of +PRNGs. We have shown that proofs of Devaney's chaos can be established for this +family, and that the sequence obtained after this post-treatment can pass the +NIST, DieHARD, and TestU01 batteries of tests, even if the inputted generators +cannot. +The proposition of this paper is to improve widely the speed of the formerly +proposed generator, without any lack of chaos or statistical properties. +In particular, a version of this PRNG on graphics processing units (GPU) +is proposed. +Although GPU was initially designed to accelerate the manipulation of images, they are nowadays commonly used in many scientific -applications. Therefore, it is important to be able to generate pseudo-random -numbers inside a GPU when a scientific application runs in a GPU. That is why we -also provide an efficient PRNG for GPU respecting based on IC. Such devices -allows us to generated almost 20 billions of random numbers per second. - -In order to establish that our PRNGs are chaotic according to the Devaney's -formulation, we extend what we have proposed in~\cite{guyeux10}. - -The rest of this paper is organised as follows. In Section~\ref{section:related - works} we review some GPU implementions of PRNG. Section~\ref{section:BASIC - RECALLS} gives some basic recalls on Devanay's formation of chaos and chaotic -iterations. In Section~\ref{sec:pseudo-random} the proof of chaos of our PRNGs -is studied. Section~\ref{sec:efficient prng} presents an efficient -implementation of our chaotic PRNG on a CPU. Section~\ref{sec:efficient prng - gpu} describes the GPU implementation of our chaotic PRNG. In -Section~\ref{sec:experiments} some experimentations are presented. - Finally, we give a conclusion and some perspectives. +applications. Therefore, it is important to be able to generate pseudorandom +numbers inside a GPU when a scientific application runs in it. This remark +motivates our proposal of a chaotic and statistically perfect PRNG for GPU. +Such device +allows us to generated almost 20 billions of pseudorandom numbers per second. +Last, but not least, we show that the proposed post-treatment preserves the +cryptographical security of the inputted PRNG, when this last has such a +property. + +The remainder of this paper is organized as follows. In Section~\ref{section:related + works} we review some GPU implementations of PRNGs. Section~\ref{section:BASIC + RECALLS} gives some basic recalls on the well-known Devaney's formulation of chaos, + and on an iteration process called ``chaotic +iterations'' on which the post-treatment is based. +Proofs of chaos are given in Section~\ref{sec:pseudorandom}. +Section~\ref{sec:efficient prng} presents an efficient +implementation of this chaotic PRNG on a CPU, whereas Section~\ref{sec:efficient prng + gpu} describes the GPU implementation. +Such generators are experimented in +Section~\ref{sec:experiments}. +We show in Section~\ref{sec:security analysis} that, if the inputted +generator is cryptographically secure, then it is the case too for the +generator provided by the post-treatment. +Such a proof leads to the proposition of a cryptographically secure and +chaotic generator on GPU based on the famous Blum Blum Shum +in Section~\ref{sec:CSGPU}. +This research work ends by a conclusion section, in which the contribution is +summarized and intended future work is presented. @@ -355,7 +415,7 @@ if and only if $\Gamma(f)$ is strongly connected. \end{theorem} This result of chaos has lead us to study the possibility to build a -pseudo-random number generator (PRNG) based on the chaotic iterations. +pseudorandom number generator (PRNG) based on the chaotic iterations. As $G_f$, defined on the domain $\llbracket 1 ; \mathsf{N} \rrbracket^{\mathds{N}} \times \mathds{B}^\mathsf{N}$, is build from Boolean networks $f : \mathds{B}^\mathsf{N} \rightarrow \mathds{B}^\mathsf{N}$, we can preserve the theoretical properties on $G_f$ @@ -363,9 +423,9 @@ during implementations (due to the discrete nature of $f$). It is as if $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ; \mathsf{N} \rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance). -\section{Application to Pseudo-Randomness} -\label{sec:pseudo-random} -\subsection{A First Pseudo-Random Number Generator} +\section{Application to pseudorandomness} +\label{sec:pseudorandom} +\subsection{A First pseudorandom Number Generator} We have proposed in~\cite{bgw09:ip} a new family of generators that receives two PRNGs as inputs. These two generators are mixed with chaotic iterations, @@ -490,7 +550,7 @@ the vectorial negation, leads to a speed improvement. However, proofs of chaos obtained in~\cite{bg10:ij} have been established only for chaotic iterations of the form presented in Definition \ref{Def:chaotic iterations}. The question is now to determine whether the -use of more general chaotic iterations to generate pseudo-random numbers +use of more general chaotic iterations to generate pseudorandom numbers faster, does not deflate their topological chaos properties. \subsection{Proofs of Chaos of the General Formulation of the Chaotic Iterations} @@ -943,27 +1003,6 @@ Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general chaotic iterations presented previously, and for this reason, it satisfies the Devaney's formulation of a chaotic behavior. -\section{A cryptographically secure prng for GPU} - -It is possible to build a cryptographically secure prng based on the previous -algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing -the {\it xor-like} algorithm by another cryptographically secure prng. In -practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form: -$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers. Those -prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is -known to be slow and not efficient for the generation of random numbers. For -current GPU cards, the modulus operation is the most time consuming -operation. So in order to obtain quite reasonable performances, it is required -to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be -less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in -pratice we can choose prime numbers around 256 that are congruent to 3 modulus -4. With 32 bits numbers, only the 4 least significant bits of $x_n$ can be -chosen (the maximum number of undistinguishing is less or equals to -$log_2(log_2(x_n))$). So to generate a 32 bits number, we need to use 8 times -the BBS algorithm, with different combinations of $M$ is required. - -Currently this PRNG does not succeed to pass all the tests of TestU01. - \section{Experiments} \label{sec:experiments} @@ -1015,7 +1054,7 @@ obtain approximately 1.8GSample/s and on the GTX 280 about 1.6GSample/s. \end{figure} Both these experimentations allows us to conclude that it is possible to -generate a huge number of pseudo-random numbers with the xor-like version and +generate a huge number of pseudorandom numbers with the xor-like version and about tens times less with the BBS based version. The former version has only chaotic properties whereas the latter also has cryptographically properties. @@ -1556,13 +1595,13 @@ chaotic properties whereas the latter also has cryptographically properties. \section{Security Analysis} - +\label{sec:security analysis} In this section the concatenation of two strings $u$ and $v$ is classically denoted by $uv$. -In a cryptographic context, a pseudo-random generator is a deterministic +In a cryptographic context, a pseudorandom generator is a deterministic algorithm $G$ transforming strings into strings and such that, for any seed $w$ of length $N$, $G(w)$ (the output of $G$ on the input $w$) has size $\ell_G(N)$ with $\ell_G(N)>N$. @@ -1586,7 +1625,7 @@ quite easily possible to change the function $\ell$ into any polynomial function $\ell^\prime$ satisfying $\ell^\prime(N)>N)$~\cite[Chapter 3.3]{Goldreich}. The generation schema developed in (\ref{equation Oplus}) is based on a -pseudo-random generator. Let $H$ be a cryptographic PRNG. We may assume, +pseudorandom generator. Let $H$ be a cryptographic PRNG. We may assume, without loss of generality, that for any string $S_0$ of size $N$, the size of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. Let $S_1,\ldots,S_k$ be the @@ -1667,6 +1706,27 @@ proving that $H$ is not secure, a contradiction. +\section{A cryptographically secure prng for GPU} +\label{sec:CSGPU} +It is possible to build a cryptographically secure prng based on the previous +algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing +the {\it xor-like} algorithm by another cryptographically secure prng. In +practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form: +$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers. Those +prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is +known to be slow and not efficient for the generation of random numbers. For +current GPU cards, the modulus operation is the most time consuming +operation. So in order to obtain quite reasonable performances, it is required +to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be +less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in +pratice we can choose prime numbers around 256 that are congruent to 3 modulus +4. With 32 bits numbers, only the 4 least significant bits of $x_n$ can be +chosen (the maximum number of undistinguishing is less or equals to +$log_2(log_2(x_n))$). So to generate a 32 bits number, we need to use 8 times +the BBS algorithm, with different combinations of $M$ is required. + +Currently this PRNG does not succeed to pass all the tests of TestU01. + \section{Conclusion} @@ -1676,7 +1736,7 @@ iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. We also propose a PRNG cryptographically secure and its implementation on GPU. An efficient implementation on GPU based on a xor-like PRNG allows us to -generate a huge number of pseudo-random numbers per second (about +generate a huge number of pseudorandom numbers per second (about 20Gsample/s). This PRNG succeeds to pass the hardest batteries of TestU01. In future work we plan to extend this work for parallel PRNG for clusters or