X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/blobdiff_plain/47000eb06d8648eaafeadfdda00e1e4ce76e9ee6..228143ea0da3dc28561a873ee227024d21bb1e57:/prng_gpu.tex?ds=sidebyside diff --git a/prng_gpu.tex b/prng_gpu.tex index d2e5e05..84efafa 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -34,113 +34,183 @@ \newcommand{\alert}[1]{\begin{color}{blue}\textit{#1}\end{color}} -\title{Efficient Generation of Pseudo-Random Numbers based on Chaotic Iterations -on GPU} +\title{Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU} \begin{document} -\author{Jacques M. Bahi, Rapha\"{e}l Couturier, and Christophe -Guyeux, Pierre-Cyrille Heam\thanks{Authors in alphabetic order}} - +\author{Jacques M. Bahi, Rapha\"{e}l Couturier, Christophe +Guyeux, and Pierre-Cyrille Heam\thanks{Authors in alphabetic order}} + \maketitle \begin{abstract} -In this paper we present a new pseudo-random numbers generator (PRNG) on -graphics processing units (GPU). This PRNG is based on chaotic iterations. it -is proven to be chaotic in the Devanay's formulation. We propose an efficient -implementation for GPU which succeeds to the {\it BigCrush}, the hardest -batteries of test of TestU01. Experimentations show that this PRNG can generate +In this paper we present a new pseudorandom number generator (PRNG) on +graphics processing units (GPU). This PRNG is based on the so-called chaotic iterations. It +is firstly proven to be chaotic according to the Devaney's formulation. We thus propose an efficient +implementation for GPU that successfully passes the {\it BigCrush} tests, deemed to be the hardest +battery of tests in TestU01. Experiments show that this PRNG can generate about 20 billions of random numbers per second on Tesla C1060 and NVidia GTX280 cards. +It is finally established that, under reasonable assumptions, the proposed PRNG can be cryptographically +secure. \end{abstract} \section{Introduction} -Random numbers are used in many scientific applications and simulations. On -finite state machines, as computers, it is not possible to generate random -numbers but only pseudo-random numbers. In practice, a good pseudo-random numbers -generator (PRNG) needs to verify some features to be used by scientists. It is -important to be able to generate pseudo-random numbers efficiently, the -generation needs to be reproducible and a PRNG needs to satisfy many usual -statistical properties. Finally, from our point a view, it is essential to prove -that a PRNG is chaotic. Concerning the statistical tests, TestU01 is the -best-known public-domain statistical testing package. So we use it for all our -PRNGs, especially the {\it BigCrush} which provides the largest serie of tests. -Concerning the chaotic properties, Devaney~\cite{Devaney} proposed a common -mathematical formulation of chaotic dynamical systems. - -In a previous work~\cite{bgw09:ip} we have proposed a new familly of chaotic -PRNG based on chaotic iterations. We have proven that these PRNGs are -chaotic in the Devaney's sense. In this paper we propose a faster version which -is also proven to be chaotic. - -Although graphics processing units (GPU) was initially designed to accelerate +Randomness is of importance in many fields as scientific simulations or cryptography. +``Random numbers'' can mainly be generated either by a deterministic and reproducible algorithm +called a pseudorandom number generator (PRNG), or by a physical non-deterministic +process having all the characteristics of a random noise, called a truly random number +generator (TRNG). +In this paper, we focus on reproducible generators, useful for instance in +Monte-Carlo based simulators or in several cryptographic schemes. +These domains need PRNGs that are statistically irreproachable. +On some fields as in numerical simulations, speed is a strong requirement +that is usually attained by using parallel architectures. In that case, +a recurrent problem is that a deflate of the statistical qualities is often +reported, when the parallelization of a good PRNG is realized. +This is why ad-hoc PRNGs for each possible architecture must be found to +achieve both speed and randomness. +On the other side, speed is not the main requirement in cryptography: the great +need is to define \emph{secure} generators being able to withstand malicious +attacks. Roughly speaking, an attacker should not be able in practice to make +the distinction between numbers obtained with the secure generator and a true random +sequence. +Finally, a small part of the community working in this domain focus on a +third requirement, that is to define chaotic generators. +The main idea is to take benefits from a chaotic dynamical system to obtain a +generator that is unpredictable, disordered, sensible to its seed, or in other words chaotic. +Their desire is to map a given chaotic dynamics into a sequence that seems random +and unassailable due to chaos. +However, the chaotic maps used as a pattern are defined in the real line +whereas computers deal with finite precision numbers. +This distortion leads to a deflation of both chaotic properties and speed. +Furthermore, authors of such chaotic generators often claim their PRNG +as secure due to their chaos properties, but there is no obvious relation +between chaos and security as it is understood in cryptography. +This is why the use of chaos for PRNG still remains marginal and disputable. + +The authors' opinion is that topological properties of disorder, as they are +properly defined in the mathematical theory of chaos, can reinforce the quality +of a PRNG. But they are not substitutable for security or statistical perfection. +Indeed, to the authors' point of view, such properties can be useful in the two following situations. On the +one hand, a post-treatment based on a chaotic dynamical system can be applied +to a PRNG statistically deflective, in order to improve its statistical +properties. Such an improvement can be found, for instance, in~\cite{bgw09:ip,bcgr11:ip}. +On the other hand, chaos can be added to a fast, statistically perfect PRNG and/or a +cryptographically secure one, in case where chaos can be of interest, +\emph{only if these last properties are not lost during +the proposed post-treatment}. Such an assumption is behind this research work. +It leads to the attempts to define a +family of PRNGs that are chaotic while being fast and statistically perfect, +or cryptographically secure. +Let us finish this paragraph by noticing that, in this paper, +statistical perfection refers to the ability to pass the whole +{\it BigCrush} battery of tests, which is widely considered as the most +stringent statistical evaluation of a sequence claimed as random. +This battery can be found into the well-known TestU01 package~\cite{LEcuyerS07}. +Chaos, for its part, refers to the well-established definition of a +chaotic dynamical system proposed by Devaney~\cite{Devaney}. + + +In a previous work~\cite{bgw09:ip,guyeux10} we have proposed a post-treatment on PRNGs making them behave +as a chaotic dynamical system. Such a post-treatment leads to a new category of +PRNGs. We have shown that proofs of Devaney's chaos can be established for this +family, and that the sequence obtained after this post-treatment can pass the +NIST~\cite{Nist10}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07} batteries of tests, even if the inputted generators +cannot. +The proposition of this paper is to improve widely the speed of the formerly +proposed generator, without any lack of chaos or statistical properties. +In particular, a version of this PRNG on graphics processing units (GPU) +is proposed. +Although GPU was initially designed to accelerate the manipulation of images, they are nowadays commonly used in many scientific -applications. Therefore, it is important to be able to generate pseudo-random -numbers inside a GPU when a scientific application runs in a GPU. That is why we -also provide an efficient PRNG for GPU respecting based on IC. Such devices -allows us to generated almost 20 billions of random numbers per second. - -In order to establish that our PRNGs are chaotic according to the Devaney's -formulation, we extend what we have proposed in~\cite{guyeux10}. - -The rest of this paper is organised as follows. In Section~\ref{section:related - works} we review some GPU implementions of PRNG. Section~\ref{section:BASIC - RECALLS} gives some basic recalls on Devanay's formation of chaos and chaotic -iterations. In Section~\ref{sec:pseudo-random} the proof of chaos of our PRNGs -is studied. Section~\ref{sec:efficient prng} presents an efficient -implementation of our chaotic PRNG on a CPU. Section~\ref{sec:efficient prng - gpu} describes the GPU implementation of our chaotic PRNG. In -Section~\ref{sec:experiments} some experimentations are presented. - Finally, we give a conclusion and some perspectives. +applications. Therefore, it is important to be able to generate pseudorandom +numbers inside a GPU when a scientific application runs in it. This remark +motivates our proposal of a chaotic and statistically perfect PRNG for GPU. +Such device +allows us to generated almost 20 billions of pseudorandom numbers per second. +Last, but not least, we show that the proposed post-treatment preserves the +cryptographical security of the inputted PRNG, when this last has such a +property. + +The remainder of this paper is organized as follows. In Section~\ref{section:related + works} we review some GPU implementations of PRNGs. Section~\ref{section:BASIC + RECALLS} gives some basic recalls on the well-known Devaney's formulation of chaos, + and on an iteration process called ``chaotic +iterations'' on which the post-treatment is based. +Proofs of chaos are given in Section~\ref{sec:pseudorandom}. +Section~\ref{sec:efficient prng} presents an efficient +implementation of this chaotic PRNG on a CPU, whereas Section~\ref{sec:efficient prng + gpu} describes the GPU implementation. +Such generators are experimented in +Section~\ref{sec:experiments}. +We show in Section~\ref{sec:security analysis} that, if the inputted +generator is cryptographically secure, then it is the case too for the +generator provided by the post-treatment. +Such a proof leads to the proposition of a cryptographically secure and +chaotic generator on GPU based on the famous Blum Blum Shum +in Section~\ref{sec:CSGPU}. +This research work ends by a conclusion section, in which the contribution is +summarized and intended future work is presented. \section{Related works on GPU based PRNGs} \label{section:related works} -In the litterature many authors have work on defining GPU based PRNGs. We do not -want to be exhaustive and we just give the most significant works from our point -of view. When authors mention the number of random numbers generated per second -we mention it. We consider that a million numbers per second corresponds to -1MSample/s and than a billion numbers per second corresponds to 1GSample/s. - -In \cite{Pang:2008:cec}, the authors define a PRNG based on cellular automata -which does not require high precision integer arithmetics nor bitwise -operations. There is no mention of statistical tests nor proof that this PRNG is -chaotic. Concerning the speed of generation, they can generate about -3.2MSample/s on a GeForce 7800 GTX GPU (which is quite old now). + +Numerous research works on defining GPU based PRNGs have yet been proposed in the +literature, so that completeness is impossible. +This is why authors of this document only give reference to the most significant attempts +in this domain, from their subjective point of view. +The quantity of pseudorandom numbers generated per second is mentioned here +only when the information is given in the related work. +A million numbers per second will be simply written as +1MSample/s whereas a billion numbers per second is 1GSample/s. + +In \cite{Pang:2008:cec} a PRNG based on cellular automata is defined +with no requirement to an high precision integer arithmetic or to any bitwise +operations. Authors can generate about +3.2MSamples/s on a GeForce 7800 GTX GPU, which is quite an old card now. +However, there is neither a mention of statistical tests nor any proof of +chaos or cryptography in this document. In \cite{ZRKB10}, the authors propose different versions of efficient GPU PRNGs -based on Lagged Fibonacci, Hybrid Taus or Hybrid Taus. They have used these +based on Lagged Fibonacci or Hybrid Taus. They have used these PRNGs for Langevin simulations of biomolecules fully implemented on GPU. Performance of the GPU versions are far better than those obtained with a -CPU and these PRNGs succeed to pass the {\it BigCrush} test of TestU01. There is -no mention that their PRNGs have chaos mathematical properties. +CPU, and these PRNGs succeed to pass the {\it BigCrush} battery of TestU01. +However the evaluations of the proposed PRNGs are only statistical ones. Authors of~\cite{conf/fpga/ThomasHL09} have studied the implementation of some -PRNGs on diferrent computing architectures: CPU, field-programmable gate array -(FPGA), GPU and massively parallel processor. This study is interesting because -it shows the performance of the same PRNGs on different architeture. For -example, the FPGA is globally the fastest architecture and it is also the -efficient one because it provides the fastest number of generated random numbers -per joule. Concerning the GPU, authors can generate betweend 11 and 16GSample/s -with a GTX 280 GPU. The drawback of this work is that those PRNGs only succeed -the {\it Crush} test which is easier than the {\it Big Crush} test. - -Cuda has developped a library for the generation of random numbers called -Curand~\cite{curand11}. Several PRNGs are implemented: -Xorwow~\cite{Marsaglia2003} and some variants of Sobol. Some tests report that -the fastest version provides 15GSample/s on the new Fermi C2050 card. Their -PRNGs fail to succeed the whole tests of TestU01 on only one test. +PRNGs on different computing architectures: CPU, field-programmable gate array +(FPGA), massively parallel processors, and GPU. This study is of interest, because +the performance of the same PRNGs on different architectures are compared. +FPGA appears as the fastest and the most +efficient architecture, providing the fastest number of generated pseudorandom numbers +per joule. +However, we can notice that authors can ``only'' generate between 11 and 16GSamples/s +with a GTX 280 GPU, which should be compared with +the results presented in this document. +We can remark too that the PRNGs proposed in~\cite{conf/fpga/ThomasHL09} are only +able to pass the {\it Crush} battery, which is very easy compared to the {\it Big Crush} one. + +Lastly, Cuda has developed a library for the generation of pseudorandom numbers called +Curand~\cite{curand11}. Several PRNGs are implemented, among +other things +Xorwow~\cite{Marsaglia2003} and some variants of Sobol. The tests reported show that +their fastest version provides 15GSamples/s on the new Fermi C2050 card. +But their PRNGs cannot pass the whole TestU01 battery (only one test is failed). \newline \newline -To the best of our knowledge no GPU implementation have been proven to have chaotic properties. +We can finally remark that, to the best of our knowledge, no GPU implementation have been proven to be chaotic, and the cryptographically secure property is surprisingly never regarded. \section{Basic Recalls} \label{section:BASIC RECALLS} + This section is devoted to basic definitions and terminologies in the fields of topological chaos and chaotic iterations. \subsection{Devaney's Chaotic Dynamical Systems} @@ -355,17 +425,18 @@ if and only if $\Gamma(f)$ is strongly connected. \end{theorem} This result of chaos has lead us to study the possibility to build a -pseudo-random number generator (PRNG) based on the chaotic iterations. +pseudorandom number generator (PRNG) based on the chaotic iterations. As $G_f$, defined on the domain $\llbracket 1 ; \mathsf{N} \rrbracket^{\mathds{N}} \times \mathds{B}^\mathsf{N}$, is build from Boolean networks $f : \mathds{B}^\mathsf{N} \rightarrow \mathds{B}^\mathsf{N}$, we can preserve the theoretical properties on $G_f$ during implementations (due to the discrete nature of $f$). It is as if $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ; \mathsf{N} -\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance). +\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG). + +\section{Application to pseudorandomness} +\label{sec:pseudorandom} -\section{Application to Pseudo-Randomness} -\label{sec:pseudo-random} -\subsection{A First Pseudo-Random Number Generator} +\subsection{A First pseudorandom Number Generator} We have proposed in~\cite{bgw09:ip} a new family of generators that receives two PRNGs as inputs. These two generators are mixed with chaotic iterations, @@ -410,7 +481,7 @@ return $y$\; This generator is synthesized in Algorithm~\ref{CI Algorithm}. -It takes as input: a function $f$; +It takes as input: a Boolean function $f$ satisfying Theorem~\ref{Th:Caractérisation des IC chaotiques}; an integer $b$, ensuring that the number of executed iterations is at least $b$ and at most $2b+1$; and an initial configuration $x^0$. It returns the new generated configuration $x$. Internally, it embeds two @@ -435,7 +506,7 @@ We have proven in \cite{bcgr11:ip} that, if and only if $M$ is a double stochastic matrix. \end{theorem} -This former generator as successively passed various batteries of statistical tests, as the NIST tests~\cite{bcgr11:ip}. +This former generator as successively passed various batteries of statistical tests, as the NIST~\cite{bcgr11:ip}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07}. \subsection{Improving the Speed of the Former Generator} @@ -490,7 +561,7 @@ the vectorial negation, leads to a speed improvement. However, proofs of chaos obtained in~\cite{bg10:ij} have been established only for chaotic iterations of the form presented in Definition \ref{Def:chaotic iterations}. The question is now to determine whether the -use of more general chaotic iterations to generate pseudo-random numbers +use of more general chaotic iterations to generate pseudorandom numbers faster, does not deflate their topological chaos properties. \subsection{Proofs of Chaos of the General Formulation of the Chaotic Iterations} @@ -900,7 +971,7 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;} \KwOut{NewNb: array containing random numbers in global memory} \If{threadId is concerned} { - retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory\; + retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\; offset = threadIdx\%permutation\_size\; o1 = threadIdx-offset+tab1[offset]\; o2 = threadIdx-offset+tab2[offset]\; @@ -952,12 +1023,14 @@ Intel Xeon E5530 cadenced at 2.40 GHz for our experiments and we have used another one equipped with a less performant CPU and a GeForce GTX 280. Both cards have 240 cores. -In Figure~\ref{fig:time_gpu} we compare the number of random numbers generated -per second. The xor-like prng is a xor64 described in~\cite{Marsaglia2003}. In -order to obtain the optimal performance we remove the storage of random numbers -in the GPU memory. This step is time consumming and slows down the random number -generation. Moreover, if you are interested by applications that consome random -numbers directly when they are generated, their storage is completely +In Figure~\ref{fig:time_xorlike_gpu} we compare the number of random numbers +generated per second with the xor-like based PRNG. In this figure, the optimized +version use the {\it xor64} described in~\cite{Marsaglia2003}. The naive version +use the three xor-like PRNGs described in Listing~\ref{algo:seqCIprng}. In +order to obtain the optimal performance we removed the storage of random numbers +in the GPU memory. This step is time consuming and slows down the random numbers +generation. Moreover, if one is interested by applications that consume random +numbers directly when they are generated, their storage are completely useless. In this figure we can see that when the number of threads is greater than approximately 30,000 upto 5 millions the number of random numbers generated per second is almost constant. With the naive version, it is between 2.5 and @@ -968,10 +1041,10 @@ should be of better quality. \begin{figure}[htbp] \begin{center} - \includegraphics[scale=.7]{curve_time_gpu.pdf} + \includegraphics[scale=.7]{curve_time_xorlike_gpu.pdf} \end{center} -\caption{Number of random numbers generated per second} -\label{fig:time_gpu} +\caption{Number of random numbers generated per second with the xorlike based PRNG} +\label{fig:time_xorlike_gpu} \end{figure} @@ -979,7 +1052,22 @@ In comparison, Listing~\ref{algo:seqCIprng} allows us to generate about 138MSample/s with only one core of the Xeon E5530. +In Figure~\ref{fig:time_bbs_gpu} we highlight the performance of the optimized +BBS based PRNG on GPU. Performances are less important. On the Tesla C1060 we +obtain approximately 1.8GSample/s and on the GTX 280 about 1.6GSample/s. + +\begin{figure}[htbp] +\begin{center} + \includegraphics[scale=.7]{curve_time_bbs_gpu.pdf} +\end{center} +\caption{Number of random numbers generated per second with the BBS based PRNG} +\label{fig:time_bbs_gpu} +\end{figure} +Both these experimentations allows us to conclude that it is possible to +generate a huge number of pseudorandom numbers with the xor-like version and +about tens times less with the BBS based version. The former version has only +chaotic properties whereas the latter also has cryptographically properties. %% \section{Cryptanalysis of the Proposed PRNG} @@ -1518,13 +1606,13 @@ In comparison, Listing~\ref{algo:seqCIprng} allows us to generate about \section{Security Analysis} - +\label{sec:security analysis} In this section the concatenation of two strings $u$ and $v$ is classically denoted by $uv$. -In a cryptographic context, a pseudo-random generator is a deterministic +In a cryptographic context, a pseudorandom generator is a deterministic algorithm $G$ transforming strings into strings and such that, for any seed $w$ of length $N$, $G(w)$ (the output of $G$ on the input $w$) has size $\ell_G(N)$ with $\ell_G(N)>N$. @@ -1548,7 +1636,7 @@ quite easily possible to change the function $\ell$ into any polynomial function $\ell^\prime$ satisfying $\ell^\prime(N)>N)$~\cite[Chapter 3.3]{Goldreich}. The generation schema developed in (\ref{equation Oplus}) is based on a -pseudo-random generator. Let $H$ be a cryptographic PRNG. We may assume, +pseudorandom generator. Let $H$ be a cryptographic PRNG. We may assume, without loss of generality, that for any string $S_0$ of size $N$, the size of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. Let $S_1,\ldots,S_k$ be the @@ -1629,20 +1717,45 @@ proving that $H$ is not secure, a contradiction. +\section{A cryptographically secure prng for GPU} +\label{sec:CSGPU} +It is possible to build a cryptographically secure prng based on the previous +algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing +the {\it xor-like} algorithm by another cryptographically secure prng. In +practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form: +$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers. Those +prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is +known to be slow and not efficient for the generation of random numbers. For +current GPU cards, the modulus operation is the most time consuming +operation. So in order to obtain quite reasonable performances, it is required +to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be +less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in +pratice we can choose prime numbers around 256 that are congruent to 3 modulus +4. With 32 bits numbers, only the 4 least significant bits of $x_n$ can be +chosen (the maximum number of undistinguishing is less or equals to +$log_2(log_2(x_n))$). So to generate a 32 bits number, we need to use 8 times +the BBS algorithm, with different combinations of $M$ is required. + +Currently this PRNG does not succeed to pass all the tests of TestU01. + \section{Conclusion} In this paper we have presented a new class of PRNGs based on chaotic -iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. +iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. +We also propose a PRNG cryptographically secure and its implementation on GPU. + +An efficient implementation on GPU based on a xor-like PRNG allows us to +generate a huge number of pseudorandom numbers per second (about +20Gsample/s). This PRNG succeeds to pass the hardest batteries of TestU01. + +In future work we plan to extend this work for parallel PRNG for clusters or +grid computing. We also plan to improve the BBS version in order to succeed all +the tests of TestU01. -An efficient implementation on GPU allows us to generate a huge number of pseudo -random numbers per second (about 20Gsample/s). Our PRNGs succeed to pass the -hardest batteries of test (TestU01). -In future work we plan to extend our work in order to have cryptographically -secure PRNGs because in some situations this property may be important. -\bibliographystyle{plain} +\bibliographystyle{plain} \bibliography{mabase} \end{document}