X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/blobdiff_plain/02b8e8e685923bf2b12d529361c3435c0291337d..5868f4ee274d7fc021616da109c53b54c99529a0:/prng_gpu.tex diff --git a/prng_gpu.tex b/prng_gpu.tex index 150e434..e5b80cc 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -39,14 +39,14 @@ on GPU} \begin{document} \author{Jacques M. Bahi, Rapha\"{e}l Couturier, and Christophe -Guyeux\thanks{Authors in alphabetic order}} +Guyeux, Pierre-Cyrille Heam\thanks{Authors in alphabetic order}} \maketitle \begin{abstract} -In this paper we present a new produce pseudo-random numbers generator (PRNG) on +In this paper we present a new pseudo-random numbers generator (PRNG) on graphics processing units (GPU). This PRNG is based on chaotic iterations. it -is proven to be chaotic in the Devany's formulation. We propose an efficient +is proven to be chaotic in the Devanay's formulation. We propose an efficient implementation for GPU which succeeds to the {\it BigCrush}, the hardest batteries of test of TestU01. Experimentations show that this PRNG can generate about 20 billions of random numbers per second on Tesla C1060 and NVidia GTX280 @@ -59,7 +59,7 @@ cards. Random numbers are used in many scientific applications and simulations. On finite state machines, as computers, it is not possible to generate random -numbers but only pseudo-random numbers. In practice, a good pseudo-random number +numbers but only pseudo-random numbers. In practice, a good pseudo-random numbers generator (PRNG) needs to verify some features to be used by scientists. It is important to be able to generate pseudo-random numbers efficiently, the generation needs to be reproducible and a PRNG needs to satisfy many usual @@ -83,9 +83,7 @@ also provide an efficient PRNG for GPU respecting based on IC. Such devices allows us to generated almost 20 billions of random numbers per second. In order to establish that our PRNGs are chaotic according to the Devaney's -formulation, we extend what we have proposed in~\cite{guyeux10}. Moreover, we -define a new distance to measure the disorder in the chaos and we prove some -interesting properties with this distance. +formulation, we extend what we have proposed in~\cite{guyeux10}. The rest of this paper is organised as follows. In Section~\ref{section:related works} we review some GPU implementions of PRNG. Section~\ref{section:BASIC @@ -95,10 +93,7 @@ is studied. Section~\ref{sec:efficient prng} presents an efficient implementation of our chaotic PRNG on a CPU. Section~\ref{sec:efficient prng gpu} describes the GPU implementation of our chaotic PRNG. In Section~\ref{sec:experiments} some experimentations are presented. -Section~\ref{sec:de la relativité du désordre} describes the relativity of -disorder. In Section~\ref{sec: chaos order topology} the proof that chaotic -iterations can be described by iterations on a real interval is -established. Finally, we give a conclusion and some perspectives. + Finally, we give a conclusion and some perspectives. @@ -134,6 +129,12 @@ efficient one because it provides the fastest number of generated random numbers per joule. Concerning the GPU, authors can generate betweend 11 and 16GSample/s with a GTX 280 GPU. The drawback of this work is that those PRNGs only succeed the {\it Crush} test which is easier than the {\it Big Crush} test. + +Cuda has developped a library for the generation of random numbers called +Curand~\cite{curand11}. Several PRNGs are implemented: +Xorwow~\cite{Marsaglia2003} and some variants of Sobol. Some tests report that +the fastest version provides 15GSample/s on the new Fermi C2050 card. Their +PRNGs fail to succeed the whole tests of TestU01 on only one test. \newline \newline To the best of our knowledge no GPU implementation have been proven to have chaotic properties. @@ -413,7 +414,7 @@ It takes as input: a function $f$; an integer $b$, ensuring that the number of executed iterations is at least $b$ and at most $2b+1$; and an initial configuration $x^0$. It returns the new generated configuration $x$. Internally, it embeds two -\textit{XORshift}$(k)$ PRNGs \cite{Marsaglia2003} that returns integers +\textit{XORshift}$(k)$ PRNGs~\cite{Marsaglia2003} that returns integers uniformly distributed into $\llbracket 1 ; k \rrbracket$. \textit{XORshift} is a category of very fast PRNGs designed by George Marsaglia, @@ -860,6 +861,9 @@ and random number of our PRNG is equals to $100,000\times ((4+5+6)\times All the tests performed to pass the BigCrush of TestU01 succeeded. Different number of threads, called \texttt{NumThreads} in our algorithm, have been tested upto $10$ millions. +\newline +\newline +{\bf QUESTION : on laisse cette remarque, je suis mitigé !!!} \begin{remark} Algorithm~\ref{algo:gpu_kernel} has the advantage to manipulate independent @@ -896,7 +900,7 @@ tab1, tab2: Arrays containing permutations of size permutation\_size\;} \KwOut{NewNb: array containing random numbers in global memory} \If{threadId is concerned} { - retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory\; + retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\; offset = threadIdx\%permutation\_size\; o1 = threadIdx-offset+tab1[offset]\; o2 = threadIdx-offset+tab2[offset]\; @@ -939,6 +943,25 @@ Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general chaotic iterations presented previously, and for this reason, it satisfies the Devaney's formulation of a chaotic behavior. +\section{A cryptographically secure prng for GPU} + +It is possible to build a cryptographically secure prng based on the previous +algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing +the {\it xor-like} algorithm by another cryptographically secure prng. In +practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form: +$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers. Those +prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is +known to be slow and not efficient for the generation of random numbers. For +current GPU cards, the modulus operation is the most time consuming +operation. So in order to obtain quite reasonable performances, it is required +to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be +less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in +pratice we can choose prime numbers around 256 that are congruent to 3 modulus +4. With 32 bits numbers, only the 4 least significant bits of $x_n$ can be +chosen (the maximum number of undistinguishing is less or equals to +$log_2(log_2(x_n))$). So to generate a 32 bits number, we need to use 8 times +the BBS algorithm, with different combinations of $M$ is required. + \section{Experiments} \label{sec:experiments} @@ -951,8 +974,8 @@ cards have 240 cores. In Figure~\ref{fig:time_gpu} we compare the number of random numbers generated per second. The xor-like prng is a xor64 described in~\cite{Marsaglia2003}. In order to obtain the optimal performance we remove the storage of random numbers -in the GPU memory. This step is time consumming and slows down the random number -generation. Moreover, if you are interested by applications that consome random +in the GPU memory. This step is time consuming and slows down the random number +generation. Moreover, if you are interested by applications that consume random numbers directly when they are generated, their storage is completely useless. In this figure we can see that when the number of threads is greater than approximately 30,000 upto 5 millions the number of random numbers generated @@ -964,10 +987,10 @@ should be of better quality. \begin{figure}[htbp] \begin{center} - \includegraphics[scale=.7]{curve_time_gpu.pdf} + \includegraphics[scale=.7]{curve_time_xorlike_gpu.pdf} \end{center} -\caption{Number of random numbers generated per second} -\label{fig:time_gpu} +\caption{Number of random numbers generated per second with the xorlike based prng} +\label{fig:time_xorlike_gpu} \end{figure} @@ -977,11 +1000,20 @@ In comparison, Listing~\ref{algo:seqCIprng} allows us to generate about +\begin{figure}[htbp] +\begin{center} + \includegraphics[scale=.7]{curve_time_bbs_gpu.pdf} +\end{center} +\caption{Number of random numbers generated per second with the bbs based prng} +\label{fig:time_bbs_gpu} +\end{figure} + + -\section{Cryptanalysis of the Proposed PRNG} +%% \section{Cryptanalysis of the Proposed PRNG} -Mettre ici la preuve de PCH +%% Mettre ici la preuve de PCH %\section{The relativity of disorder} %\label{sec:de la relativité du désordre} @@ -1513,8 +1545,132 @@ Mettre ici la preuve de PCH +\section{Security Analysis} + + + + +In this section the concatenation of two strings $u$ and $v$ is classically +denoted by $uv$. +In a cryptographic context, a pseudo-random generator is a deterministic +algorithm $G$ transforming strings into strings and such that, for any +seed $w$ of length $N$, $G(w)$ (the output of $G$ on the input $w$) has size +$\ell_G(N)$ with $\ell_G(N)>N$. +The notion of {\it secure} PRNGs can now be defined as follows. + +\begin{definition} +A cryptographic PRNG $G$ is secure if for any probabilistic polynomial time +algorithm $D$, for any positive polynomial $p$, and for all sufficiently +large $k$'s, +$$| \mathrm{Pr}[D(G(U_k))=1]-Pr[D(U_{\ell_G(k)}=1]|< \frac{1}{p(N)},$$ +where $U_r$ is the uniform distribution over $\{0,1\}^r$ and the +probabilities are taken over $U_N$, $U_{\ell_G(N)}$ as well as over the +internal coin tosses of $D$. +\end{definition} + +Intuitively, it means that there is no polynomial time algorithm that can +distinguish a perfect uniform random generator from $G$ with a non +negligible probability. The interested reader is referred +to~\cite[chapter~3]{Goldreich} for more information. Note that it is +quite easily possible to change the function $\ell$ into any polynomial +function $\ell^\prime$ satisfying $\ell^\prime(N)>N)$~\cite[Chapter 3.3]{Goldreich}. + +The generation schema developed in (\ref{equation Oplus}) is based on a +pseudo-random generator. Let $H$ be a cryptographic PRNG. We may assume, +without loss of generality, that for any string $S_0$ of size $N$, the size +of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. +Let $S_1,\ldots,S_k$ be the +strings of length $N$ such that $H(S_0)=S_1 \ldots S_k$ ($H(S_0)$ is the concatenation of +the $S_i$'s). The cryptographic PRNG $X$ defined in (\ref{equation Oplus}) +is the algorithm mapping any string of length $2N$ $x_0S_0$ into the string +$(x_0\oplus S_0 \oplus S_1)(x_0\oplus S_0 \oplus S_1\oplus S_2)\ldots +(x_o\bigoplus_{i=0}^{i=k}S_i)$. Particularly one has $\ell_{X}(2N)=kN=\ell_H(N)$. +We claim now that if this PRNG is secure, +then the new one is secure too. + +\begin{proposition} +If $H$ is a secure cryptographic PRNG, then $X$ is a secure cryptographic +PRNG too. +\end{proposition} + +\begin{proof} +The proposition is proved by contraposition. Assume that $X$ is not +secure. By Definition, there exists a polynomial time probabilistic +algorithm $D$, a positive polynomial $p$, such that for all $k_0$ there exists +$N\geq \frac{k_0}{2}$ satisfying +$$| \mathrm{Pr}[D(X(U_{2N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)}.$$ +We describe a new probabilistic algorithm $D^\prime$ on an input $w$ of size +$kN$: +\begin{enumerate} +\item Decompose $w$ into $w=w_1\ldots w_{k}$, where each $w_i$ has size $N$. +\item Pick a string $y$ of size $N$ uniformly at random. +\item Compute $z=(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y + \bigoplus_{i=1}^{i=k} w_i).$ +\item Return $D(z)$. +\end{enumerate} + + +Consider for each $y\in \mathbb{B}^{kN}$ the function $\varphi_{y}$ +from $\mathbb{B}^{kN}$ into $\mathbb{B}^{kN}$ mapping $w=w_1\ldots w_k$ +(each $w_i$ has length $N$) to +$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y + \bigoplus_{i=1}^{i=k_1} w_i).$ By construction, one has for every $w$, +\begin{equation}\label{PCH-1} +D^\prime(w)=D(\varphi_y(w)), +\end{equation} +where $y$ is randomly generated. +Moreover, for each $y$, $\varphi_{y}$ is injective: if +$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y\bigoplus_{i=1}^{i=k_1} +w_i)=(y\oplus w_1^\prime)(y\oplus w_1^\prime\oplus w_2^\prime)\ldots +(y\bigoplus_{i=1}^{i=k} w_i^\prime)$, then for every $1\leq j\leq k$, +$y\bigoplus_{i=1}^{i=j} w_i^\prime=y\bigoplus_{i=1}^{i=j} w_i$. It follows, +by a direct induction, that $w_i=w_i^\prime$. Furthermore, since $\mathbb{B}^{kN}$ +is finite, each $\varphi_y$ is bijective. Therefore, and using (\ref{PCH-1}), +one has +\begin{equation}\label{PCH-2} +\mathrm{Pr}[D^\prime(U_{kN})=1]=\mathrm{Pr}[D(\varphi_y(U_{kN}))=1]=\mathrm{Pr}[D(U_{kN})=1]. +\end{equation} + +Now, using (\ref{PCH-1}) again, one has for every $x$, +\begin{equation}\label{PCH-3} +D^\prime(H(x))=D(\varphi_y(H(x))), +\end{equation} +where $y$ is randomly generated. By construction, $\varphi_y(H(x))=X(yx)$, +thus +\begin{equation}\label{PCH-3} +D^\prime(H(x))=D(yx), +\end{equation} +where $y$ is randomly generated. +It follows that + +\begin{equation}\label{PCH-4} +\mathrm{Pr}[D^\prime(H(U_{N}))=1]=\mathrm{Pr}[D(U_{2N})=1]. +\end{equation} + From (\ref{PCH-2}) and (\ref{PCH-4}), one can deduce that +there exist a polynomial time probabilistic +algorithm $D^\prime$, a positive polynomial $p$, such that for all $k_0$ there exists +$N\geq \frac{k_0}{2}$ satisfying +$$| \mathrm{Pr}[D(H(U_{N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)},$$ +proving that $H$ is not secure, a contradiction. +\end{proof} + + + + \section{Conclusion} + + +In this paper we have presented a new class of PRNGs based on chaotic +iterations. We have proven that these PRNGs are chaotic in the sense of Devenay. + +An efficient implementation on GPU allows us to generate a huge number of pseudo +random numbers per second (about 20Gsample/s). Our PRNGs succeed to pass the +hardest batteries of test (TestU01). + +In future work we plan to extend our work in order to have cryptographically +secure PRNGs because in some situations this property may be important. + \bibliographystyle{plain} \bibliography{mabase} \end{document}