From: guyeux Date: Sat, 10 Dec 2011 13:15:13 +0000 (+0100) Subject: Fin de mes reclectures X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/commitdiff_plain/ec3fcbcdbf09ed6be33553cbb41c938cf74d3a74 Fin de mes reclectures --- diff --git a/prng_gpu.tex b/prng_gpu.tex index ff2d42a..f21f19e 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -50,8 +50,9 @@ implementation for GPU that successfully passes the {\it BigCrush} tests, de battery of tests in TestU01. Experiments show that this PRNG can generate about 20 billions of random numbers per second on Tesla C1060 and NVidia GTX280 cards. -It is finally established that, under reasonable assumptions, the proposed PRNG can be cryptographically +It is then established that, under reasonable assumptions, the proposed PRNG can be cryptographically secure. +A chaotic version of the Blum-Goldwasser asymmetric key encryption scheme is finally proposed. \end{abstract} @@ -80,7 +81,7 @@ sequence. Finally, a small part of the community working in this domain focus on a third requirement, that is to define chaotic generators. The main idea is to take benefits from a chaotic dynamical system to obtain a -generator that is unpredictable, disordered, sensible to its seed, or in other words chaotic. +generator that is unpredictable, disordered, sensible to its seed, or in other word chaotic. Their desire is to map a given chaotic dynamics into a sequence that seems random and unassailable due to chaos. However, the chaotic maps used as a pattern are defined in the real line @@ -131,19 +132,21 @@ numbers inside a GPU when a scientific application runs in it. This remark motivates our proposal of a chaotic and statistically perfect PRNG for GPU. Such device allows us to generated almost 20 billions of pseudorandom numbers per second. -Last, but not least, we show that the proposed post-treatment preserves the +Furthermore, we show that the proposed post-treatment preserves the cryptographical security of the inputted PRNG, when this last has such a property. +Last, but not least, we propose a rewritten of the Blum-Goldwasser asymmetric +key encryption protocol by using the proposed method. The remainder of this paper is organized as follows. In Section~\ref{section:related works} we review some GPU implementations of PRNGs. Section~\ref{section:BASIC RECALLS} gives some basic recalls on the well-known Devaney's formulation of chaos, and on an iteration process called ``chaotic iterations'' on which the post-treatment is based. -Proofs of chaos are given in Section~\ref{sec:pseudorandom}. +The proposed PRNG and its proof of chaos are given in Section~\ref{sec:pseudorandom}. Section~\ref{sec:efficient PRNG} presents an efficient implementation of this chaotic PRNG on a CPU, whereas Section~\ref{sec:efficient PRNG - gpu} describes the GPU implementation. + gpu} describes and evaluates theoretically the GPU implementation. Such generators are experimented in Section~\ref{sec:experiments}. We show in Section~\ref{sec:security analysis} that, if the inputted @@ -151,7 +154,8 @@ generator is cryptographically secure, then it is the case too for the generator provided by the post-treatment. Such a proof leads to the proposition of a cryptographically secure and chaotic generator on GPU based on the famous Blum Blum Shum -in Section~\ref{sec:CSGPU}. +in Section~\ref{sec:CSGPU}, and to an improvement of the +Blum-Goldwasser protocol in Sect.~\ref{Blum-Goldwasser}. This research work ends by a conclusion section, in which the contribution is summarized and intended future work is presented. @@ -192,7 +196,7 @@ the performance of the same PRNGs on different architectures are compared. FPGA appears as the fastest and the most efficient architecture, providing the fastest number of generated pseudorandom numbers per joule. -However, we can notice that authors can ``only'' generate between 11 and 16GSamples/s +However, we notice that authors can ``only'' generate between 11 and 16GSamples/s with a GTX 280 GPU, which should be compared with the results presented in this document. We can remark too that the PRNGs proposed in~\cite{conf/fpga/ThomasHL09} are only @@ -414,8 +418,7 @@ The relation between $\Gamma(f)$ and $G_f$ is clear: there exists a path from $x$ to $x'$ in $\Gamma(f)$ if and only if there exists a strategy $s$ such that the parallel iteration of $G_f$ from the initial point $(s,x)$ reaches the point $x'$. - -We have finally proven in \cite{bcgr11:ip} that, +We have then proven in \cite{bcgr11:ip} that, \begin{theorem} @@ -424,14 +427,33 @@ Let $f:\mathds{B}^\mathsf{N}\to\mathds{B}^\mathsf{N}$. $G_f$ is chaotic (accord if and only if $\Gamma(f)$ is strongly connected. \end{theorem} -This result of chaos has lead us to study the possibility to build a +Finally, we have established in \cite{bcgr11:ip} that, +\begin{theorem} + Let $f: \mathds{B}^{n} \rightarrow \mathds{B}^{n}$, $\Gamma(f)$ its + iteration graph, $\check{M}$ its adjacency + matrix and $M$ + a $n\times n$ matrix defined by + $ + M_{ij} = \frac{1}{n}\check{M}_{ij}$ %\textrm{ + if $i \neq j$ and + $M_{ii} = 1 - \frac{1}{n} \sum\limits_{j=1, j\neq i}^n \check{M}_{ij}$ otherwise. + + If $\Gamma(f)$ is strongly connected, then + the output of the PRNG detailed in Algorithm~\ref{CI Algorithm} follows + a law that tends to the uniform distribution + if and only if $M$ is a double stochastic matrix. +\end{theorem} + + +These results of chaos and uniform distribution have lead us to study the possibility to build a pseudorandom number generator (PRNG) based on the chaotic iterations. As $G_f$, defined on the domain $\llbracket 1 ; \mathsf{N} \rrbracket^{\mathds{N}} \times \mathds{B}^\mathsf{N}$, is build from Boolean networks $f : \mathds{B}^\mathsf{N} \rightarrow \mathds{B}^\mathsf{N}$, we can preserve the theoretical properties on $G_f$ -during implementations (due to the discrete nature of $f$). It is as if +during implementations (due to the discrete nature of $f$). Indeed, it is as if $\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ; \mathsf{N} \rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG). +Let us finally remark that the vectorial negation satisfies the hypotheses of the two theorems above. \section{Application to Pseudorandomness} \label{sec:pseudorandom} @@ -494,19 +516,7 @@ with a bit shifted version of it. This PRNG, which has a period of $2^{32}-1=4.29\times10^9$, is summed up in Algorithm~\ref{XORshift}. It is used in our PRNG to compute the strategy length and the strategy elements. - -We have proven in \cite{bcgr11:ip} that, -\begin{theorem} - Let $f: \mathds{B}^{n} \rightarrow \mathds{B}^{n}$, $\Gamma(f)$ its - iteration graph, $\check{M}$ its adjacency - matrix and $M$ a $n\times n$ matrix defined as in the previous lemma. - If $\Gamma(f)$ is strongly connected, then - the output of the PRNG detailed in Algorithm~\ref{CI Algorithm} follows - a law that tends to the uniform distribution - if and only if $M$ is a double stochastic matrix. -\end{theorem} - -This former generator as successively passed various batteries of statistical tests, as the NIST~\cite{bcgr11:ip}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07}. +This former generator has successively passed various batteries of statistical tests, as the NIST~\cite{bcgr11:ip}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07} ones. \subsection{Improving the Speed of the Former Generator} @@ -890,12 +900,12 @@ simultaneously. In general, the larger the number of threads is, the more local memory is used, and the less branching instructions are used (if, while, ...), the better the performances on GPU is. Obviously, having these requirements in mind, it is possible to build -a program similar to the one presented in Algorithm +a program similar to the one presented in Listing \ref{algo:seqCIPRNG}, which computes pseudorandom numbers on GPU. To do so, we must firstly recall that in the CUDA~\cite{Nvid10} environment, threads have a local identifier called \texttt{ThreadIdx}, which is relative to the block containing -them. With CUDA parts of the code which are executed by the GPU are +them. Furthermore, in CUDA, parts of the code that are executed by the GPU are called {\it kernels}. @@ -1032,8 +1042,7 @@ last $t$ respects the requirement. Furthermore, it is possible to prove by an immediate mathematical induction that, as the initial $x$ is uniformly distributed (it is provided by a cryptographically secure PRNG), the two other stored values shmem[o1] and shmem[o2] are uniformly distributed too, -(this can be stated by an immediate mathematical -induction), and thus the next $x$ is finally uniformly distributed. +(this is the induction hypothesis), and thus the next $x$ is finally uniformly distributed. Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general chaotic iterations presented previously, and for this reason, it satisfies the @@ -1051,7 +1060,7 @@ All the cards have 240 cores. In Figure~\ref{fig:time_xorlike_gpu} we compare the quantity of pseudorandom numbers -generated per second with various xor-like based PRNG. In this figure, the optimized +generated per second with various xor-like based PRNGs. In this figure, the optimized versions use the {\it xor64} described in~\cite{Marsaglia2003}, whereas the naive versions embed the three xor-like PRNGs described in Listing~\ref{algo:seqCIPRNG}. In order to obtain the optimal performances, the storage of pseudorandom numbers @@ -1304,9 +1313,9 @@ tab: 2D Arrays containing 16 combinations (in first dimension) of size combinat t|=BBS1(bbs1)\&7\; t<<=BBS7(bbs7)\&3\; t|=BBS2(bbs2)\&7\; - t=t $\wedge$ shmem[o1] $\wedge$ shmem[o2]\; + t=t \^{ } shmem[o1] \^{ } shmem[o2]\; shared\_mem[threadId]=t\; - x = x $\wedge$ t\; + x = x \^{ } t\; store the new PRNG in NewNb[NumThreads*threadId+i]\; } @@ -1330,7 +1339,7 @@ positions of $t$. Let us remark that to initialize $t$ is not a necessity as we fill it 4 bits by 4 bits, until having obtained 32 bits. The two last new shifts are realized in order to enlarge -the small periods of the BBS used here, to introduce a variability. +the small periods of the BBS used here, to introduce a kind of variability. In these operations, we make twice a left shift of $t$ of \emph{at most} 3 bits and we put \emph{exactly} the 3 last bits from a BBS into the 3 last bits of $t$, leading possibly to a loss of a few @@ -1338,7 +1347,7 @@ bits of $t$. It should be noticed that this generator has another time the form $x^{n+1} = x^n \oplus S^n$, where $S^n$ is referred in this algorithm as $t$: each iteration of this -PRNG ends with $x = x \wedge t;$. This $S^n$ is only constituted +PRNG ends with $x = x \wedge t$. This $S^n$ is only constituted by secure bits produced by the BBS generator, and thus, due to Proposition~\ref{cryptopreuve}, the resulted PRNG is cryptographically secure @@ -1346,7 +1355,7 @@ secure \subsection{Toward a Cryptographically Secure and Chaotic Asymmetric Cryptosystem} - +\label{Blum-Goldwasser} We finish this research work by giving some thoughts about the use of the proposed PRNG in an asymmetric cryptosystem. This first approach will be further investigated in a future work. @@ -1374,7 +1383,7 @@ Suppose Bob wishes to send a string $m=(m_0, \dots, m_{L-1})$ of $L$ bits to Ali \item $i=0$. \item While $i \leqslant L-1$: \begin{itemize} -\item Set $b_i$ equal to the least-significant\footnote{BBS can securely output up to $\mathsf{N} = \lfloor log(log(N)) \rfloor$ of the least-significant bits of $x_i$ during each round.} bit of $x_i$, +\item Set $b_i$ equal to the least-significant\footnote{As signaled previously, BBS can securely output up to $\mathsf{N} = \lfloor log(log(N)) \rfloor$ of the least-significant bits of $x_i$ during each round.} bit of $x_i$, \item $i=i+1$, \item $x_i = (x_{i-1})^2~mod~N.$ \end{itemize} @@ -1415,16 +1424,25 @@ the inheritance of all the properties presented in this paper. \section{Conclusion} -In this paper we have presented a new class of PRNGs based on chaotic -iterations. We have proven that these PRNGs are chaotic in the sense of Devaney. -We also propose a PRNG cryptographically secure and its implementation on GPU. - -An efficient implementation on GPU based on a xor-like PRNG allows us to -generate a huge number of pseudorandom numbers per second (about -20Gsamples/s). This PRNG succeeds to pass the hardest batteries of TestU01. - -In future work we plan to extend this work for parallel PRNG for clusters or -grid computing. +In this paper, a formerly proposed PRNG based on chaotic iterations +has been generalized to improve its speed. It has been proven to be +chaotic according to Devaney. +Efficient implementations on GPU using xor-like PRNGs as input generators +shown that a very large quantity of pseudorandom numbers can be generated per second (about +20Gsamples/s), and that these proposed PRNGs succeed to pass the hardest battery in TestU01, +namely the BigCrush. +Furthermore, we have shown that when the inputted generator is cryptographically +secure, then it is the case too for the PRNG we propose, thus leading to +the possibility to develop fast and secure PRNGs using the GPU architecture. +Thoughts about an improvement of the Blum-Goldwasser cryptosystem, using the +proposed method, has been finally proposed. + +In future work we plan to extend these researches, building a parallel PRNG for clusters or +grid computing. Topological properties of the various proposed generators will be investigated, +and the use of other categories of PRNGs as input will be studied too. The improvement +of Blum-Goldwasser will be deepened. Finally, we +will try to enlarge the quantity of pseudorandom numbers generated per second either +in a simulation context or in a cryptographic one.