X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/prng_gpu.git/blobdiff_plain/5ed99022e098863388088d7f23852378a56cdb5b..1ad3649fd6e60ffa7c238cf99b577c7cce7d7b26:/prng_gpu.tex diff --git a/prng_gpu.tex b/prng_gpu.tex index 792d8ce..f357476 100644 --- a/prng_gpu.tex +++ b/prng_gpu.tex @@ -1,4 +1,5 @@ -\documentclass{article} +%\documentclass{article} +\documentclass[10pt,journal,letterpaper,compsoc]{IEEEtran} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{fullpage} @@ -7,8 +8,15 @@ \usepackage{amscd} \usepackage{moreverb} \usepackage{commath} +\usepackage[ruled,vlined]{algorithm2e} +\usepackage{listings} \usepackage[standard]{ntheorem} - +\usepackage{algorithmic} +\usepackage{slashbox} +\usepackage{ctable} +\usepackage{cite} +\usepackage{tabularx} +\usepackage{multirow} % Pour mathds : les ensembles IR, IN, etc. \usepackage{dsfont} @@ -18,8 +26,11 @@ \usepackage{graphicx} % Pour faire des sous-figures dans les figures \usepackage{subfigure} +\usepackage{xr-hyper} +\usepackage{hyperref} +\externaldocument[A-]{supplementary} + -\usepackage{color} \newtheorem{notation}{Notation} @@ -32,399 +43,2075 @@ \newcommand{\alert}[1]{\begin{color}{blue}\textit{#1}\end{color}} -\title{Efficient generation of pseudo random numbers based on chaotic iterations on GPU} -\begin{document} -\maketitle - -\begin{abstract} -This is the abstract -\end{abstract} - -\section{Introduction} -Interet des itérations chaotiques pour générer des nombre alea\\ -Interet de générer des nombres alea sur GPU -... -\section{Chaotic iterations} - -Présentation des itérations chaotiques +\title{Efficient and Cryptographically Secure Generation of Chaotic Pseudorandom Numbers on GPU} +\begin{document} +\author{Jacques M. Bahi, Rapha\"{e}l Couturier, Christophe +Guyeux, and Pierre-Cyrille Héam\thanks{Authors in alphabetic order}} + +\IEEEcompsoctitleabstractindextext{ +\begin{abstract} +In this paper we present a new pseudorandom number generator (PRNG) on +graphics processing units (GPU). This PRNG is based on the so-called chaotic iterations and +it is thus chaotic according to the Devaney's formulation. We propose an efficient +implementation for GPU that successfully passes the {\it BigCrush} tests, deemed to be the hardest +battery of tests in TestU01. Experiments show that this PRNG can generate +about 20 billion of random numbers per second on Tesla C1060 and NVidia GTX280 +cards. +It is then established that, under reasonable assumptions, the proposed PRNG can be cryptographically +secure. +A chaotic version of the Blum-Goldwasser asymmetric key encryption scheme is finally proposed. -\section{The relativity of disorder} -\label{sec:de la relativité du désordre} +\end{abstract} +} -\subsection{Impact of the topology's finenesse} +\maketitle -Let us firstly introduce the following notations. +\IEEEdisplaynotcompsoctitleabstractindextext +\IEEEpeerreviewmaketitle -\begin{notation} -$\mathcal{X}_\tau$ will denote the topological space $\left(\mathcal{X},\tau\right)$, whereas $\mathcal{V}_\tau (x)$ will be the set of all the neighborhoods of $x$ when considering the topology $\tau$ (or simply $\mathcal{V} (x)$, if there is no ambiguity). -\end{notation} +\section{Introduction} +Randomness is of importance in many fields such as scientific simulations or cryptography. +``Random numbers'' can mainly be generated either by a deterministic and reproducible algorithm +called a pseudorandom number generator (PRNG), or by a physical non-deterministic +process having all the characteristics of a random noise, called a truly random number +generator (TRNG). +In this paper, we focus on reproducible generators, useful for instance in +Monte-Carlo based simulators or in several cryptographic schemes. +These domains need PRNGs that are statistically irreproachable. +In some fields such as in numerical simulations, speed is a strong requirement +that is usually attained by using parallel architectures. In that case, +a recurrent problem is that a deflation of the statistical qualities is often +reported, when the parallelization of a good PRNG is realized. +This is why ad-hoc PRNGs for each possible architecture must be found to +achieve both speed and randomness. +On the other side, speed is not the main requirement in cryptography: the great +need is to define \emph{secure} generators able to withstand malicious +attacks. Roughly speaking, an attacker should not be able in practice to make +the distinction between numbers obtained with the secure generator and a true random +sequence. However, in an equivalent formulation, he or she should not be +able (in practice) to predict the next bit of the generator, having the knowledge of all the +binary digits that have been already released. ``Being able in practice'' refers here +to the possibility to achieve this attack in polynomial time, and to the exponential growth +of the difficulty of this challenge when the size of the parameters of the PRNG increases. + + +Finally, a small part of the community working in this domain focuses on a +third requirement, that is to define chaotic generators. +The main idea is to take benefits from a chaotic dynamical system to obtain a +generator that is unpredictable, disordered, sensible to its seed, or in other word chaotic. +Their desire is to map a given chaotic dynamics into a sequence that seems random +and unassailable due to chaos. +However, the chaotic maps used as a pattern are defined in the real line +whereas computers deal with finite precision numbers. +This distortion leads to a deflation of both chaotic properties and speed. +Furthermore, authors of such chaotic generators often claim their PRNG +as secure due to their chaos properties, but there is no obvious relation +between chaos and security as it is understood in cryptography. +This is why the use of chaos for PRNG still remains marginal and disputable. + +The authors' opinion is that topological properties of disorder, as they are +properly defined in the mathematical theory of chaos, can reinforce the quality +of a PRNG. But they are not substitutable for security or statistical perfection. +Indeed, to the authors' mind, such properties can be useful in the two following situations. On the +one hand, a post-treatment based on a chaotic dynamical system can be applied +to a PRNG statistically deflective, in order to improve its statistical +properties. Such an improvement can be found, for instance, in~\cite{bgw09:ip,bcgr11:ip}. +On the other hand, chaos can be added to a fast, statistically perfect PRNG and/or a +cryptographically secure one, in case where chaos can be of interest, +\emph{only if these last properties are not lost during +the proposed post-treatment}. Such an assumption is behind this research work. +It leads to the attempts to define a +family of PRNGs that are chaotic while being fast and statistically perfect, +or cryptographically secure. +Let us finish this paragraph by noticing that, in this paper, +statistical perfection refers to the ability to pass the whole +{\it BigCrush} battery of tests, which is widely considered as the most +stringent statistical evaluation of a sequence claimed as random. +This battery can be found in the well-known TestU01 package~\cite{LEcuyerS07}. +More precisely, each time we performed a test on a PRNG, we ran it +twice in order to observe if all $p-$values are inside [0.01, 0.99]. In +fact, we observed that few $p-$values (less than ten) are sometimes +outside this interval but inside [0.001, 0.999], so that is why a +second run allows us to confirm that the values outside are not for +the same test. With this approach all our PRNGs pass the {\it + BigCrush} successfully and all $p-$values are at least once inside +[0.01, 0.99]. +Chaos, for its part, refers to the well-established definition of a +chaotic dynamical system proposed by Devaney~\cite{Devaney}. + +In a previous work~\cite{bgw09:ip,guyeux10} we have proposed a post-treatment on PRNGs making them behave +as a chaotic dynamical system. Such a post-treatment leads to a new category of +PRNGs. We have shown that proofs of Devaney's chaos can be established for this +family, and that the sequence obtained after this post-treatment can pass the +NIST~\cite{Nist10}, DieHARD~\cite{Marsaglia1996}, and TestU01~\cite{LEcuyerS07} batteries of tests, even if the inputted generators +cannot. +The proposition of this paper is to improve widely the speed of the formerly +proposed generator, without any lack of chaos or statistical properties. +In particular, a version of this PRNG on graphics processing units (GPU) +is proposed. +Although GPU was initially designed to accelerate +the manipulation of images, they are nowadays commonly used in many scientific +applications. Therefore, it is important to be able to generate pseudorandom +numbers inside a GPU when a scientific application runs in it. This remark +motivates our proposal of a chaotic and statistically perfect PRNG for GPU. +Such device +allows us to generate almost 20 billion of pseudorandom numbers per second. +Furthermore, we show that the proposed post-treatment preserves the +cryptographical security of the inputted PRNG, when this last has such a +property. +Last, but not least, we propose a rewriting of the Blum-Goldwasser asymmetric +key encryption protocol by using the proposed method. + + +{\bf Main contributions.} In this paper a new PRNG using chaotic iteration +is defined. From a theoretical point of view, it is proven that it has fine +topological chaotic properties and that it is cryptographically secured (when +the initial PRNG is also cryptographically secured). From a practical point of +view, experiments point out a very good statistical behavior. An optimized +original implementation of this PRNG is also proposed and experimented. +Pseudorandom numbers are generated at a rate of 20GSamples/s, which is faster +than in~\cite{conf/fpga/ThomasHL09,Marsaglia2003} (and with a better +statistical behavior). Experiments are also provided using BBS as the initial +random generator. The generation speed is significantly weaker. +Note also that an original qualitative comparison between topological chaotic +properties and statistical test is also proposed. + + + + +The remainder of this paper is organized as follows. In Section~\ref{section:related + works} we review some GPU implementations of PRNGs. Section~\ref{section:BASIC + RECALLS} gives some basic recalls on the well-known Devaney's formulation of chaos, + and on an iteration process called ``chaotic +iterations'' on which the post-treatment is based. +The proposed PRNG and its proof of chaos are given in Section~\ref{sec:pseudorandom}. +Section~\ref{sec:efficient PRNG} %{The generation of pseudorandom sequence} %illustrates the statistical +%improvement related to the chaotic iteration based post-treatment, for +%our previously released PRNGs and + contains a new efficient +implementation on CPU. + Section~\ref{sec:efficient PRNG + gpu} describes and evaluates theoretically the GPU implementation. +Such generators are experimented in +Section~\ref{sec:experiments}. +We show in Section~\ref{sec:security analysis} that, if the inputted +generator is cryptographically secure, then it is the case too for the +generator provided by the post-treatment. +%A practical +%security evaluation is also outlined in Section~\ref{sec:Practicak evaluation}. +Such a proof leads to the proposition of a cryptographically secure and +chaotic generator on GPU based on the famous Blum Blum Shub +in Section~\ref{sec:CSGPU} and to an improvement of the +Blum-Goldwasser protocol in Sect.~\ref{Blum-Goldwasser}. +This research work ends by a conclusion section, in which the contribution is +summarized and intended future work is presented. + + + + +\section{Related work on GPU based PRNGs} +\label{section:related works} + +Numerous research works on defining GPU based PRNGs have already been proposed in the +literature, so that exhaustivity is impossible. +This is why authors of this document only give reference to the most significant attempts +in this domain, from their subjective point of view. +The quantity of pseudorandom numbers generated per second is mentioned here +only when the information is given in the related work. +A million numbers per second will be simply written as +1MSample/s whereas a billion numbers per second is 1GSample/s. + +In \cite{Pang:2008:cec} a PRNG based on cellular automata is defined +with no requirement to an high precision integer arithmetic or to any bitwise +operations. Authors can generate about +3.2MSamples/s on a GeForce 7800 GTX GPU, which is quite an old card now. +However, there is neither a mention of statistical tests nor any proof of +chaos or cryptography in this document. + +In \cite{ZRKB10}, the authors propose different versions of efficient GPU PRNGs +based on Lagged Fibonacci or Hybrid Taus. They have used these +PRNGs for Langevin simulations of biomolecules fully implemented on +GPU. Performances of the GPU versions are far better than those obtained with a +CPU, and these PRNGs succeed to pass the {\it BigCrush} battery of TestU01. +However the evaluations of the proposed PRNGs are only statistical ones. + + +Authors of~\cite{conf/fpga/ThomasHL09} have studied the implementation of some +PRNGs on different computing architectures: CPU, field-programmable gate array +(FPGA), massively parallel processors, and GPU. This study is of interest, because +the performance of the same PRNGs on different architectures are compared. +FPGA appears as the fastest and the most +efficient architecture, providing the fastest number of generated pseudorandom numbers +per joule. +However, we notice that authors can ``only'' generate between 11 and 16GSamples/s +with a GTX 280 GPU, which should be compared with +the results presented in this document. +We can remark too that the PRNGs proposed in~\cite{conf/fpga/ThomasHL09} are only +able to pass the {\it Crush} battery, which is far easier than the {\it Big Crush} one. + +Lastly, Cuda has developed a library for the generation of pseudorandom numbers called +Curand~\cite{curand11}. Several PRNGs are implemented, among +other things +Xorwow~\cite{Marsaglia2003} and some variants of Sobol. The tests reported show that +their fastest version provides 15GSamples/s on the new Fermi C2050 card. +But their PRNGs cannot pass the whole TestU01 battery (only one test is failed). +\newline +\newline +We can finally remark that, to the best of our knowledge, no GPU implementation has been proven to be chaotic, and the cryptographically secure property has surprisingly never been considered. + +\section{Basic Recalls} +\label{section:BASIC RECALLS} + +This section is devoted to basic definitions and terminologies in the fields of +topological chaos and chaotic iterations. We assume the reader is familiar +with basic notions on topology (see for instance~\cite{Devaney}). + + +\subsection{Devaney's Chaotic Dynamical Systems} +\label{subsec:Devaney} +In the sequel $S^{n}$ denotes the $n^{th}$ term of a sequence $S$ and $V_{i}$ +denotes the $i^{th}$ component of a vector $V$. $f^{k}=f\circ ...\circ f$ +is for the $k^{th}$ composition of a function $f$. Finally, the following +notation is used: $\llbracket1;N\rrbracket=\{1,2,\hdots,N\}$. + + +Consider a topological space $(\mathcal{X},\tau)$ and a continuous function $f : +\mathcal{X} \rightarrow \mathcal{X}$. -\section{Chaos on the order topology} +\begin{definition} +The function $f$ is said to be \emph{topologically transitive} if, for any pair of open sets +$U,V \subset \mathcal{X}$, there exists $k>0$ such that $f^k(U) \cap V \neq +\varnothing$. +\end{definition} -\subsection{The phase space is an interval of the real line} +\begin{definition} +An element $x$ is a \emph{periodic point} for $f$ of period $n\in \mathds{N}^*$ +if $f^{n}(x)=x$.% The set of periodic points of $f$ is denoted $Per(f).$ +\end{definition} -\subsubsection{Toward a topological semiconjugacy} +\begin{definition} +$f$ is said to be \emph{regular} on $(\mathcal{X}, \tau)$ if the set of periodic +points for $f$ is dense in $\mathcal{X}$: for any point $x$ in $\mathcal{X}$, +any neighborhood of $x$ contains at least one periodic point (without +necessarily the same period). +\end{definition} -In what follows, our intention is to establish, by using a topological semiconjugacy, that chaotic iterations over $\mathcal{X}$ can be described as iterations on a real interval. To do so, we must firstly introduce some notations and terminologies. -Let $\mathcal{S}_\mathsf{N}$ be the set of sequences belonging into $\llbracket 1; \mathsf{N}\rrbracket$ and $\mathcal{X}_{\mathsf{N}} = \mathcal{S}_\mathsf{N} \times \B^\mathsf{N}$. +\begin{definition}[Devaney's formulation of chaos~\cite{Devaney}] +The function $f$ is said to be \emph{chaotic} on $(\mathcal{X},\tau)$ if $f$ is regular and +topologically transitive. +\end{definition} +The chaos property is strongly linked to the notion of ``sensitivity'', defined +on a metric space $(\mathcal{X},d)$ by: \begin{definition} -The function $\varphi: \mathcal{S}_{10} \times\mathds{B}^{10} \rightarrow \big[ 0, 2^{10} \big[$ is defined by: -$$ -\begin{array}{cccl} -\varphi: & \mathcal{X}_{10} = \mathcal{S}_{10} \times\mathds{B}^{10}& \longrightarrow & \big[ 0, 2^{10} \big[ \\ - & (S,E) = \left((S^0, S^1, \hdots ); (E_0, \hdots, E_9)\right) & \longmapsto & \varphi \left((S,E)\right) -\end{array} -$$ -\noindent where $\varphi\left((S,E)\right)$ is the real number: -\begin{itemize} -\item whose integral part $e$ is $\displaystyle{\sum_{k=0}^9 2^{9-k} E_k}$, that is, the binary digits of $e$ are $E_0 ~ E_1 ~ \hdots ~ E_9$. -\item whose decimal part $s$ is equal to $s = 0,S^0~ S^1~ S^2~ \hdots = \sum_{k=1}^{+\infty} 10^{-k} S^{k-1}.$ -\end{itemize} +\label{sensitivity} The function $f$ has \emph{sensitive dependence on initial conditions} +if there exists $\delta >0$ such that, for any $x\in \mathcal{X}$ and any +neighborhood $V$ of $x$, there exist $y\in V$ and $n > 0$ such that +$d\left(f^{n}(x), f^{n}(y)\right) >\delta $. + +The constant $\delta$ is called the \emph{constant of sensitivity} of $f$. \end{definition} +Indeed, Banks \emph{et al.} have proven in~\cite{Banks92} that when $f$ is +chaotic and $(\mathcal{X}, d)$ is a metric space, then $f$ has the property of +sensitive dependence on initial conditions (this property was formerly an +element of the definition of chaos). To sum up, quoting Devaney +in~\cite{Devaney}, a chaotic dynamical system ``is unpredictable because of the +sensitive dependence on initial conditions. It cannot be broken down or +simplified into two subsystems which do not interact because of topological +transitivity. And in the midst of this random behavior, we nevertheless have an +element of regularity''. Fundamentally different behaviors are consequently +possible and occur in an unpredictable way. -$\varphi$ realizes the association between a point of $\mathcal{X}_{10}$ and a real number into $\big[ 0, 2^{10} \big[$. We must now translate the chaotic iterations $\Go$ on this real interval. To do so, two intermediate functions over $\big[ 0, 2^{10} \big[$ must be introduced: +\subsection{Chaotic Iterations} +\label{sec:chaotic iterations} -\begin{definition} -\label{def:e et s} -Let $x \in \big[ 0, 2^{10} \big[$ and: -\begin{itemize} -\item $e_0, \hdots, e_9$ the binary digits of the integral part of $x$: $\displaystyle{\lfloor x \rfloor = \sum_{k=0}^{9} 2^{9-k} e_k}$. -\item $(s^k)_{k\in \mathds{N}}$ the digits of $x$, where the chosen decimal decomposition of $x$ is the one that does not have an infinite number of 9: -$\displaystyle{x = \lfloor x \rfloor + \sum_{k=0}^{+\infty} s^k 10^{-k-1}}$. -\end{itemize} -$e$ and $s$ are thus defined as follows: -$$ -\begin{array}{cccl} -e: & \big[ 0, 2^{10} \big[ & \longrightarrow & \mathds{B}^{10} \\ - & x & \longmapsto & (e_0, \hdots, e_9) -\end{array} -$$ -\noindent and -$$ -\begin{array}{cccl} -s: & \big[ 0, 2^{10} \big[ & \longrightarrow & \llbracket 0, 9 \rrbracket^{\mathds{N}} \\ - & x & \longmapsto & (s^k)_{k \in \mathds{N}} -\end{array} -$$ -\end{definition} -We are now able to define the function $g$, whose goal is to translate the chaotic iterations $\Go$ on an interval of $\mathds{R}$. +Let us consider a \emph{system} with a finite number $\mathsf{N} \in +\mathds{N}^*$ of elements (or \emph{cells}), so that each cell has a +Boolean \emph{state}. Having $\mathsf{N}$ Boolean values for these + cells leads to the definition of a particular \emph{state of the +system}. A sequence which elements belong to $\llbracket 1;\mathsf{N} +\rrbracket $ is called a \emph{strategy}. The set of all strategies is +denoted by $\llbracket 1, \mathsf{N} \rrbracket^\mathds{N}.$ \begin{definition} -$g:\big[ 0, 2^{10} \big[ \longrightarrow \big[ 0, 2^{10} \big[$ is defined by: -$$ -\begin{array}{cccl} -g: & \big[ 0, 2^{10} \big[ & \longrightarrow & \big[ 0, 2^{10} \big[ \\ -& \\ - & x & \longmapsto & g(x) -\end{array} -$$ -\noindent where g(x) is the real number of $\big[ 0, 2^{10} \big[$ defined bellow: -\begin{itemize} -\item its integral part has a binary decomposition equal to $e_0', \hdots, e_9'$, with: -$$ -e_i' = \left\{ +\label{Def:chaotic iterations} +The set $\mathds{B}$ denoting $\{0,1\}$, let +$f:\mathds{B}^{\mathsf{N}}\longrightarrow \mathds{B}^{\mathsf{N}}$ be +a function and $S\in \llbracket 1, \mathsf{N} \rrbracket^\mathds{N}$ be a ``strategy''. The so-called +\emph{chaotic iterations} are defined by $x^0\in +\mathds{B}^{\mathsf{N}}$ and +\begin{equation} +\forall n\in \mathds{N}^{\ast }, \forall i\in +\llbracket1;\mathsf{N}\rrbracket ,x_i^n=\left\{ \begin{array}{ll} -e(x)_i & \textrm{ if } i \neq s^0\\ -e(x)_i + 1 \textrm{ (mod 2)} & \textrm{ if } i = s^0\\ -\end{array} -\right. -$$ -\item whose decimal part is $s(x)^1, s(x)^2, \hdots$ -\end{itemize} + x_i^{n-1} & \text{ if }S^n\neq i \\ + \left(f(x^{n-1})\right)_{S^n} & \text{ if }S^n=i. +\end{array}\right. +\end{equation} \end{definition} -\bigskip +In other words, at the $n^{th}$ iteration, only the $S^{n}-$th cell is +\textquotedblleft iterated\textquotedblright . Note that in a more +general formulation, $S^n$ can be a subset of components and +$\left(f(x^{n-1})\right)_{S^{n}}$ can be replaced by +$\left(f(x^{k})\right)_{S^{n}}$, where $k0$. \medskip +%% \begin{itemize} +%% \item If $\varepsilon \geqslant 1$, we see that the distance +%% between $\left( G_{f}(S^n,E^n)\right) $ and $\left( G_{f}(S,E)\right) $ is +%% strictly less than 1 after the $max(n_{0},n_{1})^{th}$ term (same state). +%% \medskip +%% \item If $\varepsilon <1$, then $\exists k\in \mathds{N},10^{-k}\geqslant +%% \varepsilon > 10^{-(k+1)}$. But $d_{s}(S^n,S)$ converges to 0, so +%% \begin{equation*} +%% \exists n_{2}\in \mathds{N},\forall n\geqslant +%% n_{2},d_{s}(S^n,S)<10^{-(k+2)}, +%% \end{equation*}% +%% thus after $n_{2}$, the $k+2$ first terms of $S^n$ and $S$ are equal. +%% \end{itemize} +%% \noindent As a consequence, the $k+1$ first entries of the strategies of $% +%% G_{f}(S^n,E^n)$ and $G_{f}(S,E)$ are the same ($G_{f}$ is a shift of strategies) and due to the definition of $d_{s}$, the floating part of +%% the distance between $(S^n,E^n)$ and $(S,E)$ is strictly less than $% +%% 10^{-(k+1)}\leqslant \varepsilon $. + +%% In conclusion, +%% %%RAPH : ici j'ai rajouté une ligne +%% %%TOF : ici j'ai rajouté un commentaire +%% %%TOF : ici aussi +%% $ +%% \forall \varepsilon >0,$ $\exists N_{0}=max(n_{0},n_{1},n_{2})\in \mathds{N} +%% ,$ $\forall n\geqslant N_{0},$ +%% $ d\left( G_{f}(S^n,E^n);G_{f}(S,E)\right) +%% \leqslant \varepsilon . +%% $ +%% $G_{f}$ is consequently continuous. +%% \end{proof} + + +%% It is now possible to study the topological behavior of the general chaotic +%% iterations. We will prove that, + +%% \begin{theorem} +%% \label{t:chaos des general} +%% The general chaotic iterations defined on Equation~\ref{general CIs} satisfy +%% the Devaney's property of chaos. +%% \end{theorem} + +%% Let us firstly prove the following lemma. + +%% \begin{lemma}[Strong transitivity] +%% \label{strongTrans} +%% For all couples $X,Y \in \mathcal{X}$ and any neighborhood $V$ of $X$, we can +%% find $n \in \mathds{N}^*$ and $X' \in V$ such that $G^n(X')=Y$. +%% \end{lemma} + +%% \begin{proof} +%% Let $X=(S,E)$, $\varepsilon>0$, and $k_0 = \lfloor log_{10}(\varepsilon)+1 \rfloor$. +%% Any point $X'=(S',E')$ such that $E'=E$ and $\forall k \leqslant k_0, S'^k=S^k$, +%% are in the open ball $\mathcal{B}\left(X,\varepsilon\right)$. Let us define +%% $\check{X} = \left(\check{S},\check{E}\right)$, where $\check{X}= G^{k_0}(X)$. +%% We denote by $s\subset \llbracket 1; \mathsf{N} \rrbracket$ the set of coordinates +%% that are different between $\check{E}$ and the state of $Y$. Thus each point $X'$ of +%% the form $(S',E')$ where $E'=E$ and $S'$ starts with +%% $(S^0, S^1, \hdots, S^{k_0},s,\hdots)$, verifies the following properties: +%% \begin{itemize} +%% \item $X'$ is in $\mathcal{B}\left(X,\varepsilon\right)$, +%% \item the state of $G_f^{k_0+1}(X')$ is the state of $Y$. +%% \end{itemize} +%% Finally the point $\left(\left(S^0, S^1, \hdots, S^{k_0},s,s^0, s^1, \hdots\right); E\right)$, +%% where $(s^0,s^1, \hdots)$ is the strategy of $Y$, satisfies the properties +%% claimed in the lemma. +%% \end{proof} + +%% We can now prove the Theorem~\ref{t:chaos des general}. + +%% \begin{proof}[Theorem~\ref{t:chaos des general}] +%% Firstly, strong transitivity implies transitivity. + +%% Let $(S,E) \in\mathcal{X}$ and $\varepsilon >0$. To +%% prove that $G_f$ is regular, it is sufficient to prove that +%% there exists a strategy $\tilde S$ such that the distance between +%% $(\tilde S,E)$ and $(S,E)$ is less than $\varepsilon$, and such that +%% $(\tilde S,E)$ is a periodic point. + +%% Let $t_1=\lfloor-\log_{10}(\varepsilon)\rfloor$, and let $E'$ be the +%% configuration that we obtain from $(S,E)$ after $t_1$ iterations of +%% $G_f$. As $G_f$ is strongly transitive, there exists a strategy $S'$ +%% and $t_2\in\mathds{N}$ such +%% that $E$ is reached from $(S',E')$ after $t_2$ iterations of $G_f$. + +%% Consider the strategy $\tilde S$ that alternates the first $t_1$ terms +%% of $S$ and the first $t_2$ terms of $S'$: +%% %%RAPH : j'ai coupé la ligne en 2 +%% $$\tilde +%% S=(S_0,\dots,S_{t_1-1},S'_0,\dots,S'_{t_2-1},S_0,$$$$\dots,S_{t_1-1},S'_0,\dots,S'_{t_2-1},S_0,\dots).$$ It +%% is clear that $(\tilde S,E)$ is obtained from $(\tilde S,E)$ after +%% $t_1+t_2$ iterations of $G_f$. So $(\tilde S,E)$ is a periodic +%% point. Since $\tilde S_t=S_t$ for $t0$ et $\liminf_{n \rightarrow +\infty} d(f^{(n)}(x), f^{(n)}(y))=0$, meaning that their orbits always oscillate as the iterations pass. When a system is compact and contains an uncountable set of such points, it is claimed as chaotic according +%% to Li-Yorke~\cite{Li75,Ruette2001}. A similar property is regarded in the following NIST test~\cite{Nist10}. +%% \begin{itemize} +%% \item \textbf{Runs Test}. To determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. In particular, this test determines whether the oscillation between such zeros and ones is too fast or too slow. +%% \end{itemize} +%% \item \textbf{Topological entropy}. The desire to formulate an equivalency of the thermodynamics entropy +%% has emerged both in the topological and statistical fields. Once again, a similar objective has led to two different +%% rewritting of an entropy based disorder: the famous Shannon definition of entropy is approximated in the statistical approach, +%% whereas topological entropy is defined as follows: +%% $x,y \in \mathcal{X}$ are $\varepsilon-$\emph{separated in time $n$} if there exists $k \leqslant n$ such that $d\left(f^{(k)}(x),f^{(k)}(y)\right)>\varepsilon$. Then $(n,\varepsilon)-$separated sets are sets of points that are all $\varepsilon-$separated in time $n$, which +%% leads to the definition of $s_n(\varepsilon,Y)$, being the maximal cardinality of all $(n,\varepsilon)-$separated sets. Using these notations, +%% the topological entropy is defined as follows: $$h_{top}(\mathcal{X},f) = \displaystyle{\lim_{\varepsilon \rightarrow 0} \Big[ \limsup_{n \rightarrow +\infty} \dfrac{1}{n} \log s_n(\varepsilon,\mathcal{X})\Big]}.$$ +%% This value measures the average exponential growth of the number of distinguishable orbit segments. +%% In this sense, it measures the complexity of the topological dynamical system, whereas +%% the Shannon approach comes to mind when defining the following test~\cite{Nist10}: +%% \begin{itemize} +%% \item \textbf{Approximate Entropy Test}. Compare the frequency of the overlapping blocks of two consecutive/adjacent lengths ($m$ and $m+1$) against the expected result for a random sequence. +%% \end{itemize} + +%% \item \textbf{Non-linearity, complexity}. Finally, let us remark that non-linearity and complexity are +%% not only sought in general to obtain chaos, but they are also required for randomness, as illustrated by the two tests below~\cite{Nist10}. +%% \begin{itemize} +%% \item \textbf{Binary Matrix Rank Test}. Check for linear dependence among fixed length substrings of the original sequence. +%% \item \textbf{Linear Complexity Test}. Determine whether or not the sequence is complex enough to be considered random. +%% \end{itemize} +%% \end{itemize} + + +%% We have proven in our previous works~\cite{guyeux12:bc} that chaotic iterations satisfying Theorem~\ref{Th:Caractérisation des IC chaotiques} are, among other +%% things, strongly transitive, topologically mixing, chaotic as defined by Li and Yorke, +%% and that they have a topological entropy and an exponent of Lyapunov both equal to $ln(\mathsf{N})$, +%% where $\mathsf{N}$ is the size of the iterated vector. +%% These topological properties make that we are ground to believe that a generator based on chaotic +%% iterations will probably be able to pass all the existing statistical batteries for pseudorandomness like +%% the NIST one. The following subsections, in which we prove that defective generators have their +%% statistical properties improved by chaotic iterations, show that such an assumption is true. + +%% \subsection{Details of some Existing Generators} + +%% The list of defective PRNGs we will use +%% as inputs for the statistical tests to come is introduced here. + +%% Firstly, the simple linear congruency generators (LCGs) will be used. +%% They are defined by the following recurrence: +%% \begin{equation} +%% x^n = (ax^{n-1} + c)~mod~m, +%% \label{LCG} +%% \end{equation} +%% where $a$, $c$, and $x^0$ must be, among other things, non-negative and inferior to +%% $m$~\cite{LEcuyerS07}. In what follows, 2LCGs and 3LCGs refer to two (resp. three) +%% combinations of such LCGs. For further details, see~\cite{bfg12a:ip,combined_lcg}. + +%% Secondly, the multiple recursive generators (MRGs) which will be used, +%% are based on a linear recurrence of order +%% $k$, modulo $m$~\cite{LEcuyerS07}: +%% \begin{equation} +%% x^n = (a^1x^{n-1}+~...~+a^kx^{n-k})~mod~m . +%% \label{MRG} +%% \end{equation} +%% The combination of two MRGs (referred as 2MRGs) is also used in these experiments. + +%% Generators based on linear recurrences with carry will be regarded too. +%% This family of generators includes the add-with-carry (AWC) generator, based on the recurrence: +%% \begin{equation} +%% \label{AWC} +%% \begin{array}{l} +%% x^n = (x^{n-r} + x^{n-s} + c^{n-1})~mod~m, \\ +%% c^n= (x^{n-r} + x^{n-s} + c^{n-1}) / m, \end{array}\end{equation} +%% the SWB generator, having the recurrence: +%% \begin{equation} +%% \label{SWB} +%% \begin{array}{l} +%% x^n = (x^{n-r} - x^{n-s} - c^{n-1})~mod~m, \\ +%% c^n=\left\{ +%% \begin{array}{l} +%% 1 ~~~~~\text{if}~ (x^{i-r} - x^{i-s} - c^{i-1})<0\\ +%% 0 ~~~~~\text{else},\end{array} \right. \end{array}\end{equation} +%% and the SWC generator, which is based on the following recurrence: +%% \begin{equation} +%% \label{SWC} +%% \begin{array}{l} +%% x^n = (a^1x^{n-1} \oplus ~...~ \oplus a^rx^{n-r} \oplus c^{n-1}) ~ mod ~ 2^w, \\ +%% c^n = (a^1x^{n-1} \oplus ~...~ \oplus a^rx^{n-r} \oplus c^{n-1}) ~ / ~ 2^w. \end{array}\end{equation} + +%% Then the generalized feedback shift register (GFSR) generator has been implemented, that is: +%% \begin{equation} +%% x^n = x^{n-r} \oplus x^{n-k} . +%% \label{GFSR} +%% \end{equation} + + +%% Finally, the nonlinear inversive (INV) generator~\cite{LEcuyerS07} has been studied, which is: + +%% \begin{equation} +%% \label{INV} +%% \begin{array}{l} +%% x^n=\left\{ +%% \begin{array}{ll} +%% (a^1 + a^2 / z^{n-1})~mod~m & \text{if}~ z^{n-1} \neq 0 \\ +%% a^1 & \text{if}~ z^{n-1} = 0 .\end{array} \right. \end{array}\end{equation} + + + +%% \begin{table} +%% \renewcommand{\arraystretch}{1.3} +%% \caption{TestU01 Statistical Test Failures} +%% \label{TestU011} +%% \centering +%% \begin{tabular}{lccccc} +%% \toprule +%% Test name &Tests& Logistic & XORshift & ISAAC\\ +%% Rabbit & 38 &21 &14 &0 \\ +%% Alphabit & 17 &16 &9 &0 \\ +%% Pseudo DieHARD &126 &0 &2 &0 \\ +%% FIPS\_140\_2 &16 &0 &0 &0 \\ +%% SmallCrush &15 &4 &5 &0 \\ +%% Crush &144 &95 &57 &0 \\ +%% Big Crush &160 &125 &55 &0 \\ \hline +%% Failures & &261 &146 &0 \\ +%% \bottomrule +%% \end{tabular} +%% \end{table} + + + +%% \begin{table} +%% \renewcommand{\arraystretch}{1.3} +%% \caption{TestU01 Statistical Test Failures for Old CI algorithms ($\mathsf{N}=4$)} +%% \label{TestU01 for Old CI} +%% \centering +%% \begin{tabular}{lcccc} +%% \toprule +%% \multirow{3}*{Test name} & \multicolumn{4}{c}{Old CI}\\ +%% &Logistic& XORshift& ISAAC&ISAAC \\ +%% &+& +& + & + \\ +%% &Logistic& XORshift& XORshift&ISAAC \\ \cmidrule(r){2-5} +%% Rabbit &7 &2 &0 &0 \\ +%% Alphabit & 3 &0 &0 &0 \\ +%% DieHARD &0 &0 &0 &0 \\ +%% FIPS\_140\_2 &0 &0 &0 &0 \\ +%% SmallCrush &2 &0 &0 &0 \\ +%% Crush &47 &4 &0 &0 \\ +%% Big Crush &79 &3 &0 &0 \\ \hline +%% Failures &138 &9 &0 &0 \\ +%% \bottomrule +%% \end{tabular} +%% \end{table} + + + + + +%% \subsection{Statistical tests} +%% \label{Security analysis} + +%% Three batteries of tests are reputed and regularly used +%% to evaluate the statistical properties of newly designed pseudorandom +%% number generators. These batteries are named DieHard~\cite{Marsaglia1996}, +%% the NIST suite~\cite{ANDREW2008}, and the most stringent one called +%% TestU01~\cite{LEcuyerS07}, which encompasses the two other batteries. + + + +%% \label{Results and discussion} +%% \begin{table*} +%% \renewcommand{\arraystretch}{1.3} +%% \caption{NIST and DieHARD tests suite passing rates for PRNGs without CI} +%% \label{NIST and DieHARD tests suite passing rate the for PRNGs without CI} +%% \centering +%% \begin{tabular}{|l||c|c|c|c|c|c|c|c|c|c|} +%% \hline\hline +%% Types of PRNGs & \multicolumn{2}{c|}{Linear PRNGs} & \multicolumn{4}{c|}{Lagged PRNGs} & \multicolumn{1}{c|}{ICG PRNGs} & \multicolumn{3}{c|}{Mixed PRNGs}\\ \hline +%% \backslashbox{\textbf{$Tests$}} {\textbf{$PRNG$}} & LCG& MRG& AWC & SWB & SWC & GFSR & INV & LCG2& LCG3& MRG2 \\ \hline +%% NIST & 11/15 & 14/15 &\textbf{15/15} & \textbf{15/15} & 14/15 & 14/15 & 14/15 & 14/15& 14/15& 14/15 \\ \hline +%% DieHARD & 16/18 & 16/18 & 15/18 & 16/18 & \textbf{18/18} & 16/18 & 16/18 & 16/18& 16/18& 16/18\\ \hline +%% \end{tabular} +%% \end{table*} + +%% Table~\ref{NIST and DieHARD tests suite passing rate the for PRNGs without CI} shows the +%% results on the two first batteries recalled above, indicating that all the PRNGs presented +%% in the previous section +%% cannot pass all these tests. In other words, the statistical quality of these PRNGs cannot +%% fulfill the up-to-date standards presented previously. We have shown in~\cite{bfg12a:ip} that the use of chaotic +%% iterations can solve this issue. +%% %More precisely, to +%% %illustrate the effects of chaotic iterations on these defective PRNGs, experiments have been divided in three parts~\cite{bfg12a:ip}: +%% %\begin{enumerate} +%% % \item \textbf{Single CIPRNG}: The PRNGs involved in CI computing are of the same category. +%% % \item \textbf{Mixed CIPRNG}: Two different types of PRNGs are mixed during the chaotic iterations process. +%% % \item \textbf{Multiple CIPRNG}: The generator is obtained by repeating the composition of the iteration function as follows: $x^0\in \mathds{B}^{\mathsf{N}}$, and $\forall n\in \mathds{N}^{\ast },\forall i\in \llbracket1;\mathsf{N}\rrbracket, x_i^n=$ +%% %\begin{equation} +%% %\begin{array}{l} +%% %\left\{ +%% %\begin{array}{l} +%% %x_i^{n-1}~~~~~\text{if}~S^n\neq i \\ +%% %\forall j\in \llbracket1;\mathsf{m}\rrbracket,f^m(x^{n-1})_{S^{nm+j}}~\text{if}~S^{nm+j}=i.\end{array} \right. \end{array} +%% %\end{equation} +%% %$m$ is called the \emph{functional power}. +%% %\end{enumerate} +%% % +%% The obtained results are reproduced in Table +%% \ref{NIST and DieHARD tests suite passing rate the for single CIPRNGs}. +%% The scores written in boldface indicate that all the tests have been passed successfully, whereas an +%% asterisk ``*'' means that the considered passing rate has been improved. +%% The improvements are obvious for both the ``Old CI'' and the ``New CI'' generators. +%% Concerning the ``Xor CI PRNG'', the score is less spectacular. Because of a large speed improvement, the statistics +%% are not as good as for the two other versions of these CIPRNGs. +%% However 8 tests have been improved (with no deflation for the other results). + + +%% \begin{table*} +%% \renewcommand{\arraystretch}{1.3} +%% \caption{NIST and DieHARD tests suite passing rates for PRNGs with CI} +%% \label{NIST and DieHARD tests suite passing rate the for single CIPRNGs} +%% \centering +%% \begin{tabular}{|l||c|c|c|c|c|c|c|c|c|c|c|c|} +%% \hline +%% Types of PRNGs & \multicolumn{2}{c|}{Linear PRNGs} & \multicolumn{4}{c|}{Lagged PRNGs} & \multicolumn{1}{c|}{ICG PRNGs} & \multicolumn{3}{c|}{Mixed PRNGs}\\ \hline +%% \backslashbox{\textbf{$Tests$}} {\textbf{$Single~CIPRNG$}} & LCG & MRG & AWC & SWB & SWC & GFSR & INV& LCG2 & LCG3& MRG2 \\ \hline\hline +%% Old CIPRNG\\ \hline \hline +%% NIST & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} & \textbf{15/15} & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} *& \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} \\ \hline +%% DieHARD & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} & \textbf{18/18} * & \textbf{18/18} *& \textbf{18/18} * & \textbf{18/18} *& \textbf{18/18} * \\ \hline +%% New CIPRNG\\ \hline \hline +%% NIST & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} & \textbf{15/15} & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} *& \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} \\ \hline +%% DieHARD & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} *& \textbf{18/18} *\\ \hline +%% Xor CIPRNG\\ \hline\hline +%% NIST & 14/15*& \textbf{15/15} * & \textbf{15/15} & \textbf{15/15} & 14/15 & \textbf{15/15} * & 14/15& \textbf{15/15} * & \textbf{15/15} *& \textbf{15/15} \\ \hline +%% DieHARD & 16/18 & 16/18 & 17/18* & \textbf{18/18} * & \textbf{18/18} & \textbf{18/18} * & 16/18 & 16/18 & 16/18& 16/18\\ \hline +%% \end{tabular} +%% \end{table*} + + +%% We have then investigated in~\cite{bfg12a:ip} if it were possible to improve +%% the statistical behavior of the Xor CI version by combining more than one +%% $\oplus$ operation. Results are summarized in Table~\ref{threshold}, illustrating +%% the progressive increasing effects of chaotic iterations, when giving time to chaos to get settled in. +%% Thus rapid and perfect PRNGs, regarding the NIST and DieHARD batteries, can be obtained +%% using chaotic iterations on defective generators. + +%% \begin{table*} +%% \renewcommand{\arraystretch}{1.3} +%% \caption{Number of $\oplus$ operations to pass the whole NIST and DieHARD batteries} +%% \label{threshold} +%% \centering +%% \begin{tabular}{|l||c|c|c|c|c|c|c|c|} +%% \hline +%% Inputted $PRNG$ & LCG & MRG & SWC & GFSR & INV& LCG2 & LCG3 & MRG2 \\ \hline\hline +%% Threshold value $m$& 19 & 7 & 2& 1 & 11& 9& 3& 4\\ \hline\hline +%% \end{tabular} +%% \end{table*} + +%% Finally, the TestU01 battery has been launched on three well-known generators +%% (a logistic map, a simple XORshift, and the cryptographically secure ISAAC, +%% see Table~\ref{TestU011}). These results can be compared with +%% Table~\ref{TestU01 for Old CI}, which gives the scores obtained by the +%% Old CI PRNG that has received these generators. +%% The obvious improvement speaks for itself, and together with the other +%% results recalled in this section, it reinforces the opinion that a strong +%% correlation between topological properties and statistical behavior exists. + + +%% The next subsection will now give a concrete original implementation of the Xor CI PRNG, the +%% fastest generator in the chaotic iteration based family. In the remainder, +%% this generator will be simply referred to as CIPRNG, or ``the proposed PRNG'', if this statement does not +%% raise ambiguity. + + +\section{First Efficient Implementation of a PRNG based on Chaotic Iterations} +\label{sec:efficient PRNG} +% +%Based on the proof presented in the previous section, it is now possible to +%improve the speed of the generator formerly presented in~\cite{bgw09:ip,guyeux10}. +%The first idea is to consider +%that the provided strategy is a pseudorandom Boolean vector obtained by a +%given PRNG. +%An iteration of the system is simply the bitwise exclusive or between +%the last computed state and the current strategy. +%Topological properties of disorder exhibited by chaotic +%iterations can be inherited by the inputted generator, we hope by doing so to +%obtain some statistical improvements while preserving speed. +% +%%RAPH : j'ai viré tout ca +%% Let us give an example using 16-bits numbers, to clearly understand how the bitwise xor operations +%% are +%% done. +%% Suppose that $x$ and the strategy $S^i$ are given as +%% binary vectors. +%% Table~\ref{TableExemple} shows the result of $x \oplus S^i$. + +%% \begin{table} +%% \begin{scriptsize} +%% $$ +%% \begin{array}{|cc|cccccccccccccccc|} +%% \hline +%% x &=&1&0&1&1&1&0&1&0&1&0&0&1&0&0&1&0\\ +%% \hline +%% S^i &=&0&1&1&0&0&1&1&0&1&1&1&0&0&1&1&1\\ +%% \hline +%% x \oplus S^i&=&1&1&0&1&1&1&0&0&0&1&1&1&0&1&0&1\\ +%% \hline + +%% \hline +%% \end{array} +%% $$ +%% \end{scriptsize} +%% \caption{Example of an arbitrary round of the proposed generator} +%% \label{TableExemple} +%% \end{table} + + + + +\lstset{language=C,caption={C code of the sequential PRNG based on chaotic iterations},label={algo:seqCIPRNG}} +\begin{small} +\begin{lstlisting} + +unsigned int CIPRNG() { + static unsigned int x = 123123123; + unsigned long t1 = xorshift(); + unsigned long t2 = xor128(); + unsigned long t3 = xorwow(); + x = x^(unsigned int)t1; + x = x^(unsigned int)(t2>>32); + x = x^(unsigned int)(t3>>32); + x = x^(unsigned int)t2; + x = x^(unsigned int)(t1>>32); + x = x^(unsigned int)t3; + return x; +} +\end{lstlisting} +\end{small} -The convergence of sequences according to $D$ is not the same than the usual convergence related to the Euclidian metric. For instance, if $x^n \to x$ according to $D$, then necessarily the integral part of each $x^n$ is equal to the integral part of $x$ (at least after a given threshold), and the decimal part of $x^n$ corresponds to the one of $x$ ``as far as required''. -To illustrate this fact, a comparison between $D$ and the Euclidian distance is given Figure \ref{fig:comparaison de distances}. These illustrations show that $D$ is richer and more refined than the Euclidian distance, and thus is more precise. -\begin{figure}[t] -\begin{center} - \subfigure[Function $x \to dist(x;1,234) $ on the interval $(0;5)$.]{\includegraphics[scale=.35]{DvsEuclidien.pdf}}\quad - \subfigure[Function $x \to dist(x;3) $ on the interval $(0;5)$.]{\includegraphics[scale=.35]{DvsEuclidien2.pdf}} -\end{center} -\caption{Comparison between $D$ (in blue) and the Euclidian distane (in green).} -\label{fig:comparaison de distances} -\end{figure} +In Listing~\ref{algo:seqCIPRNG} a sequential version of the proposed PRNG based +on chaotic iterations is presented. The xor operator is represented by +\textasciicircum. This function uses three classical 64-bits PRNGs, namely the +\texttt{xorshift}, the \texttt{xor128}, and the +\texttt{xorwow}~\cite{Marsaglia2003}. In the following, we call them ``xor-like +PRNGs''. As each xor-like PRNG uses 64-bits whereas our proposed generator +works with 32-bits, we use the command \texttt{(unsigned int)}, that selects the +32 least significant bits of a given integer, and the code \texttt{(unsigned + int)(t$>>$32)} in order to obtain the 32 most significant bits of \texttt{t}. +Thus producing a pseudorandom number needs 6 xor operations with 6 32-bits numbers +that are provided by 3 64-bits PRNGs. This version successfully passes the +stringent BigCrush battery of tests~\cite{LEcuyerS07}. +At this point, we thus +have defined an efficient and statistically unbiased generator. Its speed is +directly related to the use of linear operations, but for the same reason, +this fast generator cannot be proven as secure. -\subsubsection{The semiconjugacy} +\section{Efficient PRNGs based on Chaotic Iterations on GPU} +\label{sec:efficient PRNG gpu} -It is now possible to define a topological semiconjugacy between $\mathcal{X}$ and an interval of $\mathds{R}$: +In order to take benefits from the computing power of GPU, a program +needs to have independent blocks of threads that can be computed +simultaneously. In general, the larger the number of threads is, the +more local memory is used, and the less branching instructions are +used (if, while, ...), the better the performances on GPU is. +Obviously, having these requirements in mind, it is possible to build +a program similar to the one presented in Listing +\ref{algo:seqCIPRNG}, which computes pseudorandom numbers on GPU. To +do so, we must firstly recall that in the CUDA~\cite{Nvid10} +environment, threads have a local identifier called +\texttt{ThreadIdx}, which is relative to the block containing +them. Furthermore, in CUDA, parts of the code that are executed by the GPU, are +called {\it kernels}. -\begin{theorem} -Chaotic iterations on the phase space $\mathcal{X}$ are simple iterations on $\mathds{R}$, which is illustrated by the semiconjugacy of the diagram bellow: -\begin{equation*} -\begin{CD} -\left(~\mathcal{S}_{10} \times\mathds{B}^{10}, d~\right) @>G_{f_0}>> \left(~\mathcal{S}_{10} \times\mathds{B}^{10}, d~\right)\\ - @V{\varphi}VV @VV{\varphi}V\\ -\left( ~\big[ 0, 2^{10} \big[, D~\right) @>>g> \left(~\big[ 0, 2^{10} \big[, D~\right) -\end{CD} -\end{equation*} -\end{theorem} -\begin{proof} -$\varphi$ has been constructed in order to be continuous and onto. -\end{proof} +\subsection{Naive Version for GPU} -In other words, $\mathcal{X}$ is approximately equal to $\big[ 0, 2^\mathsf{N} \big[$. + +It is possible to deduce from the CPU version a quite similar version adapted to GPU. +The simple principle consists in making each thread of the GPU computing the CPU version of our PRNG. +Of course, the three xor-like +PRNGs used in these computations must have different parameters. +In a given thread, these parameters are +randomly picked from another PRNGs. +The initialization stage is performed by the CPU. +To do it, the ISAAC PRNG~\cite{Jenkins96} is used to set all the +parameters embedded into each thread. + +The implementation of the three +xor-like PRNGs is straightforward when their parameters have been +allocated in the GPU memory. Each xor-like works with an internal +number $x$ that saves the last generated pseudorandom number. Additionally, the +implementation of the xor128, the xorshift, and the xorwow respectively require +4, 5, and 6 unsigned long as internal variables. + + +\begin{algorithm} +\begin{small} +\KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like +PRNGs in global memory\; +NumThreads: number of threads\;} +\KwOut{NewNb: array containing random numbers in global memory} +\If{threadIdx is concerned by the computation} { + retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\; + \For{i=1 to n} { + compute a new PRNG as in Listing\ref{algo:seqCIPRNG}\; + store the new PRNG in NewNb[NumThreads*threadIdx+i]\; + } + store internal variables in InternalVarXorLikeArray[threadIdx]\; +} +\end{small} +\caption{Main kernel of the GPU ``naive'' version of the PRNG based on chaotic iterations} +\label{algo:gpu_kernel} +\end{algorithm} +Algorithm~\ref{algo:gpu_kernel} presents a naive implementation of the proposed PRNG on +GPU. Due to the available memory in the GPU and the number of threads +used simultaneously, the number of random numbers that a thread can generate +inside a kernel is limited (\emph{i.e.}, the variable \texttt{n} in +algorithm~\ref{algo:gpu_kernel}). For instance, if $100,000$ threads are used and +if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}, +then the memory required to store all of the internals variables of both the xor-like +PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers} +and the pseudorandom numbers generated by our PRNG, is equal to $100,000\times ((4+5+6)\times +2+(1+100))=1,310,000$ 32-bits numbers, that is, approximately $52$Mb. +This generator is able to pass the whole BigCrush battery of tests, for all +the versions that have been tested depending on their number of threads +(called \texttt{NumThreads} in our algorithm, tested up to $5$ million). +\begin{remark} +The proposed algorithm has the advantage of manipulating independent +PRNGs, so this version is easily adaptable on a cluster of computers too. The only thing +to ensure is to use a single ISAAC PRNG. To achieve this requirement, a simple solution consists in +using a master node for the initialization. This master node computes the initial parameters +for all the different nodes involved in the computation. +\end{remark} -\subsection{Study of the chaotic iterations described as a real function} +\subsection{Improved Version for GPU} + +As GPU cards using CUDA have shared memory between threads of the same block, it +is possible to use this feature in order to simplify the previous algorithm, +i.e., to use less than 3 xor-like PRNGs. The solution consists in computing only +one xor-like PRNG by thread, saving it into the shared memory, and then to use the results +of some other threads in the same block of threads. In order to define which +thread uses the result of which other one, we can use a combination array that +contains the indexes of all threads and for which a combination has been +performed. + +In Algorithm~\ref{algo:gpu_kernel2}, two combination arrays are used. The +variable \texttt{offset} is computed using the value of +\texttt{combination\_size}. Then we can compute \texttt{o1} and \texttt{o2} +representing the indexes of the other threads whose results are used by the +current one. In this algorithm, we consider that a 32-bits xor-like PRNG has +been chosen. In practice, we use the xor128 proposed in~\cite{Marsaglia2003} in +which unsigned longs (64 bits) have been replaced by unsigned integers (32 +bits). + +This version can also pass the whole {\it BigCrush} battery of tests. + +\begin{algorithm} +\begin{small} +\KwIn{InternalVarXorLikeArray: array with internal variables of 1 xor-like PRNGs +in global memory\; +NumThreads: Number of threads\; +array\_comb1, array\_comb2: Arrays containing combinations of size combination\_size\;} + +\KwOut{NewNb: array containing random numbers in global memory} +\If{threadId is concerned} { + retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\; + offset = threadIdx\%combination\_size\; + o1 = threadIdx-offset+array\_comb1[offset]\; + o2 = threadIdx-offset+array\_comb2[offset]\; + \For{i=1 to n} { + t=xor-like()\; + t=t\textasciicircum shmem[o1]\textasciicircum shmem[o2]\; + shared\_mem[threadId]=t\; + x = x\textasciicircum t\; + + store the new PRNG in NewNb[NumThreads*threadId+i]\; + } + store internal variables in InternalVarXorLikeArray[threadId]\; +} +\end{small} +\caption{Main kernel for the chaotic iterations based PRNG GPU efficient +version\label{IR}} +\label{algo:gpu_kernel2} +\end{algorithm} + +\subsection{Chaos Evaluation of the Improved Version} + +A run of Algorithm~\ref{algo:gpu_kernel2} consists in an operation ($x=x\oplus t$) having +the form of Equation~\ref{equation Oplus}, which is equivalent to the iterative +system of Eq.~\ref{eq:generalIC}. That is, an iteration of the general chaotic +iterations is realized between the last stored value $x$ of the thread and a strategy $t$ +(obtained by a bitwise exclusive or between a value provided by a xor-like() call +and two values previously obtained by two other threads). +To be certain that we are in the framework of Theorem~\ref{t:chaos des general}, +we must guarantee that this dynamical system iterates on the space +$\mathcal{X} = \mathcal{P}\left(\llbracket 1, \mathsf{N} \rrbracket\right)^\mathds{N}\times\mathds{B}^\mathsf{N}$. +The left term $x$ obviously belongs to $\mathds{B}^ \mathsf{N}$. +To prevent from any flaws of chaotic properties, we must check that the right +term (the last $t$), corresponding to the strategies, can possibly be equal to any +integer of $\llbracket 1, \mathsf{N} \rrbracket$. + +Such a result is obvious, as for the xor-like(), all the +integers belonging into its interval of definition can occur at each iteration, and thus the +last $t$ respects the requirement. Furthermore, it is possible to +prove by an immediate mathematical induction that, as the initial $x$ +is uniformly distributed (it is provided by a cryptographically secure PRNG), +the two other stored values shmem[o1] and shmem[o2] are uniformly distributed too, +(this is the induction hypothesis), and thus the next $x$ is finally uniformly distributed. + +Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general +chaotic iterations presented previously, and for this reason, it satisfies the +Devaney's formulation of a chaotic behavior. +\section{Experiments} +\label{sec:experiments} + +Different experiments have been performed in order to measure the generation +speed. We have used a first computer equipped with a Tesla C1060 NVidia GPU card +and an +Intel Xeon E5530 cadenced at 2.40 GHz, and +a second computer equipped with a smaller CPU and a GeForce GTX 280. +All the +cards have 240 cores. + +In Figure~\ref{fig:time_xorlike_gpu} we compare the quantity of pseudorandom numbers +generated per second with various xor-like based PRNGs. In this figure, the optimized +versions use the {\it xor64} described in~\cite{Marsaglia2003}, whereas the naive versions +embed the three xor-like PRNGs described in Listing~\ref{algo:seqCIPRNG}. In +order to obtain the optimal performances, the storage of pseudorandom numbers +into the GPU memory has been removed. This step is time consuming and slows down the numbers +generation. Moreover this storage is completely +useless, in case of applications that consume the pseudorandom +numbers directly after generation. We can see that when the number of threads is greater +than approximately 30,000 and lower than 5 million, the number of pseudorandom numbers generated +per second is almost constant. With the naive version, this value ranges from 2.5 to +3GSamples/s. With the optimized version, it is approximately equal to +20GSamples/s. Finally we can remark that both GPU cards are quite similar, but in +practice, the Tesla C1060 has more memory than the GTX 280, and this memory +should be of better quality. +As a comparison, Listing~\ref{algo:seqCIPRNG} leads to the generation of about +138MSample/s when using one core of the Xeon E5530. -\begin{figure}[t] +\begin{figure}[htbp] \begin{center} - \subfigure[ICs on the interval $(0,9;1)$.]{\includegraphics[scale=.35]{ICs09a1.pdf}}\quad - \subfigure[ICs on the interval $(0,7;1)$.]{\includegraphics[scale=.35]{ICs07a95.pdf}}\\ - \subfigure[ICs on the interval $(0,5;1)$.]{\includegraphics[scale=.35]{ICs05a1.pdf}}\quad - \subfigure[ICs on the interval $(0;1)$]{\includegraphics[scale=.35]{ICs0a1.pdf}} + \includegraphics[width=\columnwidth]{curve_time_xorlike_gpu.pdf} \end{center} -\caption{Representation of the chaotic iterations.} -\label{fig:ICs} +\caption{Quantity of pseudorandom numbers generated per second with the xorlike-based PRNG} +\label{fig:time_xorlike_gpu} \end{figure} -\begin{figure}[t] -\begin{center} - \subfigure[ICs on the interval $(510;514)$.]{\includegraphics[scale=.35]{ICs510a514.pdf}}\quad - \subfigure[ICs on the interval $(1000;1008)$]{\includegraphics[scale=.35]{ICs1000a1008.pdf}} -\end{center} -\caption{ICs on small intervals.} -\label{fig:ICs2} -\end{figure} -\begin{figure}[t] +In Figure~\ref{fig:time_bbs_gpu} we highlight the performances of the optimized +BBS-based PRNG on GPU. On the Tesla C1060 we obtain approximately 700MSample/s +and on the GTX 280 about 670MSample/s, which is obviously slower than the +xorlike-based PRNG on GPU. However, we will show in the next sections that this +new PRNG has a strong level of security, which is necessarily paid by a speed +reduction. + +\begin{figure}[htbp] \begin{center} - \subfigure[ICs on the interval $(0;16)$.]{\includegraphics[scale=.3]{ICs0a16.pdf}}\quad - \subfigure[ICs on the interval $(40;70)$.]{\includegraphics[scale=.45]{ICs40a70.pdf}}\quad + \includegraphics[width=\columnwidth]{curve_time_bbs_gpu.pdf} \end{center} -\caption{General aspect of the chaotic iterations.} -\label{fig:ICs3} +\caption{Quantity of pseudorandom numbers generated per second using the BBS-based PRNG} +\label{fig:time_bbs_gpu} \end{figure} +All these experiments allow us to conclude that it is possible to +generate a very large quantity of pseudorandom numbers statistically perfect with the xor-like version. +To a certain extend, it is also the case with the secure BBS-based version, the speed deflation being +explained by the fact that the former version has ``only'' +chaotic properties and statistical perfection, whereas the latter is also cryptographically secure, +as it is shown in the next sections. -We have written a Python program to represent the chaotic iterations with the vectorial negation on the real line $\mathds{R}$. Various representations of these CIs are given in Figures \ref{fig:ICs}, \ref{fig:ICs2} and \ref{fig:ICs3}. It can be remarked that the function $g$ is a piecewise linear function: it is linear on each interval having the form $\left[ \dfrac{n}{10}, \dfrac{n+1}{10}\right[$, $n \in \llbracket 0;2^{10}\times 10 \rrbracket$ and its slope is equal to 10. Let us justify these claims: -\begin{proposition} -\label{Prop:derivabilite des ICs} -Chaotic iterations $g$ defined on $\mathds{R}$ have derivatives of all orders on $\big[ 0, 2^{10} \big[$, except on the 10241 points in $I$ defined by $\left\{ \dfrac{n}{10} ~\big/~ n \in \llbracket 0;2^{10}\times 10\rrbracket \right\}$. -Furthermore, on each interval of the form $\left[ \dfrac{n}{10}, \dfrac{n+1}{10}\right[$, with $n \in \llbracket 0;2^{10}\times 10 \rrbracket$, $g$ is a linear function, having a slope equal to 10: $\forall x \notin I, g'(x)=10$. -\end{proposition} -\begin{proof} -Let $I_n = \left[ \dfrac{n}{10}, \dfrac{n+1}{10}\right[$, with $n \in \llbracket 0;2^{10}\times 10 \rrbracket$. All the points of $I_n$ have the same integral prat $e$ and the same decimal part $s^0$: on the set $I_n$, functions $e(x)$ and $x \mapsto s(x)^0$ of Definition \ref{def:e et s} only depend on $n$. So all the images $g(x)$ of these points $x$: -\begin{itemize} -\item Have the same integral part, which is $e$, except probably the bit number $s^0$. In other words, this integer has approximately the same binary decomposition than $e$, the sole exception being the digit $s^0$ (this number is then either $e+2^{10-s^0}$ or $e-2^{10-s^0}$, depending on the parity of $s^0$, \emph{i.e.}, it is equal to $e+(-1)^{s^0}\times 2^{10-s^0}$). -\item A shift to the left has been applied to the decimal part $y$, losing by doing so the common first digit $s^0$. In other words, $y$ has been mapped into $10\times y - s^0$. -\end{itemize} -To sum up, the action of $g$ on the points of $I$ is as follows: first, make a multiplication by 10, and second, add the same constant to each term, which is $\dfrac{1}{10}\left(e+(-1)^{s^0}\times 2^{10-s^0}\right)-s^0$. -\end{proof} -\begin{remark} -Finally, chaotic iterations are elements of the large family of functions that are both chaotic and piecewise linear (like the tent map). -\end{remark} +\section{Security Analysis} -\subsection{Comparison of the two metrics on $\big[ 0, 2^\mathsf{N} \big[$} +This section is dedicated to the security analysis of the + proposed PRNGs.%, both from a theoretical and from a practical point of view. -The two propositions bellow allow to compare our two distances on $\big[ 0, 2^\mathsf{N} \big[$: +%\subsection{Theoretical Proof of Security} +\label{sec:security analysis} -\begin{proposition} -Id: $\left(~\big[ 0, 2^\mathsf{N} \big[,\Delta~\right) \to \left(~\big[ 0, 2^\mathsf{N} \big[, D~\right)$ is not continuous. -\end{proposition} - -\begin{proof} -The sequence $x^n = 1,999\hdots 999$ constituted by $n$ 9 as decimal part, is such that: -\begin{itemize} -\item $\Delta (x^n,2) \to 0.$ -\item But $D(x^n,2) \geqslant 1$, then $D(x^n,2)$ does not converge to 0. -\end{itemize} - -The sequential characterization of the continuity concludes the demonstration. -\end{proof} +The standard definition + of {\it indistinguishability} used here is the classical one as defined for + instance in~\cite[chapter~3]{Goldreich}. + This property shows that predicting the future results of the PRNG + cannot be done in a reasonable time compared to the generation time. It is important to emphasize that this + is a relative notion between breaking time and the sizes of the + keys/seeds. Of course, if small keys or seeds are chosen, the system can + be broken in practice. But it also means that if the keys/seeds are large + enough, the system is secured. +As a complement, an example of a concrete practical evaluation of security +is outlined in Annex~\ref{A-sec:Practicak evaluation}. +In this section the concatenation of two strings $u$ and $v$ is classically +denoted by $uv$. +In a cryptographic context, a pseudorandom generator is a deterministic +algorithm $G$ transforming strings into strings and such that, for any +seed $s$ of length $m$, $G(s)$ (the output of $G$ on the input $s$) has size +$\ell_G(m)$ with $\ell_G(m)>m$. +The notion of {\it secure} PRNGs can now be defined as follows. +\begin{definition} +A cryptographic PRNG $G$ is secure if for any probabilistic polynomial time +algorithm $D$, for any positive polynomial $p$, and for all sufficiently +large $m$'s, +$$| \mathrm{Pr}[D(G(U_m))=1]-Pr[D(U_{\ell_G(m)})=1]|< \frac{1}{p(m)},$$ +where $U_r$ is the uniform distribution over $\{0,1\}^r$ and the +probabilities are taken over $U_m$, $U_{\ell_G(m)}$ as well as over the +internal coin tosses of $D$. +\end{definition} -A contrario: +Intuitively, it means that there is no polynomial time algorithm that can +distinguish a perfect uniform random generator from $G$ with a non negligible +probability. An equivalent formulation of this well-known security property +means that it is possible \emph{in practice} to predict the next bit of the +generator, knowing all the previously produced ones. The interested reader is +referred to~\cite[chapter~3]{Goldreich} for more information. Note that it is +quite easily possible to change the function $\ell$ into any polynomial function +$\ell^\prime$ satisfying $\ell^\prime(m)>m)$~\cite[Chapter 3.3]{Goldreich}. + +The generation schema developed in (\ref{equation Oplus}) is based on a +pseudorandom generator. Let $H$ be a cryptographic PRNG. We may assume, +without loss of generality, that for any string $S_0$ of size $N$, the size +of $H(S_0)$ is $kN$, with $k>2$. It means that $\ell_H(N)=kN$. +Let $S_1,\ldots,S_k$ be the +strings of length $N$ such that $H(S_0)=S_1 \ldots S_k$ ($H(S_0)$ is the concatenation of +the $S_i$'s). The cryptographic PRNG $X$ defined in (\ref{equation Oplus}) +is the algorithm mapping any string of length $2N$ $x_0S_0$ into the string +$(x_0\oplus S_0 \oplus S_1)(x_0\oplus S_0 \oplus S_1\oplus S_2)\ldots +(x_o\bigoplus_{i=0}^{i=k}S_i)$. One in particular has $\ell_{X}(2N)=kN=\ell_H(N)$. +We claim now that if this PRNG is secure, +then the new one is secure too. \begin{proposition} -Id: $\left(~\big[ 0, 2^\mathsf{N} \big[,D~\right) \to \left(~\big[ 0, 2^\mathsf{N} \big[, \Delta ~\right)$ is a continuous fonction. +\label{cryptopreuve} +If $H$ is a secure cryptographic PRNG, then $X$ is a secure cryptographic +PRNG too. \end{proposition} \begin{proof} -If $D(x^n,x) \to 0$, then $D_e(x^n,x) = 0$ at least for $n$ larger than a given threshold, because $D_e$ only returns integers. So, after this threshold, the integral parts of all the $x^n$ are equal to the integral part of $x$. - -Additionally, $D_s(x^n, x) \to 0$, then $\forall k \in \mathds{N}^*, \exists N_k \in \mathds{N}, n \geqslant N_k \Rightarrow D_s(x^n,x) \leqslant 10^{-k}$. This means that for all $k$, an index $N_k$ can be found such that, $\forall n \geqslant N_k$, all the $x^n$ have the same $k$ firsts digits, which are the digits of $x$. We can deduce the convergence $\Delta(x^n,x) \to 0$, and thus the result. +The proposition is proven by contraposition. Assume that $X$ is not +secure. By Definition, there exists a polynomial time probabilistic +algorithm $D$, a positive polynomial $p$, such that for all $k_0$ there exists +$N\geq \frac{k_0}{2}$ satisfying +$$| \mathrm{Pr}[D(X(U_{2N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)}.$$ +We describe a new probabilistic algorithm $D^\prime$ on an input $w$ of size +$kN$: +\begin{enumerate} +\item Decompose $w$ into $w=w_1\ldots w_{k}$, where each $w_i$ has size $N$. +\item Pick a string $y$ of size $N$ uniformly at random. +\item Compute $z=(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y + \bigoplus_{i=1}^{i=k} w_i).$ +\item Return $D(z)$. +\end{enumerate} + + +Consider for each $y\in \mathbb{B}^{kN}$ the function $\varphi_{y}$ +from $\mathbb{B}^{kN}$ into $\mathbb{B}^{kN}$ mapping $w=w_1\ldots w_k$ +(each $w_i$ has length $N$) to +$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y + \bigoplus_{i=1}^{i=k_1} w_i).$ By construction, one has for every $w$, +\begin{equation}\label{PCH-1} +D^\prime(w)=D(\varphi_y(w)), +\end{equation} +where $y$ is randomly generated. +Moreover, for each $y$, $\varphi_{y}$ is injective: if +$(y\oplus w_1)(y\oplus w_1\oplus w_2)\ldots (y\bigoplus_{i=1}^{i=k_1} +w_i)=(y\oplus w_1^\prime)(y\oplus w_1^\prime\oplus w_2^\prime)\ldots +(y\bigoplus_{i=1}^{i=k} w_i^\prime)$, then for every $1\leq j\leq k$, +$y\bigoplus_{i=1}^{i=j} w_i^\prime=y\bigoplus_{i=1}^{i=j} w_i$. It follows, +by a direct induction, that $w_i=w_i^\prime$. Furthermore, since $\mathbb{B}^{kN}$ +is finite, each $\varphi_y$ is bijective. Therefore, and using (\ref{PCH-1}), +one has +$\mathrm{Pr}[D^\prime(U_{kN})=1]=\mathrm{Pr}[D(\varphi_y(U_{kN}))=1]$ and, +therefore, +\begin{equation}\label{PCH-2} +\mathrm{Pr}[D^\prime(U_{kN})=1]=\mathrm{Pr}[D(U_{kN})=1]. +\end{equation} + +Now, using (\ref{PCH-1}) again, one has for every $x$, +\begin{equation}\label{PCH-3} +D^\prime(H(x))=D(\varphi_y(H(x))), +\end{equation} +where $y$ is randomly generated. By construction, $\varphi_y(H(x))=X(yx)$, +thus +\begin{equation}%\label{PCH-3} %%RAPH : j'ai viré ce label qui existe déjà, il est 3 ligne avant +D^\prime(H(x))=D(yx), +\end{equation} +where $y$ is randomly generated. +It follows that + +\begin{equation}\label{PCH-4} +\mathrm{Pr}[D^\prime(H(U_{N}))=1]=\mathrm{Pr}[D(U_{2N})=1]. +\end{equation} + From (\ref{PCH-2}) and (\ref{PCH-4}), one can deduce that +there exists a polynomial time probabilistic +algorithm $D^\prime$, a positive polynomial $p$, such that for all $k_0$ there exists +$N\geq \frac{k_0}{2}$ satisfying +$$| \mathrm{Pr}[D(H(U_{N}))=1]-\mathrm{Pr}[D(U_{kN}=1]|\geq \frac{1}{p(2N)},$$ +proving that $H$ is not secure, which is a contradiction. \end{proof} -The conclusion of these propositions is that the proposed metric is more precise than the Euclidian distance, that is: -\begin{corollary} -$D$ is finer than the Euclidian distance $\Delta$. -\end{corollary} -This corollary can be reformulated as follows: +%\subsection{Practical Security Evaluation} +%\label{sec:Practicak evaluation} +%This subsection is given in Section +A example of a practical security evaluation is outlined in +Annex~\ref{A-sec:Practicak evaluation}. +%%RAF mis en annexe + +%% Pseudorandom generators based on Eq.~\eqref{equation Oplus} are thus cryptographically secure when +%% they are XORed with an already cryptographically +%% secure PRNG. But, as stated previously, +%% such a property does not mean that, whatever the +%% key size, no attacker can predict the next bit +%% knowing all the previously released ones. +%% However, given a key size, it is possible to +%% measure in practice the minimum duration needed +%% for an attacker to break a cryptographically +%% secure PRNG, if we know the power of his/her +%% machines. Such a concrete security evaluation +%% is related to the $(T,\varepsilon)-$security +%% notion, which is recalled and evaluated in what +%% follows, for the sake of completeness. + +%% Let us firstly recall that, +%% \begin{definition} +%% Let $\mathcal{D} : \mathds{B}^M \longrightarrow \mathds{B}$ be a probabilistic algorithm that runs +%% in time $T$. +%% Let $\varepsilon > 0$. +%% $\mathcal{D}$ is called a $(T,\varepsilon)-$distinguishing attack on pseudorandom +%% generator $G$ if + +%% \begin{flushleft} +%% $\left| Pr[\mathcal{D}(G(k)) = 1 \mid k \in_R \{0,1\}^\ell ]\right.$ +%% \end{flushleft} + +%% \begin{flushright} +%% $ - \left. Pr[\mathcal{D}(s) = 1 \mid s \in_R \mathds{B}^M ]\right| \geqslant \varepsilon,$ +%% \end{flushright} + +%% \noindent where the probability is taken over the internal coin flips of $\mathcal{D}$, and the notation +%% ``$\in_R$'' indicates the process of selecting an element at random and uniformly over the +%% corresponding set. +%% \end{definition} + +%% Let us recall that the running time of a probabilistic algorithm is defined to be the +%% maximum of the expected number of steps needed to produce an output, maximized +%% over all inputs; the expected number is averaged over all coin flips made by the algorithm~\cite{Knuth97}. +%% We are now able to define the notion of cryptographically secure PRNGs: + +%% \begin{definition} +%% A pseudorandom generator is $(T,\varepsilon)-$secure if there exists no $(T,\varepsilon)-$distinguishing attack on this pseudorandom generator. +%% \end{definition} + + + + + + + +%% Suppose now that the PRNG of Eq.~\eqref{equation Oplus} will work during +%% $M=100$ time units, and that during this period, +%% an attacker can realize $10^{12}$ clock cycles. +%% We thus wonder whether, during the PRNG's +%% lifetime, the attacker can distinguish this +%% sequence from a truly random one, with a probability +%% greater than $\varepsilon = 0.2$. +%% We consider that $N$ has 900 bits. + +%% Predicting the next generated bit knowing all the +%% previously released ones by Eq.~\eqref{equation Oplus} is obviously equivalent to predicting the +%% next bit in the BBS generator, which +%% is cryptographically secure. More precisely, it +%% is $(T,\varepsilon)-$secure: no +%% $(T,\varepsilon)-$distinguishing attack can be +%% successfully realized on this PRNG, if~\cite{Fischlin} +%% \begin{equation} +%% T \leqslant \dfrac{L(N)}{6 N (log_2(N))\varepsilon^{-2}M^2}-2^7 N \varepsilon^{-2} M^2 log_2 (8 N \varepsilon^{-1}M) +%% \label{mesureConcrete} +%% \end{equation} +%% where $M$ is the length of the output ($M=100$ in +%% our example), and $L(N)$ is equal to +%% $$ +%% 2.8\times 10^{-3} exp \left(1.9229 \times (N ~ln~ 2)^\frac{1}{3} \times (ln(N~ln~ 2))^\frac{2}{3}\right) +%% $$ +%% is the number of clock cycles to factor a $N-$bit +%% integer. + + + + +%% A direct numerical application shows that this attacker +%% cannot achieve its $(10^{12},0.2)$ distinguishing +%% attack in that context. + + + +\section{Cryptographical Applications} + +\subsection{A Cryptographically Secure PRNG for GPU} +\label{sec:CSGPU} + +It is possible to build a cryptographically secure PRNG based on the previous +algorithm (Algorithm~\ref{algo:gpu_kernel2}). Due to Proposition~\ref{cryptopreuve}, +it simply consists in replacing +the {\it xor-like} PRNG by a cryptographically secure one. +We have chosen the Blum Blum Shub generator~\cite{BBS} (usually denoted by BBS) having the form: +$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers (these +prime numbers need to be congruent to 3 modulus 4). BBS is known to be +very slow and only usable for cryptographic applications. + + +The modulus operation is the most time consuming operation for current +GPU cards. So in order to obtain quite reasonable performances, it is +required to use only modulus on 32-bits integer numbers. Consequently +$x_n^2$ need to be lesser than $2^{32}$, and thus the number $M$ must be +lesser than $2^{16}$. So in practice we can choose prime numbers around +256 that are congruent to 3 modulus 4. With 32-bits numbers, only the +4 least significant bits of $x_n$ can be chosen (the maximum number of +indistinguishable bits is lesser than or equals to +$log_2(log_2(M))$). In other words, to generate a 32-bits number, we need to use +8 times the BBS algorithm with possibly different combinations of $M$. This +approach is not sufficient to be able to pass all the tests of TestU01, +as small values of $M$ for the BBS lead to + small periods. So, in order to add randomness we have proceeded with +the followings modifications. \begin{itemize} -\item The topology produced by $\Delta$ is a subset of the topology produced by $D$. -\item $D$ has more open sets than $\Delta$. -\item It is harder to converge for the topology $\tau_D$ inherited by $D$, than to converge with the one inherited by $\Delta$, which is denoted here by $\tau_\Delta$. +\item +Firstly, we define 16 arrangement arrays instead of 2 (as described in +Algorithm \ref{algo:gpu_kernel2}), but only 2 of them are used at each call of +the PRNG kernels. In practice, the selection of combination +arrays to be used is different for all the threads. It is determined +by using the three last bits of two internal variables used by BBS. +%This approach adds more randomness. +In Algorithm~\ref{algo:bbs_gpu}, +character \& is for the bitwise AND. Thus using \&7 with a number +gives the last 3 bits, thus providing a number between 0 and 7. +\item +Secondly, after the generation of the 8 BBS numbers for each thread, we +have a 32-bits number whose period is possibly quite small. So +to add randomness, we generate 4 more BBS numbers to +shift the 32-bits numbers, and add up to 6 new bits. This improvement is +described in Algorithm~\ref{algo:bbs_gpu}. In practice, the last 2 bits +of the first new BBS number are used to make a left shift of at most +3 bits. The last 3 bits of the second new BBS number are added to the +strategy whatever the value of the first left shift. The third and the +fourth new BBS numbers are used similarly to apply a new left shift +and add 3 new bits. +\item +Finally, as we use 8 BBS numbers for each thread, the storage of these +numbers at the end of the kernel is performed using a rotation. So, +internal variable for BBS number 1 is stored in place 2, internal +variable for BBS number 2 is stored in place 3, ..., and finally, internal +variable for BBS number 8 is stored in place 1. \end{itemize} +\begin{algorithm} +\begin{small} +\KwIn{InternalVarBBSArray: array with internal variables of the 8 BBS +in global memory\; +NumThreads: Number of threads\; +array\_comb: 2D Arrays containing 16 combinations (in first dimension) of size combination\_size (in second dimension)\; +array\_shift[4]=\{0,1,3,7\}\; +} -\subsection{Chaos of the chaotic iterations on $\mathds{R}$} -\label{chpt:Chaos des itérations chaotiques sur R} - - - -\subsubsection{Chaos according to Devaney} - -We have recalled previously that the chaotic iterations $\left(\Go, \mathcal{X}_d\right)$ are chaotic according to the formulation of Devaney. We can deduce that they are chaotic on $\mathds{R}$ too, when considering the order topology, because: +\KwOut{NewNb: array containing random numbers in global memory} +\If{threadId is concerned} { + retrieve data from InternalVarBBSArray[threadId] in local variables including shared memory and x\; + we consider that bbs1 ... bbs8 represent the internal states of the 8 BBS numbers\; + offset = threadIdx\%combination\_size\; + o1 = threadIdx-offset+array\_comb[bbs1\&7][offset]\; + o2 = threadIdx-offset+array\_comb[8+bbs2\&7][offset]\; + \For{i=1 to n} { + t$<<$=4\; + t|=BBS1(bbs1)\&15\; + ...\; + t$<<$=4\; + t|=BBS8(bbs8)\&15\; + \tcp{two new shifts} + shift=BBS3(bbs3)\&3\; + t$<<$=shift\; + t|=BBS1(bbs1)\&array\_shift[shift]\; + shift=BBS7(bbs7)\&3\; + t$<<$=shift\; + t|=BBS2(bbs2)\&array\_shift[shift]\; + t=t\textasciicircum shmem[o1]\textasciicircum shmem[o2]\; + shared\_mem[threadId]=t\; + x = x\textasciicircum t\; + + store the new PRNG in NewNb[NumThreads*threadId+i]\; + } + store internal variables in InternalVarXorLikeArray[threadId] using a rotation\; +} +\end{small} +\caption{main kernel for the BBS based PRNG GPU} +\label{algo:bbs_gpu} +\end{algorithm} + +In Algorithm~\ref{algo:bbs_gpu}, $n$ is for the quantity of random numbers that +a thread has to generate. The operation t<<=4 performs a left shift of 4 bits +on the variable $t$ and stores the result in $t$, and $BBS1(bbs1)\&15$ selects +the last four bits of the result of $BBS1$. Thus an operation of the form +$t<<=4; t|=BBS1(bbs1)\&15\;$ realizes in $t$ a left shift of 4 bits, and then +puts the 4 last bits of $BBS1(bbs1)$ in the four last positions of $t$. Let us +remark that the initialization $t$ is not a necessity as we fill it 4 bits by 4 +bits, until having obtained 32-bits. The two last new shifts are realized in +order to enlarge the small periods of the BBS used here, to introduce a kind of +variability. In these operations, we make twice a left shift of $t$ of \emph{at + most} 3 bits, represented by \texttt{shift} in the algorithm, and we put +\emph{exactly} the \texttt{shift} last bits from a BBS into the \texttt{shift} +last bits of $t$. For this, an array named \texttt{array\_shift}, containing the +correspondence between the shift and the number obtained with \texttt{shift} 1 +to make the \texttt{and} operation is used. For example, with a left shift of 0, +we make an and operation with 0, with a left shift of 3, we make an and +operation with 7 (represented by 111 in binary mode). + +It should be noticed that this generator has once more the form $x^{n+1} = x^n \oplus S^n$, +where $S^n$ is referred in this algorithm as $t$: each iteration of this +PRNG ends with $x = x \wedge t$. This $S^n$ is only constituted +by secure bits produced by the BBS generator, and thus, due to +Proposition~\ref{cryptopreuve}, the resulted PRNG is +cryptographically secure. + +As stated before, even if the proposed PRNG is cryptocaphically +secure, it does not mean that such a generator +can be used as described here when attacks are +awaited. The problem is to determine the minimum +time required for an attacker, with a given +computational power, to predict under a probability +lower than 0.5 the $n+1$th bit, knowing the $n$ +previous ones. The proposed GPU generator will be +useful in a security context, at least in some +situations where a secret protected by a pseudorandom +keystream is rapidly obsolete, if this time to +predict the next bit is large enough when compared +to both the generation and transmission times. +It is true that the prime numbers used in the last +section are very small compared to up-to-date +security recommendations. However the attacker has not +access to each BBS, but to the output produced +by Algorithm~\ref{algo:bbs_gpu}, which is far +more complicated than a simple BBS. Indeed, to +determine if this cryptographically secure PRNG +on GPU can be useful in security context with the +proposed parameters, or if it is only a very fast +and statistically perfect generator on GPU, its +$(T,\varepsilon)-$security must be determined, and +a formulation similar to Annex~\ref{A-sec:Practicak evaluation} %.Eq.\eqref{mesureConcrete} +must be established. Authors +hope to achieve this difficult task in a future +work. + + +\subsection{Toward a Cryptographically Secure and Chaotic Asymmetric Cryptosystem} +\label{Blum-Goldwasser} +We finish this research work by giving some thoughts about the use of +the proposed PRNG in an asymmetric cryptosystem. +This first approach will be further investigated in a future work. + +\subsubsection{Recalls of the Blum-Goldwasser Probabilistic Cryptosystem} + +The Blum-Goldwasser cryptosystem is a cryptographically secure asymmetric key encryption algorithm +proposed in 1984~\cite{Blum:1985:EPP:19478.19501}. The encryption algorithm +implements a XOR-based stream cipher using the BBS PRNG, in order to generate +the keystream. Decryption is done by obtaining the initial seed thanks to +the final state of the BBS generator and the secret key, thus leading to the + reconstruction of the keystream. + +The key generation consists in generating two prime numbers $(p,q)$, +randomly and independently of each other, that are + congruent to 3 mod 4, and to compute the modulus $N=pq$. +The public key is $N$, whereas the secret key is the factorization $(p,q)$. + + +Suppose Bob wishes to send a string $m=(m_0, \dots, m_{L-1})$ of $L$ bits to Alice: +\begin{enumerate} +\item Bob picks an integer $r$ randomly in the interval $\llbracket 1,N\rrbracket$ and computes $x_0 = r^2~mod~N$. +\item He uses the BBS to generate the keystream of $L$ pseudorandom bits $(b_0, \dots, b_{L-1})$, as follows. For $i=0$ to $L-1$, \begin{itemize} -\item $\left(\Go, \mathcal{X}_d\right)$ and $\left(g, \big[ 0, 2^{10} \big[_D\right)$ are semiconjugate by $\varphi$, -\item Then $\left(g, \big[ 0, 2^{10} \big[_D\right)$ is a system chaotic according to Devaney, because the semiconjugacy preserve this character. -\item But the topology generated by $D$ is finer than the topology generated by the Euclidian distance $\Delta$ -- which is the order topology. -\item According to Theorem \ref{Th:chaos et finesse}, we can deduce that the chaotic iterations $g$ are indeed chaotic, as defined by Devaney, for the order topology on $\mathds{R}$. +\item $i=0$. +\item While $i \leqslant L-1$: +\begin{itemize} +\item Set $b_i$ equal to the least-significant\footnote{As signaled previously, BBS can securely output up to $\mathsf{N} = \lfloor log(log(N)) \rfloor$ of the least-significant bits of $x_i$ during each round.} bit of $x_i$, +\item $i=i+1$, +\item $x_i = (x_{i-1})^2~mod~N.$ \end{itemize} - -This result can be formulated as follows. - -\begin{theorem} -\label{th:IC et topologie de l'ordre} -The chaotic iterations $g$ on $\mathds{R}$ are chaotic according to the Devaney's formulation, when $\mathds{R}$ has his usual topology, which is the order topology. -\end{theorem} - -Indeed this result is weaker than the theorem establishing the chaos for the finer topology $d$. However the Theorem \ref{th:IC et topologie de l'ordre} still remains important. Indeed, we have studied in our previous works a set different from the usual set of study ($\mathcal{X}$ instead of $\mathds{R}$), in order to be as close as possible from the computer: the properties of disorder proved theoretically will then be preserved when computing. However, we could wonder whether this change does not lead to a disorder of a lower quality. In other words, have we replaced a situation of a good disorder lost when computing, to another situation of a disorder preserved but of bad quality. Theorem \ref{th:IC et topologie de l'ordre} prove exactly the contrary. - +\end{itemize} +\item The ciphertext is computed by XORing the plaintext bits $m$ with the keystream: $ c = (c_0, \dots, c_{L-1}) = m \oplus b$. This ciphertext is $[c, y]$, where $y=x_{0}^{2^{L}}~mod~N.$ +\end{enumerate} +When Alice receives $\left[(c_0, \dots, c_{L-1}), y\right]$, she can recover $m$ as follows: +\begin{enumerate} +\item Using the secret key $(p,q)$, she computes $r_p = y^{((p+1)/4)^{L}}~mod~p$ and $r_q = y^{((q+1)/4)^{L}}~mod~q$. +\item The initial seed can be obtained using the following procedure: $x_0=q(q^{-1}~{mod}~p)r_p + p(p^{-1}~{mod}~q)r_q~{mod}~N$. +\item She recomputes the bit-vector $b$ by using BBS and $x_0$. +\item Alice finally computes the plaintext by XORing the keystream with the ciphertext: $ m = c \oplus b$. +\end{enumerate} -\section{Efficient prng based on chaotic iterations} -On parle du séquentiel avec des nombres 64 bits\\ +\subsubsection{Proposal of a new Asymmetric Cryptosystem Adapted from Blum-Goldwasser} -Faire le lien avec le paragraphe précédent (je considère que la stratégie s'appelle $S^i$\\ +We propose to adapt the Blum-Goldwasser protocol as follows. +Let $\mathsf{N} = \lfloor log(log(N)) \rfloor$ be the number of bits that can +be obtained securely with the BBS generator using the public key $N$ of Alice. +Alice will pick randomly $S^0$ in $\llbracket 0, 2^{\mathsf{N}-1}\rrbracket$ too, and +her new public key will be $(S^0, N)$. -In order to implement efficiently a PRNG based on chaotic iterations it is -possible to improve previous works [ref]. One solution consists in considering -that the strategy used $S^i$ contains all the bits for which the negation is -achieved out. Then instead of applying the negation on these bits we can simply -apply the xor operator between the current number and the strategy $S^i$. In -order to obtain the strategy we also use a classical PRNG. +To encrypt his message, Bob will compute +%%RAPH : ici, j'ai mis un simple $ +%\begin{equation} +$c = \left(m_0 \oplus (b_0 \oplus S^0), m_1 \oplus (b_0 \oplus b_1 \oplus S^0), \hdots, \right.$ +$ \left. m_{L-1} \oplus (b_0 \oplus b_1 \hdots \oplus b_{L-1} \oplus S^0) \right)$ +%%\end{equation} +instead of $\left(m_0 \oplus b_0, m_1 \oplus b_1, \hdots, m_{L-1} \oplus b_{L-1} \right)$. -\begin{figure}[htbp] -\begin{center} -\fbox{ -\begin{minipage}{14cm} -unsigned int CIprng() \{\\ - static unsigned int x = 123123123;\\ - unsigned long t1 = xorshift();\\ - unsigned long t2 = xor128();\\ - unsigned long t3 = xorwow();\\ - x = x\textasciicircum (unsigned int)t1;\\ - x = x\textasciicircum (unsigned int)(t2$>>$32);\\ - x = x\textasciicircum (unsigned int)(t3$>>$32);\\ - x = x\textasciicircum (unsigned int)t2;\\ - x = x\textasciicircum (unsigned int)(t1$>>$32);\\ - x = x\textasciicircum (unsigned int)t3;\\ - return x;\\ -\} -\end{minipage} -} -\end{center} -\caption{sequential Chaotic Iteration PRNG} -\label{algo:seqCIprng} -\end{figure} +The same decryption stage as in Blum-Goldwasser leads to the sequence +$\left(m_0 \oplus S^0, m_1 \oplus S^0, \hdots, m_{L-1} \oplus S^0 \right)$. +Thus, with a simple use of $S^0$, Alice can obtain the plaintext. +By doing so, the proposed generator is used in place of BBS, leading to +the inheritance of all the properties presented in this paper. -In Figure~\ref{algo:seqCIprng} a sequential version of our chaotic iterations -based PRNG is presented. This version uses three classical 64 bits PRNG: the -\texttt{xorshift}, the \texttt{xor128} and the \texttt{xorwow}. These three -PRNGs are presented in~\cite{Marsaglia2003}. As each PRNG used works with -64-bits and as our PRNG works with 32 bits, the use of \texttt{(unsigned int)} -selects the 32 least significant bits whereas \texttt{(unsigned int)(t3$>>$32)} -selects the 32 most significants bits of the variable \texttt{t}. This version -sucesses the BigCrush of the TestU01 battery [P. L’ecuyer and - R. Simard. Testu01]. +\section{Conclusion} -\section{Efficient prng based on chaotic iterations on GPU} -On parle du passage du sequentiel au GPU +In this paper, a formerly proposed PRNG based on chaotic iterations +has been generalized to improve its speed. It has been proven to be +chaotic according to Devaney. +Efficient implementations on GPU using xor-like PRNGs as input generators +have shown that a very large quantity of pseudorandom numbers can be generated per second (about +20Gsamples/s), and that these proposed PRNGs succeed to pass the hardest battery in TestU01, +namely the BigCrush. +Furthermore, we have shown that when the inputted generator is cryptographically +secure, then it is the case too for the PRNG we propose, thus leading to +the possibility to develop fast and secure PRNGs using the GPU architecture. +An improvement of the Blum-Goldwasser cryptosystem, making it +behave chaotically, has finally been proposed. -\section{Experiments} +In future work we plan to extend this research, building a parallel PRNG for clusters or +grid computing. Topological properties of the various proposed generators will be investigated, +and the use of other categories of PRNGs as input will be studied too. The improvement +of Blum-Goldwasser will be deepened. Finally, we +will try to enlarge the quantity of pseudorandom numbers generated per second either +in a simulation context or in a cryptographic one. -On passe le BigCrush\\ -On donne des temps de générations sur GPU/CPU\\ -On donne des temps de générations de nombre sur GPU puis on rappatrie sur CPU / CPU ? bof bof, on verra -\section{Conclusion} -\bibliographystyle{plain} +\bibliographystyle{plain} \bibliography{mabase} \end{document}