$\mathds{B}^\mathsf{N}$ represents the memory of the computer whereas $\llbracket 1 ; \mathsf{N}
\rrbracket^{\mathds{N}}$ is its input stream (the seeds, for instance, in PRNG, or a physical noise in TRNG).
-\section{Application to pseudorandomness}
+\section{Application to Pseudorandomness}
\label{sec:pseudorandom}
-\subsection{A First pseudorandom Number Generator}
+\subsection{A First Pseudorandom Number Generator}
We have proposed in~\cite{bgw09:ip} a new family of generators that receives
two PRNGs as inputs. These two generators are mixed with chaotic iterations,
\section{Efficient PRNG based on Chaotic Iterations}
\label{sec:efficient prng}
-In order to implement efficiently a PRNG based on chaotic iterations it is
-possible to improve previous works [ref]. One solution consists in considering
-that the strategy used contains all the bits for which the negation is
-achieved out. Then in order to apply the negation on these bits we can simply
-apply the xor operator between the current number and the strategy. In
-order to obtain the strategy we also use a classical PRNG.
+Based on the proof presented in the previous section, it is now possible to
+improve the speed of the generator formerly presented in~\cite{bgw09:ip,guyeux10}.
+The first idea is to consider
+that the provided strategy is a pseudorandom Boolean vector obtained by a
+given PRNG.
+An iteration of the system is simply the bitwise exclusive or between
+the last computed state and the current strategy.
+Topological properties of disorder exhibited by chaotic
+iterations can be inherited by the inputted generator, hoping by doing so to
+obtain some statistical improvements while preserving speed.
-Here is an example with 16-bits numbers showing how the bitwise operations
+
+Let us give an example using 16-bits numbers, to clearly understand how the bitwise xor operations
are
-applied. Suppose that $x$ and the strategy $S^i$ are defined in binary mode.
-Then the following table shows the result of $x$ xor $S^i$.
+done.
+Suppose that $x$ and the strategy $S^i$ are given as
+binary vectors.
+Table~\ref{TableExemple} shows the result of $x \oplus S^i$.
+
+\begin{table}
$$
\begin{array}{|cc|cccccccccccccccc|}
\hline
\hline
\end{array}
$$
+\caption{Example of an arbitrary round of the proposed generator}
+\label{TableExemple}
+\end{table}
-
-
-\lstset{language=C,caption={C code of the sequential chaotic iterations based
-PRNG},label=algo:seqCIprng}
+\lstset{language=C,caption={C code of the sequential PRNG based on chaotic iterations},label=algo:seqCIprng}
\begin{lstlisting}
unsigned int CIprng() {
static unsigned int x = 123123123;
-In listing~\ref{algo:seqCIprng} a sequential version of our chaotic iterations
-based PRNG is presented. The xor operator is represented by \textasciicircum.
-This function uses three classical 64-bits PRNG: the \texttt{xorshift}, the
-\texttt{xor128} and the \texttt{xorwow}. In the following, we call them
-xor-like PRNGSs. These three PRNGs are presented in~\cite{Marsaglia2003}. As
-each xor-like PRNG used works with 64-bits and as our PRNG works with 32-bits,
-the use of \texttt{(unsigned int)} selects the 32 least significant bits whereas
-\texttt{(unsigned int)(t3$>>$32)} selects the 32 most significants bits of the
-variable \texttt{t}. So to produce a random number realizes 6 xor operations
-with 6 32-bits numbers produced by 3 64-bits PRNG. This version successes the
-BigCrush of the TestU01 battery~\cite{LEcuyerS07}.
+In Listing~\ref{algo:seqCIprng} a sequential version of the proposed PRNG based on chaotic iterations
+ is presented. The xor operator is represented by \textasciicircum.
+This function uses three classical 64-bits PRNGs, namely the \texttt{xorshift}, the
+\texttt{xor128}, and the \texttt{xorwow}~\cite{Marsaglia2003}. In the following, we call them
+``xor-like PRNGs''.
+As
+each xor-like PRNG uses 64-bits whereas our proposed generator works with 32-bits,
+we use the command \texttt{(unsigned int)}, that selects the 32 least significant bits of a given integer, and the code
+\texttt{(unsigned int)(t3$>>$32)} in order to obtain the 32 most significant bits of \texttt{t}.
-\section{Efficient PRNGs based on chaotic iterations on GPU}
+So producing a pseudorandom number needs 6 xor operations
+with 6 32-bits numbers that are provided by 3 64-bits PRNGs. This version successfully passes the
+stringent BigCrush battery of tests~\cite{LEcuyerS07}.
+
+\section{Efficient PRNGs based on Chaotic Iterations on GPU}
\label{sec:efficient prng gpu}
-In order to benefit from computing power of GPU, a program needs to define
-independent blocks of threads which can be computed simultaneously. In general,
-the larger the number of threads is, the more local memory is used and the less
-branching instructions are used (if, while, ...), the better performance is
-obtained on GPU. So with algorithm \ref{algo:seqCIprng} presented in the
-previous section, it is possible to build a similar program which computes PRNG
-on GPU. In the CUDA~\cite{Nvid10} environment, threads have a local
-identificator, called \texttt{ThreadIdx} relative to the block containing them.
-
-
-\subsection{Naive version for GPU}
-
-From the CPU version, it is possible to obtain a quite similar version for GPU.
-The principe consists in assigning the computation of a PRNG as in sequential to
-each thread of the GPU. Of course, it is essential that the three xor-like
-PRNGs used for our computation have different parameters. So we chose them
-randomly with another PRNG. As the initialisation is performed by the CPU, we
-have chosen to use the ISAAC PRNG~\cite{Jenkins96} to initalize all the
-parameters for the GPU version of our PRNG. The implementation of the three
-xor-like PRNGs is straightforward as soon as their parameters have been
-allocated in the GPU memory. Each xor-like PRNGs used works with an internal
-number $x$ which keeps the last generated random numbers. Other internal
-variables are also used by the xor-like PRNGs. More precisely, the
-implementation of the xor128, the xorshift and the xorwow respectively require
-4, 5 and 6 unsigned long as internal variables.
+In order to take benefits from the computing power of GPU, a program needs to have
+independent blocks of threads that can be computed simultaneously. In general,
+the larger the number of threads is, the more local memory is used, and the less
+branching instructions are used (if, while, ...), the better the performances on GPU is.
+Obviously, having these requirements in mind, it is possible to build a program similar to
+the one presented in Algorithm \ref{algo:seqCIprng}, which computes pseudorandom numbers
+on GPU.
+To do so, we must firstly recall that in
+ the CUDA~\cite{Nvid10} environment, threads have a local
+identifier called \texttt{ThreadIdx}, which is relative to the block containing them.
+
+
+\subsection{Naive Version for GPU}
+
+
+It is possible to deduce from the CPU version a quite similar version adapted to GPU.
+The simple principle consists to make each thread of the GPU computing the CPU version of our PRNG.
+Of course, the three xor-like
+PRNGs used in these computations must have different parameters.
+In a given thread, these lasts are
+randomly picked from another PRNGs.
+The initialization stage is performed by the CPU.
+To do it, the ISAAC PRNG~\cite{Jenkins96} is used to set all the
+parameters embedded into each thread.
+
+The implementation of the three
+xor-like PRNGs is straightforward when their parameters have been
+allocated in the GPU memory. Each xor-like works with an internal
+number $x$ that saves the last generated pseudorandom number. Additionally, the
+implementation of the xor128, the xorshift, and the xorwow respectively require
+4, 5, and 6 unsigned long as internal variables.
\begin{algorithm}
\KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like
PRNGs in global memory\;
-NumThreads: Number of threads\;}
+NumThreads: number of threads\;}
\KwOut{NewNb: array containing random numbers in global memory}
\If{threadIdx is concerned by the computation} {
retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\;
store internal variables in InternalVarXorLikeArray[threadIdx]\;
}
-\caption{main kernel for the chaotic iterations based PRNG GPU naive version}
+\caption{Main kernel of the GPU ``naive'' version of the PRNG based on chaotic iterations}
\label{algo:gpu_kernel}
\end{algorithm}
-Algorithm~\ref{algo:gpu_kernel} presents a naive implementation of PRNG using
-GPU. According to the available memory in the GPU and the number of threads
+Algorithm~\ref{algo:gpu_kernel} presents a naive implementation of the proposed PRNG on
+GPU. Due to the available memory in the GPU and the number of threads
used simultenaously, the number of random numbers that a thread can generate
-inside a kernel is limited, i.e. the variable \texttt{n} in
-algorithm~\ref{algo:gpu_kernel}. For example, if $100,000$ threads are used and
-if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}
-then the memory required to store internals variables of xor-like
+inside a kernel is limited (\emph{i.e.}, the variable \texttt{n} in
+algorithm~\ref{algo:gpu_kernel}). For instance, if $100,000$ threads are used and
+if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)},
+then the memory required to store all of the internals variables of both the xor-like
PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
-and random number of our PRNG is equals to $100,000\times ((4+5+6)\times
-2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb.
+and the pseudorandom numbers generated by our PRNG, is equal to $100,000\times ((4+5+6)\times
+2+(1+100))=1,310,000$ 32-bits numbers, that is, approximately $52$Mb.
-All the tests performed to pass the BigCrush of TestU01 succeeded. Different
-number of threads, called \texttt{NumThreads} in our algorithm, have been tested
-upto $10$ millions.
-\newline
-\newline
-{\bf QUESTION : on laisse cette remarque, je suis mitigé !!!}
+This generator is able to pass the whole BigCrush battery of tests, for all
+the versions that have been tested depending on their number of threads
+(called \texttt{NumThreads} in our algorithm, tested until $10$ millions).
\begin{remark}
-Algorithm~\ref{algo:gpu_kernel} has the advantage to manipulate independent
-PRNGs, so this version is easily usable on a cluster of computer. The only thing
-to ensure is to use a single ISAAC PRNG. For this, a simple solution consists in
-using a master node for the initialization which computes the initial parameters
+The proposed algorithm has the advantage to manipulate independent
+PRNGs, so this version is easily adaptable on a cluster of computers too. The only thing
+to ensure is to use a single ISAAC PRNG. To achieve this requirement, a simple solution consists in
+using a master node for the initialization. This master node computes the initial parameters
for all the differents nodes involves in the computation.
\end{remark}
-\subsection{Improved version for GPU}
+\subsection{Improved Version for GPU}
As GPU cards using CUDA have shared memory between threads of the same block, it
is possible to use this feature in order to simplify the previous algorithm,
-\section{A cryptographically secure prng for GPU}
+\section{A Cryptographically Secure PRNG for GPU}
\label{sec:CSGPU}
It is possible to build a cryptographically secure prng based on the previous
algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing