+Let us now explain why we have reasonable ground to believe that chaos
+can improve statistical properties.
+We will show in this section that chaotic properties as defined in the
+mathematical theory of chaos are related to some statistical tests that can be found
+in the NIST battery. Furthermore, we will check that, when mixing defective PRNGs with
+chaotic iterations, the new generator presents better statistical properties
+(this section summarizes and extends the work of~\cite{bfg12a:ip}).
+
+
+
+\subsection{Qualitative relations between topological properties and statistical tests}
+
+
+There are various relations between topological properties that describe an unpredictable behavior for a discrete
+dynamical system on the one
+hand, and statistical tests to check the randomness of a numerical sequence
+on the other hand. These two mathematical disciplines follow a similar
+objective in case of a recurrent sequence (to characterize an intrinsically complicated behavior for a
+recurrent sequence), with two different but complementary approaches.
+It is true that the following illustrative links give only qualitative arguments,
+and proofs should be provided later to make such arguments irrefutable. However
+they give a first understanding of the reason why we think that chaotic properties should tend
+to improve the statistical quality of PRNGs.
+%
+Let us now list some of these relations between topological properties defined in the mathematical
+theory of chaos and tests embedded into the NIST battery. %Such relations need to be further
+%investigated, but they presently give a first illustration of a trend to search similar properties in the
+%two following fields: mathematical chaos and statistics.
+
+
+\begin{itemize}
+ \item \textbf{Regularity}. As stated in Section~\ref{subsec:Devaney}, a chaotic dynamical system must
+have an element of regularity. Depending on the chosen definition of chaos, this element can be the existence of
+a dense orbit, the density of periodic points, etc. The key idea is that a dynamical system with no periodicity
+is not as chaotic as a system having periodic orbits: in the first situation, we can predict something and gain a
+knowledge about the behavior of the system, that is, it never enters into a loop. A similar importance for periodicity is emphasized in
+the two following NIST tests~\cite{Nist10}:
+ \begin{itemize}
+ \item \textbf{Non-overlapping Template Matching Test}. Detect generators that produce too many occurrences of a given non-periodic (aperiodic) pattern.
+ \item \textbf{Discrete Fourier Transform (Spectral) Test}. Detect periodic features (i.e., repetitive patterns that are close one to another) in the tested sequence that would indicate a deviation from the assumption of randomness.
+ \end{itemize}
+
+\item \textbf{Transitivity}. This topological property previously introduced states that the dynamical system is intrinsically complicated: it cannot be simplified into
+two subsystems that do not interact, as we can find in any neighborhood of any point another point whose orbit visits the whole phase space.
+This focus on the places visited by the orbits of the dynamical system takes various nonequivalent formulations in the mathematical theory
+of chaos, namely: transitivity, strong transitivity, total transitivity, topological mixing, and so on~\cite{bg10:ij}. A similar attention
+is brought on the states visited during a random walk in the two tests below~\cite{Nist10}:
+ \begin{itemize}
+ \item \textbf{Random Excursions Variant Test}. Detect deviations from the expected number of visits to various states in the random walk.
+ \item \textbf{Random Excursions Test}. Determine if the number of visits to a particular state within a cycle deviates from what one would expect for a random sequence.
+ \end{itemize}
+
+\item \textbf{Chaos according to Li and Yorke}. Two points of the phase space $(x,y)$ define a couple of Li-Yorke when $\limsup_{n \rightarrow +\infty} d(f^{(n)}(x), f^{(n)}(y))>0$ et $\liminf_{n \rightarrow +\infty} d(f^{(n)}(x), f^{(n)}(y))=0$, meaning that their orbits always oscillate as the iterations pass. When a system is compact and contains an uncountable set of such points, it is claimed as chaotic according
+to Li-Yorke~\cite{Li75,Ruette2001}. A similar property is regarded in the following NIST test~\cite{Nist10}.
+ \begin{itemize}
+ \item \textbf{Runs Test}. To determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. In particular, this test determines whether the oscillation between such zeros and ones is too fast or too slow.
+ \end{itemize}
+ \item \textbf{Topological entropy}. The desire to formulate an equivalency of the thermodynamics entropy
+has emerged both in the topological and statistical fields. Once again, a similar objective has led to two different
+rewritting of an entropy based disorder: the famous Shannon definition of entropy is approximated in the statistical approach,
+whereas topological entropy is defined as follows:
+$x,y \in \mathcal{X}$ are $\varepsilon-$\emph{separated in time $n$} if there exists $k \leqslant n$ such that $d\left(f^{(k)}(x),f^{(k)}(y)\right)>\varepsilon$. Then $(n,\varepsilon)-$separated sets are sets of points that are all $\varepsilon-$separated in time $n$, which
+leads to the definition of $s_n(\varepsilon,Y)$, being the maximal cardinality of all $(n,\varepsilon)-$separated sets. Using these notations,
+the topological entropy is defined as follows: $$h_{top}(\mathcal{X},f) = \displaystyle{\lim_{\varepsilon \rightarrow 0} \Big[ \limsup_{n \rightarrow +\infty} \dfrac{1}{n} \log s_n(\varepsilon,\mathcal{X})\Big]}.$$
+This value measures the average exponential growth of the number of distinguishable orbit segments.
+In this sense, it measures the complexity of the topological dynamical system, whereas
+the Shannon approach comes to mind when defining the following test~\cite{Nist10}:
+ \begin{itemize}
+\item \textbf{Approximate Entropy Test}. Compare the frequency of the overlapping blocks of two consecutive/adjacent lengths ($m$ and $m+1$) against the expected result for a random sequence.
+ \end{itemize}
+
+ \item \textbf{Non-linearity, complexity}. Finally, let us remark that non-linearity and complexity are
+not only sought in general to obtain chaos, but they are also required for randomness, as illustrated by the two tests below~\cite{Nist10}.
+ \begin{itemize}
+\item \textbf{Binary Matrix Rank Test}. Check for linear dependence among fixed length substrings of the original sequence.
+\item \textbf{Linear Complexity Test}. Determine whether or not the sequence is complex enough to be considered random.
+ \end{itemize}
+\end{itemize}
+
+
+We have proven in our previous works~\cite{guyeux12:bc} that chaotic iterations satisfying Theorem~\ref{Th:Caractérisation des IC chaotiques} are, among other
+things, strongly transitive, topologically mixing, chaotic as defined by Li and Yorke,
+and that they have a topological entropy and an exponent of Lyapunov both equal to $ln(\mathsf{N})$,
+where $\mathsf{N}$ is the size of the iterated vector.
+These topological properties make that we are ground to believe that a generator based on chaotic
+iterations will probably be able to pass all the existing statistical batteries for pseudorandomness like
+the NIST one. The following subsections, in which we prove that defective generators have their
+statistical properties improved by chaotic iterations, show that such an assumption is true.
+
+\subsection{Details of some Existing Generators}
+
+The list of defective PRNGs we will use
+as inputs for the statistical tests to come is introduced here.
+
+Firstly, the simple linear congruency generators (LCGs) will be used.
+They are defined by the following recurrence:
+\begin{equation}
+x^n = (ax^{n-1} + c)~mod~m,
+\label{LCG}
+\end{equation}
+where $a$, $c$, and $x^0$ must be, among other things, non-negative and inferior to
+$m$~\cite{LEcuyerS07}. In what follows, 2LCGs and 3LCGs refer to two (resp. three)
+combinations of such LCGs. For further details, see~\cite{bfg12a:ip,combined_lcg}.
+
+Secondly, the multiple recursive generators (MRGs) which will be used,
+are based on a linear recurrence of order
+$k$, modulo $m$~\cite{LEcuyerS07}:
+\begin{equation}
+x^n = (a^1x^{n-1}+~...~+a^kx^{n-k})~mod~m .
+\label{MRG}
+\end{equation}
+The combination of two MRGs (referred as 2MRGs) is also used in these experiments.
+
+Generators based on linear recurrences with carry will be regarded too.
+This family of generators includes the add-with-carry (AWC) generator, based on the recurrence:
+\begin{equation}
+\label{AWC}
+\begin{array}{l}
+x^n = (x^{n-r} + x^{n-s} + c^{n-1})~mod~m, \\
+c^n= (x^{n-r} + x^{n-s} + c^{n-1}) / m, \end{array}\end{equation}
+the SWB generator, having the recurrence:
+\begin{equation}
+\label{SWB}
+\begin{array}{l}
+x^n = (x^{n-r} - x^{n-s} - c^{n-1})~mod~m, \\
+c^n=\left\{
+\begin{array}{l}
+1 ~~~~~\text{if}~ (x^{i-r} - x^{i-s} - c^{i-1})<0\\
+0 ~~~~~\text{else},\end{array} \right. \end{array}\end{equation}
+and the SWC generator, which is based on the following recurrence:
+\begin{equation}
+\label{SWC}
+\begin{array}{l}
+x^n = (a^1x^{n-1} \oplus ~...~ \oplus a^rx^{n-r} \oplus c^{n-1}) ~ mod ~ 2^w, \\
+c^n = (a^1x^{n-1} \oplus ~...~ \oplus a^rx^{n-r} \oplus c^{n-1}) ~ / ~ 2^w. \end{array}\end{equation}
+
+Then the generalized feedback shift register (GFSR) generator has been implemented, that is:
+\begin{equation}
+x^n = x^{n-r} \oplus x^{n-k} .
+\label{GFSR}
+\end{equation}
+
+
+Finally, the nonlinear inversive (INV) generator~\cite{LEcuyerS07} has been studied, which is:
+
+\begin{equation}
+\label{INV}
+\begin{array}{l}
+x^n=\left\{
+\begin{array}{ll}
+(a^1 + a^2 / z^{n-1})~mod~m & \text{if}~ z^{n-1} \neq 0 \\
+a^1 & \text{if}~ z^{n-1} = 0 .\end{array} \right. \end{array}\end{equation}
+
+
+
+\begin{table}
+%\renewcommand{\arraystretch}{1}
+\caption{TestU01 Statistical Test Failures}
+\label{TestU011}
+\centering
+ \begin{tabular}{lccccc}
+ \toprule
+Test name &Tests& Logistic & XORshift & ISAAC\\
+Rabbit & 38 &21 &14 &0 \\
+Alphabit & 17 &16 &9 &0 \\
+Pseudo DieHARD &126 &0 &2 &0 \\
+FIPS\_140\_2 &16 &0 &0 &0 \\
+SmallCrush &15 &4 &5 &0 \\
+Crush &144 &95 &57 &0 \\
+Big Crush &160 &125 &55 &0 \\ \hline
+Failures & &261 &146 &0 \\
+\bottomrule
+ \end{tabular}
+\end{table}
+
+
+
+\begin{table}
+%\renewcommand{\arraystretch}{1}
+\caption{TestU01 Statistical Test Failures for Old CI algorithms ($\mathsf{N}=4$)}
+\label{TestU01 for Old CI}
+\centering
+ \begin{tabular}{lcccc}
+ \toprule
+\multirow{3}*{Test name} & \multicolumn{4}{c}{Old CI}\\
+&Logistic& XORshift& ISAAC&ISAAC \\
+&+& +& + & + \\
+&Logistic& XORshift& XORshift&ISAAC \\ \cmidrule(r){2-5}
+Rabbit &7 &2 &0 &0 \\
+Alphabit & 3 &0 &0 &0 \\
+DieHARD &0 &0 &0 &0 \\
+FIPS\_140\_2 &0 &0 &0 &0 \\
+SmallCrush &2 &0 &0 &0 \\
+Crush &47 &4 &0 &0 \\
+Big Crush &79 &3 &0 &0 \\ \hline
+Failures &138 &9 &0 &0 \\
+\bottomrule
+ \end{tabular}
+\end{table}
+
+
+
+
+
+\subsection{Statistical tests}
+\label{Security analysis}
+
+Three batteries of tests are reputed and regularly used
+to evaluate the statistical properties of newly designed pseudorandom
+number generators. These batteries are named DieHard~\cite{Marsaglia1996},
+the NIST suite~\cite{ANDREW2008}, and the most stringent one called
+TestU01~\cite{LEcuyerS07}, which encompasses the two other batteries.
+
+
+
+\label{Results and discussion}
+\begin{table*}
+%\renewcommand{\arraystretch}{1}
+\caption{NIST and DieHARD tests suite passing rates for PRNGs without CI}
+\label{NIST and DieHARD tests suite passing rate the for PRNGs without CI}
+\centering
+ \begin{tabular}{|l||c|c|c|c|c|c|c|c|c|c|}
+ \hline\hline
+Types of PRNGs & \multicolumn{2}{c|}{Linear PRNGs} & \multicolumn{4}{c|}{Lagged PRNGs} & \multicolumn{1}{c|}{ICG PRNGs} & \multicolumn{3}{c|}{Mixed PRNGs}\\ \hline
+\backslashbox{\textbf{$Tests$}} {\textbf{$PRNG$}} & LCG& MRG& AWC & SWB & SWC & GFSR & INV & LCG2& LCG3& MRG2 \\ \hline
+NIST & 11/15 & 14/15 &\textbf{15/15} & \textbf{15/15} & 14/15 & 14/15 & 14/15 & 14/15& 14/15& 14/15 \\ \hline
+DieHARD & 16/18 & 16/18 & 15/18 & 16/18 & \textbf{18/18} & 16/18 & 16/18 & 16/18& 16/18& 16/18\\ \hline
+\end{tabular}
+\end{table*}
+
+Table~\ref{NIST and DieHARD tests suite passing rate the for PRNGs without CI} shows the
+results on the two first batteries recalled above, indicating that all the PRNGs presented
+in the previous section
+cannot pass all these tests. In other words, the statistical quality of these PRNGs cannot
+fulfill the up-to-date standards presented previously. We have shown in~\cite{bfg12a:ip} that the use of chaotic
+iterations can solve this issue.
+%More precisely, to
+%illustrate the effects of chaotic iterations on these defective PRNGs, experiments have been divided in three parts~\cite{bfg12a:ip}:
+%\begin{enumerate}
+% \item \textbf{Single CIPRNG}: The PRNGs involved in CI computing are of the same category.
+% \item \textbf{Mixed CIPRNG}: Two different types of PRNGs are mixed during the chaotic iterations process.
+% \item \textbf{Multiple CIPRNG}: The generator is obtained by repeating the composition of the iteration function as follows: $x^0\in \mathds{B}^{\mathsf{N}}$, and $\forall n\in \mathds{N}^{\ast },\forall i\in \llbracket1;\mathsf{N}\rrbracket, x_i^n=$
+%\begin{equation}
+%\begin{array}{l}
+%\left\{
+%\begin{array}{l}
+%x_i^{n-1}~~~~~\text{if}~S^n\neq i \\
+%\forall j\in \llbracket1;\mathsf{m}\rrbracket,f^m(x^{n-1})_{S^{nm+j}}~\text{if}~S^{nm+j}=i.\end{array} \right. \end{array}
+%\end{equation}
+%$m$ is called the \emph{functional power}.
+%\end{enumerate}
+%
+The obtained results are reproduced in Table
+\ref{NIST and DieHARD tests suite passing rate the for single CIPRNGs}.
+The scores written in boldface indicate that all the tests have been passed successfully, whereas an
+asterisk ``*'' means that the considered passing rate has been improved.
+The improvements are obvious for both the ``Old CI'' and the ``New CI'' generators.
+Concerning the ``Xor CI PRNG'', the score is less spectacular. Because of a large speed improvement, the statistics
+ are not as good as for the two other versions of these CIPRNGs.
+However 8 tests have been improved (with no deflation for the other results).
+
+
+\begin{table*}
+%\renewcommand{\arraystretch}{1.3}
+\caption{NIST and DieHARD tests suite passing rates for PRNGs with CI}
+\label{NIST and DieHARD tests suite passing rate the for single CIPRNGs}
+\centering
+ \begin{tabular}{|l||c|c|c|c|c|c|c|c|c|c|c|c|}
+ \hline
+Types of PRNGs & \multicolumn{2}{c|}{Linear PRNGs} & \multicolumn{4}{c|}{Lagged PRNGs} & \multicolumn{1}{c|}{ICG PRNGs} & \multicolumn{3}{c|}{Mixed PRNGs}\\ \hline
+\backslashbox{\textbf{$Tests$}} {\textbf{$Single~CIPRNG$}} & LCG & MRG & AWC & SWB & SWC & GFSR & INV& LCG2 & LCG3& MRG2 \\ \hline\hline
+Old CIPRNG\\ \hline \hline
+NIST & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} & \textbf{15/15} & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} *& \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} \\ \hline
+DieHARD & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} & \textbf{18/18} * & \textbf{18/18} *& \textbf{18/18} * & \textbf{18/18} *& \textbf{18/18} * \\ \hline
+New CIPRNG\\ \hline \hline
+NIST & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} & \textbf{15/15} & \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} *& \textbf{15/15} * & \textbf{15/15} * & \textbf{15/15} \\ \hline
+DieHARD & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} * & \textbf{18/18} *& \textbf{18/18} *\\ \hline
+Xor CIPRNG\\ \hline\hline
+NIST & 14/15*& \textbf{15/15} * & \textbf{15/15} & \textbf{15/15} & 14/15 & \textbf{15/15} * & 14/15& \textbf{15/15} * & \textbf{15/15} *& \textbf{15/15} \\ \hline
+DieHARD & 16/18 & 16/18 & 17/18* & \textbf{18/18} * & \textbf{18/18} & \textbf{18/18} * & 16/18 & 16/18 & 16/18& 16/18\\ \hline
+\end{tabular}
+\end{table*}
+
+
+We have then investigated in~\cite{bfg12a:ip} if it were possible to improve
+the statistical behavior of the Xor CI version by combining more than one
+$\oplus$ operation. Results are summarized in Table~\ref{threshold}, illustrating
+the progressive increasing effects of chaotic iterations, when giving time to chaos to get settled in.
+Thus rapid and perfect PRNGs, regarding the NIST and DieHARD batteries, can be obtained
+using chaotic iterations on defective generators.
+
+\begin{table*}
+%\renewcommand{\arraystretch}{1.3}
+\caption{Number of $\oplus$ operations to pass the whole NIST and DieHARD batteries}
+\label{threshold}
+\centering
+ \begin{tabular}{|l||c|c|c|c|c|c|c|c|}
+ \hline
+Inputted $PRNG$ & LCG & MRG & SWC & GFSR & INV& LCG2 & LCG3 & MRG2 \\ \hline\hline
+Threshold value $m$& 19 & 7 & 2& 1 & 11& 9& 3& 4\\ \hline\hline
+\end{tabular}
+\end{table*}
+
+Finally, the TestU01 battery has been launched on three well-known generators
+(a logistic map, a simple XORshift, and the cryptographically secure ISAAC,
+see Table~\ref{TestU011}). These results can be compared with
+Table~\ref{TestU01 for Old CI}, which gives the scores obtained by the
+Old CI PRNG that has received these generators.
+The obvious improvement speaks for itself, and together with the other
+results recalled in this section, it reinforces the opinion that a strong
+correlation between topological properties and statistical behavior exists.
+
+
+The next subsection will now give a concrete original implementation of the Xor CI PRNG, the
+fastest generator in the chaotic iteration based family. In the remainder,
+this generator will be simply referred to as CIPRNG, or ``the proposed PRNG'', if this statement does not
+raise ambiguity.
+
+
+\subsection{First Efficient Implementation of a PRNG based on Chaotic Iterations}
+\label{sec:efficient PRNG}
+%
+%Based on the proof presented in the previous section, it is now possible to
+%improve the speed of the generator formerly presented in~\cite{bgw09:ip,guyeux10}.
+%The first idea is to consider
+%that the provided strategy is a pseudorandom Boolean vector obtained by a
+%given PRNG.
+%An iteration of the system is simply the bitwise exclusive or between
+%the last computed state and the current strategy.
+%Topological properties of disorder exhibited by chaotic
+%iterations can be inherited by the inputted generator, we hope by doing so to
+%obtain some statistical improvements while preserving speed.
+%
+%%RAPH : j'ai viré tout ca
+%% Let us give an example using 16-bits numbers, to clearly understand how the bitwise xor operations
+%% are
+%% done.
+%% Suppose that $x$ and the strategy $S^i$ are given as
+%% binary vectors.
+%% Table~\ref{TableExemple} shows the result of $x \oplus S^i$.
+
+%% \begin{table}
+%% \begin{scriptsize}
+%% $$
+%% \begin{array}{|cc|cccccccccccccccc|}
+%% \hline
+%% x &=&1&0&1&1&1&0&1&0&1&0&0&1&0&0&1&0\\
+%% \hline
+%% S^i &=&0&1&1&0&0&1&1&0&1&1&1&0&0&1&1&1\\
+%% \hline
+%% x \oplus S^i&=&1&1&0&1&1&1&0&0&0&1&1&1&0&1&0&1\\
+%% \hline
+
+%% \hline
+%% \end{array}
+%% $$
+%% \end{scriptsize}
+%% \caption{Example of an arbitrary round of the proposed generator}
+%% \label{TableExemple}
+%% \end{table}
+
+
+
+
+\lstset{language=C,caption={C code of the sequential PRNG based on chaotic iterations},label={algo:seqCIPRNG}}
+\begin{small}
+\begin{lstlisting}
+
+unsigned int CIPRNG() {
+ static unsigned int x = 123123123;
+ unsigned long t1 = xorshift();
+ unsigned long t2 = xor128();
+ unsigned long t3 = xorwow();
+ x = x^(unsigned int)t1;
+ x = x^(unsigned int)(t2>>32);
+ x = x^(unsigned int)(t3>>32);
+ x = x^(unsigned int)t2;
+ x = x^(unsigned int)(t1>>32);
+ x = x^(unsigned int)t3;
+ return x;
+}
+\end{lstlisting}
+\end{small}
+
+
+
+In Listing~\ref{algo:seqCIPRNG} a sequential version of the proposed PRNG based
+on chaotic iterations is presented. The xor operator is represented by
+\textasciicircum. This function uses three classical 64-bits PRNGs, namely the
+\texttt{xorshift}, the \texttt{xor128}, and the
+\texttt{xorwow}~\cite{Marsaglia2003}. In the following, we call them ``xor-like
+PRNGs''. As each xor-like PRNG uses 64-bits whereas our proposed generator
+works with 32-bits, we use the command \texttt{(unsigned int)}, that selects the
+32 least significant bits of a given integer, and the code \texttt{(unsigned
+ int)(t$>>$32)} in order to obtain the 32 most significant bits of \texttt{t}.
+
+Thus producing a pseudorandom number needs 6 xor operations with 6 32-bits numbers
+that are provided by 3 64-bits PRNGs. This version successfully passes the
+stringent BigCrush battery of tests~\cite{LEcuyerS07}.
+At this point, we thus
+have defined an efficient and statistically unbiased generator. Its speed is
+directly related to the use of linear operations, but for the same reason,
+this fast generator cannot be proven as secure.
+
+
+
+\section{Efficient PRNGs based on Chaotic Iterations on GPU}
+\label{sec:efficient PRNG gpu}
+
+In order to take benefits from the computing power of GPU, a program
+needs to have independent blocks of threads that can be computed
+simultaneously. In general, the larger the number of threads is, the
+more local memory is used, and the less branching instructions are
+used (if, while, ...), the better the performances on GPU is.
+Obviously, having these requirements in mind, it is possible to build
+a program similar to the one presented in Listing
+\ref{algo:seqCIPRNG}, which computes pseudorandom numbers on GPU. To
+do so, we must firstly recall that in the CUDA~\cite{Nvid10}
+environment, threads have a local identifier called
+\texttt{ThreadIdx}, which is relative to the block containing
+them. Furthermore, in CUDA, parts of the code that are executed by the GPU, are
+called {\it kernels}.
+
+
+\subsection{Naive Version for GPU}
+
+
+It is possible to deduce from the CPU version a quite similar version adapted to GPU.
+The simple principle consists in making each thread of the GPU computing the CPU version of our PRNG.
+Of course, the three xor-like
+PRNGs used in these computations must have different parameters.
+In a given thread, these parameters are
+randomly picked from another PRNGs.
+The initialization stage is performed by the CPU.
+To do it, the ISAAC PRNG~\cite{Jenkins96} is used to set all the
+parameters embedded into each thread.
+
+The implementation of the three
+xor-like PRNGs is straightforward when their parameters have been
+allocated in the GPU memory. Each xor-like works with an internal
+number $x$ that saves the last generated pseudorandom number. Additionally, the
+implementation of the xor128, the xorshift, and the xorwow respectively require
+4, 5, and 6 unsigned long as internal variables.
+
+
+\begin{algorithm}
+\begin{small}
+\KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like
+PRNGs in global memory\;
+NumThreads: number of threads\;}
+\KwOut{NewNb: array containing random numbers in global memory}
+\If{threadIdx is concerned by the computation} {
+ retrieve data from InternalVarXorLikeArray[threadIdx] in local variables\;
+ \For{i=1 to n} {
+ compute a new PRNG as in Listing\ref{algo:seqCIPRNG}\;
+ store the new PRNG in NewNb[NumThreads*threadIdx+i]\;
+ }
+ store internal variables in InternalVarXorLikeArray[threadIdx]\;
+}
+\end{small}
+\caption{Main kernel of the GPU ``naive'' version of the PRNG based on chaotic iterations}
+\label{algo:gpu_kernel}
+\end{algorithm}
+
+
+
+Algorithm~\ref{algo:gpu_kernel} presents a naive implementation of the proposed PRNG on
+GPU. Due to the available memory in the GPU and the number of threads
+used simultaneously, the number of random numbers that a thread can generate
+inside a kernel is limited (\emph{i.e.}, the variable \texttt{n} in
+algorithm~\ref{algo:gpu_kernel}). For instance, if $100,000$ threads are used and
+if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)},
+then the memory required to store all of the internals variables of both the xor-like
+PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
+and the pseudorandom numbers generated by our PRNG, is equal to $100,000\times ((4+5+6)\times
+2+(1+100))=1,310,000$ 32-bits numbers, that is, approximately $52$Mb.
+
+This generator is able to pass the whole BigCrush battery of tests, for all
+the versions that have been tested depending on their number of threads
+(called \texttt{NumThreads} in our algorithm, tested up to $5$ million).
+
+\begin{remark}
+The proposed algorithm has the advantage of manipulating independent
+PRNGs, so this version is easily adaptable on a cluster of computers too. The only thing
+to ensure is to use a single ISAAC PRNG. To achieve this requirement, a simple solution consists in
+using a master node for the initialization. This master node computes the initial parameters
+for all the different nodes involved in the computation.
+\end{remark}
+
+\subsection{Improved Version for GPU}
+
+As GPU cards using CUDA have shared memory between threads of the same block, it
+is possible to use this feature in order to simplify the previous algorithm,
+i.e., to use less than 3 xor-like PRNGs. The solution consists in computing only
+one xor-like PRNG by thread, saving it into the shared memory, and then to use the results
+of some other threads in the same block of threads. In order to define which
+thread uses the result of which other one, we can use a combination array that
+contains the indexes of all threads and for which a combination has been
+performed.
+
+In Algorithm~\ref{algo:gpu_kernel2}, two combination arrays are used. The
+variable \texttt{offset} is computed using the value of
+\texttt{combination\_size}. Then we can compute \texttt{o1} and \texttt{o2}
+representing the indexes of the other threads whose results are used by the
+current one. In this algorithm, we consider that a 32-bits xor-like PRNG has
+been chosen. In practice, we use the xor128 proposed in~\cite{Marsaglia2003} in
+which unsigned longs (64 bits) have been replaced by unsigned integers (32
+bits).
+
+This version can also pass the whole {\it BigCrush} battery of tests.
+
+\begin{algorithm}
+\begin{small}
+\KwIn{InternalVarXorLikeArray: array with internal variables of 1 xor-like PRNGs
+in global memory\;
+NumThreads: Number of threads\;
+array\_comb1, array\_comb2: Arrays containing combinations of size combination\_size\;}
+
+\KwOut{NewNb: array containing random numbers in global memory}
+\If{threadId is concerned} {
+ retrieve data from InternalVarXorLikeArray[threadId] in local variables including shared memory and x\;
+ offset = threadIdx\%combination\_size\;
+ o1 = threadIdx-offset+array\_comb1[offset]\;
+ o2 = threadIdx-offset+array\_comb2[offset]\;
+ \For{i=1 to n} {
+ t=xor-like()\;
+ t=t\textasciicircum shmem[o1]\textasciicircum shmem[o2]\;
+ shared\_mem[threadId]=t\;
+ x = x\textasciicircum t\;
+
+ store the new PRNG in NewNb[NumThreads*threadId+i]\;
+ }
+ store internal variables in InternalVarXorLikeArray[threadId]\;
+}
+\end{small}
+\caption{Main kernel for the chaotic iterations based PRNG GPU efficient
+version\label{IR}}
+\label{algo:gpu_kernel2}
+\end{algorithm}
+
+\subsection{Chaos Evaluation of the Improved Version}
+
+A run of Algorithm~\ref{algo:gpu_kernel2} consists in an operation ($x=x\oplus t$) having
+the form of Equation~\ref{equation Oplus}, which is equivalent to the iterative
+system of Eq.~\ref{eq:generalIC}. That is, an iteration of the general chaotic
+iterations is realized between the last stored value $x$ of the thread and a strategy $t$
+(obtained by a bitwise exclusive or between a value provided by a xor-like() call
+and two values previously obtained by two other threads).
+To be certain that we are in the framework of Theorem~\ref{t:chaos des general},
+we must guarantee that this dynamical system iterates on the space
+$\mathcal{X} = \mathcal{P}\left(\llbracket 1, \mathsf{N} \rrbracket\right)^\mathds{N}\times\mathds{B}^\mathsf{N}$.
+The left term $x$ obviously belongs to $\mathds{B}^ \mathsf{N}$.
+To prevent from any flaws of chaotic properties, we must check that the right
+term (the last $t$), corresponding to the strategies, can possibly be equal to any
+integer of $\llbracket 1, \mathsf{N} \rrbracket$.
+
+Such a result is obvious, as for the xor-like(), all the
+integers belonging into its interval of definition can occur at each iteration, and thus the
+last $t$ respects the requirement. Furthermore, it is possible to
+prove by an immediate mathematical induction that, as the initial $x$
+is uniformly distributed (it is provided by a cryptographically secure PRNG),
+the two other stored values shmem[o1] and shmem[o2] are uniformly distributed too,
+(this is the induction hypothesis), and thus the next $x$ is finally uniformly distributed.
+
+Thus Algorithm~\ref{algo:gpu_kernel2} is a concrete realization of the general
+chaotic iterations presented previously, and for this reason, it satisfies the
+Devaney's formulation of a chaotic behavior.
+
+\section{Experiments}
+\label{sec:experiments}
+
+Different experiments have been performed in order to measure the generation
+speed. We have used a first computer equipped with a Tesla C1060 NVidia GPU card
+and an
+Intel Xeon E5530 cadenced at 2.40 GHz, and
+a second computer equipped with a smaller CPU and a GeForce GTX 280.
+All the
+cards have 240 cores.
+
+In Figure~\ref{fig:time_xorlike_gpu} we compare the quantity of pseudorandom numbers
+generated per second with various xor-like based PRNGs. In this figure, the optimized
+versions use the {\it xor64} described in~\cite{Marsaglia2003}, whereas the naive versions
+embed the three xor-like PRNGs described in Listing~\ref{algo:seqCIPRNG}. In
+order to obtain the optimal performances, the storage of pseudorandom numbers
+into the GPU memory has been removed. This step is time consuming and slows down the numbers
+generation. Moreover this storage is completely
+useless, in case of applications that consume the pseudorandom
+numbers directly after generation. We can see that when the number of threads is greater
+than approximately 30,000 and lower than 5 million, the number of pseudorandom numbers generated
+per second is almost constant. With the naive version, this value ranges from 2.5 to
+3GSamples/s. With the optimized version, it is approximately equal to
+20GSamples/s. Finally we can remark that both GPU cards are quite similar, but in
+practice, the Tesla C1060 has more memory than the GTX 280, and this memory
+should be of better quality.
+As a comparison, Listing~\ref{algo:seqCIPRNG} leads to the generation of about
+138MSample/s when using one core of the Xeon E5530.
+
+\begin{figure}[htbp]