algorithm (algorithm~\ref{algo:gpu_kernel2}). It simply consists in replacing
the {\it xor-like} algorithm by another cryptographically secure prng. In
practice, we suggest to use the BBS algorithm~\cite{BBS} which takes the form:
-$$x_{n+1}=x_n^2~ mod~ M$$ where M is the product of the prime numbers. Those
+$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers. Those
prime numbers need to be congruent to 3 modulus 4. In practice, this PRNG is
known to be slow and not efficient for the generation of random numbers. For
current GPU cards, the modulus operation is the most time consuming
operation. So in order to obtain quite reasonable performances, it is required
to use only modulus on 32 bits integer numbers. Consequently $x_n^2$ need to be
less than $2^{32}$ and the number $M$ need to be less than $2^{16}$. So in
-pratice we can choose prime number around 256 that are congruent to 3 modulus 4.
-With 32 bits numbers, only the 4 least significant bits of $x_n$ can be chosen
-(the maximum number of undistinguishing is less or equals to
-$log_2(log_2(x_n))$).
+pratice we can choose prime numbers around 256 that are congruent to 3 modulus
+4. With 32 bits numbers, only the 4 least significant bits of $x_n$ can be
+chosen (the maximum number of undistinguishing is less or equals to
+$log_2(log_2(x_n))$). So to generate a 32 bits number, we need to use 8 times
+the BBS algorithm, with different combinations of $M$ is required.
+
+Currently this PRNG does not succeed to pass all the tests of TestU01.
\section{Experiments}
\label{sec:experiments}
another one equipped with a less performant CPU and a GeForce GTX 280. Both
cards have 240 cores.
-In Figure~\ref{fig:time_gpu} we compare the number of random numbers generated
-per second. The xor-like prng is a xor64 described in~\cite{Marsaglia2003}. In
-order to obtain the optimal performance we remove the storage of random numbers
-in the GPU memory. This step is time consuming and slows down the random number
-generation. Moreover, if you are interested by applications that consume random
-numbers directly when they are generated, their storage is completely
+In Figure~\ref{fig:time_xorlike_gpu} we compare the number of random numbers
+generated per second with the xor-like based PRNG. In this figure, the optimized
+version use the {\it xor64} described in~\cite{Marsaglia2003}. The naive version
+use the three xor-like PRNGs described in Listing~\ref{algo:seqCIprng}. In
+order to obtain the optimal performance we removed the storage of random numbers
+in the GPU memory. This step is time consuming and slows down the random numbers
+generation. Moreover, if one is interested by applications that consume random
+numbers directly when they are generated, their storage are completely
useless. In this figure we can see that when the number of threads is greater
than approximately 30,000 upto 5 millions the number of random numbers generated
per second is almost constant. With the naive version, it is between 2.5 and
\begin{figure}[htbp]
\begin{center}
- \includegraphics[scale=.7]{curve_time_gpu.pdf}
+ \includegraphics[scale=.7]{curve_time_xorlike_gpu.pdf}
\end{center}
-\caption{Number of random numbers generated per second}
-\label{fig:time_gpu}
+\caption{Number of random numbers generated per second with the xorlike based PRNG}
+\label{fig:time_xorlike_gpu}
\end{figure}
138MSample/s with only one core of the Xeon E5530.
+In Figure~\ref{fig:time_bbs_gpu} we highlight the performance of the optimized
+BBS based PRNG on GPU. Performances are less important. On the Tesla C1060 we
+obtain approximately 1.8GSample/s and on the GTX 280 about 1.6GSample/s.
+
+\begin{figure}[htbp]
+\begin{center}
+ \includegraphics[scale=.7]{curve_time_bbs_gpu.pdf}
+\end{center}
+\caption{Number of random numbers generated per second with the BBS based PRNG}
+\label{fig:time_bbs_gpu}
+\end{figure}
+Both these experimentations allows us to conclude that it is possible to
+generate a huge number of pseudo-random numbers with the xor-like version and
+about tens times less with the BBS based version. The former version has only
+chaotic properties whereas the latter also has cryptographically properties.
%% \section{Cryptanalysis of the Proposed PRNG}
In this paper we have presented a new class of PRNGs based on chaotic
-iterations. We have proven that these PRNGs are chaotic in the sense of Devenay.
+iterations. We have proven that these PRNGs are chaotic in the sense of Devenay.
+We also propose a PRNG cryptographically secure and its implementation on GPU.
+
+An efficient implementation on GPU based on a xor-like PRNG allows us to
+generate a huge number of pseudo-random numbers per second (about
+20Gsample/s). This PRNG succeeds to pass the hardest batteries of TestU01.
+
+In future work we plan to extend this work for parallel PRNG for clusters or
+grid computing. We also plan to improve the BBS version in order to succeed all
+the tests of TestU01.
-An efficient implementation on GPU allows us to generate a huge number of pseudo
-random numbers per second (about 20Gsample/s). Our PRNGs succeed to pass the
-hardest batteries of test (TestU01).
-In future work we plan to extend our work in order to have cryptographically
-secure PRNGs because in some situations this property may be important.
-\bibliographystyle{plain}
+\bibliographystyle{plain}
\bibliography{mabase}
\end{document}