\usepackage{moreverb}
\usepackage{commath}
\usepackage{algorithm2e}
+\usepackage{listings}
\usepackage[standard]{ntheorem}
% Pour mathds : les ensembles IR, IN, etc.
\end{algorithm}
\begin{algorithm}[h!]
-\SetAlgoLined
+%\SetAlgoLined %%RAPH: cette ligne provoque une erreur chez moi
\KwIn{the internal configuration $z$ (a 32-bit word)}
\KwOut{$y$ (a 32-bit word)}
$z\leftarrow{z\oplus{(z\ll13)}}$\;
On parle du séquentiel avec des nombres 64 bits\\
-Faire le lien avec le paragraphe précédent (je considère que la stratégie s'appelle $S^i$\\
In order to implement efficiently a PRNG based on chaotic iterations it is
possible to improve previous works [ref]. One solution consists in considering
-that the strategy used $S^i$ contains all the bits for which the negation is
+that the strategy used contains all the bits for which the negation is
achieved out. Then instead of applying the negation on these bits we can simply
-apply the xor operator between the current number and the strategy $S^i$. In
+apply the xor operator between the current number and the strategy. In
order to obtain the strategy we also use a classical PRNG.
-\begin{figure}[htbp]
-\begin{center}
-\fbox{
-\begin{minipage}{14cm}
-unsigned int CIprng() \{\\
- static unsigned int x = 123123123;\\
- unsigned long t1 = xorshift();\\
- unsigned long t2 = xor128();\\
- unsigned long t3 = xorwow();\\
- x = x\textasciicircum (unsigned int)t1;\\
- x = x\textasciicircum (unsigned int)(t2$>>$32);\\
- x = x\textasciicircum (unsigned int)(t3$>>$32);\\
- x = x\textasciicircum (unsigned int)t2;\\
- x = x\textasciicircum (unsigned int)(t1$>>$32);\\
- x = x\textasciicircum (unsigned int)t3;\\
- return x;\\
-\}
-\end{minipage}
+%% \begin{figure}[htbp]
+%% \begin{center}
+%% \fbox{
+%% \begin{minipage}{14cm}
+%% unsigned int CIprng() \{\\
+%% static unsigned int x = 123123123;\\
+%% unsigned long t1 = xorshift();\\
+%% unsigned long t2 = xor128();\\
+%% unsigned long t3 = xorwow();\\
+%% x = x\textasciicircum (unsigned int)t1;\\
+%% x = x\textasciicircum (unsigned int)(t2$>>$32);\\
+%% x = x\textasciicircum (unsigned int)(t3$>>$32);\\
+%% x = x\textasciicircum (unsigned int)t2;\\
+%% x = x\textasciicircum (unsigned int)(t1$>>$32);\\
+%% x = x\textasciicircum (unsigned int)t3;\\
+%% return x;\\
+%% \}
+%% \end{minipage}
+%% }
+%% \end{center}
+%% \caption{sequential Chaotic Iteration PRNG}
+%% \label{algo:seqCIprng}
+%% \end{figure}
+
+
+
+\lstset{language=C,caption={C code of the sequential chaotic iterations based PRNG},label=algo:seqCIprng}
+\begin{lstlisting}
+unsigned int CIprng() {
+ static unsigned int x = 123123123;
+ unsigned long t1 = xorshift();
+ unsigned long t2 = xor128();
+ unsigned long t3 = xorwow();
+ x = x^(unsigned int)t1;
+ x = x^(unsigned int)(t2>>32);
+ x = x^(unsigned int)(t3>>32);
+ x = x^(unsigned int)t2;
+ x = x^(unsigned int)(t1>>32);
+ x = x^(unsigned int)t3;
+ return x;
}
-\end{center}
-\caption{sequential Chaotic Iteration PRNG}
-\label{algo:seqCIprng}
-\end{figure}
+\end{lstlisting}
+
+
+
-In Figure~\ref{algo:seqCIprng} a sequential version of our chaotic iterations
-based PRNG is presented. This version uses three classical 64 bits PRNG: the
-\texttt{xorshift}, the \texttt{xor128} and the \texttt{xorwow}. These three
-PRNGs are presented in~\cite{Marsaglia2003}. As each PRNG used works with
-64-bits and as our PRNG works with 32 bits, the use of \texttt{(unsigned int)}
-selects the 32 least significant bits whereas \texttt{(unsigned int)(t3$>>$32)}
-selects the 32 most significants bits of the variable \texttt{t}. This version
-sucesses the BigCrush of the TestU01 battery [P. L’ecuyer and
+
+In listing~\ref{algo:seqCIprng} a sequential version of our chaotic iterations
+based PRNG is presented. This version uses three classical 64-bits PRNG: the
+\texttt{xorshift}, the \texttt{xor128} and the \texttt{xorwow}. In the
+following, we call them xor-like PRNGSs. These three PRNGs are presented
+in~\cite{Marsaglia2003}. As each xor-like PRNG used works with 64-bits and as our PRNG
+works with 32-bits, the use of \texttt{(unsigned int)} selects the 32 least
+significant bits whereas \texttt{(unsigned int)(t3$>>$32)} selects the 32 most
+significants bits of the variable \texttt{t}. So to produce a random number
+realizes 6 xor operations with 6 32-bits numbers produced by 3 64-bits PRNG.
+This version successes the BigCrush of the TestU01 battery [P. L’ecuyer and
R. Simard. Testu01].
\section{Efficient prng based on chaotic iterations on GPU}
-On parle du passage du sequentiel au GPU
+In order to benefit from computing power of GPU, a program needs to define
+independent blocks of threads which can be computed simultaneously. In general,
+the larger the number of threads is, the more local memory is used and the less
+branching instructions are used (if, while, ...), the better performance is
+obtained on GPU. So with algorithm \ref{algo:seqCIprng} presented in the
+previous section, it is possible to build a similar program which computes PRNG
+on GPU. The principe consists in assigning the computation of a PRNG as in
+sequential to each thread of the GPU. Of course, it is essential that the three
+xor-like PRNGs used for our computation have different parameters. So we chose
+them randomly with another PRNG. As the initialisation is performed by the CPU,
+we have chosen to use the ISAAC PRNG to initalize all the parameters for the GPU
+version of our PRNG. The implementation of the three xor-like PRNGs is
+straightforward as soon as their parameters have been allocated in the GPU
+memory. Each xor-like PRNGs used works with an internal number $x$ which keeps
+the last generated random numbers. Other internal variables are also used by the
+xor-like PRNGs. More precisely, the implementation of the xor128, the xorshift
+and the xorwow respectively require 4, 5 and 6 unsigned long as internal
+variables.
+
+\begin{algorithm}
+
+\KwIn{InternalVarXorLikeArray: array with internal variables of the 3 xor-like PRNGs in global memory\;
+NumThreads: Number of threads\;}
+\KwOut{NewNb: array containing random numbers in global memory}
+\If{threadId is concerned} {
+ retrieve data from InternalVarXorLikeArray[threadId] in local variables\;
+ \For{i=1 to n} {
+ compute a new PRNG as in Listing\ref{algo:seqCIprng}\;
+ store the new PRNG in NewNb[NumThreads*threadId+i]\;
+ }
+ store internal variables in InternalVarXorLikeArray[threadId]\;
+}
+
+\caption{main kernel for the chaotic iterations based PRNG GPU version}
+\label{algo:gpu_kernel}
+\end{algorithm}
+
+According to the available memory in the GPU and the number of threads used
+simultenaously, the number of random numbers that a thread can generate inside a
+kernel is limited, i.e. the variable \texttt{n} in
+algorithm~\ref{algo:gpu_kernel}. For example, if $100,000$ threads are used and
+if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}
+then the memory required to store internals variables of xor-like
+PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
+and random number of our PRNG is equals to $100,000\times ((4+5+6)\times
+2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb.
\section{Experiments}