+It is possible to build a cryptographically secure PRNG based on the previous
+algorithm (Algorithm~\ref{algo:gpu_kernel2}). Due to Proposition~\ref{cryptopreuve},
+it simply consists in replacing
+the {\it xor-like} PRNG by a cryptographically secure one.
+We have chosen the Blum Blum Shum generator~\cite{BBS} (usually denoted by BBS) having the form:
+$$x_{n+1}=x_n^2~ mod~ M$$ where $M$ is the product of two prime numbers (these
+prime numbers need to be congruent to 3 modulus 4). BBS is known to be
+very slow and only usable for cryptographic applications.
+
+
+The modulus operation is the most time consuming operation for current
+GPU cards. So in order to obtain quite reasonable performances, it is
+required to use only modulus on 32 bits integer numbers. Consequently
+$x_n^2$ need to be lesser than $2^{32}$, and thus the number $M$ must be
+lesser than $2^{16}$. So in practice we can choose prime numbers around
+256 that are congruent to 3 modulus 4. With 32 bits numbers, only the
+4 least significant bits of $x_n$ can be chosen (the maximum number of
+indistinguishable bits is lesser than or equals to
+$log_2(log_2(M))$). In other words, to generate a 32 bits number, we need to use
+8 times the BBS algorithm with possibly different combinations of $M$. This
+approach is not sufficient to be able to pass all the TestU01,
+as small values of $M$ for the BBS lead to
+ small periods. So, in order to add randomness we proceed with
+the followings modifications.
+\begin{itemize}
+\item
+Firstly, we define 16 arrangement arrays instead of 2 (as described in
+Algorithm \ref{algo:gpu_kernel2}), but only 2 of them are used at each call of
+the PRNG kernels. In practice, the selection of combinations
+arrays to be used is different for all the threads. It is determined
+by using the three last bits of two internal variables used by BBS.
+%This approach adds more randomness.
+In Algorithm~\ref{algo:bbs_gpu},
+character \& is for the bitwise AND. Thus using \&7 with a number
+gives the last 3 bits, providing so a number between 0 and 7.
+\item
+Secondly, after the generation of the 8 BBS numbers for each thread, we
+have a 32 bits number whose period is possibly quite small. So
+to add randomness, we generate 4 more BBS numbers to
+shift the 32 bits numbers, and add up to 6 new bits. This improvement is
+described in Algorithm~\ref{algo:bbs_gpu}. In practice, the last 2 bits
+of the first new BBS number are used to make a left shift of at most
+3 bits. The last 3 bits of the second new BBS number are add to the
+strategy whatever the value of the first left shift. The third and the
+fourth new BBS numbers are used similarly to apply a new left shift
+and add 3 new bits.
+\item
+Finally, as we use 8 BBS numbers for each thread, the storage of these
+numbers at the end of the kernel is performed using a rotation. So,
+internal variable for BBS number 1 is stored in place 2, internal
+variable for BBS number 2 is stored in place 3, ..., and finally, internal
+variable for BBS number 8 is stored in place 1.
+\end{itemize}
+
+\begin{algorithm}
+
+\KwIn{InternalVarBBSArray: array with internal variables of the 8 BBS
+in global memory\;
+NumThreads: Number of threads\;
+tab: 2D Arrays containing 16 combinations (in first dimension) of size combination\_size (in second dimension)\;}
+
+\KwOut{NewNb: array containing random numbers in global memory}
+\If{threadId is concerned} {
+ retrieve data from InternalVarBBSArray[threadId] in local variables including shared memory and x\;
+ we consider that bbs1 ... bbs8 represent the internal states of the 8 BBS numbers\;
+ offset = threadIdx\%combination\_size\;
+ o1 = threadIdx-offset+tab[bbs1\&7][offset]\;
+ o2 = threadIdx-offset+tab[8+bbs2\&7][offset]\;
+ \For{i=1 to n} {
+ t<<=4\;
+ t|=BBS1(bbs1)\&15\;
+ ...\;
+ t<<=4\;
+ t|=BBS8(bbs8)\&15\;
+ //two new shifts\;
+ t<<=BBS3(bbs3)\&3\;
+ t|=BBS1(bbs1)\&7\;
+ t<<=BBS7(bbs7)\&3\;
+ t|=BBS2(bbs2)\&7\;
+ t=t $\wedge$ shmem[o1] $\wedge$ shmem[o2]\;
+ shared\_mem[threadId]=t\;
+ x = x $\wedge$ t\;
+
+ store the new PRNG in NewNb[NumThreads*threadId+i]\;
+ }
+ store internal variables in InternalVarXorLikeArray[threadId] using a rotation\;
+}
+
+\caption{main kernel for the BBS based PRNG GPU}
+\label{algo:bbs_gpu}
+\end{algorithm}
+
+In Algorithm~\ref{algo:bbs_gpu}, $n$ is for the quantity
+of random numbers that a thread has to generate.
+The operation t<<=4 performs a left shift of 4 bits
+on the variable $t$ and stores the result in $t$, and
+$BBS1(bbs1)\&15$ selects
+the last four bits of the result of $BBS1$.
+Thus an operation of the form $t<<=4; t|=BBS1(bbs1)\&15\;$
+realizes in $t$ a left shift of 4 bits, and then puts
+the 4 last bits of $BBS1(bbs1)$ in the four last
+positions of $t$.
+Let us remark that to initialize $t$ is not a necessity as we
+fill it 4 bits by 4 bits, until having obtained 32 bits.
+The two last new shifts are realized in order to enlarge
+the small periods of the BBS used here, to introduce a variability.
+In these operations, we make twice a left shift of $t$ of \emph{at most}
+3 bits and we put \emph{exactly} the 3 last bits from a BBS into
+the 3 last bits of $t$, leading possibly to a loss of a few
+bits of $t$.
+
+It should be noticed that this generator has another time the form $x^{n+1} = x^n \oplus S^n$,
+where $S^n$ is referred in this algorithm as $t$: each iteration of this
+PRNG ends with $x = x \wedge t;$. This $S^n$ is only constituted
+by secure bits produced by the BBS generator, and thus, due to
+Proposition~\ref{cryptopreuve}, the resulted PRNG is cryptographically
+secure
+
+
+
+\subsection{Toward a Cryptographically Secure and Chaotic Asymmetric Cryptosystem}
+
+We finish this research work by giving some thoughts about the use of
+the proposed PRNG in an asymmetric cryptosystem.
+This first approach will be further investigated in a future work.
+
+\subsubsection{Recalls of the Blum-Goldwasser Probabilistic Cryptosystem}
+
+The Blum-Goldwasser cryptosystem is a cryptographically secure asymmetric key encryption algorithm
+proposed in 1984~\cite{Blum:1985:EPP:19478.19501}. The encryption algorithm
+implements a XOR-based stream cipher using the BBS PRNG, in order to generate
+the keystream. Decryption is done by obtaining the initial seed thanks to
+the final state of the BBS generator and the secret key, thus leading to the
+ reconstruction of the keystream.
+
+The key generation consists in generating two prime numbers $(p,q)$,
+randomly and independently of each other, that are
+ congruent to 3 mod 4, and to compute the modulus $N=pq$.
+The public key is $N$, whereas the secret key is the factorization $(p,q)$.
+
+
+Suppose Bob wishes to send a string $m=(m_0, \dots, m_{L-1})$ of $L$ bits to Alice:
+\begin{enumerate}
+\item Bob picks an integer $r$ randomly in the interval $\llbracket 1,N\rrbracket$ and computes $x_0 = r^2~mod~N$.
+\item He uses the BBS to generate the keystream of $L$ pseudorandom bits $(b_0, \dots, b_{L-1})$, as follows. For $i=0$ to $L-1$,
+\begin{itemize}
+\item $i=0$.
+\item While $i \leqslant L-1$:
+\begin{itemize}
+\item Set $b_i$ equal to the least-significant\footnote{BBS can securely output up to $\mathsf{N} = \lfloor log(log(N)) \rfloor$ of the least-significant bits of $x_i$ during each round.} bit of $x_i$,
+\item $i=i+1$,
+\item $x_i = (x_{i-1})^2~mod~N.$
+\end{itemize}
+\end{itemize}
+\item The ciphertext is computed by XORing the plaintext bits $m$ with the keystream: $ c = (c_0, \dots, c_{L-1}) = m \oplus b$. This ciphertext is $[c, y]$, where $y=x_{0}^{2^{L}}~mod~N.$
+\end{enumerate}
+
+
+When Alice receives $\left[(c_0, \dots, c_{L-1}), y\right]$, she can recover $m$ as follows:
+\begin{enumerate}
+\item Using the secret key $(p,q)$, she computes $r_p = y^{((p+1)/4)^{L}}~mod~p$ and $r_q = y^{((q+1)/4)^{L}}~mod~q$.
+\item The initial seed can be obtained using the following procedure: $x_0=q(q^{-1}~{mod}~p)r_p + p(p^{-1}~{mod}~q)r_q~{mod}~N$.
+\item She recomputes the bit-vector $b$ by using BBS and $x_0$.
+\item Alice computes finally the plaintext by XORing the keystream with the ciphertext: $ m = c \oplus b$.
+\end{enumerate}
+
+
+\subsubsection{Proposal of a new Asymmetric Cryptosystem Adapted from Blum-Goldwasser}
+
+We propose to adapt the Blum-Goldwasser protocol as follows.
+Let $\mathsf{N} = \lfloor log(log(N)) \rfloor$ be the number of bits that can
+be obtained securely with the BBS generator using the public key $N$ of Alice.
+Alice will pick randomly $S^0$ in $\llbracket 0, 2^{\mathsf{N}-1}\rrbracket$ too, and
+her new public key will be $(S^0, N)$.
+
+To encrypt his message, Bob will compute
+\begin{equation}
+c = \left(m_0 \oplus (b_0 \oplus S^0), m_1 \oplus (b_0 \oplus b_1 \oplus S^0), \hdots, m_{L-1} \oplus (b_0 \oplus b_1 \hdots \oplus b_{L-1} \oplus S^0) \right)
+\end{equation}
+instead of $\left(m_0 \oplus b_0, m_1 \oplus b_1, \hdots, m_{L-1} \oplus b_{L-1} \right)$.
+
+The same decryption stage as in Blum-Goldwasser leads to the sequence
+$\left(m_0 \oplus S^0, m_1 \oplus S^0, \hdots, m_{L-1} \oplus S^0 \right)$.
+Thus, with a simple use of $S^0$, Alice can obtained the plaintext.
+By doing so, the proposed generator is used in place of BBS, leading to
+the inheritance of all the properties presented in this paper.