+According to the available memory in the GPU and the number of threads used
+simultenaously, the number of random numbers that a thread can generate inside a
+kernel is limited, i.e. the variable \texttt{n} in
+algorithm~\ref{algo:gpu_kernel}. For example, if $100,000$ threads are used and
+if $n=100$\footnote{in fact, we need to add the initial seed (a 32-bits number)}
+then the memory required to store internals variables of xor-like
+PRNGs\footnote{we multiply this number by $2$ in order to count 32-bits numbers}
+and random number of our PRNG is equals to $100,000\times ((4+5+6)\times
+2+(1+100))=1,310,000$ 32-bits numbers, i.e. about $52$Mb.
+
+All the tests performed to pass the BigCrush of TestU01 succeeded. Different
+number of threads have been tested upto $10$ millions.
+
+\begin{remark}
+Algorithm~\ref{algo:gpu_kernel} has the advantage to manipulate independent
+PRNGs, so this version is easily usable on a cluster of computer. The only thing
+to ensure is to use a single ISAAC PRNG. For this, a simple solution consists in
+using a master node for the initialization which computes the initial parameters
+for all the differents nodes involves in the computation.
+\end{remark}
+
+\subsection{Version more suited to GPU}
+
+As GPU offers shared memory mechanism between threads of the same block, it is
+possible to use this in order to simplify the previous algorithm, i.e. using
+less than 3 xor-like PRNGs. The solution consists in
+
+ threads of the same block compute a random
+number and uses other random numbers of
+