-per second. In order to obtain the optimal performance we remove the storage of
-random numbers in the GPU memory. This step is time consumming and slows down
-the random number generation. Moreover, if you are interested by applications
-that consome random numbers directly when they are generated, their storage is
-completely useless. In this figure we can see that when the number of threads is
-greater than approximately 30,000 upto 5 millions the number of random numbers
-generated per second is almost constant. With the naive version, it is between
-2.5 and 3GSample/s. With the optimized version, it is approximately equals to
+per second. The xor-like prng is a xor64 described in~\cite{Marsaglia2003}. In
+order to obtain the optimal performance we remove the storage of random numbers
+in the GPU memory. This step is time consumming and slows down the random number
+generation. Moreover, if you are interested by applications that consome random
+numbers directly when they are generated, their storage is completely
+useless. In this figure we can see that when the number of threads is greater
+than approximately 30,000 upto 5 millions the number of random numbers generated
+per second is almost constant. With the naive version, it is between 2.5 and
+3GSample/s. With the optimized version, it is approximately equals to