more

[prng_gpu.git] / prng_gpu.tex
diff --git a/prng_gpu.tex b/prng_gpu.tex

index bc5b3e55f5d7896f4374abcec6c8f56b0b4f3ee8..23fc7769d9be3949932bfdcb8a2fb8856d42abe6 100644 (file)
--- a/prng_gpu.tex
+++ b/prng_gpu.tex
@@ -895,7 +895,7 @@ which represent the indexes of the  other threads for which the results are used
  by the  current thread. In  the algorithm, we  consider that a  64-bits xor-like
  PRNG is used, that is why both 32-bits parts are used.
  
  by the  current thread. In  the algorithm, we  consider that a  64-bits xor-like
  PRNG is used, that is why both 32-bits parts are used.
  
-This version also succeed to the BigCrush batteries of tests.
+This version also succeeds to the {\it BigCrush} batteries of tests.
  
  \begin{algorithm}
  
  
  \begin{algorithm}
  
@@ -956,18 +956,22 @@ Devaney's formulation of a chaotic behavior.
  
  Different experiments  have been  performed in order  to measure  the generation
  speed. We have used  a computer equiped with Tesla C1060 NVidia  GPU card and an
  
  Different experiments  have been  performed in order  to measure  the generation
  speed. We have used  a computer equiped with Tesla C1060 NVidia  GPU card and an
-Intel Xeon E5530 cadenced at 2.40 GHz for our experiments.
+Intel  Xeon E5530 cadenced  at 2.40  GHz for  our experiments  and we  have used
+another one  equipped with  a less performant  CPU and  a GeForce GTX  280. Both
+cards have 240 cores.
  
  In Figure~\ref{fig:time_gpu}  we compare the number of  random numbers generated
  
  In Figure~\ref{fig:time_gpu}  we compare the number of  random numbers generated
-per second.   In order  to obtain the  optimal number  we remove the  storage of
+per second.  In order to obtain the optimal performance we remove the storage of
  random numbers  in the GPU memory. This  step is time consumming  and slows down
  the random number  generation.  Moreover, if you are  interested by applications
  random numbers  in the GPU memory. This  step is time consumming  and slows down
  the random number  generation.  Moreover, if you are  interested by applications
-that consome  random number directly when  they are generated,  their storage is
+that consome random  numbers directly when they are  generated, their storage is
  completely useless. In this figure we can see that when the number of threads is
  greater than approximately  30,000 upto 5 millions the  number of random numbers
  generated per second is almost constant.   With the naive version, it is between
  completely useless. In this figure we can see that when the number of threads is
  greater than approximately  30,000 upto 5 millions the  number of random numbers
  generated per second is almost constant.   With the naive version, it is between
-2.5  and  3GSample/s.   With the  optimized  version,  it  is almost  equals  to
-20GSample/s.
+2.5 and 3GSample/s.   With the optimized version, it  is approximately equals to
+20GSample/s. Finally  we can remark  that both GPU  cards are quite  similar. In
+practice,  the Tesla C1060  has more  memory than  the GTX  280 and  this memory
+should be of better quality.
  
  \begin{figure}[htbp]
  \begin{center}
  
  \begin{figure}[htbp]
  \begin{center}
@@ -978,11 +982,10 @@ generated per second is almost constant.   With the naive version, it is between
  \end{figure}
  
  
  \end{figure}
  
  
-First of all we have compared the time to generate X random numbers with both
-the CPU version and the GPU version. 
+In  comparison,   Listing~\ref{algo:seqCIprng}  allows  us   to  generate  about
+138MSample/s with only one core of the Xeon E5530.
+
  
  
-Faire une courbe du nombre de random en fonction du nombre de threads,
-éventuellement en fonction du nombres de threads par bloc.