last

[book_gpu.git] / BookGPU / Chapters / chapter6 / PartieSync.tex
diff --git a/BookGPU/Chapters/chapter6/PartieSync.tex b/BookGPU/Chapters/chapter6/PartieSync.tex

index bc08557db0fb454e51939cb5e11e2e8603137479..ce215650b12551d15131ccdea204209de12cb70e 100755 (executable)
--- a/BookGPU/Chapters/chapter6/PartieSync.tex
+++ b/BookGPU/Chapters/chapter6/PartieSync.tex
@@ -97,7 +97,7 @@ parallel programming schemes on a GPU cluster:
  Using CUDA\index{CUDA}, GPU kernel executions are nonblocking, and GPU/CPU data
  transfers\index{CUDA!data transfer}
  are blocking or nonblocking operations. All GPU kernel executions and CPU/GPU
  Using CUDA\index{CUDA}, GPU kernel executions are nonblocking, and GPU/CPU data
  transfers\index{CUDA!data transfer}
  are blocking or nonblocking operations. All GPU kernel executions and CPU/GPU
-data transfers are associated to "streams,"\index{CUDA!stream} and all operations on a same stream
+data transfers are associated to ``streams'',\index{CUDA!stream} and all operations on a same stream
  are serialized. When transferring data from the CPU to the GPU, then running GPU
  computations, and finally transferring results from the GPU to the CPU, there is
  a natural synchronization and serialization if these operations are achieved on
  are serialized. When transferring data from the CPU to the GPU, then running GPU
  computations, and finally transferring results from the GPU to the CPU, there is
  a natural synchronization and serialization if these operations are achieved on
@@ -210,7 +210,7 @@ achieved serially and not overlapped.
  
  When CPU/GPU data transfers are not negligible compared to GPU computations, it
  can be interesting to overlap internode CPU computations with a \emph{GPU
  
  When CPU/GPU data transfers are not negligible compared to GPU computations, it
  can be interesting to overlap internode CPU computations with a \emph{GPU
-  sequence}\index{GPU sequence} including CPU/GPU data transfers and GPU computations (see
+  sequence}\index{GPU!sequence} including CPU/GPU data transfers and GPU computations (see
  \Fig{fig:ch6p1overlapseqsequence}). Algorithmic issues of this approach are basic,
  but their implementation requires explicit CPU multithreading and
  synchronization, and CPU data buffer duplication. We need to implement two
  \Fig{fig:ch6p1overlapseqsequence}). Algorithmic issues of this approach are basic,
  but their implementation requires explicit CPU multithreading and
  synchronization, and CPU data buffer duplication. We need to implement two
@@ -367,7 +367,7 @@ of the code.
  
  \Lst{algo:ch6p1overlapstreamsequence} introduces the generic MPI+OpenMP+CUDA
  code,  explicitly overlapping MPI communications with
  
  \Lst{algo:ch6p1overlapstreamsequence} introduces the generic MPI+OpenMP+CUDA
  code,  explicitly overlapping MPI communications with
-streamed GPU sequences\index{GPU sequence!streamed}.
+streamed GPU sequences\index{GPU!streamed sequence}.
  
  %\begin{algorithm}
  %  \caption{Generic scheme explicitly overlapping MPI communications with streamed sequences of CUDA
  
  %\begin{algorithm}
  %  \caption{Generic scheme explicitly overlapping MPI communications with streamed sequences of CUDA
@@ -489,7 +489,7 @@ working on  independent subsets of  data.  \Lst{algo:ch6p1overlapstreamsequence}
  is not so generic as \Lst{algo:ch6p1overlapseqsequence}.
  
  
  is not so generic as \Lst{algo:ch6p1overlapseqsequence}.
  
  
-\subsection{Interleaved communications-transfers-computations\\overlapping}
+\subsection{Interleaved communications-transfers-computations overlapping}
  
  Many algorithms do not support splitting data transfers and kernel calls, and
  cannot exploit CUDA streams, for example, when each GPU thread requires access to
  
  Many algorithms do not support splitting data transfers and kernel calls, and
  cannot exploit CUDA streams, for example, when each GPU thread requires access to