When CPU/GPU data transfers are not negligible compared to GPU computations, it
can be interesting to overlap internode CPU computations with a \emph{GPU
- sequence}\index{GPU sequence} including CPU/GPU data transfers and GPU computations (see
+ sequence}\index{GPU!sequence} including CPU/GPU data transfers and GPU computations (see
\Fig{fig:ch6p1overlapseqsequence}). Algorithmic issues of this approach are basic,
but their implementation requires explicit CPU multithreading and
synchronization, and CPU data buffer duplication. We need to implement two
\Lst{algo:ch6p1overlapstreamsequence} introduces the generic MPI+OpenMP+CUDA
code, explicitly overlapping MPI communications with
-streamed GPU sequences\index{GPU sequence!streamed}.
+streamed GPU sequences\index{GPU!streamed sequence}.
%\begin{algorithm}
% \caption{Generic scheme explicitly overlapping MPI communications with streamed sequences of CUDA