new

[book_gpu.git] / BookGPU / Chapters / chapter12 / ch12.tex
diff --git a/BookGPU/Chapters/chapter12/ch12.tex b/BookGPU/Chapters/chapter12/ch12.tex

index 0269fd28a3957ccd5faaf8dec9ee15c92eee3703..4fe0eb97b97f30755ed469d87a43ee7b2a94bcbd 100755 (executable)
--- a/BookGPU/Chapters/chapter12/ch12.tex
+++ b/BookGPU/Chapters/chapter12/ch12.tex
@@ -457,8 +457,8 @@ nodes\index{neighboring node} over the GPU cluster must exchange between them th
  elements necessary to compute this multiplication. First, each computing node determines, in its
  local subvector, the vector elements needed by other nodes. Then, the neighboring nodes exchange
  between them these shared vector elements. The data exchanges are implemented by using the MPI
  elements necessary to compute this multiplication. First, each computing node determines, in its
  local subvector, the vector elements needed by other nodes. Then, the neighboring nodes exchange
  between them these shared vector elements. The data exchanges are implemented by using the MPI
-point-to-point communication routines: blocking\index{MPI subroutines!blocking} sends with \verb+MPI_Send()+
-and nonblocking\index{MPI subroutines!nonblocking} receives with \verb+MPI_Irecv()+. Figure~\ref{ch12:fig:02}
+point-to-point communication routines: blocking\index{MPI!blocking} sends with \verb+MPI_Send()+
+and nonblocking\index{MPI!nonblocking} receives with \verb+MPI_Irecv()+. Figure~\ref{ch12:fig:02}
  shows an example of data exchanges between \textit{Node 1} and its neighbors \textit{Node 0}, \textit{Node 2},
  and \textit{Node 3}. In this example, the iterate matrix $A$ split between these four computing
  nodes is that presented in Figure~\ref{ch12:fig:01}.
  shows an example of data exchanges between \textit{Node 1} and its neighbors \textit{Node 0}, \textit{Node 2},
  and \textit{Node 3}. In this example, the iterate matrix $A$ split between these four computing
  nodes is that presented in Figure~\ref{ch12:fig:01}.
@@ -491,7 +491,7 @@ cluster. Consequently, the vector elements to be exchanged must be copied from t
  and vice versa before and after the synchronization operation between CPUs. We have used the CUBLAS\index{CUBLAS}
  communication subroutines to perform the data transfers between a CPU core and its GPU: \verb+cublasGetVector()+
  and \verb+cublasSetVector()+. Finally, in addition to the data exchanges, GPU nodes perform reduction operations
  and vice versa before and after the synchronization operation between CPUs. We have used the CUBLAS\index{CUBLAS}
  communication subroutines to perform the data transfers between a CPU core and its GPU: \verb+cublasGetVector()+
  and \verb+cublasSetVector()+. Finally, in addition to the data exchanges, GPU nodes perform reduction operations
-to compute in parallel the dot products and Euclidean norms. This is implemented by using the MPI global communication\index{MPI subroutines!global}
+to compute in parallel the dot products and Euclidean norms. This is implemented by using the MPI global communication\index{MPI!global}
  \verb+MPI_Allreduce()+.
  
  
  \verb+MPI_Allreduce()+.