new

[book_gpu.git] / BookGPU / Chapters / chapter6 / PartieAsync.tex
diff --git a/BookGPU/Chapters/chapter6/PartieAsync.tex b/BookGPU/Chapters/chapter6/PartieAsync.tex

index 3365b41039ab07ff2b0b8232e4f438ceb9a04d64..1623f8c721bbdaa824950c04b7257ccc65ad74e3 100644 (file)
--- a/BookGPU/Chapters/chapter6/PartieAsync.tex
+++ b/BookGPU/Chapters/chapter6/PartieAsync.tex
@@ -6,7 +6,7 @@ In the previous section, we have seen how to efficiently implement overlap of
  computations (CPU and GPU) with communications (GPU transfers and internode
  communications).  However, we have previously shown that for some parallel
  iterative algorithms, it is sometimes even more efficient to use an asynchronous
  computations (CPU and GPU) with communications (GPU transfers and internode
  communications).  However, we have previously shown that for some parallel
  iterative algorithms, it is sometimes even more efficient to use an asynchronous
-scheme of iterations\index{iterations!asynchronous} \cite{HPCS2002,ParCo05,Para10}.  In that case, the nodes do
+scheme of iterations\index{asynchronous iterations} \cite{HPCS2002,ParCo05,Para10}.  In that case, the nodes do
  not wait for each other but they perform their iterations using the last
  external data they have received from the other nodes, even if this
  data was produced \emph{before} the previous iteration on the other nodes.
  not wait for each other but they perform their iterations using the last
  external data they have received from the other nodes, even if this
  data was produced \emph{before} the previous iteration on the other nodes.
@@ -139,7 +139,7 @@ communication libraries such as MPI are not systematically performed in parallel
  the computations~\cite{ChVCV13,Hoefler08a}.  So, the logical and classical way
  to implement such an overlap is to use three threads: one for
  computing, one for sending, and one for receiving. Moreover, since
  the computations~\cite{ChVCV13,Hoefler08a}.  So, the logical and classical way
  to implement such an overlap is to use three threads: one for
  computing, one for sending, and one for receiving. Moreover, since
-the communication is performed by threads, blocking synchronous communications\index{MPI!communication!blocking}\index{MPI!communication!synchronous}
+the communication is performed by threads, blocking synchronous communications\index{MPI!blocking}\index{MPI!synchronous}
  can be used without deteriorating the overall performance.
  
  In this basic version, the termination\index{termination} of the global process is performed
  can be used without deteriorating the overall performance.
  
  In this basic version, the termination\index{termination} of the global process is performed
@@ -648,7 +648,7 @@ while(!Finished){
        case tagState: // Management of local state messages
         // Actual reception of the message
         MPI_Recv(&recvdState, 1, MPI_CHAR, status.MPI_SOURCE, tagState, MPI_COMM_WORLD, &status); 
        case tagState: // Management of local state messages
         // Actual reception of the message
         MPI_Recv(&recvdState, 1, MPI_CHAR, status.MPI_SOURCE, tagState, MPI_COMM_WORLD, &status); 
-       // Updates of numbers of stabilized nodes and received state msgs 
+       // Updates of numbers of stabilized nodes and recvd state msgs 
         nbOtherCVs += recvdState;
         nbStateMsg++;
         // Unlocking of the computing thread when states of all other 
         nbOtherCVs += recvdState;
         nbStateMsg++;
         // Unlocking of the computing thread when states of all other 
@@ -887,7 +887,7 @@ the CPU may vary depending on the application. For example, when processing data
  streams (pipelines), pre-processing of the next data item and/or post-processing
  of the previous result can be done on the CPU while the GPU is processing the current
  data item.  In other cases, the CPU can perform \emph{auxiliary}
  streams (pipelines), pre-processing of the next data item and/or post-processing
  of the previous result can be done on the CPU while the GPU is processing the current
  data item.  In other cases, the CPU can perform \emph{auxiliary}
-computations\index{computation!auxiliary}
+computations\index{computation auxiliary}
  that are not absolutely required to obtain the result but that may accelerate
  the entire iterative process.  Another possibility would be to distribute the
  main computations between the GPU and CPU. However, this
  that are not absolutely required to obtain the result but that may accelerate
  the entire iterative process.  Another possibility would be to distribute the
  main computations between the GPU and CPU. However, this