X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/11f93a2e8880680f6b192298e5ce0697d2596a31..ecd01808b5702d940bd77107a2bf829d3832179b:/BookGPU/Chapters/chapter6/PartieAsync.tex?ds=sidebyside diff --git a/BookGPU/Chapters/chapter6/PartieAsync.tex b/BookGPU/Chapters/chapter6/PartieAsync.tex index 3365b41..3f4a539 100644 --- a/BookGPU/Chapters/chapter6/PartieAsync.tex +++ b/BookGPU/Chapters/chapter6/PartieAsync.tex @@ -6,7 +6,7 @@ In the previous section, we have seen how to efficiently implement overlap of computations (CPU and GPU) with communications (GPU transfers and internode communications). However, we have previously shown that for some parallel iterative algorithms, it is sometimes even more efficient to use an asynchronous -scheme of iterations\index{iterations!asynchronous} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do +scheme of iterations\index{asynchronous iterations} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do not wait for each other but they perform their iterations using the last external data they have received from the other nodes, even if this data was produced \emph{before} the previous iteration on the other nodes. @@ -139,7 +139,7 @@ communication libraries such as MPI are not systematically performed in parallel the computations~\cite{ChVCV13,Hoefler08a}. So, the logical and classical way to implement such an overlap is to use three threads: one for computing, one for sending, and one for receiving. Moreover, since -the communication is performed by threads, blocking synchronous communications\index{MPI!communication!blocking}\index{MPI!communication!synchronous} +the communication is performed by threads, blocking synchronous communications\index{MPI!blocking}\index{MPI!synchronous} can be used without deteriorating the overall performance. In this basic version, the termination\index{termination} of the global process is performed @@ -887,7 +887,7 @@ the CPU may vary depending on the application. For example, when processing data streams (pipelines), pre-processing of the next data item and/or post-processing of the previous result can be done on the CPU while the GPU is processing the current data item. In other cases, the CPU can perform \emph{auxiliary} -computations\index{computation!auxiliary} +computations\index{computation auxiliary} that are not absolutely required to obtain the result but that may accelerate the entire iterative process. Another possibility would be to distribute the main computations between the GPU and CPU. However, this