X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/1874c46934f4ba7e8c2013d3829f65309456d292..063fd4437e9bfbefc2f6ed6c932744bb20514751:/BookGPU/Chapters/chapter6/PartieAsync.tex?ds=sidebyside diff --git a/BookGPU/Chapters/chapter6/PartieAsync.tex b/BookGPU/Chapters/chapter6/PartieAsync.tex index 0253c9c..1617ffb 100644 --- a/BookGPU/Chapters/chapter6/PartieAsync.tex +++ b/BookGPU/Chapters/chapter6/PartieAsync.tex @@ -6,7 +6,7 @@ In the previous section, we have seen how to efficiently implement overlap of computations (CPU and GPU) with communications (GPU transfers and internode communications). However, we have previously shown that for some parallel iterative algorithms, it is sometimes even more efficient to use an asynchronous -scheme of iterations\index{iterations asynchronous} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do +scheme of iterations\index{asynchronous iterations} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do not wait for each other but they perform their iterations using the last external data they have received from the other nodes, even if this data was produced \emph{before} the previous iteration on the other nodes. @@ -139,7 +139,7 @@ communication libraries such as MPI are not systematically performed in parallel the computations~\cite{ChVCV13,Hoefler08a}. So, the logical and classical way to implement such an overlap is to use three threads: one for computing, one for sending, and one for receiving. Moreover, since -the communication is performed by threads, blocking synchronous communications\index{MPI!communication!blocking}\index{MPI!communication!synchronous} +the communication is performed by threads, blocking synchronous communications\index{MPI!blocking}\index{MPI!synchronous} can be used without deteriorating the overall performance. In this basic version, the termination\index{termination} of the global process is performed @@ -621,7 +621,7 @@ execution. They are similar to the mechanism used for managing the end messages at the end of the entire process. Line~23 directly updates the number of other nodes that are in local convergence by adding the received state of the source node. This is possible due to the encoding that is used to -represent the local convergence (1) and the non convergence (0). +represent the local convergence (1) and the nonconvergence (0). %\begin{algorithm}[H] % \caption{Reception function in the synchronized scheme.} @@ -648,7 +648,7 @@ while(!Finished){ case tagState: // Management of local state messages // Actual reception of the message MPI_Recv(&recvdState, 1, MPI_CHAR, status.MPI_SOURCE, tagState, MPI_COMM_WORLD, &status); - // Updates of numbers of stabilized nodes and received state msgs + // Updates of numbers of stabilized nodes and recvd state msgs nbOtherCVs += recvdState; nbStateMsg++; // Unlocking of the computing thread when states of all other @@ -1174,7 +1174,7 @@ account in the main computations when it is relevant. So, the Newton process should be accelerated a little bit. We compare the performance obtained with overlapped Jacobian updatings and -non overlapped ones for several problem sizes (see~\Fig{fig:ch6p2aux}). +nonoverlapped ones for several problem sizes (see~\Fig{fig:ch6p2aux}). \begin{figure}[h] \centering \includegraphics[width=.75\columnwidth]{Chapters/chapter6/curves/recouvs.pdf}