computations (CPU and GPU) with communications (GPU transfers and internode
communications). However, we have previously shown that for some parallel
iterative algorithms, it is sometimes even more efficient to use an asynchronous
-scheme of iterations\index{iterations asynchronous} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do
+scheme of iterations\index{asynchronous iterations} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do
not wait for each other but they perform their iterations using the last
external data they have received from the other nodes, even if this
data was produced \emph{before} the previous iteration on the other nodes.
the computations~\cite{ChVCV13,Hoefler08a}. So, the logical and classical way
to implement such an overlap is to use three threads: one for
computing, one for sending, and one for receiving. Moreover, since
-the communication is performed by threads, blocking synchronous communications\index{MPI!communication!blocking}\index{MPI!communication!synchronous}
+the communication is performed by threads, blocking synchronous communications\index{MPI!blocking}\index{MPI!synchronous}
can be used without deteriorating the overall performance.
In this basic version, the termination\index{termination} of the global process is performed
case tagState: // Management of local state messages
// Actual reception of the message
MPI_Recv(&recvdState, 1, MPI_CHAR, status.MPI_SOURCE, tagState, MPI_COMM_WORLD, &status);
- // Updates of numbers of stabilized nodes and received state msgs
+ // Updates of numbers of stabilized nodes and recvd state msgs
nbOtherCVs += recvdState;
nbStateMsg++;
// Unlocking of the computing thread when states of all other