the computations~\cite{ChVCV13,Hoefler08a}. So, the logical and classical way
to implement such an overlap is to use three threads: one for
computing, one for sending, and one for receiving. Moreover, since
-the communication is performed by threads, blocking synchronous communications\index{MPI!communication!blocking}\index{MPI!communication!synchronous}
+the communication is performed by threads, blocking synchronous communications\index{MPI!blocking}\index{MPI!synchronous}
can be used without deteriorating the overall performance.
In this basic version, the termination\index{termination} of the global process is performed