computations (CPU and GPU) with communications (GPU transfers and internode
communications). However, we have previously shown that for some parallel
iterative algorithms, it is sometimes even more efficient to use an asynchronous
-scheme of iterations\index{iterations!asynchronous} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do
+scheme of iterations\index{iterations asynchronous} \cite{HPCS2002,ParCo05,Para10}. In that case, the nodes do
not wait for each other but they perform their iterations using the last
external data they have received from the other nodes, even if this
data was produced \emph{before} the previous iteration on the other nodes.
streams (pipelines), pre-processing of the next data item and/or post-processing
of the previous result can be done on the CPU while the GPU is processing the current
data item. In other cases, the CPU can perform \emph{auxiliary}
-computations\index{computation!auxiliary}
+computations\index{computation auxiliary}
that are not absolutely required to obtain the result but that may accelerate
the entire iterative process. Another possibility would be to distribute the
main computations between the GPU and CPU. However, this