1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
4 \newcommand{\Sec}[1]{Section~\ref{#1}}
5 \newcommand{\Fig}[1]{Figure~\ref{#1}}
6 \newcommand{\Alg}[1]{Algorithm~\ref{#1}}
7 \newcommand{\Lst}[1]{Listing~\ref{#1}}
8 \newcommand{\Tab}[1]{Table~\ref{#1}}
9 \newcommand{\Equ}[1]{(\ref{#1})}
10 \def\Reals{\mathbb{R}}
12 %\newenvironment{Algo}{\vspace{-1em}\begin{center}\begin{minipage}[h]{0.95\columnwidth}\begin{shaded}\begin{tabbing}%
13 % \hspace{3mm}\=\hspace{3mm}\=\hspace{3mm}\=\hspace{3mm}\=\hspace{3mm}\=\hspace{3mm}\=\hspace{3mm}\= \kill} %
14 % { \end{tabbing}\vspace{-1em}\end{shaded}\end{minipage}\end{center}\vspace{-1em}}
16 \lstnewenvironment{Listing}[2]{\lstset{
17 %% basicstyle=\scriptsize\ttfamily,%
18 %% breaklines=true, breakatwhitespace=true, language=C, keywordstyle=\color{black},%
19 %% prebreak = \raisebox{0ex}[0ex][0ex]{\ensuremath{\hookleftarrow}},%
20 %% commentstyle=\textit, numbersep=1em, numberstyle=\tiny, numbers=left,%
21 %% numberblanklines=false, mathescape, escapechar=@,
22 escapechar=@, label=#1, caption={#2}}
30 \def\affect{$\leftarrow$ }
32 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
34 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
35 \chapterauthor{Sylvain Contassot-Vivier}{Université Lorraine, Loria UMR 7503 \& AlGorille INRIA Project Team, Nancy, France.}
36 \chapterauthor{Stephane Vialle}{SUPELEC, UMI GT-CNRS 2958 \& AlGorille INRIA Project Team, Metz, France.}
37 \chapterauthor{Jens Gustedt}{INRIA Nancy--Grand Est, AlGorille INRIA Project Team, Strasbourg, France.}
39 \chapter{Development methodologies for GPU and cluster of GPUs}
42 \input{Chapters/chapter6/Intro}
44 % Partie 1 : CUDA - MPI synchrone avec recouvrement
45 \input{Chapters/chapter6/PartieSync}
47 % Partie 2 : CUDA - MPI asynchrone avec recouvrement
48 \input{Chapters/chapter6/PartieAsync}
50 % Partie 6 : Analyse prospective
51 \input{Chapters/chapter6/PartieORWL}
54 \input{Chapters/chapter6/Conclu}
59 \item[AIAC] Asynchronous Iterations and Asynchronous Communications.
60 \item[Asynchronous iterations] iterative process where each element is updated
61 without waiting for the last updates of the other elements.
62 \item[Auxiliary computations] optional computations performed in parallel to the
63 main computations and used to complete them or speed them up.
64 \item[BSP parallel scheme] bulk Synchronous Parallel, a parallel model that uses
65 a repeated pattern (superstep) composed of computation, communication, barrier.
66 \item[GPU stream] serialized data transfers and computations performed on a same
68 \item[Message loss/miss] can be said about a message that is either not
69 sent or sent but not received (possible with unreliable communication protocols).
70 \item[Message stamping] inclusion of a specific value in messages of the same tag to
71 distinguish them (kind of secondary tag).
72 \item[ORWL] Ordered Read-Write Locks, a programming tool proposing a unified
74 \item[Page-locked data] data that are locked in cache memory to ensure fast accesses.
75 \item[Residual] difference between results of consecutive iterations in an
77 \item[Streamed GPU sequence] GPU transfers and computations performed
78 simultaneously via distinct GPU streams.
83 \putbib[Chapters/chapter6/biblio6]
88 %%% ispell-dictionary: "american"
90 %%% TeX-master: "../../BookGPU.tex"