new

[book_gpu.git] / BookGPU / Chapters / chapter5 / ch5.tex
diff --git a/BookGPU/Chapters/chapter5/ch5.tex b/BookGPU/Chapters/chapter5/ch5.tex

index 0a043525937c624788773aabc6c8f6efccbc35e4..3a4942dc50fc37e0553dbda4bd9cacf5f79b07a0 100644 (file)
--- a/BookGPU/Chapters/chapter5/ch5.tex
+++ b/BookGPU/Chapters/chapter5/ch5.tex
@@ -450,6 +450,8 @@ If grid ghost layers are updated whenever information from adjacent subdomains i
  
  Distributed performance for the finite difference stencil operation is illustrated in Figure \ref{ch5:fig:multigpu}. The timings include the compute time for the finite difference approximation and the time for updating ghost layers via message passing. It is obvious from Figure \ref{ch5:fig:multigpu:a} that communication overhead dominates for the smallest problem sizes, where the non distributed grid (1 GPU) is fastest. However, communication overhead does not grow as rapidly as computation times, due to the surface-to-volume ratio. Therefore message passing becomes less influential for large problems, where reasonable performance speedups are obtained. Figure \ref{ch5:fig:multigpu:b} demonstrates how the computational performance on multi-GPU systems can be significantly improved for various stencil sizes. With this simple domain decomposition technique, developers are able to implement applications based on heterogeneous distributed computing, without explicitly dealing with message passing and it is still possible to provide user specific implementations of the topology class for customized grid updates.
  
  
  Distributed performance for the finite difference stencil operation is illustrated in Figure \ref{ch5:fig:multigpu}. The timings include the compute time for the finite difference approximation and the time for updating ghost layers via message passing. It is obvious from Figure \ref{ch5:fig:multigpu:a} that communication overhead dominates for the smallest problem sizes, where the non distributed grid (1 GPU) is fastest. However, communication overhead does not grow as rapidly as computation times, due to the surface-to-volume ratio. Therefore message passing becomes less influential for large problems, where reasonable performance speedups are obtained. Figure \ref{ch5:fig:multigpu:b} demonstrates how the computational performance on multi-GPU systems can be significantly improved for various stencil sizes. With this simple domain decomposition technique, developers are able to implement applications based on heterogeneous distributed computing, without explicitly dealing with message passing and it is still possible to provide user specific implementations of the topology class for customized grid updates.
  
+\clearpage
+
  % TODO: Should we put in the DD algebra?
  
  \begin{figure}[!htb]
  % TODO: Should we put in the DD algebra?
  
  \begin{figure}[!htb]
@@ -633,7 +635,7 @@ from the Danish Research Council for Technology and Production Sciences. A speci
  %\cite{ch5:Vandevoorde2002}
  %\cite{ch5:Bell2011}
  %\cite{ch5:mooreslaw1965}
  %\cite{ch5:Vandevoorde2002}
  %\cite{ch5:Bell2011}
  %\cite{ch5:mooreslaw1965}
-
+\clearpage
  \putbib[Chapters/chapter5/biblio5]
  
  % Reset lst label and caption
  \putbib[Chapters/chapter5/biblio5]
  
  % Reset lst label and caption