new

[book_gpu.git] / BookGPU / Chapters / chapter5 / ch5.tex
diff --git a/BookGPU/Chapters/chapter5/ch5.tex b/BookGPU/Chapters/chapter5/ch5.tex

index f78057785a08cb6a0a12bc5123e03ecf6675fce7..49cc9c2c30e6c11bceb6e580b9919995e11efea2 100644 (file)
--- a/BookGPU/Chapters/chapter5/ch5.tex
+++ b/BookGPU/Chapters/chapter5/ch5.tex
@@ -144,7 +144,7 @@ c_{20} & c_{21} & c_{22} \\
  Matrix components precompute these compact stencil coefficients and provides member functions that computes the finite difference approximation of input vectors. Unit scaled coefficients (assuming grid spacing is one) are computed and stored to be accessible via both CPU and GPU memory. On the GPU, the constant memory space is used for faster memory access~\cite{ch5:cudaguide}. In order to apply a stencil on a non unit-spaced grid, with grid space $\Delta x$, the scale factor $1/(\Delta x)^q$ will have to be multiplied by the finite difference sum, i.e., $(c_{00}u_0 + c_{01}u_1 + c_{02}u_2)/(\Delta x)^q \approx u^{(q)}_0$ as in the first row of \eqref{ch5:eq:stencilmatrix}.
  
  Setting up a two-dimensional grid of size $N_x \times N_y$ in the unit square and computing the first derivative hereof is illustrated in Listing~\ref{ch5:lst:stencil}. The grid is a vector component, derived from the vector class. It is by default treated as a device object and memory is automatically allocated on the device to fit the grid size. The finite difference approximation as in \eqref{ch5:eq:fdstencil}, is performed via a CUDA kernel behind the scenes during the calls to \texttt{mult} and \texttt{diff\_x}, utilizing the memory hierarchy as the CUDA guidelines prescribe~\cite{ch5:cudaguide,ch5:cudapractice}. To increase developer productivity, kernel launch configurations have default settings, based on CUDA guidelines, principles, and experiences from performance testings, such that the user does not have to explicitly specify them. For problem-specific finite difference approximations, where the built-in stencil operators are insufficient, a pointer to the coefficient matrix \eqref{ch5:eq:stencilcoeffs} can be accessed as demonstrated in Listing \ref{ch5:lst:stencil} and passed to customized kernels.
  Matrix components precompute these compact stencil coefficients and provides member functions that computes the finite difference approximation of input vectors. Unit scaled coefficients (assuming grid spacing is one) are computed and stored to be accessible via both CPU and GPU memory. On the GPU, the constant memory space is used for faster memory access~\cite{ch5:cudaguide}. In order to apply a stencil on a non unit-spaced grid, with grid space $\Delta x$, the scale factor $1/(\Delta x)^q$ will have to be multiplied by the finite difference sum, i.e., $(c_{00}u_0 + c_{01}u_1 + c_{02}u_2)/(\Delta x)^q \approx u^{(q)}_0$ as in the first row of \eqref{ch5:eq:stencilmatrix}.
  
  Setting up a two-dimensional grid of size $N_x \times N_y$ in the unit square and computing the first derivative hereof is illustrated in Listing~\ref{ch5:lst:stencil}. The grid is a vector component, derived from the vector class. It is by default treated as a device object and memory is automatically allocated on the device to fit the grid size. The finite difference approximation as in \eqref{ch5:eq:fdstencil}, is performed via a CUDA kernel behind the scenes during the calls to \texttt{mult} and \texttt{diff\_x}, utilizing the memory hierarchy as the CUDA guidelines prescribe~\cite{ch5:cudaguide,ch5:cudapractice}. To increase developer productivity, kernel launch configurations have default settings, based on CUDA guidelines, principles, and experiences from performance testings, such that the user does not have to explicitly specify them. For problem-specific finite difference approximations, where the built-in stencil operators are insufficient, a pointer to the coefficient matrix \eqref{ch5:eq:stencilcoeffs} can be accessed as demonstrated in Listing \ref{ch5:lst:stencil} and passed to customized kernels.
-
+\pagebreak
  \lstinputlisting[label=ch5:lst:stencil,caption={two-dimensional finite difference stencil example: computing the first derivative using five points ($\alpha=\beta=2$) per dimension, a total nine-point stencil}]{Chapters/chapter5/code/ex2.cu}
  
  In the following sections we demonstrate how to go from an initial value problem (IVP) or a boundary value problem (BVP) to a working application solver by combining existing library components along with new custom-tailored components. We also demonstrate how to apply spatial and temporal domain decomposition strategies that can make existing solvers take advantage of systems equipped with multiple GPUs. Next section demonstrates how to rapidly assemble a PDE solver using library components.
  \lstinputlisting[label=ch5:lst:stencil,caption={two-dimensional finite difference stencil example: computing the first derivative using five points ($\alpha=\beta=2$) per dimension, a total nine-point stencil}]{Chapters/chapter5/code/ex2.cu}
  
  In the following sections we demonstrate how to go from an initial value problem (IVP) or a boundary value problem (BVP) to a working application solver by combining existing library components along with new custom-tailored components. We also demonstrate how to apply spatial and temporal domain decomposition strategies that can make existing solvers take advantage of systems equipped with multiple GPUs. Next section demonstrates how to rapidly assemble a PDE solver using library components.
@@ -188,7 +188,7 @@ We use a Method of Lines (MoL)\index{method of lines} approach to solve \eqref{c
  \begin{align}\label{ch5:eq:discreteheateq}
  \frac{\partial u}{\partial t} = \mymat{A}\myvec{u}, \qquad \mymat{A} \in \mathbb{R}^{N\times N}, \quad \myvec{u} \in \mathbb{R}^{N},
  \end{align}
  \begin{align}\label{ch5:eq:discreteheateq}
  \frac{\partial u}{\partial t} = \mymat{A}\myvec{u}, \qquad \mymat{A} \in \mathbb{R}^{N\times N}, \quad \myvec{u} \in \mathbb{R}^{N},
  \end{align}
-where $\mymat{A}$ is the sparse finite difference matrix and $N$ is the number of unknowns in the discrete system. The temporal derivative is now free to be approximated by any suitable choice of a time-integration method\index{time integration}. The most simple integration scheme would be the first-order accurate explicit forward Euler method\index{forward Euler},
+where $\mymat{A}$ is the sparse finite difference matrix and $N$ is the number of unknowns in the discrete system. The temporal derivative is now free to be approximated by any suitable choice of a time-integration method\index{time integration}. The most simple integration scheme would be the first-order accurate explicit forward Euler method\index{Euler!forward Euler},
  \begin{align}\label{ch5:eq:forwardeuler}
  \myvec{u}^{n+1} = \myvec{u}^n + \delta t\,\mymat{A}\myvec{u}^n,
  \end{align}
  \begin{align}\label{ch5:eq:forwardeuler}
  \myvec{u}^{n+1} = \myvec{u}^n + \delta t\,\mymat{A}\myvec{u}^n,
  \end{align}