X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/32eca9a71cb97b720b022d9fa6f8e753368a2243..30d1d29de747eecaa732000749e6aaf46ed2a5a3:/BookGPU/Chapters/chapter7/ch7.tex diff --git a/BookGPU/Chapters/chapter7/ch7.tex b/BookGPU/Chapters/chapter7/ch7.tex index 6ca2080..a44cbb5 100644 --- a/BookGPU/Chapters/chapter7/ch7.tex +++ b/BookGPU/Chapters/chapter7/ch7.tex @@ -1,8 +1,8 @@ -\chapterauthor{Allan P. Engsig-Karup}{Technical University of Denmark} -\chapterauthor{Stefan L. Glimberg}{Technical University of Denmark} -\chapterauthor{Allan S. Nielsen}{Technical University of Denmark} -\chapterauthor{Ole Lindberg}{Technical University of Denmark} +\chapterauthor{Allan P. Engsig-Karup, Stefan L. Glimberg, Allan S. Nielsen and Ole Lindberg}{Technical University of Denmark} +%\chapterauthor{Stefan L. Glimberg}{Technical University of Denmark} +%\chapterauthor{Allan S. Nielsen}{Technical University of Denmark} +%\chapterauthor{Ole Lindberg}{Technical University of Denmark} \chapter{Fast hydrodynamics on heterogenous many-core hardware} \label{ch7} @@ -470,10 +470,10 @@ The CUDA-based numerical wave model has been developed based on all the numerica For the unified potential flow model the user will need to provide implementations of the following components; the right hand side operator for the semi-discrete free surface variables \eqref{ch7:FSorigin}, the matrix-vector operator for the discretized $\sigma$-transformed Laplace equation \eqref{ch7:TransformedLaplace}, a smoother for the multigrid relaxation step, and the potential flow solver itself, that reads initial data and advance the solution in time. In order to make the library as generic as possible, all components are template-based, which makes it possible to assemble the PDE solver by combining type definitions in the preamble of the application. An excerpt of the potential flow assembling is given in listing \ref{ch7:lst:solversetup}. -\lstset{label=ch7:lst:solversetup,caption={Generic assembling of the potential flow solver for fully nonlinear free surface water waves.} +\lstset{caption={Generic assembling of the potential flow solver for fully nonlinear free surface water waves.} %,basicstyle=\scriptsize } -\begin{lstlisting} +\begin{lstlisting}[label=ch7:lst:solversetup] // Basics typedef double value_type; typedef gpulab::grid vector_type; @@ -505,10 +505,10 @@ typedef free_surface::potential_flow_solver_3d potential_f Hereafter, the potential flow solver is aware of all component types that should be used to solve the entire PDE system, and it will be easy for developers to exchange parts at later times. The \texttt{laplace\_sigma\_stencil\_3d} class implements both the matrix-vector and right hand side operator. The flexible-order finite difference kernel for the matrix-free matrix-vector product for the two-dimensional Laplace problem is presented in listing \ref{ch7:lst:fd2d}. Library macros and reusable kernel routines are used throughout the implementations to enhance developer productivity and hide hardware specific details. This kernel can be used both for matrix-vector products for the original system and for the preconditioning. -\lstset{label=ch7:lst:fd2d,caption={CUDA kernel implementation for the two dimensional finite difference approximation to the transformed Laplace equation.} +\lstset{caption={CUDA kernel implementation for the two dimensional finite difference approximation to the transformed Laplace equation.} %,basicstyle=\scriptsize\ttfamily } -\begin{lstlisting} +\begin{lstlisting}[label=ch7:lst:fd2d] template __global__ void laplace_sigma_transformed( value_type* out , value_type const* p @@ -596,9 +596,9 @@ __global__ void laplace_sigma_transformed( In a similar template-based approach, the kernel for the right hand side operator of the two dimensional problem is implemented and listed in Listing \ref{ch7:lst:rhs2d}. The kernel computes the right hand side updates for both surface variables, $\eta$ and $\tilde{\phi}$, and applies an embedded penalty forcing \eqref{ch7:eq:penalty}, for all nodes within generation or absorption zones. The penalty forcing functions are computed based on linear or non-linear wave theory in a separate device function. -\lstset{label=ch7:lst:rhs2d,caption={CUDA kernel implementation for the 2D right hand side.}%,basicstyle=\scriptsize\ttfamily +\lstset{caption={CUDA kernel implementation for the 2D right hand side.}%,basicstyle=\scriptsize\ttfamily } -\begin{lstlisting} +\begin{lstlisting}[label=ch7:lst:rhs2d] template __global__ void rhs(value_type const* p , value_type const* p_surf , value_type const* eta , value_type* dp_surf_dt @@ -820,14 +820,14 @@ Ideally, the ratio $\mathcal{C}_\mathcal{G}/\mathcal{C}_\mathcal{F}$ is small an \begin{figure}[!htb] \begin{center} \setlength\figureheight{0.35\textwidth} - \setlength\figurewidth{0.37\textwidth} + \setlength\figurewidth{0.35\textwidth} \subfigure[Performance scaling]{ % {\small\input{Chapters/chapter7/figures/PararealScaletestGTX590.tikz}} - \includegraphics[width=0.5\textwidth]{Chapters/chapter7/figures/PararealScaletestGTX590_conv.pdf} + \includegraphics[width=0.47\textwidth]{Chapters/chapter7/figures/PararealScaletestGTX590_conv.pdf} } \subfigure[Speedup]{ % {\small\input{Chapters/chapter7/figures/PararealSpeedupGTX590.tikz}} - \includegraphics[width=0.5\textwidth]{Chapters/chapter7/figures/PararealSpeedupGTX590_conv.pdf} + \includegraphics[width=0.47\textwidth]{Chapters/chapter7/figures/PararealSpeedupGTX590_conv.pdf} } \end{center} \caption[Parareal absolute timings and parareal speedup.]{(a) Parareal absolute timings for an increasingly number of water waves traveling one wave length, each wave resolution is ($33\times 9$). (b) Parareal speedup for two to sixteen compute nodes compared to the purely sequential single GPU solver. Notice how insensitive the parareal scheme is to the size of the problem solved. Test environment 2.}\label{ch7:fig:DDPA_SPEEDUP}