From: couturie Date: Sat, 21 Sep 2013 06:37:08 +0000 (+0200) Subject: new X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/commitdiff_plain/c76d2115cc2c37904fb4173b7b1e0288a6646c9c?ds=inline;hp=8fc4c8914177b38f8042870c31065ea619b900ec new --- diff --git a/BookGPU/Chapters/chapter10/ch10.tex b/BookGPU/Chapters/chapter10/ch10.tex index 37a8ba1..dffecb5 100644 --- a/BookGPU/Chapters/chapter10/ch10.tex +++ b/BookGPU/Chapters/chapter10/ch10.tex @@ -297,7 +297,7 @@ and $\bs\beta = \mbf{N^T} \mbf{(B^{-1})^T} \mbf{d}$ \subsubsection*{Choice of the leaving variable} The stability and robustness of the algorithm depend considerably on the choice of the leaving variable. With respect to this, the \textit{expand} method~\cite{GILL1989} proves to be very useful in the sense that it helps to avoid cycles and reduces the risks of encountering numerical instabilities. This method consists of two steps of complexity $\mathcal{O}(m)$. In the first step, a small perturbation is applied to the bounds of the variables to prevent stalling of the objective value, thus avoiding cycles. These perturbed bounds are then used to determine the greatest gain on the entering variable imposed by the most constraining basic variable. The second phase uses the original bounds to define the basic variable offering the gain closest to the one of the first phase. This variable will then be selected for leaving the basis. -\pagebreak +\clearpage \section{Branch-and-bound\index{branch-and-bound} algorithm} \label{chXXX:sec:bnb} \subsection{Integer linear programming\index{integer linear programming}} @@ -428,7 +428,7 @@ Various types or families of cutting-planes exist. Those that are most applied a The cutting-planes generated must be carefully selected in order to avoid a huge increase in the problem size. They are selected according to three criteria: their efficiency, their orthogonality with respect to other cutting-planes, and their parallelism with respect to the objective function. Cutting-planes having the most impact on the problem are then selected, while the others are dropped. - +\clearpage \section{CUDA considerations} \label{chXXX:sec:cuda} The most expensive operations in the simplex algorithm are linear algebra functions. @@ -549,7 +549,7 @@ The bulk of the data representing the problem is stored on the GPU. Only variables required for decision-making operations are updated on the CPU. The communications arising from the aforementioned scheme are illustrated in Figure~\ref{chXXX:fig:diagSTD}. The amount of data exchanged at each iteration is independent of the problem size ensuring that this implementation scales well as the problem size increases. - +\clearpage \begin{figure}[!h] \centering \includegraphics[width=90mm]{Chapters/chapter10/figures/DiagSTD_cap.pdf} @@ -937,7 +937,7 @@ nesm.mps & 663 & 2923 & 13988 & 4.4\\ \hline maros.mps & 847 & 1443 & 10006 & 1.7\\ \hline perold.mps & 626 & 1376 & 6026 & 1.0\\ \hline \end{tabular} -\caption{NETLIB problems solved in the range of 1 to 4 seconds} +\caption{NETLIB problems solved in the range of 1 to 4 seconds.} \label{chXXX:tab:medium} \end{center} \end{table} @@ -950,7 +950,7 @@ perold.mps & 626 & 1376 & 6026 & 1.0\\ \hline \end{figure} Finally, the biggest problems, and slowest to solve, are given in Table~\ref{chXXX:tab:big}. A new tendency can be observed in Figure~\ref{chXXX:fig:SlowSolve}. The \textit{Revised-sparse} method is the fastest on most of the problems. The performances are still close between the best two methods on problems having a C-to-R ratio close to 2 such as bnl2.mps, pilot.mps, or greenbeb.mps. However, when this ratio is greater, the \textit{Revised-sparse} can be nearly twice as fast as the standard method. This is noticeable on 80bau3b.mps (4.5) and fit2p.mps (4.3). Although the C-to-R ratio of d6cube.mps (14.9) exceeds the ones previously mentioned, the \textit{Revised-sparse} method doesn't show an impressive performance, probably due to the small amount of rows and the density of this problem, which doesn't fully benefit from the lower complexity of sparse operations. - +\clearpage \begin{table}[!h] \begin{center} \begin{tabular}{|l|r|r|r|r|} diff --git a/BookGPU/Chapters/chapter7/biblio7.bib b/BookGPU/Chapters/chapter7/biblio7.bib index 53c555c..3ca2131 100644 --- a/BookGPU/Chapters/chapter7/biblio7.bib +++ b/BookGPU/Chapters/chapter7/biblio7.bib @@ -47,7 +47,7 @@ PAGES = "351--370" @ARTICLE{ch7:EHBW08, AUTHOR = "Engsig-Karup, A. P. and Hesthaven, J. S. and Bingham, H. B. and Warburton, T.", -TITLE = "{DG-FEM} solution for nonlinear wave-structure interaction using Boussinesq-type equations", +TITLE = "{DG-FEM} solution for nonlinear wave-structure interaction using {B}oussinesq-type equations", JOURNAL = CE, YEAR = "2008", VOLUME = "55", @@ -74,7 +74,7 @@ PAGES = "2100-2118" @ARTICLE{ch7:EE13, AUTHOR = "Eskilsson, C. and Engsig-Karup, A. P.", -TITLE = "On devising Boussinesq-type models with bounded eigenspectra: One horizontal dimension", +TITLE = "On devising {B}oussinesq-type models with bounded eigenspectra: One horizontal dimension", YEAR = "2013", JOURNAL = "Submitted to Journal of Computational Physics", } @@ -158,7 +158,7 @@ PAGES = "211--228" @ARTICLE{ch7:MBS03, AUTHOR = "Madsen, P. A. and Bingham, H. B. and Sch{\"a}ffer, H. A.", -TITLE = "Boussinesq-type formulations for fully nonlinear and extremely dispersive water waves: derivation and analysis", +TITLE = "{B}oussinesq-type formulations for fully nonlinear and extremely dispersive water waves: derivation and analysis", JOURNAL = RSL, YEAR = "2003", VOLUME = "459", @@ -167,7 +167,7 @@ PAGES = "1075--1104" @ARTICLE{ch7:MBL02, AUTHOR = "Madsen, P. A. and Bingham, H. B. and Liu, H.", -TITLE = "A new Boussinesq method for fully nonlinear waves from shallow to deep water", +TITLE = "A new {B}oussinesq method for fully nonlinear waves from shallow to deep water", JOURNAL = JFM, YEAR = "2002", VOLUME = "462", @@ -176,7 +176,7 @@ PAGES = "1--30" @ARTICLE{ch7:AMS99, AUTHOR = "Agnon, Y. and Madsen, P. A. and Sch{\"a}ffer", -TITLE = "{A new approach to high-order Boussinesq models.}", +TITLE = "{A new approach to high-order {B}oussinesq models.}", JOURNAL = JFM, YEAR = "{1999}", VOLUME = "{399}", @@ -235,7 +235,7 @@ PAGES = "{319--333}" @article {ch7:ChazelEtAl2010, AUTHOR = {Chazel, F. and Benoit, M. and Ern, A. and Piperno, S.}, - TITLE = {A double-layer Boussinesq-type model for highly nonlinear and dispersive waves}, + TITLE = {A double-layer {B}oussinesq-type model for highly nonlinear and dispersive waves}, JOURNAL = {Proc. Roy. Soc. London Ser. A}, FJOURNAL = {Proceedings of the Royal Society. London. Series A. Mathematical, Physical and Engineering Sciences}, @@ -390,7 +390,7 @@ keywords = "Exaflop" @PHDTHESIS{ch7:ENG06, AUTHOR = "Engsig-Karup, A. P.", -TITLE = "Unstructured Nodal {DG-FEM} solution of high-order Boussinesq-type equations", +TITLE = "Unstructured Nodal {DG-FEM} solution of high-order {B}oussinesq-type equations", SCHOOL = "Department of Mechanical Engineering, Technical University of Denmark", YEAR = "2006", howpublished = "http://orbit.dtu.dk/en/publications/id(52502078-1608-4799-8e55-41f60bb92db6).html" @@ -427,7 +427,7 @@ year = "1978" @ARTICLE{ch7:MS98, AUTHOR = "Madsen, P. A. and Sch{\"{a}}ffer, H. A.", -TITLE = "Higher order Boussinesq-type equations for surface gravity waves--derivation and analysis.", +TITLE = "Higher order {B}oussinesq-type equations for surface gravity waves--derivation and analysis.", JOURNAL = "Advances in Coastal and Ocean Engineering", VOLUME = "356", YEAR = "1998", @@ -540,7 +540,7 @@ YEAR = "2001" } @article{ch7:Bingham2009467, -title = "Velocity potential formulations of highly accurate Boussinesq-type models", +title = "Velocity potential formulations of highly accurate {B}oussinesq-type models", journal = "Coastal Engineering", volume = "56", number = "4", diff --git a/BookGPU/Chapters/chapter7/ch7.tex b/BookGPU/Chapters/chapter7/ch7.tex index 53cbae2..6fd9c62 100644 --- a/BookGPU/Chapters/chapter7/ch7.tex +++ b/BookGPU/Chapters/chapter7/ch7.tex @@ -37,7 +37,7 @@ A key problem is that improvements in performance require porting legacy codes\f %However, increasing amounts of applications are utilizing accelerators to parts of their code to gain speedups albeit with less dramatic improvements of performance as one can potentially find by adapting most, if not all of the application to modern hardware. In this work, we explore some of these trends by developing, by bottom-up-design, a water-wave model which can be utilized in maritime engineering and with the intended use on affordable office desktops as well as on more expensive modern compute clusters for engineering analysis purposes. - +\clearpage \section{On modeling paradigms for highly nonlinear and dispersive water waves} \label{ch7:sec:modernwavemodellingparadigms} @@ -200,7 +200,7 @@ From a numerical point of view, an efficient and scalable discretization strateg We present scalability and performance tests based on the same two test environments outlined in Chapter \ref{ch5}, Section \ref{ch5:sec:testenvironments}, plus a fourth test environment based on the most recent hardware generation: \begin{description} -\item[Test environment 4.] Desktop with dual-socket Sandy Bridge Intel Xeon E5-2670 (2.60 GHz) processors, 64GB RAM, 2x Nvidia Tesla K20 GPUs. +\item[Test environment 4.] Desktop with dual-socket Sandy Bridge Intel Xeon E5-2670 (2.60 GHz) processors, 64GB RAM, 2x NVIDIA Tesla K20 GPUs. \end{description} Performance results can be used to predict actual runtimes as described in \cite{ch7:EngsigKarupEtAl2011}, e.g., for estimation of whether a real-time constraint for a given problem size can be met. @@ -761,7 +761,7 @@ Elaborating on this example, we examine the propagation of a regular stream func \label{ch7:filter} \mathcal{F}u(x_i) = \sum_{n=-\alpha}^{\alpha} c_n u(x_{i+n}), \end{align} -where $c_n\in\mathbb{R}$ are the stencil coefficients and $\alpha\in\mathbb{Z}_+$ is the stencil half-width. An active filter can for example be based on employing a Savitzky-Golay smoothening filter \cite{ch7:PT90}, e.g., the mild 7-point SG(6,10) filter, and applying it after every 10th time step to each of the collocation nodes defining the free surface variables. The same procedure can be used for stabilization of nonlinear simulations to remove high-frequency "saw-tooth" instabilities as shown in \cite{ch7:EBL08}. This filtering technique can also remove high-frequency noise resulting from round-off errors in computations that would otherwise potentially pollute the computational results and in the worst case leave them useless. The effect of this type of filtering on the numerical efficiency of the model is insignificant. +where $c_n\in\mathbb{R}$ are the stencil coefficients and $\alpha\in\mathbb{Z}_+$ is the stencil half-width. An active filter can for example be based on employing a Savitzky-Golay smoothening filter \cite{ch7:PT90}, e.g., the mild 7-point SG(6,10) filter, and applying it after every 10th time step to each of the collocation nodes defining the free surface variables. The same procedure can be used for stabilization of nonlinear simulations to remove high-frequency ``saw-tooth'' instabilities as shown in \cite{ch7:EBL08}. This filtering technique can also remove high-frequency noise resulting from round-off errors in computations that would otherwise potentially pollute the computational results and in the worst case leave them useless. The effect of this type of filtering on the numerical efficiency of the model is insignificant. Results from numerical experiments are presented in Figure \ref{ch7:filtering}, and most of the errors can be attributed to phase errors resulting from difference in exact versus numerical phase speed. In numerical experiments, we find that while results computed in double-precision are not significantly affected by accumulation of round-off errors, the single-precision results are. In Figures \ref{ch7:filtering} (a) and (b), a direct solver based on sparse Gaussian elimination within MATLAB\footnote{\url{http://www.mathworks.com}.} is used to solve the linear system at every stage and a comparison is made between single- and unfiltered double-precision calculations. It is shown in Figure \ref{ch7:filtering} a) that without a filter, the single-precision calculations result in ``blow-up'' after which the solver fails just before 50 wave periods of calculation time. However, in Figure \ref{ch7:filtering} (b) it is demonstrated that invoking a smoothening filter, cf. \eqref{ch7:filter}, stabilizes the accumulation of round-off errors and the calculations continue and achieve reduced accuracy compared to the computed double-precision results. Thus, it is confirmed that such a filter can be used to control and suppress high-frequency oscillations that results from accumulation of round-off errors. In contrast, replacing the direct solver with an iterative PDC method using the GPU-accelerated wave model appears to be much more attractive upon inspection of Figures \ref{ch7:filtering} (c) and (d). The single-precision results are found to be stable with and {\em without} the filter-based strategy for this problem. The calculations show that single-precision math leads to slightly faster error accumulation for this choice of resolution, however, with only small differences in error level during long time integration. This highlights that fault-tolerance of the iterative PDC method contributes to securing robustness of the calculations. @@ -846,8 +846,8 @@ We have performed a scalability study for parareal using 2D nonlinear stream fun Performance results for the Whalin test case are also shown in Figure \ref{ch7:fig:whalinparareal}. There is a natural limitation to how much we can increase $R$ (the ratio between the complexity of the fine and coarse propagators), because of stability issues with the coarse propagator. In this test case we simulate from $t=[0,1]$s, using up to $32$ GPUs. For low $R$ and only two GPUs, there is no speedup gain, but for the configuration with eight or more GPUs and $R\geq6$, we are able to get more than $2$ times speedup. Though these hyperbolic systems are not optimal for performance tuning using the parareal method, results still confirm that reasonable speedups are in fact possible on heterogenous systems. \begin{figure}[!htb] - \setlength\figureheight{0.3\textwidth} - \setlength\figurewidth{0.32\textwidth} + \setlength\figureheight{0.29\textwidth} + \setlength\figurewidth{0.29\textwidth} % \begin{center} \subfigure[Speedup]{ {\small\input{Chapters/chapter7/figures/WhalinPararealSpeedup.tikz}}