BookGPU/Chapters/chapter16/exp.tex

   1 \section{Numerical examples}
   2 \label{sec:exp}
   3
   4 The presented algorithm has been prototyped and numerical experiments
   5 have been carried out on a server which has an Intel Xeon quad-core
   6 CPU with 2.0~GHz clock speed, and 24~GBytes memory. The GPU card
   7 mounted on this server is NVIDIA's Tesla~C2070 (Fermi), which contains
   8 448 cores (14 MPs $\times$ 32 cores per MP) running at a 1.30~GHz and
   9 has 4~GBytes on-chip memory. Some initial results have been published
  10 in~\cite{LiuTan1:DATE'12}.
  11
  12 The envelope-following method with the proposed Gear-2
  13 sensitivity matrix computation is added to an open-source
  14 SPICE\index{SPICE}, implemented in C~\cite{ngspice}.
  15 Our envelope-following program is implemented by following
  16 the algorithm mentioned in~\cite{Kato:COMPEL'06}.
  17 To solve the Newton update equation,
  18 different methods are used to compare the computation time,
  19 such as direct LU, GMRES with explicitly formed matrix,
  20 and GMRES with implicit matrix-vector multiplication (matrix-free).
  21 Moreover, the matrix-free method is also incorporated to the same SPICE
  22 simulator using CUDA C programming interface, as described in
  23 Section~\ref{sec:gpu}.
  24
  25 \begin{figure}
  26   \centering
  27   \resizebox{.8\textwidth}{!}{\input{./Chapters/chapter16/figures/resonant_flyback.pdf_t}}
  28   \caption{Diagram of a zero-voltage quasi-resonant flyback converter.}
  29   \label{fig:flyback}
  30 \end{figure}
  31
  32
  33 \begin{figure}%[hbfp]
  34   \centering
  35   \resizebox{.6\textwidth}{!}{\input{./Chapters/chapter16/figures/pgMesh.pdf_t}}
  36   \caption{Illustration of power/ground network model.}
  37   \label{fig:pg}
  38 \end{figure}
  39
  40 \begin{figure}
  41 \centering
  42 \subfigure[The whole plot]{
  43   \includegraphics[width=.6\textwidth]{./Chapters/chapter16/figures/flyback_wave_emb.eps}
  44   \label{fig:flybackWhole}
  45 }
  46 \subfigure[Detail of one EF simulation period]{
  47   \includegraphics[width=.6\textwidth]{./Chapters/chapter16/figures/flyback_zoomin_emb.eps}
  48   \label{fig:flybackZoom}
  49 }
  50 \caption{Flyback converter solution calculated by envelope-following.
  51 The red curve is traditional SPICE simulation result, and
  52 the back curve is the envelope-following output with simulation points
  53 marked.}
  54 \label{fig:flyback_wave}
  55 \end{figure}
  56
  57
  58 \begin{figure}%[hbfp]
  59   \centering
  60   \includegraphics[width=.6\textwidth]{./Chapters/chapter16/figures/buck_wave_emb.eps}
  61   \caption{Buck converter solution calculated by envelope-following.}
  62   \label{fig:buck_wave}
  63 \end{figure}
  64
  65 We use several integrated on-chip converters as simulation examples
  66 to measure running time and speedup. They include a Buck converter,
  67 a quasi-resonant flyback converter (shown in Fig.~\ref{fig:flyback}),
  68 and two boost converters.
  69 Each converter is directly integrated with on-chip power grid networks,
  70 since the performance of converters should be studied with their loads and
  71 we can easily observe the waveforms at different nodes in a power
  72 grid (see Fig.~\ref{fig:pg} for a simplified power grid structure).
  73
  74 Fig.~\ref{fig:flyback_wave}
  75 and Fig.~\ref{fig:buck_wave}
  76 shows the waveform at output node of the resonant flyback converter
  77 and the Buck converter.
  78 Note that on the envelope curve, the darker
  79 dots in separated segments indicate the real simulation points were
  80 calculated in those cycles, and the segments without dots are the
  81 envelope jumps where no simulation were done.
  82 It can be verified that the proposed Gear-2 envelope-following method
  83 produces a envelope matching the original waveform well.
  84
  85 \begin{table}
  86 \centering
  87 \caption{CPU and GPU time comparisons (in seconds) for solving Newton update equation
  88   with the proposed Gear-2 sensitivity.
  89 }
  90 \vspace{0.1in}
  91 \label{table:circuit}
  92 {%\normalsize
  93 \begin{tabular}{@{}c|c|c|c|c|c|c@{}}
  94 \hline\hline
  95 Circuit & Nodes & Direct & Explicit & \multicolumn{3}{c}{Implicit GMRES}\\ \cline{5-7}
  96          &       & LU   & GMRES     & CPU  & GPU & X \\
  97 \hline
  98 Buck       &  910   & 423.8  & 420.3 &  36.8  &  3.9 & 9.4$\times$  \\
  99 Flyback    &  941   & 462.4  & 459.6 &  64.5  &  7.4 & 8.7$\times$  \\
 100 Boost-1    &  976   & 695.1  & 687.7 &  73.2  &  6.2 & 11.8$\times$ \\
 101 Boost-2    & 1093   & 729.5  & 720.8 &  71.0  &  8.5 & 9.9$\times$ \\
 102 \hline\hline
 103 \end{tabular}
 104 }
 105 \end{table}
 106
 107 For the comparison  of running time spent in solving
 108 Newton update equation, Table~\ref{table:circuit} lists the time
 109 costed by direct method, explicit GMRES, matrix-free GMRES,
 110 and GPU matrix-free GMRES. All methods carry out the Gear-2 based
 111 envelope-following method, but they handle the sensitivity and
 112 equation solving in different implementation steps.
 113 It is obvious that as long as the sensitivity matrix is explicitly formed,
 114 such as the cases in direct method and explicitly GMRES,
 115 the cost is much higher than the implicit methods.
 116 When matrix-free technique is applied to generate matrix-vector
 117 products implicitly, the computation cost is greatly reduced.
 118 Thus, for the same example, implicit GMRES would be one order
 119 of magnitude faster than explicit GMRES. Furthermore, our GPU parallel
 120 implementation of implicit GMRES makes this method even faster,
 121 with a further 10$\times$ speedup.