BookGPU/Chapters/chapter16/bdf.tex

   1 \subsection{Gear-2 based sensitivity calculation}
   2 \label{sec:gear}
   3
   4 The Gear-2 integration method is a backward differentiation
   5 formula (BDF), which approximates the derivative of a function
   6 using information from past few steps.  Gear-2 method has been
   7 shown to be more suitable for many practical problems such as
   8 stiff problems where circuit behavior is affected by different time constants
   9 (fast ones with large poles and slow ones with small poles)~\cite{Vlach:Book'94}.
  10 Switching power converters and RF systems are typically stiff systems
  11 as waveforms changing in two different time scales are involved.
  12
  13 % Although on one time step, Gear-2
  14 % involves more computation than backward-Euler and trapezoidal method,
  15 % it has better convergence and stability, and usually allows larger
  16 % time step.  As a result, fewer time points are needed for a given
  17 % transient sweep range, and the simulation is speeded up with the same
  18 % accuracy
  19
  20 Given the DAE~(\ref{eq:dae}), at the $i$-th time step,
  21 Gear-2 integration\index{Gear-2 integration}
  22 approximates the derivative $\ud q(x(t))/\ud t$
  23 by a two-step backward finite difference,
  24 \[
  25    \frac{\ud q(x(t_i))}{\ud t}
  26    %\ud q(x(t_i)) / \ud t
  27    = \frac{1}{h_i}
  28      \left[ q(x(t_{i})) - \alpha_1^i q(x(t_{i-1})) - \alpha_2^i q(x(t_{i-2}))\right],
  29 \]
  30 where the coefficients $\alpha$'s are chosen to satisfy Gear's
  31 backward differentiation formula~\cite{Gear:book'71}.
  32 For uniformly discretised time steps,
  33 the index $i$ in $h_i$, $\alpha_1^i$ and $\alpha_2^i$ can be omitted.
  34
  35 For the first step $t_1$,
  36 only the initial condition at $t_0$ is available.
  37 Therefore backward-Euler is used, i.e.,
  38 \begin{equation}\label{eq:BE}
  39    \tfrac{1}{h_1}[q(x_1)-q(x_0)] + f(x_1) = b_1.
  40 \end{equation}
  41 Since $x_0$ directly affects the solution of $x_1$,
  42 the sensitivity matrix\index{sensitivity matrix} up to now is
  43 \begin{equation}\label{eq:sens1}
  44    \frac{\ud x_1}{\ud x_0}
  45    = \left[\frac{1}{h_1}C_1+G_1\right]^{-1} \frac{C_0}{h_1}
  46    = J_1^{-1} \frac{C_0}{h_1}.
  47 \end{equation}
  48 %For the sake of simplicity,
  49 Let $J_i$ denote $(1/h_i)C_i + G_i$
  50 in the remaining part of this chapter.
  51 In addition, the LU\index{LU} factorizations of $J_i$ are already calculated
  52 since they are used to solve the circuit state at
  53 each time step before we calculate the sensitivity.
  54
  55 Next, for the solution at $t_2$,
  56 with the previous two steps information available,
  57 Gear-2 integration can be applied,
  58 \begin{equation}\label{eq:Gear_t2}
  59    \tfrac{1}{h_2}[q(x_2) - \alpha_1 q(x_1) - \alpha_2 q(x_0)] + f(x_2) = b_2.
  60 \end{equation}
  61 In view of the sensitivity of $x_2$ with respect to the changes of $x_0$,
  62 \eqref{eq:Gear_t2} indicates that $x_2$'s perturbation can be
  63 traced back to $x_0$ along two paths:
  64 $x_2$ is directly affected by $x_0$,
  65 and $x_2$ is also affected indirectly by $x_0$ via $x_1$.
  66 That is,
  67 \[ %begin{equation}
  68   \frac{\ud x_2}{\ud x_0} = \frac{\partial x_2}{\partial x_1}\frac{\ud x_1}{\ud x_0}
  69                           + \frac{\partial x_2}{\partial x_0},
  70 \] %end{equation}
  71 where the two partial derivatives are
  72 \[
  73    \frac{\partial x_2}{\partial x_1} = J_2^{-1} \frac{\alpha_1}{h_2} C_1, \qquad
  74    \frac{\partial x_2}{\partial x_0} = J_2^{-1} \frac{\alpha_2}{h_2} C_0,
  75 \]
  76 and $\ud x_1/\ud x_0$ %, the sensitivity of $x_1$ with respect to $x_0$,
  77 is calculated previously in \eqref{eq:sens1}.
  78 Thus, the sensitivity matrix deduced from \eqref{eq:Gear_t2} is
  79 \begin{equation}\label{eq:sens2}
  80    \frac{\ud x_2}{\ud x_0}
  81    = J_2^{-1} \frac{\alpha_1}{h_2} C_1 \cdot J_1^{-1} \frac{C_0}{h_1}
  82    + J_2^{-1} \frac{\alpha_2}{h_2} C_0.
  83 \end{equation}
  84
  85 Likewise, for the third time step $t_3$,
  86 apply the Gear-2 integration formula to DAE~(\ref{eq:dae}),
  87 \begin{equation}\label{eq:Gear_t3}
  88    \tfrac{1}{h_3} [q(x_3) - \alpha_1 q(x_2) - \alpha_2 q(x_1)] + f(x_3) = b_3,
  89 \end{equation}
  90 and the chain rule for sensitivity is
  91 \[ %begin{equation}\label{eq:sens3}
  92 % \begin{split}
  93 %    \frac{\ud x_3}{\ud x_0} &= \frac{\partial x_3}{\partial x_2}\frac{\ud x_2}{\ud x_0}
  94 %                             + \frac{\partial x_3}{\partial x_1}\frac{\ud x_1}{\ud x_0} \\
  95 %    &=  J_3^{-1} \frac{\alpha_1}{h_3} C_2 \frac{\ud x_2}{\ud x_0}
  96 %      + J_3^{-1} \frac{\alpha_2}{h_3} C_1 \frac{\ud x_1}{\ud x_0}.
  97 % \end{split}
  98    \frac{\ud x_3}{\ud x_0} \!=\! \frac{\partial x_3}{\partial x_2}\frac{\ud x_2}{\ud x_0}
  99                             + \frac{\partial x_3}{\partial x_1}\frac{\ud x_1}{\ud x_0}
 100    \!=\!  J_3^{-1} \!\left[ \frac{\alpha_1}{h_3} C_2 \frac{\ud x_2}{\ud x_0}
 101                    \!+\!\frac{\alpha_2}{h_3} C_1 \frac{\ud x_1}{\ud x_0}\right],
 102 \] %end{equation}
 103 where both $\ud x_2/\ud x_0$ and $\ud x_1/\ud x_0$
 104 are ready from the previous two time steps, i.e., Eqs.~(\ref{eq:sens2}) and~(\ref{eq:sens1}).
 105 Follow this chain rule in the aforementioned recursive style,
 106 the sensitivity matrix for Gear-2 integration is computed
 107 along the remaining time steps up to the $M$-th step,
 108 \begin{equation}\label{eq:sensM}
 109 % \begin{split}
 110 %    J = \frac{\ud x_M}{\ud x_0}
 111 %      & =  J_M^{-1} \frac{\alpha_1}{h_M} C_{M-1} \frac{\ud x_{M-1}}{\ud x_0}  \\
 112 %      &\quad + J_M^{-1} \frac{\alpha_2}{h_M} C_{M-2} \frac{\ud x_{M-2}}{\ud x_0}.
 113 % \end{split}
 114 J\! =\! \frac{\ud x_M}{\ud x_0}
 115  \! =\!  J_M^{-1} \! \left[ \frac{\alpha_1}{h_M} C_{M-1} \frac{\ud x_{M-1}}{\ud x_0}
 116      \! + \! \frac{\alpha_2}{h_M} C_{M-2} \frac{\ud x_{M-2}}{\ud x_0}\right].
 117 \end{equation}
 118
 119 \begin{algorithm}
 120 \caption{The matrix-free\index{matrix-free} method for
 121  Krylov subspace\index{Krylov subspace} construction.}
 122 \label{alg:mf_Gear}
 123   \KwIn{ current Krylov subspace basis vector $v$,
 124            time step lengths $h_i$,
 125            saved $C_i$ matrices and LU\index{LU} factors of $J_i$, $i=0,\ldots,M$}
 126   \KwOut{ matrix-vector product $r$, such that $r = A v$}
 127   solve $J_1 p_2 = h_1^{-1} C_0 v$ for $p_2$\;
 128   solve $J_2 p_1 = h_2^{-1} \left( \alpha_1 C_1 p_2 + \alpha_2 C_0 v \right)$ for $p_1$\;
 129   \For{$i=3 \ldots M$}{
 130      solve
 131        $J_i p_0 = \alpha_1 h_i^{-1} C_{i-1} p_1 + \alpha_2 h_i^{-1} C_{i-2} p_2$
 132        for $p_0$ \label{line:mf_Gear_loop}\;
 133      $p_2 = p_1$\;
 134      $p_1 = p_0$\;
 135   }
 136    $r = (k-1) p_0 - k v$ \label{line:shift}\;
 137 \end{algorithm}
 138
 139
 140 %% \begin{algorithm}
 141 %% \caption{The matrix-free\index{matrix-free} method for
 142 %%  Krylov subspace\index{Krylov subspace} construction.}
 143 %% \label{alg:mf_Gear}
 144 %% \begin{algorithmic}[1]
 145 %%   \REQUIRE current Krylov subspace basis vector $v$,
 146 %%            time step lengths $h_i$,
 147 %%            saved $C_i$ matrices and LU\index{LU} factors of $J_i$, $i=0,\ldots,M$
 148 %%   \ENSURE matrix-vector product $r$, such that $r = A v$
 149 %%   \STATE solve $J_1 p_2 = h_1^{-1} C_0 v$ for $p_2$
 150 %%   \STATE solve $J_2 p_1 = h_2^{-1} \left( \alpha_1 C_1 p_2 + \alpha_2 C_0 v \right)$ for $p_1$
 151 %%   \FOR{$i=3 \ldots M$}
 152 %%     \STATE solve
 153 %%        $J_i p_0 = \alpha_1 h_i^{-1} C_{i-1} p_1 + \alpha_2 h_i^{-1} C_{i-2} p_2$
 154 %%        for $p_0$ \label{line:mf_Gear_loop}
 155 %%     \STATE $p_2 = p_1$
 156 %%     \STATE $p_1 = p_0$
 157 %%   \ENDFOR
 158 %%   \STATE $r = (k-1) p_0 - k v$ \label{line:shift}
 159 %% \end{algorithmic}
 160 %% \end{algorithm}
 161
 162 %It is easy to judge that, as $J$ is dense, explicitly
 163 %computing $J$ is expensive and impractical for large problems.
 164 %In addition, with a dense matrix,
 165 %both direct or iterative solvers will be inefficient.
 166 %Fortunately, one iterative solver, GMRES,
 167 %only requires $J x$ (actually $Ax$),
 168 %instead of an explicit $J$, to fulfill the solving process.
 169 %This virtue greatly saves computation cost.
 170 Note that as the matrix-free\index{matrix-free} GMRES method is applied,
 171 which only requires matrix-vector multiplication and no explicit $J$ is required,
 172 as explained in Algorithm~\ref{alg:mf_Gear}.
 173 % Next section will explain how to use this technique for
 174 % our proposed Gear-2 envelope-following.
 175
 176 With matrix-free method, the matrix-vector multiplication
 177 in Line~\ref{line:mvp} of Algorithm~\ref{alg:GMRES}
 178 is replaced by the iteration shown in Algorithm~\ref{alg:mf_Gear},
 179 which calculates the multiplication product of the Gear-2 sensitivity we
 180 encounter in envelope-following and a basis vector in the Krylov
 181 subspace.
 182 Note that the scaling and shift of $J$ in $A=(k-1)J-kI$,
 183 as described in Eq.~\eqref{eq:A},
 184 is taken into consideration by Line~\ref{line:shift}
 185 of Algorithm~\ref{alg:mf_Gear}.
 186
 187 For the matrix-free generation of new basis vectors, it is
 188 straightforward to apply CUBLAS and CUSPARSE routines, and some
 189 customized GPU kernel functions to implement
 190 Algorithm~\ref{alg:mf_Gear} with the stored LU matrices of $J_i$ and
 191 $C_i$ mentioned earlier. For example in Line~\ref{line:mf_Gear_loop},
 192 as the iteration index $i$ traverses all the $M$ time points'
 193 matrix information,
 194 sparse matrix-vector multiplication \verb|csrmv|
 195 is first called to calculate $C_{i-1} p_1$ and $C_{i-2} p_2$.
 196 And after the two resulted vectors are combined by \verb|axpy| of CUBLAS,
 197 $p_0$ is solved for by \verb|trsv|, since as we noted before,
 198 $J_i$ is already in LU factorization form when transient simulation
 199 at step $i$ finished.