1 \subsection{Gear-2 based sensitivity calculation}
4 The Gear-2 integration method is a backward differentiation
5 formula (BDF), which approximates the derivative of a function
6 using information from past few steps. Gear-2 method has been
7 shown to be more suitable for many practical problems such as
8 stiff problems where circuit behavior is affected by different time constants
9 (fast ones with large poles and slow ones with small poles)~\cite{Vlach:Book'94}.
10 Switching power converters and RF systems are typically stiff systems
11 as waveforms changing in two different time scales are involved.
13 % Although on one time step, Gear-2
14 % involves more computation than backward-Euler and trapezoidal method,
15 % it has better convergence and stability, and usually allows larger
16 % time step. As a result, fewer time points are needed for a given
17 % transient sweep range, and the simulation is speeded up with the same
20 Given the DAE~(\ref{eq:dae}), at the $i$-th time step,
21 Gear-2 integration\index{Gear-2 integration}
22 approximates the derivative $\ud q(x(t))/\ud t$
23 by a two-step backward finite difference,
25 \frac{\ud q(x(t_i))}{\ud t}
26 %\ud q(x(t_i)) / \ud t
28 \left[ q(x(t_{i})) - \alpha_1^i q(x(t_{i-1})) - \alpha_2^i q(x(t_{i-2}))\right],
30 where the coefficients $\alpha$'s are chosen to satisfy Gear's
31 backward differentiation formula~\cite{Gear:book'71}.
32 For uniformly discretised time steps,
33 the index $i$ in $h_i$, $\alpha_1^i$ and $\alpha_2^i$ can be omitted.
35 For the first step $t_1$,
36 only the initial condition at $t_0$ is available.
37 Therefore backward Euler is used, i.e.,
38 \begin{equation}\label{eq:BE}
39 \tfrac{1}{h_1}[q(x_1)-q(x_0)] + f(x_1) = b_1.
41 Since $x_0$ directly affects the solution of $x_1$,
42 the sensitivity matrix\index{sensitivity matrix} up to now is
43 \begin{equation}\label{eq:sens1}
44 \frac{\ud x_1}{\ud x_0}
45 = \left[\frac{1}{h_1}C_1+G_1\right]^{-1} \frac{C_0}{h_1}
46 = J_1^{-1} \frac{C_0}{h_1}.
48 %For the sake of simplicity,
49 Let $J_i$ denote $(1/h_i)C_i + G_i$
50 in the remaining part of this chapter.
51 In addition, the LU\index{LU} factorizations of $J_i$ are already calculated
52 since they are used to solve the circuit state at
53 each time step before we calculate the sensitivity.
55 Next, for the solution at $t_2$,
56 with the previous two steps information available,
57 Gear-2 integration can be applied,
58 \begin{equation}\label{eq:Gear_t2}
59 \tfrac{1}{h_2}[q(x_2) - \alpha_1 q(x_1) - \alpha_2 q(x_0)] + f(x_2) = b_2.
61 In view of the sensitivity of $x_2$ with respect to the changes of $x_0$,
62 \eqref{eq:Gear_t2} indicates that $x_2$'s perturbation can be
63 traced back to $x_0$ along two paths:
64 $x_2$ is directly affected by $x_0$,
65 and $x_2$ is also affected indirectly by $x_0$ via $x_1$.
68 \frac{\ud x_2}{\ud x_0} = \frac{\partial x_2}{\partial x_1}\frac{\ud x_1}{\ud x_0}
69 + \frac{\partial x_2}{\partial x_0},
71 where the two partial derivatives are
73 \frac{\partial x_2}{\partial x_1} = J_2^{-1} \frac{\alpha_1}{h_2} C_1, \qquad
74 \frac{\partial x_2}{\partial x_0} = J_2^{-1} \frac{\alpha_2}{h_2} C_0,
76 and $\ud x_1/\ud x_0$ %, the sensitivity of $x_1$ with respect to $x_0$,
77 is calculated previously in \eqref{eq:sens1}.
78 Thus, the sensitivity matrix deduced from \eqref{eq:Gear_t2} is
79 \begin{equation}\label{eq:sens2}
80 \frac{\ud x_2}{\ud x_0}
81 = J_2^{-1} \frac{\alpha_1}{h_2} C_1 \cdot J_1^{-1} \frac{C_0}{h_1}
82 + J_2^{-1} \frac{\alpha_2}{h_2} C_0.
85 Likewise, for the third time step $t_3$,
86 apply the Gear-2 integration formula to DAE~(\ref{eq:dae}),
87 \begin{equation}\label{eq:Gear_t3}
88 \tfrac{1}{h_3} [q(x_3) - \alpha_1 q(x_2) - \alpha_2 q(x_1)] + f(x_3) = b_3,
90 and the chain rule for sensitivity is
91 \[ %begin{equation}\label{eq:sens3}
93 % \frac{\ud x_3}{\ud x_0} &= \frac{\partial x_3}{\partial x_2}\frac{\ud x_2}{\ud x_0}
94 % + \frac{\partial x_3}{\partial x_1}\frac{\ud x_1}{\ud x_0} \\
95 % &= J_3^{-1} \frac{\alpha_1}{h_3} C_2 \frac{\ud x_2}{\ud x_0}
96 % + J_3^{-1} \frac{\alpha_2}{h_3} C_1 \frac{\ud x_1}{\ud x_0}.
98 \frac{\ud x_3}{\ud x_0} \!=\! \frac{\partial x_3}{\partial x_2}\frac{\ud x_2}{\ud x_0}
99 + \frac{\partial x_3}{\partial x_1}\frac{\ud x_1}{\ud x_0}
100 \!=\! J_3^{-1} \!\left[ \frac{\alpha_1}{h_3} C_2 \frac{\ud x_2}{\ud x_0}
101 \!+\!\frac{\alpha_2}{h_3} C_1 \frac{\ud x_1}{\ud x_0}\right],
103 where both $\ud x_2/\ud x_0$ and $\ud x_1/\ud x_0$
104 are ready from the previous two time steps, i.e., Eqs.~(\ref{eq:sens2}) and~(\ref{eq:sens1}).
105 Follow this chain rule in the aforementioned recursive style,
106 the sensitivity matrix for Gear-2 integration is computed
107 along the remaining time steps up to the $M$-th step,
108 \begin{equation}\label{eq:sensM}
110 % J = \frac{\ud x_M}{\ud x_0}
111 % & = J_M^{-1} \frac{\alpha_1}{h_M} C_{M-1} \frac{\ud x_{M-1}}{\ud x_0} \\
112 % &\quad + J_M^{-1} \frac{\alpha_2}{h_M} C_{M-2} \frac{\ud x_{M-2}}{\ud x_0}.
114 J\! =\! \frac{\ud x_M}{\ud x_0}
115 \! =\! J_M^{-1} \! \left[ \frac{\alpha_1}{h_M} C_{M-1} \frac{\ud x_{M-1}}{\ud x_0}
116 \! + \! \frac{\alpha_2}{h_M} C_{M-2} \frac{\ud x_{M-2}}{\ud x_0}\right].
120 \caption{the matrix-free\index{matrix-free} method for
121 Krylov subspace\index{iterative method!Krylov subspace} construction}
123 \KwIn{ current Krylov subspace basis vector $v$,
124 time step lengths $h_i$,
125 saved $C_i$ matrices and LU\index{LU} factors of $J_i$, $i=0,\ldots,M$}
126 \KwOut{ matrix-vector product $r$, such that $r = A v$}
127 solve $J_1 p_2 = h_1^{-1} C_0 v$ for $p_2$\;
128 solve $J_2 p_1 = h_2^{-1} \left( \alpha_1 C_1 p_2 + \alpha_2 C_0 v \right)$ for $p_1$\;
129 \For{$i=3 \ldots M$}{
131 $J_i p_0 = \alpha_1 h_i^{-1} C_{i-1} p_1 + \alpha_2 h_i^{-1} C_{i-2} p_2$
132 for $p_0$ \label{line:mf_Gear_loop}\;
136 $r = (k-1) p_0 - k v$ \label{line:shift}\;
141 %% \caption{The matrix-free\index{matrix-free} method for
142 %% Krylov subspace\index{Krylov subspace} construction.}
143 %% \label{alg:mf_Gear}
144 %% \begin{algorithmic}[1]
145 %% \REQUIRE current Krylov subspace basis vector $v$,
146 %% time step lengths $h_i$,
147 %% saved $C_i$ matrices and LU\index{LU} factors of $J_i$, $i=0,\ldots,M$
148 %% \ENSURE matrix-vector product $r$, such that $r = A v$
149 %% \STATE solve $J_1 p_2 = h_1^{-1} C_0 v$ for $p_2$
150 %% \STATE solve $J_2 p_1 = h_2^{-1} \left( \alpha_1 C_1 p_2 + \alpha_2 C_0 v \right)$ for $p_1$
151 %% \FOR{$i=3 \ldots M$}
153 %% $J_i p_0 = \alpha_1 h_i^{-1} C_{i-1} p_1 + \alpha_2 h_i^{-1} C_{i-2} p_2$
154 %% for $p_0$ \label{line:mf_Gear_loop}
155 %% \STATE $p_2 = p_1$
156 %% \STATE $p_1 = p_0$
158 %% \STATE $r = (k-1) p_0 - k v$ \label{line:shift}
162 %It is easy to judge that, as $J$ is dense, explicitly
163 %computing $J$ is expensive and impractical for large problems.
164 %In addition, with a dense matrix,
165 %both direct or iterative solvers will be inefficient.
166 %Fortunately, one iterative solver, GMRES,
167 %only requires $J x$ (actually $Ax$),
168 %instead of an explicit $J$, to fulfill the solving process.
169 %This virtue greatly saves computation cost.
170 Note that as the matrix-free\index{matrix-free} GMRES method is applied,
171 which only requires matrix-vector multiplication and no explicit $J$ is required,
172 as explained in Algorithm~\ref{alg:mf_Gear}.
173 % Next section will explain how to use this technique for
174 % our proposed Gear-2 envelope-following.
176 With matrix-free method, the matrix-vector multiplication
177 in Line~\ref{line:mvp} of Algorithm~\ref{alg:GMRES}
178 is replaced by the iteration shown in Algorithm~\ref{alg:mf_Gear},
179 which calculates the multiplication product of the Gear-2 sensitivity we
180 encounter in envelope-following and a basis vector in the Krylov
182 Note that the scaling and shift of $J$ in $A=(k-1)J-kI$,
183 as described in Eq.~\eqref{eq:A},
184 is taken into consideration by Line~\ref{line:shift}
185 of Algorithm~\ref{alg:mf_Gear}.
187 For the matrix-free generation of new basis vectors, it is
188 straightforward to apply CUBLAS and CUSPARSE routines, and some
189 customized GPU kernel functions to implement
190 Algorithm~\ref{alg:mf_Gear} with the stored LU matrices of $J_i$ and
191 $C_i$ mentioned earlier. For example in Line~\ref{line:mf_Gear_loop},
192 as the iteration index $i$ traverses all the $M$ time points'
194 sparse matrix-vector multiplication \verb|csrmv|
195 is first called to calculate $C_{i-1} p_1$ and $C_{i-2} p_2$.
196 And after the two resulted vectors are combined by \verb|axpy| of CUBLAS,
197 $p_0$ is solved for by \verb|trsv|, since as we noted before,
198 $J_i$ is already in LU factorization form when transient simulation
199 at step $i$ finished.