use matrix-free GMRES to solve
the Newton update problems with implicit sensitivity calculation,
i.e., the steps enclosed by the double dashed block
-in Fig.~\ref{fig:ef_flow}.
+in Figure~\ref{fig:ef_flow}.
Then implementation issues of GPU acceleration
will be discussed in detail.
Finally, the Gear-2 integration is briefly introduced.
%% \end{algorithm}
\begin{algorithm}
-\caption{Standard GMRES\index{iterative method!GMRES} algorithm.} \label{alg:GMRES}
+\caption{standard GMRES\index{iterative method!GMRES} algorithm} \label{alg:GMRES}
\KwIn{ $ A \in \mathbb{R}^{N \times N}$, $b \in \mathbb{R}^N$,
and initial guess $x_0 \in \mathbb{R}^N$}
\KwOut{ $x \in \mathbb{R}^N$: $\| b - A x\|_2 < tol$}
At each time step, SPICE\index{SPICE} has
to linearize device models, stamp matrix elements
into MNA (short for modified nodal analysis\index{modified nodal analysis, or MNA}) matrices,
-and solve circuit equations in its inner Newton iteration\index{Newton iteration}.
+and solve circuit equations in its inner Newton iteration\index{iterative method!Newton iteration}.
When convergence is attained,
circuit states are saved and then next time step begins.
This is also the time when we store the needed matrices
the small size of Hessenberg matrix,
and the frequent inspection of values by host, it is
preferable to allocate $\tilde{H}$ in CPU (host) memory.
-As shown in Fig.~\ref{fig:gmres}, the memory copy from device to host
+As shown in Figure~\ref{fig:gmres}, the memory copy from device to host
is called each time when Arnoldi iteration generates a new vector
and the orthogonalization produces the vector $h$.