use matrix-free GMRES to solve
the Newton update problems with implicit sensitivity calculation,
i.e., the steps enclosed by the double dashed block
-in Fig.~\ref{fig:ef_flow}.
+in Figure~\ref{fig:ef_flow}.
Then implementation issues of GPU acceleration
will be discussed in detail.
Finally, the Gear-2 integration is briefly introduced.
the small size of Hessenberg matrix,
and the frequent inspection of values by host, it is
preferable to allocate $\tilde{H}$ in CPU (host) memory.
-As shown in Fig.~\ref{fig:gmres}, the memory copy from device to host
+As shown in Figure~\ref{fig:gmres}, the memory copy from device to host
is called each time when Arnoldi iteration generates a new vector
and the orthogonalization produces the vector $h$.