X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/book_gpu.git/blobdiff_plain/4932967a1c684dc3f7ed04c19144101278b79972..HEAD:/BookGPU/Chapters/chapter16/gpu.tex diff --git a/BookGPU/Chapters/chapter16/gpu.tex b/BookGPU/Chapters/chapter16/gpu.tex index bcfc686..4d4d6ef 100644 --- a/BookGPU/Chapters/chapter16/gpu.tex +++ b/BookGPU/Chapters/chapter16/gpu.tex @@ -5,7 +5,7 @@ In this section, we explain how to efficiently use matrix-free GMRES to solve the Newton update problems with implicit sensitivity calculation, i.e., the steps enclosed by the double dashed block -in Fig.~\ref{fig:ef_flow}. +in Figure~\ref{fig:ef_flow}. Then implementation issues of GPU acceleration will be discussed in detail. Finally, the Gear-2 integration is briefly introduced. @@ -15,13 +15,13 @@ Finally, the Gear-2 integration is briefly introduced. \underline{G}eneralized \underline{M}inimum \underline{Res}idual, or GMRES method is an iterative method for solving systems of linear equations ($A x=b$) with dense matrix $A$. -The standard GMRES\index{GMRES} is given in Algorithm~\ref{alg:GMRES}. -It constructs a Krylov subspace\index{Krylov subspace} with order $m$, +The standard GMRES\index{iterative method!GMRES} is given in Algorithm~\ref{alg:GMRES}. +It constructs a Krylov subspace\index{iterative method!Krylov subspace} with order $m$, \[ \mathcal{K}_m = \mathrm{span}( b, A^{} b, A^2 b,\ldots, A^{m-1} b ),\] where the approximate solution $x_m$ resides. In practice, an orthonormal basis $V_m$ that spans the subspace $\mathcal{K}_{m}$ can be generated by -the Arnoldi iteration\index{Arnoldi iterations}. +the Arnoldi iterations\index{iterative method!Arnoldi iterations}. The goal of GMRES is to search for an optimal coefficient $y$ such that the linear combination $x_m = V_m y$ will minimize its residual $\| b-Ax_m \|_2$. @@ -77,6 +77,36 @@ a preset tolerance~\cite{Golub:Book'96}. %% \end{algorithmic} %% \end{algorithm} +\begin{algorithm} +\caption{standard GMRES\index{iterative method!GMRES} algorithm} \label{alg:GMRES} + \KwIn{ $ A \in \mathbb{R}^{N \times N}$, $b \in \mathbb{R}^N$, + and initial guess $x_0 \in \mathbb{R}^N$} + \KwOut{ $x \in \mathbb{R}^N$: $\| b - A x\|_2 < tol$} + + $r = b - A x_0$\; + $h_{1,0}=\left \| r \right \|_2$\; + $m=0$\; + + \While{$m < max\_iter$} { + $m = m+1$; + $v_{m} = r / h_{m,m-1}$\; + \label{line:mvp} $r = A v_m$\; + \For{$i = 1\ldots m$} { + $h_{i,m} = \langle v_i, r \rangle$\; + $r = r - h_{i,m} v_i$\; + } + $h_{m+1,m} = \left\| r \right\|_2$\label{line:newnorm} \; + %\STATE Generate Givens rotations to triangularize $\tilde{H}_m$ + %\STATE Apply Givens rotations on $h_{1,0}e_1$ to get residual $\epsilon$ + Compute the residual $\epsilon$\; + \If{$\epsilon < tol$} { + Solve the problem: minimize $\|b-Ax_m\|_2$\; + Return $x_m = x_0 + V_m y_m$\; + } + } +\end{algorithm} + + At a first glance, the cost of using standard GMRES directly to solve the Newton update in Eq.~\eqref{eq:Newton} seems to come mainly from two parts: the @@ -130,7 +160,7 @@ period in order to solve a Newton update. At each time step, SPICE\index{SPICE} has to linearize device models, stamp matrix elements into MNA (short for modified nodal analysis\index{modified nodal analysis, or MNA}) matrices, -and solve circuit equations in its inner Newton iteration\index{Newton iteration}. +and solve circuit equations in its inner Newton iteration\index{iterative method!Newton iteration}. When convergence is attained, circuit states are saved and then next time step begins. This is also the time when we store the needed matrices @@ -195,7 +225,7 @@ Hence, in consideration of the serial nature of the trianularization, the small size of Hessenberg matrix, and the frequent inspection of values by host, it is preferable to allocate $\tilde{H}$ in CPU (host) memory. -As shown in Fig.~\ref{fig:gmres}, the memory copy from device to host +As shown in Figure~\ref{fig:gmres}, the memory copy from device to host is called each time when Arnoldi iteration generates a new vector and the orthogonalization produces the vector $h$.