new

[book_gpu.git] / BookGPU / Chapters / chapter16 / gpu.tex
diff --git a/BookGPU/Chapters/chapter16/gpu.tex b/BookGPU/Chapters/chapter16/gpu.tex

index bcfc686529272c248c47bcd49b7f609553c839eb..623ac81d27112085ea12e9ff159e7c2683a84449 100644 (file)
--- a/BookGPU/Chapters/chapter16/gpu.tex
+++ b/BookGPU/Chapters/chapter16/gpu.tex
@@ -15,13 +15,13 @@ Finally,  the Gear-2 integration is briefly introduced.
  \underline{G}eneralized \underline{M}inimum \underline{Res}idual,
  or GMRES method is an iterative method for solving
  systems of linear equations ($A x=b$) with dense matrix $A$.
  \underline{G}eneralized \underline{M}inimum \underline{Res}idual,
  or GMRES method is an iterative method for solving
  systems of linear equations ($A x=b$) with dense matrix $A$.
-The standard GMRES\index{GMRES} is given in Algorithm~\ref{alg:GMRES}.
-It constructs a Krylov subspace\index{Krylov subspace} with order $m$,
+The standard GMRES\index{iterative method!GMRES} is given in Algorithm~\ref{alg:GMRES}.
+It constructs a Krylov subspace\index{iterative method!Krylov subspace} with order $m$,
  \[ \mathcal{K}_m = \mathrm{span}( b, A^{} b, A^2 b,\ldots, A^{m-1} b ),\]
  where the approximate solution $x_m$ resides.
  In practice, an orthonormal basis $V_m$ that spans the
  subspace $\mathcal{K}_{m}$ can be generated by
  \[ \mathcal{K}_m = \mathrm{span}( b, A^{} b, A^2 b,\ldots, A^{m-1} b ),\]
  where the approximate solution $x_m$ resides.
  In practice, an orthonormal basis $V_m$ that spans the
  subspace $\mathcal{K}_{m}$ can be generated by
-the Arnoldi iteration\index{Arnoldi iterations}.
+the Arnoldi iterations\index{iterative method!Arnoldi iterations}.
  The goal of GMRES is to search for an optimal coefficient $y$
  such that the linear combination $x_m = V_m y$ will minimize
  its residual $\| b-Ax_m \|_2$.
  The goal of GMRES is to search for an optimal coefficient $y$
  such that the linear combination $x_m = V_m y$ will minimize
  its residual $\| b-Ax_m \|_2$.
@@ -77,6 +77,36 @@ a preset tolerance~\cite{Golub:Book'96}.
  %% \end{algorithmic}
  %% \end{algorithm}
  
  %% \end{algorithmic}
  %% \end{algorithm}
  
+\begin{algorithm}
+\caption{Standard GMRES\index{iterative method!GMRES} algorithm.} \label{alg:GMRES}
+  \KwIn{ $ A \in \mathbb{R}^{N \times N}$, $b \in \mathbb{R}^N$,
+      and initial guess $x_0 \in \mathbb{R}^N$}
+  \KwOut{ $x \in \mathbb{R}^N$: $\| b - A x\|_2 < tol$}
+
+  $r = b - A x_0$\;
+  $h_{1,0}=\left \| r \right \|_2$\;
+  $m=0$\;
+
+  \While{$m < max\_iter$} {
+    $m = m+1$;
+    $v_{m} = r / h_{m,m-1}$\;
+    \label{line:mvp} $r = A v_m$\;
+    \For{$i = 1\ldots m$} {
+      $h_{i,m} = \langle v_i, r \rangle$\;
+      $r = r - h_{i,m} v_i$\;
+    }
+    $h_{m+1,m} = \left\| r \right\|_2$\label{line:newnorm} \;
+    %\STATE Generate Givens rotations to triangularize $\tilde{H}_m$
+    %\STATE Apply Givens rotations on $h_{1,0}e_1$ to get residual $\epsilon$
+    Compute the residual $\epsilon$\;
+    \If{$\epsilon < tol$} {
+      Solve the problem: minimize $\|b-Ax_m\|_2$\;
+      Return $x_m = x_0 + V_m y_m$\;
+    }
+  }
+\end{algorithm}
+
+
  At a first glance, the cost of using standard GMRES directly to
  solve the Newton update in Eq.~\eqref{eq:Newton}
  seems to come mainly from two parts: the
  At a first glance, the cost of using standard GMRES directly to
  solve the Newton update in Eq.~\eqref{eq:Newton}
  seems to come mainly from two parts: the
@@ -130,7 +160,7 @@ period in order to solve a Newton update.
  At each time step, SPICE\index{SPICE} has
  to linearize device models, stamp matrix elements
  into MNA (short for modified nodal analysis\index{modified nodal analysis, or MNA}) matrices,
  At each time step, SPICE\index{SPICE} has
  to linearize device models, stamp matrix elements
  into MNA (short for modified nodal analysis\index{modified nodal analysis, or MNA}) matrices,
-and solve circuit equations in its inner Newton iteration\index{Newton iteration}.
+and solve circuit equations in its inner Newton iteration\index{iterative method!Newton iteration}.
  When convergence is attained,
  circuit states are saved and then next time step begins.
  This is also the time when we store the needed matrices
  When convergence is attained,
  circuit states are saved and then next time step begins.
  This is also the time when we store the needed matrices