new

[book_gpu.git] / BookGPU / Chapters / chapter11 / ch11.tex
diff --git a/BookGPU/Chapters/chapter11/ch11.tex b/BookGPU/Chapters/chapter11/ch11.tex

index 57b3328125a773cb8ae95cb5360b4975d374206c..0aa6e8cc8aec787a1721a3fdd4e4b66db81ba197 100644 (file)
--- a/BookGPU/Chapters/chapter11/ch11.tex
+++ b/BookGPU/Chapters/chapter11/ch11.tex
@@ -116,8 +116,8 @@ It is almost straightforward to parallelize this scheme for GPUs, by processing
  At the spline evaluation stage we need to compute $s(z_k)$ for a sequence of query values ${z_k}, k=1,\ldots,K$. For each $z_k$ we locate the interval $[t_i,t_{i+1}]$ containing $z_k$, using the bisection algorithm presented in Listing \ref{ch11:algeval}, and then apply the appropriate coefficients of the quadratic function. This is also  done in parallel.
  The bisection algorithm could be implemented using texture memory (to cache the array \texttt{z}), but this is not shown in Listing \ref{ch11:algeval}.
  
  At the spline evaluation stage we need to compute $s(z_k)$ for a sequence of query values ${z_k}, k=1,\ldots,K$. For each $z_k$ we locate the interval $[t_i,t_{i+1}]$ containing $z_k$, using the bisection algorithm presented in Listing \ref{ch11:algeval}, and then apply the appropriate coefficients of the quadratic function. This is also  done in parallel.
  The bisection algorithm could be implemented using texture memory (to cache the array \texttt{z}), but this is not shown in Listing \ref{ch11:algeval}.
  
-%\pagebreak
-\lstinputlisting[label=ch11:algcoef,caption=Implementation of the kernel for calculating spline knots and coefficients; function fmax is used to avoid division by zero for data with coinciding abscissae.]{Chapters/chapter11/code1.cu}
+\pagebreak
+\lstinputlisting[label=ch11:algcoef,caption=implementation of the kernel for calculating spline knots and coefficients; function fmax is used to avoid division by zero for data with coinciding abscissae.]{Chapters/chapter11/code1.cu}
  
  
  %% \begin{figure}[!hp]
  
  
  %% \begin{figure}[!hp]
@@ -346,7 +346,7 @@ A popular PAV algorithm (PAVA) is one method that provides efficient numerical s
  
  Various serial implementations of the PAVA exist. It is noted \cite{Kearsley_2006} that in PAVA, which is based on the ideas from convex analysis, a decomposition theorem holds, namely, performing PAVA separately on two contiguous subsets of data and then performing PAVA on the result produces isotonic regression on the whole data set. Thus, isotonic regression is parallelizable, and the divide-and-conquer approach, decomposing the original problem into two smaller subproblems, can be implemented on multiple processors. However, to our knowledge, no parallel PAVA for many-core systems such as GPUs exist.
  
  
  Various serial implementations of the PAVA exist. It is noted \cite{Kearsley_2006} that in PAVA, which is based on the ideas from convex analysis, a decomposition theorem holds, namely, performing PAVA separately on two contiguous subsets of data and then performing PAVA on the result produces isotonic regression on the whole data set. Thus, isotonic regression is parallelizable, and the divide-and-conquer approach, decomposing the original problem into two smaller subproblems, can be implemented on multiple processors. However, to our knowledge, no parallel PAVA for many-core systems such as GPUs exist.
  
-\index{MLS algorithm} \index{Minimum Lower Sets}
+\index{MLS (minimum lower sets) algorithm} 
  Another approach to isotonic regression is called the MLS algorithm 
  \cite{Best1990, Robertson_book}. It provides the same solution as the PAVA, but works differently.  For each datum (or block), MLS selects the largest contiguous block of subsequent data with the smallest average. If this average is smaller than that of the preceding block, the blocks are merged, and the data in the block are replaced with their average. MLS is also an active set method \cite{Best1990}, but its complexity is $O(n^2)$ as opposed to $O(n)$ of the PAVA, and of another active set algorithm proposed in \cite{Best1990} by the name of Algorithm A.
  
  Another approach to isotonic regression is called the MLS algorithm 
  \cite{Best1990, Robertson_book}. It provides the same solution as the PAVA, but works differently.  For each datum (or block), MLS selects the largest contiguous block of subsequent data with the smallest average. If this average is smaller than that of the preceding block, the blocks are merged, and the data in the block are replaced with their average. MLS is also an active set method \cite{Best1990}, but its complexity is $O(n^2)$ as opposed to $O(n)$ of the PAVA, and of another active set algorithm proposed in \cite{Best1990} by the name of Algorithm A.