ch17

[book_gpu.git] / BookGPU / Chapters / chapter11 / ch11.tex
diff --git a/BookGPU/Chapters/chapter11/ch11.tex b/BookGPU/Chapters/chapter11/ch11.tex

index 57b3328125a773cb8ae95cb5360b4975d374206c..736d68ae01fca641bd2945862d72137efeffd09b 100644 (file)
--- a/BookGPU/Chapters/chapter11/ch11.tex
+++ b/BookGPU/Chapters/chapter11/ch11.tex
@@ -45,7 +45,7 @@ In this work we examine several monotone spline fitting algorithms, and select t
  
  The rest of the chapter is organized as follows. Section \ref{ch11:splines} discusses monotone spline interpolation methods and presents two parallel algorithms. Section \ref{ch11:smoothing} deals with the smoothing problem. It presents the isotonic regression problem and discusses the Pool Adjacent Violators (PAV) and MLS algorithms. Combined with monotone spline interpolation, the parallel MLS method makes it possible to build a monotone spline approximation to noisy data entirely on GPU. Section \ref{ch11:conc} concludes.
  
  
  The rest of the chapter is organized as follows. Section \ref{ch11:splines} discusses monotone spline interpolation methods and presents two parallel algorithms. Section \ref{ch11:smoothing} deals with the smoothing problem. It presents the isotonic regression problem and discusses the Pool Adjacent Violators (PAV) and MLS algorithms. Combined with monotone spline interpolation, the parallel MLS method makes it possible to build a monotone spline approximation to noisy data entirely on GPU. Section \ref{ch11:conc} concludes.
  
-
+\clearpage
  \section{Monotone splines} \label{ch11:splines}
  
  \index{constrained splines} \index{monotonicity}
  \section{Monotone splines} \label{ch11:splines}
  
  \index{constrained splines} \index{monotonicity}
@@ -116,8 +116,8 @@ It is almost straightforward to parallelize this scheme for GPUs, by processing
  At the spline evaluation stage we need to compute $s(z_k)$ for a sequence of query values ${z_k}, k=1,\ldots,K$. For each $z_k$ we locate the interval $[t_i,t_{i+1}]$ containing $z_k$, using the bisection algorithm presented in Listing \ref{ch11:algeval}, and then apply the appropriate coefficients of the quadratic function. This is also  done in parallel.
  The bisection algorithm could be implemented using texture memory (to cache the array \texttt{z}), but this is not shown in Listing \ref{ch11:algeval}.
  
  At the spline evaluation stage we need to compute $s(z_k)$ for a sequence of query values ${z_k}, k=1,\ldots,K$. For each $z_k$ we locate the interval $[t_i,t_{i+1}]$ containing $z_k$, using the bisection algorithm presented in Listing \ref{ch11:algeval}, and then apply the appropriate coefficients of the quadratic function. This is also  done in parallel.
  The bisection algorithm could be implemented using texture memory (to cache the array \texttt{z}), but this is not shown in Listing \ref{ch11:algeval}.
  
-%\pagebreak
-\lstinputlisting[label=ch11:algcoef,caption=Implementation of the kernel for calculating spline knots and coefficients; function fmax is used to avoid division by zero for data with coinciding abscissae.]{Chapters/chapter11/code1.cu}
+\pagebreak
+\lstinputlisting[label=ch11:algcoef,caption=implementation of the kernel for calculating spline knots and coefficients; function fmax is used to avoid division by zero for data with coinciding abscissae.]{Chapters/chapter11/code1.cu}
  
  
  %% \begin{figure}[!hp]
  
  
  %% \begin{figure}[!hp]
@@ -346,7 +346,7 @@ A popular PAV algorithm (PAVA) is one method that provides efficient numerical s
  
  Various serial implementations of the PAVA exist. It is noted \cite{Kearsley_2006} that in PAVA, which is based on the ideas from convex analysis, a decomposition theorem holds, namely, performing PAVA separately on two contiguous subsets of data and then performing PAVA on the result produces isotonic regression on the whole data set. Thus, isotonic regression is parallelizable, and the divide-and-conquer approach, decomposing the original problem into two smaller subproblems, can be implemented on multiple processors. However, to our knowledge, no parallel PAVA for many-core systems such as GPUs exist.
  
  
  Various serial implementations of the PAVA exist. It is noted \cite{Kearsley_2006} that in PAVA, which is based on the ideas from convex analysis, a decomposition theorem holds, namely, performing PAVA separately on two contiguous subsets of data and then performing PAVA on the result produces isotonic regression on the whole data set. Thus, isotonic regression is parallelizable, and the divide-and-conquer approach, decomposing the original problem into two smaller subproblems, can be implemented on multiple processors. However, to our knowledge, no parallel PAVA for many-core systems such as GPUs exist.
  
-\index{MLS algorithm} \index{Minimum Lower Sets}
+\index{MLS (minimum lower sets) algorithm} 
  Another approach to isotonic regression is called the MLS algorithm 
  \cite{Best1990, Robertson_book}. It provides the same solution as the PAVA, but works differently.  For each datum (or block), MLS selects the largest contiguous block of subsequent data with the smallest average. If this average is smaller than that of the preceding block, the blocks are merged, and the data in the block are replaced with their average. MLS is also an active set method \cite{Best1990}, but its complexity is $O(n^2)$ as opposed to $O(n)$ of the PAVA, and of another active set algorithm proposed in \cite{Best1990} by the name of Algorithm A.
  
  Another approach to isotonic regression is called the MLS algorithm 
  \cite{Best1990, Robertson_book}. It provides the same solution as the PAVA, but works differently.  For each datum (or block), MLS selects the largest contiguous block of subsequent data with the smallest average. If this average is smaller than that of the preceding block, the blocks are merged, and the data in the block are replaced with their average. MLS is also an active set method \cite{Best1990}, but its complexity is $O(n^2)$ as opposed to $O(n)$ of the PAVA, and of another active set algorithm proposed in \cite{Best1990} by the name of Algorithm A.