new

[book_gpu.git] / BookGPU / Chapters / chapter8 / ch8.tex
diff --git a/BookGPU/Chapters/chapter8/ch8.tex b/BookGPU/Chapters/chapter8/ch8.tex

index 5d012eb5ec2b5c8047b425b9f43a46de1f0bb72e..f5c001ad7f962bb7e76ed71b663555e9822b4f19 100644 (file)
--- a/BookGPU/Chapters/chapter8/ch8.tex
+++ b/BookGPU/Chapters/chapter8/ch8.tex
@@ -140,7 +140,6 @@ The parallel evaluation of bounds \index{parallel evaluation of bounds} model, a
  \label{ch8:BB-FSP}
  
  \subsection{Definition of the Flowshop Scheduling Problem} 
-\label{ch8:LB-FSP}
  
  As a case study for our GPU-based Branch-and-Bound, we considered the NP-hard and well-known problem in the scheduling theory: the "Permutation Flow-shop Scheduling Problem" (FSP). 
  In this work, the mono-objective case is considered. The FSP aims to find the optimal schedule of n jobs on m machines so that the overall completion time of all jobs, called {\it makespan}, is minimized. 
@@ -169,7 +168,6 @@ Figure~\ref{flow-shop} illustrates a solution of a flow-shop problem instance de
  \vspace{0.3cm}
  
  \subsection{Lower Bound \index{Lower Bound} for the Flowshop Scheduling Problem}
-\label{ch8:LB-FSP}
  
  The lower bounding technique provides a lower bound (LB) for each sub-problem generated by the branching operator. The more the bound is accurate, the more it allows to eliminate not promising nodes from the search tree. Therefore, the efficiency of a B\&B algorithm depends strongly on the quality of its lower bound function. In this chapter, we use the lower bound proposed by Lenstra {\it et al.}~\cite{ch8:Lenstra_1978} for FSP, based on the Johnson's algorithm~\cite{ch8:Johnson_1954}.
  
@@ -464,7 +462,6 @@ CUDA enabled devices use several memory spaces, which have different characteris
  The data access optimization challenge is to find the best mapping of the data structures of the application at hand (different sizes and access frequencies) and the GPU hierarchy of memories (different sizes and access latencies). For instance, of these different memory spaces, global memory is the most plentiful but the one with the highest access latency. On the contrary, shared memory is smaller in size but has much higher bandwidth and lower latency than the global memory.
  
  \subsection{Complexity analysis of the memory usage of the Lower Bound }
-\label{ch8:MemComplex}
  
  In this section, the characteristics of the data structures used by the lower bound function are studied in terms of sizes and access frequencies. For an efficient implementation of the LB, six data structures are required: the  matrix $PTM$ of the processing times of the jobs, the matrix of lags $LM$, the Johnson's matrix $JM$, the matrix $RM$ of the earliest starting times of jobs, the matrix $QM$ of their lowest latency times and the matrix $MM$ containing the couples of machines. The complexities of the different data structures are summarized in Table~\ref{ch8:tabMemComplex} where the columns represent respectively the name of the data structure, its size and the number of times it is accessed.
  
@@ -505,7 +502,6 @@ To reduce the computation time cost of the term $\min\limits_{(i,j)\in \jmath^2,
  \end{table}
  
  \subsection{Data placement pattern of the Lower Bound on GPU}
-\label{ch8:MemComplex}
  
  This section discusses how best to map the six data structures identified above on the various kinds of memories of the GPU device.