-\begin{figure}
-\centering
-\includegraphics[height=5cm]{Chapters/chapter19/fig/scoo-a.pdf}
-\caption{Performance comparison of SCOO and other GPU formats for each test matrix on a Fermi Tesla C2075 (ECC disabled).}
-\label{fig:scoo-vs-gpu}
-\end{figure}
-
-\subsubsection{Performance comparison to existing GPU formats}
-We compare the performance of our SCOO format to available SpMV implementations on both GPU and CPU. The set of selected test matrices are collected from the University of Florida Sparse Matrix Collection \cite{ch19:matrix-collection}. We have chosen the biggest matrices from different areas that with their corresponding input and output vector can still fit into the 6GB global memory of a C2075 GPU. Table \ref{table:matrices} gives an overview of those matrices.
-
-We compare the SCOO format to the CSR, COO and HYB format of Cusp 0.3.0. Other Cusp formats are not able to run on the large tested matrices that we selected. The results are shown in Figure \ref{fig:scoo-vs-gpu}. The performance are in terms of Gflop/s which is based on the assumption of two flops per non-zero entry of the matrix \cite{ch19:nvidia-spmv, ch19:bellpack}.
-
-The SCOO format achieves a stable performance for different matrices in single-precision mode. In most cases a performance of over 10 Gflop/s can be sustained. For some highly unstructured matrices such as \emph{GL7d19}, \emph{wikipedia-20070206}, \emph{rgg\_n\_2\_24\_s0} and \emph{kron\_g500-logn21} SCOO achieves high speedups ranging from 3 to 6 compared to the best performaning Cusp format.
-
-For most matrices, HYB produces the best performance among the tested Cusp formats. HYB is able to outperform SCOO only for two matrices: \emph{nlpkkt120} and \emph{nlpkkt160}. Both matrices have a similar structure i.e. they consist of consecutive rows that have a very similar number of non-zero coefficients which is suitable to be stored in the ELL section of the HYB format. Moreover the non-zeros are close to each other facilitating coaleasing and cache-friendly access patterns by nature. SCOO is able to outperform COO and CSR for all tested matrices.
-
-In matrix $Relat9$ we observe some patterns but the matrix is still generally unstructured, thus SCOO is able to achieve about 2 times speed up compared to HYB which is the best among tested Cusp formats in this case. The average speedup of SCOO for the tested matrices is 3.0 compared to CSR, 5.02 compared to COO, 2.15 compared to HYB.
-
-We show the visualization of sparse matrices \emph{nlpkkt120}, \emph{Relat9}, \emph{GL7d19} in Figure \ref{fig:mat-str}, \ref{fig:mat-mid}, {fig:mat-unstr} using MatView \cite{ch19:matview}. The white color represents zero entries, gray color represents non-zero entries.