+\begin{table}
+\begin{center}
+\begin{tabular}{|c||c|c|c||c|c|c|}
+\hline
+\multirow{2}{*}{Matrix} & \multicolumn{3}{c||}{GPU version} & \multicolumn{3}{c|}{CPU version} \\ \cline{2-7}
+ & $Time_{comput}$ & $Time_{comm}$ & $Ratio$ & $Time_{comput}$ & $Time_{comm}$ & $Ratio$ \\ \hline \hline
+2cubes\_sphere & 28.440 s & 7.768 s & {\bf 3.661} & 327.109 s & 63.788 s & {\bf 5.128}\\
+ecology2 & 3.652 s & 0.757 s & {\bf 4.823} & 63.632 s & 13.520 s & {\bf 4.707}\\
+finan512 & 7.579 s & 4.569 s & {\bf 1.659} & 74.120 s & 22.505 s & {\bf 3.294}\\
+G3\_circuit & 4.876 s & 8.745 s & {\bf 0.558} & 72.280 s & 28.395 s & {\bf 2.546}\\
+shallow\_water2 & 3.146 s & 0.606 s & {\bf 5.191} & 52.903 s & 11.177 s & {\bf 4.733}\\
+thermal2 & 6.473 s & 4.325 s & {\bf 1.497} & 81.171 s & 20.907 s & {\bf 3.882}\\ \hline \hline
+cage13 & 11.676 s & 7.723 s & {\bf 1.512} & 145.755 s & 46.547 s & {\bf 3.131}\\
+crashbasis & 42.799 s & 29.399 s & {\bf 1.456} & 650.386 s & 203.918 s & {\bf 3.189}\\
+FEM\_3D\_thermal2 & 29.875 s & 8.915 s & {\bf 3.351} & 382.887 s & 93.252 s & {\bf 4.106}\\
+language & 20.991 s & 11.197 s & {\bf 1.875} & 310.679 s & 82.480 s & {\bf 3.767}\\
+poli\_large & 13.817 s & 102.760 s & {\bf 0.134} & 197.508 s & 151.672 s & {\bf 1.302}\\
+torso3 & 57.469 s & 16.828 s & {\bf 3.415} & 926.588 s & 242.721 s & {\bf 3.817}\\ \hline
+\end{tabular}
+\caption{Ratios of the computation time over the communication time obtained from the parallel GMRES algorithm using hypergraph partitioning and compressed format for vectors on 12 GPUs and 24 CPUs.}
+\label{tab:12}
+\end{center}
+\end{table}
+
+\begin{figure}
+\centering
+ \includegraphics[width=120mm,keepaspectratio]{weak}
+\caption{Weak scaling of the parallel GMRES algorithm on a GPU cluster.}
+\label{fig:09}
+\end{figure}
+
+\textcolor{red}{\bf Figure~\ref{fig:09} presents the weak scaling of four versions of the parallel GMRES algorithm on a GPU cluster. We fixed the size of a sub-matrix to 5 million of rows per GPU computing node. We used matrices having five bands generated from the symmetric matrix thermal2. This figure shows that the parallel GMRES algorithm in its naive version or using either the compression format for vectors or the hypergraph partitioning is not scalable on a GPU cluster due to the large amount of communications between GPUs. In contrast, we can see that the algorithm using both optimization techniques is fairly scalable. This means in this version the cost of communications is relatively constant regardless the number of computing nodes in the cluster.}
+