\begin{tabular}{r c }
\hline
Grid & 2x16, 4x8, 4x16 and 8x8\\ %\hline
- Network & N2 : bw=1Gbits/s - lat=\np{5E-5} \\ %\hline
+ Network & N2 : bw=1Gbits/s - lat=5.10$^{-5}$ \\ %\hline
Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ %\hline
- & N$_{x}$ x N$_{y}$ x N$_{z}$ =170 x 170 x 170 \\ \hline
\end{tabular}
\begin{figure} [ht!]
\centering
\includegraphics[width=100mm]{cluster_x_nodes_nx_150_and_nx_170.pdf}
-\caption{Cluster x Nodes NX=150 and NX=170}
+\caption{Cluster x Nodes N$_{x}$=150 and N$_{x}$=170}
%\label{overflow}}
\end{figure}
%\end{wrapfigure}
\begin{tabular}{r c }
\hline
Grid & 2x16, 4x8\\ %\hline
- Network & N1 : bw=10Gbs-lat=8E-06 \\ %\hline
- - & N2 : bw=1Gbs-lat=5E-05 \\
- Input matrix size & N$_{x}$ =150 x 150 x 150\\ \hline \\
+ Network & N1 : bw=10Gbs-lat=8.10$^{-6}$ \\ %\hline
+ - & N2 : bw=1Gbs-lat=5.10$^{-5}$ \\
+ Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline \\
\end{tabular}
Table 2 : Clusters x Nodes - Networks N1 x N2 \\
%\end{wrapfigure}
The experiments compare the behavior of the algorithms running first on
-speed inter- cluster network (N1) and a less performant network (N2).
-The figure 2 shows that end users will gain to reduce the execution time
+a speed inter- cluster network (N1) and a less performant network (N2).
+Figure 4 shows that end users will gain to reduce the execution time
for both algorithms in using a grid architecture like 4x16 or 8x8: the
performance was increased in a factor of 2. The results depict also that
when the network speed drops down, the difference between the execution
\hline
Grid & 2x16\\ %\hline
Network & N1 : bw=1Gbs \\ %\hline
- Input matrix size & N$_{x}$ =150 x 150 x 150\\ \hline\\
+ Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline\\
\end{tabular}
-
Table 3 : Network latency impact \\
\end{footnotesize}
\end{figure}
-According the results in table and figure 3, degradation of the network
+According the results in table and figure 5, degradation of the network
latency from 8.10$^{-6}$ to 6.10$^{-5}$ implies an absolute time
increase more than 75\% (resp. 82\%) of the execution for the classical
GMRES (resp. multisplitting) algorithm. In addition, it appears that the
multisplitting method tolerates more the network latency variation with
-a less rate increase. Consequently, in the worst case (lat=6.10$^{-5
+a less rate increase of the execution time. Consequently, in the worst case (lat=6.10$^{-5
}$), the execution time for GMRES is almost the double of the time for
the multisplitting, even though, the performance was on the same order
of magnitude with a latency of 8.10$^{-6}$.