In the scope of this paper, our first objective is to analyze when the Krylov
Multisplitting method has better performances than the classical GMRES
-method. With an iterative method, better performances mean a smaller number of
-iterations and execution time before reaching the convergence. For a systematic
-study, the experiments should figure out that, for various grid parameters
-values, the simulator will confirm the targeted outcomes, particularly for poor
-and slow networks, focusing on the impact on the communication performance on
-the chosen class of algorithm.
+method. With a synchronous iterative method, better performances mean a
+smaller number of iterations and execution time before reaching the convergence.
+For a systematic study, the experiments should figure out that, for various
+grid parameters values, the simulator will confirm the targeted outcomes,
+particularly for poor and slow networks, focusing on the impact on the
+communication performance on the chosen class of algorithm.
The following paragraphs present the test conditions, the output results
and our comments.\\
-\subsubsection{Execution of the the algorithms on various computational grid
-architecture and scaling up the input matrix size}
+\subsubsection{Execution of the algorithms on various computational grid
+architectures and scaling up the input matrix size}
\ \\
% environment
In this section, we analyze the performences of algorithms running on various
-grid configuration (2x16, 4x8, 4x16 and 8x8). First, the results in Figure~\ref{fig:01}
-show for all grid configuration the non-variation of the number of iterations of
-classical GMRES for a given input matrix size; it is not the case for the
+grid configurations (2x16, 4x8, 4x16 and 8x8). First, the results in Figure~\ref{fig:01}
+show for all grid configurations the non-variation of the number of iterations of
+classical GMRES for a given input matrix size; it is not the case for the
multisplitting method.
\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vérifie...}
-The execution time difference between the two algorithms is important when
-comparing between different grid architectures, even with the same number of
-processors (like 2x16 and 4x8 = 32 processors for example). The
-experiment concludes the low sensitivity of the multisplitting method
-(compared with the classical GMRES) when scaling up the number of the processors in the grid: in average, the GMRES (resp. Multisplitting) algorithm performs 40\% better (resp. 48\%) less when running from 2x16=32 to 8x8=64 processors.
+The execution times between the two algorithms is significant with different
+grid architectures, even with the same number of processors (for example, 2x16
+and 4x8). We can observ the low sensitivity of the Krylov multisplitting method
+(compared with the classical GMRES) when scaling up the number of the processors
+in the grid: in average, the GMRES (resp. Multisplitting) algorithm performs
+$40\%$ better (resp. $48\%$) when running from 2x16=32 to 8x8=64 processors.
-\textit{\\3.b Running on two different speed cluster inter-networks\\}
+\subsubsection{Running on two different inter-clusters network speed}
+\ \\
-% environment
+\begin{figure} [ht!]
\begin{tabular}{r c }
Grid & 2x16, 4x8\\ %\hline
Network & N1 : bw=10Gbs-lat=8.10$^{-6}$ \\ %\hline
- & N2 : bw=1Gbs-lat=5.10$^{-5}$ \\
- Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline \\
+ Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline
-Table 2 : Clusters x Nodes - Networks N1 x N2 \\
- \end{footnotesize}
+\caption{Clusters x Nodes - Networks N1 x N2}
\caption{Cluster x Nodes N1 x N2}
-The experiments compare the behavior of the algorithms running first on
-a speed inter- cluster network (N1) and also on a less performant network (N2).
-Figure 4 shows that end users will gain to reduce the execution time
-for both algorithms in using a grid architecture like 4x16 or 8x8: the
-performance was increased in a factor of 2. The results depict also that
-when the network speed drops down (12.5\%), the difference between the execution
-times can reach more than 25\%.
+These experiments compare the behavior of the algorithms running first on a
+speed inter-cluster network (N1) and also on a less performant network (N2).
+Figure~\ref{fig:02} shows that end users will gain to reduce the execution time
+for both algorithms in using a grid architecture like 4x16 or 8x8: the
+performance was increased by a factor of $2$. The results depict also that when
+the network speed drops down (12.5\%), the difference between the execution
+times can reach more than 25\%. \RC{c'est pas clair : la différence entre quoi et quoi?}
+\DL{pas clair}
-\textit{\\3.c Network latency impacts on performance\\}
-% environment
+\subsubsection{Network latency impacts on performance}
+\ \\
+\begin{figure} [ht!]
\begin{tabular}{r c }
Grid & 2x16\\ %\hline
Network & N1 : bw=1Gbs \\ %\hline
- Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline\\
+ Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline
-Table 3 : Network latency impact \\
+\caption{Network latency impacts}
\begin{figure} [ht!]
-\caption{Network latency impact on execution time}
+\caption{Network latency impacts on execution time}
-According the results in figure 5, degradation of the network
-latency from 8.10$^{-6}$ to 6.10$^{-5}$ implies an absolute time
-increase more than 75\% (resp. 82\%) of the execution for the classical
-GMRES (resp. multisplitting) algorithm. In addition, it appears that the
-multisplitting method tolerates more the network latency variation with
-a less rate increase of the execution time. Consequently, in the worst case (lat=6.10$^{-5
-}$), the execution time for GMRES is almost the double of the time for
-the multisplitting, even though, the performance was on the same order
-of magnitude with a latency of 8.10$^{-6}$.
+According to the results of Figure~\ref{fig:03}, a degradation of the network
+latency from $8.10^{-6}$ to $6.10^{-5}$ implies an absolute time increase of more
+than $75\%$ (resp. $82\%$) of the execution for the classical GMRES (resp. Krylov
+multisplitting) algorithm. In addition, it appears that the Krylov
+multisplitting method tolerates more the network latency variation with a less
+rate increase of the execution time. Consequently, in the worst case
+($lat=6.10^{-5 }$), the execution time for GMRES is almost the double than the
+time of the Krylov multisplitting, even though, the performance was on the same
+order of magnitude with a latency of $8.10^{-6}$.
-\textit{\\3.d Network bandwidth impacts on performance\\}
-% environment
+\subsubsection{Network bandwidth impacts on performance}
+\ \\
+\begin{figure} [ht!]
\begin{tabular}{r c }
Grid & 2x16\\ %\hline
Network & N1 : bw=1Gbs - lat=5.10$^{-5}$ \\ %\hline
Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline \\
-Table 4 : Network bandwidth impact \\
+\caption{Network bandwidth impacts}
\begin{figure} [ht!]
-\caption{Network bandwith impact on execution time}
+\caption{Network bandwith impacts on execution time}
+The results of increasing the network bandwidth show the improvement of the
+performance for both algorithms by reducing the execution time (see
+Figure~\ref{fig:04}). However, in this case, the Krylov multisplitting method
+presents a better performance in the considered bandwidth interval with a gain
+of 40\% which is only around 24\% for classical GMRES.
-The results of increasing the network bandwidth show the improvement
-of the performance for both of the two algorithms by reducing the execution time (Figure 6). However, and again in this case, the multisplitting method presents a better performance in the considered bandwidth interval with a gain of 40\% which is only around 24\% for classical GMRES.
-\textit{\\3.e Input matrix size impacts on performance\\}
-% environment
+\subsubsection{Input matrix size impacts on performance}
+\ \\
+\begin{figure} [ht!]
\begin{tabular}{r c }
Grid & 4x8\\ %\hline
- Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\ %\hline
- Input matrix size & N$_{x}$ = From 40 to 200\\ \hline \\
+ Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\
+ Input matrix size & N$_{x}$ = From 40 to 200\\ \hline
-Table 5 : Input matrix size impact\\
+\caption{Input matrix size impact}
\begin{figure} [ht!]
-\caption{Pb size impact on execution time}
+\caption{Problem size impact on execution time}
-In this experimentation, the input matrix size has been set from
-N$_{x}$ = N$_{y}$ = N$_{z}$ = 40 to 200 side elements that is from 40$^{3}$ = 64.000 to
-200$^{3}$ = 8.000.000 points. Obviously, as shown in the figure 7,
-the execution time for the two algorithms convergence increases with the
-iinput matrix size. But the interesting results here direct on (i) the
-drastic increase (300 times) of the number of iterations needed before
-the convergence for the classical GMRES algorithm when the matrix size
-go beyond N$_{x}$=150; (ii) the classical GMRES execution time also almost
-the double from N$_{x}$=140 compared with the convergence time of the
-multisplitting method. These findings may help a lot end users to setup
-the best and the optimal targeted environment for the application
-deployment when focusing on the problem size scale up. Note that the
-same test has been done with the grid 2x16 getting the same conclusion.
-\textit{\\3.f CPU Power impact on performance\\}
+In these experiments, the input matrix size has been set from N$_{x}$ = N$_{y}$
+= N$_{z}$ = 40 to 200 side elements that is from 40$^{3}$ = 64.000 to 200$^{3}$
+= 8,000,000 points. Obviously, as shown in Figure~\ref{fig:05}, the execution
+time for both algorithms increases when the input matrix size also increases.
+But the interesting results are:
+ \item the drastic increase (300 times) \RC{Je ne vois pas cela sur la figure}
+of the number of iterations needed to reach the convergence for the classical
+GMRES algorithm when the matrix size go beyond N$_{x}$=150;
+\item the classical GMRES execution time is almost the double for N$_{x}$=140
+ compared with the Krylov multisplitting method.
+These findings may help a lot end users to setup the best and the optimal
+targeted environment for the application deployment when focusing on the problem
+size scale up. It should be noticed that the same test has been done with the
+grid 2x16 leading to the same conclusion.
-% environment
+\subsubsection{CPU Power impact on performance}
+\begin{figure} [ht!]
\begin{tabular}{r c }
Grid & 2x16\\ %\hline
Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\ %\hline
Input matrix size & N$_{x}$ = 150 x 150 x 150\\ \hline
-Table 6 : CPU Power impact \\
+\caption{CPU Power impact}
\begin{figure} [ht!]
\caption{CPU Power impact on execution time}
-Using the Simgrid simulator flexibility, we have tried to determine the
-impact on the algorithms performance in varying the CPU power of the
-clusters nodes from 1 to 19 GFlops. The outputs depicted in the figure 6
-confirm the performance gain, around 95\% for both of the two methods,
-after adding more powerful CPU.
-\subsection{Comparing GMRES in native synchronous mode and
-Multisplitting algorithms in asynchronous mode}
-The previous paragraphs put in evidence the interests to simulate the
-behavior of the application before any deployment in a real environment.
-We have focused the study on analyzing the performance in varying the
-key factors impacting the results. The study compares
-the performance of the two proposed algorithms both in \textit{synchronous mode
-}. In this section, following the same previous methodology, the goal is to
-demonstrate the efficiency of the multisplitting method in \textit{
-asynchronous mode} compared with the classical GMRES staying in
-\textit{synchronous mode}.
+Using the Simgrid simulator flexibility, we have tried to determine the impact
+on the algorithms performance in varying the CPU power of the clusters nodes
+from 1 to 19 GFlops. The outputs depicted in Figure~\ref{fig:06} confirm the
+performance gain, around 95\% for both of the two methods, after adding more
+powerful CPU.
+\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
+The previous paragraphs put in evidence the interests to simulate the behavior
+of the application before any deployment in a real environment. We have focused
+the study on analyzing the performance in varying the key factors impacting the
+results. The study compares the performance of the two proposed algorithms both
+in \textit{synchronous mode }. In this section, following the same previous
+methodology, the goal is to demonstrate the efficiency of the multisplitting
+method in \textit{ asynchronous mode} compared with the classical GMRES staying
+in \textit{synchronous mode}.
Note that the interest of using the asynchronous mode for data exchange
is mainly, in opposite of the synchronous mode, the non-wait aspects of