In the scope of this paper, our first objective is to analyze when the Krylov
Multisplitting method has better performances than the classical GMRES
-method. With an iterative method, better performances mean a smaller number of
-iterations and execution time before reaching the convergence. For a systematic
-study, the experiments should figure out that, for various grid parameters
-values, the simulator will confirm the targeted outcomes, particularly for poor
-and slow networks, focusing on the impact on the communication performance on
-the chosen class of algorithm.
+method. With a synchronous iterative method, better performances mean a
+smaller number of iterations and execution time before reaching the convergence.
+For a systematic study, the experiments should figure out that, for various
+grid parameters values, the simulator will confirm the targeted outcomes,
+particularly for poor and slow networks, focusing on the impact on the
+communication performance on the chosen class of algorithm.
The following paragraphs present the test conditions, the output results
and our comments.\\
-\subsubsection{Execution of the the algorithms on various computational grid
-architecture and scaling up the input matrix size}
+\subsubsection{Execution of the algorithms on various computational grid
+architectures and scaling up the input matrix size}
\ \\
% environment
In this section, we analyze the performences of algorithms running on various
-grid configuration (2x16, 4x8, 4x16 and 8x8). First, the results in Figure~\ref{fig:01}
-show for all grid configuration the non-variation of the number of iterations of
-classical GMRES for a given input matrix size; it is not the case for the
+grid configurations (2x16, 4x8, 4x16 and 8x8). First, the results in Figure~\ref{fig:01}
+show for all grid configurations the non-variation of the number of iterations of
+classical GMRES for a given input matrix size; it is not the case for the
multisplitting method.
\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vérifie...}
and 4x8). We can observ the low sensitivity of the Krylov multisplitting method
(compared with the classical GMRES) when scaling up the number of the processors
in the grid: in average, the GMRES (resp. Multisplitting) algorithm performs
-40\% better (resp. 48\%) less when running from 2x16=32 to 8x8=64 processors.
+$40\%$ better (resp. $48\%$) when running from 2x16=32 to 8x8=64 processors.
-\subsubsection{Running on two different speed cluster inter-networks}
+\subsubsection{Running on two different inter-clusters network speed}
\ \\
\begin{figure} [ht!]
Grid & 2x16, 4x8\\ %\hline
Network & N1 : bw=10Gbs-lat=8.10$^{-6}$ \\ %\hline
- & N2 : bw=1Gbs-lat=5.10$^{-5}$ \\
- Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline
+ Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline
\end{tabular}
\caption{Clusters x Nodes - Networks N1 x N2}
\end{center}
speed inter-cluster network (N1) and also on a less performant network (N2).
Figure~\ref{fig:02} shows that end users will gain to reduce the execution time
for both algorithms in using a grid architecture like 4x16 or 8x8: the
-performance was increased in a factor of 2. The results depict also that when
+performance was increased by a factor of $2$. The results depict also that when
the network speed drops down (12.5\%), the difference between the execution
times can reach more than 25\%. \RC{c'est pas clair : la différence entre quoi et quoi?}
+\DL{pas clair}
\subsubsection{Network latency impacts on performance}
\ \\
Network & N1 : bw=1Gbs \\ %\hline
Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline
\end{tabular}
-\caption{Network latency impact}
+\caption{Network latency impacts}
\end{figure}
\begin{figure} [ht!]
\centering
\includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf}
-\caption{Network latency impact on execution time}
+\caption{Network latency impacts on execution time}
\label{fig:03}
\end{figure}
-According the results in Figure~\ref{fig:03}, a degradation of the network
-latency from 8.10$^{-6}$ to 6.10$^{-5}$ implies an absolute time increase more
-than 75\% (resp. 82\%) of the execution for the classical GMRES (resp. Krylov
+According to the results of Figure~\ref{fig:03}, a degradation of the network
+latency from $8.10^{-6}$ to $6.10^{-5}$ implies an absolute time increase of more
+than $75\%$ (resp. $82\%$) of the execution for the classical GMRES (resp. Krylov
multisplitting) algorithm. In addition, it appears that the Krylov
multisplitting method tolerates more the network latency variation with a less
rate increase of the execution time. Consequently, in the worst case
-(lat=6.10$^{-5 }$), the execution time for GMRES is almost the double than the
+($lat=6.10^{-5 }$), the execution time for GMRES is almost the double than the
time of the Krylov multisplitting, even though, the performance was on the same
-order of magnitude with a latency of 8.10$^{-6}$.
+order of magnitude with a latency of $8.10^{-6}$.
\subsubsection{Network bandwidth impacts on performance}
\ \\
Network & N1 : bw=1Gbs - lat=5.10$^{-5}$ \\ %\hline
Input matrix size & N$_{x}$ x N$_{y}$ x N$_{z}$ =150 x 150 x 150\\ \hline \\
\end{tabular}
-\caption{Network bandwidth impact}
+\caption{Network bandwidth impacts}
\end{figure}
\begin{figure} [ht!]
\centering
\includegraphics[width=100mm]{network_bandwith_impact_on_execution_time.pdf}
-\caption{Network bandwith impact on execution time}
+\caption{Network bandwith impacts on execution time}
\label{fig:04}
\end{figure}
-
-
The results of increasing the network bandwidth show the improvement of the
performance for both algorithms by reducing the execution time (see
Figure~\ref{fig:04}). However, in this case, the Krylov multisplitting method
\begin{tabular}{r c }
\hline
Grid & 4x8\\ %\hline
- Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\
+ Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\
Input matrix size & N$_{x}$ = From 40 to 200\\ \hline
\end{tabular}
\caption{Input matrix size impact}
\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
The previous paragraphs put in evidence the interests to simulate the behavior
- of the application before any deployment in a real environment. We have focused
- the study on analyzing the performance in varying the key factors impacting the
- results. The study compares the performance of the two proposed algorithms both
- in \textit{synchronous mode }. In this section, following the same previous
- methodology, the goal is to demonstrate the efficiency of the multisplitting
- method in \textit{ asynchronous mode} compared with the classical GMRES staying
- in \textit{synchronous mode}.
-
- Note that the interest of using the asynchronous mode for data exchange
- is mainly, in opposite of the synchronous mode, the non-wait aspects of
- the current computation after a communication operation like sending
- some data between nodes. Each processor can continue their local
- calculation without waiting for the end of the communication. Thus, the
- asynchronous may theoretically reduce the overall execution time and can
- improve the algorithm performance.
-
- As stated supra, Simgrid simulator tool has been used to prove the
- efficiency of the multisplitting in asynchronous mode and to find the
- best combination of the grid resources (CPU, Network, input matrix size,
- \ldots ) to get the highest \textit{"relative gain"} (exec\_time$_{GMRES}$ / exec\_time$_{multisplitting}$) in comparison with the classical GMRES time.
+ of the application before any deployment in a real environment. In this
+ section, following the same previous methodology, our goal is to compare the
+ efficiency of the multisplitting method in \textit{ asynchronous mode} with the
+ classical GMRES in \textit{synchronous mode}.
+
+ The interest of using an asynchronous algorithm is that there is no more
+ synchronization. With geographically distant clusters, this may be essential.
+ In this case, each processor can compute its iteration freely without any
+ synchronization with the other processors. Thus, the asynchronous may
+ theoretically reduce the overall execution time and can improve the algorithm
+ performance.
+
+ \RC{la phrase suivante est bizarre, je ne comprends pas pourquoi elle vient ici}
+ As stated before, the Simgrid simulator tool has been successfully used to show
+ the efficiency of the multisplitting in asynchronous mode and to find the best
+ combination of the grid resources (CPU, Network, input matrix size, \ldots ) to
+ get the highest \textit{"relative gain"} (exec\_time$_{GMRES}$ /
+ exec\_time$_{multisplitting}$) in comparison with the classical GMRES time.
The test conditions are summarized in the table below : \\
- % environment
- \begin{footnotesize}
+ \begin{figure} [ht!]
+ \centering
\begin{tabular}{r c }
\hline
Grid & 2x50 totaling 100 processors\\ %\hline
Input matrix size & N$_{x}$ = From 62 to 150\\ %\hline
Residual error precision & 10$^{-5}$ to 10$^{-9}$\\ \hline \\
\end{tabular}
- \end{footnotesize}
+ \end{figure}
- Again, comprehensive and extensive tests have been conducted varying the
- CPU power and the network parameters (bandwidth and latency) in the
- simulator tool with different problem size. The relative gains greater
- than 1 between the two algorithms have been captured after each step of
- the test. Table 7 below has recorded the best grid configurations
- allowing the multisplitting method execution time more performant 2.5 times than
- the classical GMRES execution and convergence time. The experimentation has demonstrated the relative multisplitting algorithm tolerance when using a low speed network that we encounter usually with distant clusters thru the internet.
+ Again, comprehensive and extensive tests have been conducted with different
+ parametes as the CPU power, the network parameters (bandwidth and latency) in
+ the simulator tool and with different problem size. The relative gains greater
+ than 1 between the two algorithms have been captured after each step of the
+ test. In Figure~\ref{table:01} are reported the best grid configurations
+ allowing the multisplitting method to be more than 2.5 times faster than the
+ classical GMRES. These experiments also show the relative tolerance of the
+ multisplitting algorithm when using a low speed network as usually observed with
+ geographically distant clusters throuth the internet.
% use the same column width for the following three tables
\newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}}
\end{tabular}}
- \begin{table}[!t]
- \centering
+ \begin{figure}[!t]
+ \centering
+ %\begin{table}
% \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES}
% \label{"Table 7"}
- Table 7. Relative gain of the multisplitting algorithm compared with
- the classical GMRES \\
-
- \begin{mytable}{11}
+ \begin{mytable}{11}
\hline
bandwidth (Mbit/s)
& 5 & 5 & 5 & 5 & 5 & 50 & 50 & 50 & 50 & 50 \\
& 2.52 & 2.55 & 2.52 & 2.57 & 2.54 & 2.53 & 2.51 & 2.58 & 2.55 & 2.54 \\
\hline
\end{mytable}
- \end{table}
+ %\end{table}
+ \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES}
+ \label{table:01}
+ \end{figure}
+
\section{Conclusion}
CONCLUSION