X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/b0ab45cb63050774fc164efad1e58659b71acfa6..f1ca3116c910d634d5282b5b3e4dc929cae46560:/paper.tex diff --git a/paper.tex b/paper.tex index 35ab88f..1391f0f 100644 --- a/paper.tex +++ b/paper.tex @@ -588,6 +588,7 @@ efficient for distributed systems with high latency networks. \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf} \caption{Various grid configurations with networks $N1$ vs. $N2$} \LZK{CE, remplacer les ``,'' des décimales par un ``.''} +\RCE{ok} \label{fig:02} \end{figure} @@ -639,7 +640,7 @@ both GMRES and Krylov two-stage algorithms is in accordance with larger real executions on large scale supercomputers~\cite{couturier15}. -\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode} +\subsection{Comparison between synchronous GMRES and asynchronous two-stage multisplitting algorithms} The previous paragraphs put in evidence the interests to simulate the behavior of the application before any deployment in a real environment. In this @@ -654,42 +655,34 @@ synchronization with the other processors. Thus, the asynchronous may theoretically reduce the overall execution time and can improve the algorithm performance. -In this section, the Simgrid simulator is used to compare the behavior of the -multisplitting in asynchronous mode with GMRES in synchronous mode. Several -benchmarks have been performed with various combination of the grid resources -(CPU, Network, input matrix size, \ldots ). The test conditions are summarized -in Table~\ref{tab:07}. In order to compare the execution times, this table +In this section, the SimGrid simulator is used to compare the behavior of the +two-stage algorithm in asynchronous mode with GMRES in synchronous mode. Several +benchmarks have been performed with various combinations of the grid resources +(CPU, Network, matrix size, \ldots). The test conditions are summarized +in Table~\ref{tab:02}. In order to compare the execution times, Table~\ref{tab:03} reports the relative gain between both algorithms. It is defined by the ratio between the execution time of GMRES and the execution time of the -multisplitting. The ratio is greater than one because the asynchronous +multisplitting. +\LZK{Quelle table repporte les gains relatifs?? Sûrement pas Table II !!} +\RCE{Table III avec la nouvelle numerotation} +The ratio is greater than one because the asynchronous multisplitting version is faster than GMRES. - - -\begin{table} [htbp] +\begin{table}[htbp] \centering -\begin{tabular}{r c } +\begin{tabular}{ll} \hline - Grid Architecture & 2 $\times$ 50 totaling 100 processors\\ %\hline - Processors Power & 1 GFlops to 1.5 GFlops\\ - Intra-Network & bw=1.25 Gbits - lat=5.10$^{-5}$ \\ %\hline - Inter-Network & bw=5 Mbits - lat=2.10$^{-2}$\\ - Input matrix size & $N_{x}$ = From 62 to 150\\ %\hline - Residual error precision & 10$^{-5}$ to 10$^{-9}$\\ \hline \\ + Grid architecture & 2$\times$50 totaling 100 processors\\ + Processors Power & 1 GFlops to 1.5 GFlops \\ + \multirow{2}{*}{Network inter-clusters} & $bw$=1.25 Gbits, $lat=50\mu$s \\ + & $bw$=5 Mbits, $lat=20ms$s\\ + Matrix size & from $62^3$ to $150^3$\\ + Residual error precision & $10^{-5}$ to $10^{-9}$\\ \hline \\ \end{tabular} -\caption{Test conditions: GMRES in synchronous mode vs Krylov Multisplitting in asynchronous mode} -\label{tab:07} +\caption{Test conditions: GMRES in synchronous mode vs. Krylov two-stage in asynchronous mode} +\label{tab:02} \end{table} -Again, comprehensive and extensive tests have been conducted with different -parameters as the CPU power, the network parameters (bandwidth and latency) -and with different problem size. The relative gains greater than $1$ between the -two algorithms have been captured after each step of the test. In -Table~\ref{tab:08} are reported the best grid configurations allowing -the multisplitting method to be more than $2.5$ times faster than the -classical GMRES. These experiments also show the relative tolerance of the -multisplitting algorithm when using a low speed network as usually observed with -geographically distant clusters through the internet. % use the same column width for the following three tables \newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}} @@ -727,15 +720,24 @@ geographically distant clusters through the internet. \hline \end{mytable} %\end{table} - \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES} - \label{tab:08} + \caption{Relative gains of the two-stage multisplitting algorithm compared with the classical GMRES} + \label{tab:03} \end{table} +Again, comprehensive and extensive tests have been conducted with different +parameters as the CPU power, the network parameters (bandwidth and latency) +and with different problem size. The relative gains greater than $1$ between the +two algorithms have been captured after each step of the test. In +Table~\ref{tab:08} are reported the best grid configurations allowing +the two-stage multisplitting algorithm to be more than $2.5$ times faster than the +classical GMRES. These experiments also show the relative tolerance of the +multisplitting algorithm when using a low speed network as usually observed with +geographically distant clusters through the internet. -\section{Conclusion} +\section{Conclusion} In this paper we have presented the simulation of the execution of three -different parallel solvers on some multi-core architectures. We have show that +different parallel solvers on some multi-core architectures. We have shown that the SimGrid toolkit is an interesting simulation tool that has allowed us to determine which method to choose given a specified multi-core architecture. Moreover the simulated results are in accordance (i.e. with the same order of @@ -757,7 +759,7 @@ converge and so to very different execution times. In future works, we plan to investigate how to simulate the behavior of really large scale applications. For example, if we are interested to simulate the execution of the solvers of this paper with thousand or even dozens of thousands -or core, it is not possible to do that with SimGrid. In fact, this tool will +of cores, it is not possible to do that with SimGrid. In fact, this tool will make the real computation. So we plan to focus our research on that problematic.