X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/96e726f9524ffdb8eaead14a943d80870ea1da65..538afc25a7a90630d2f90891d0a4d700bfe3460f:/paper.tex?ds=sidebyside diff --git a/paper.tex b/paper.tex index c3cfdbb..a2e23a2 100644 --- a/paper.tex +++ b/paper.tex @@ -442,8 +442,6 @@ In this section, experiments for both multisplitting algorithms are reported. Fi \subsection{The 3D Poisson problem} \label{3dpoisson} - - We use our two-stage algorithms to solve the well-known Poisson problem $\nabla^2\phi=f$~\cite{Polyanin01}. In three-dimensional Cartesian coordinates in $\mathbb{R}^3$, the problem takes the following form: \begin{equation} \frac{\partial^2}{\partial x^2}\phi(x,y,z)+\frac{\partial^2}{\partial y^2}\phi(x,y,z)+\frac{\partial^2}{\partial z^2}\phi(x,y,z)=f(x,y,z)\mbox{~in the domain~}\Omega @@ -489,13 +487,7 @@ and on the other hand the execution time and the number of iterations to reach t simulated in the simulator tool to run the program. The following architectures have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number represents the number of clusters in the grid and the second number represents -the number of hosts (processors/cores) in each cluster. The network has been -designed to operate with a bandwidth equals to 10Gbits (resp. 1Gbits/s) and a -latency of 8.10$^{-6}$ seconds (resp. 5.10$^{-5}$) for the intra-clusters links -(resp. inter-clusters backbone links). \\ - -\LZK{Il me semble que le bw et lat des deux réseaux varient dans les expés d'une simu à l'autre. On vire la dernière phrase?} -\RC{il me semble qu'on peut laisser ca} +the number of hosts (processors/cores) in each cluster. \\ \textbf{Step 5}: Conduct an extensive and comprehensive testings within these configurations by varying the key parameters, especially @@ -536,60 +528,57 @@ and between distant clusters. This parameter is application dependent. a lower speed. The network between distant clusters might be a bottleneck for the global performance of the application. -\subsection{Comparison of GMRES and Krylov two-stage algorithms in synchronous mode} - -In the scope of this paper, our first objective is to analyze when the Krylov -two-stage method has better performance than the classical GMRES method. With a synchronous iterative method, better performance means a -smaller number of iterations and execution time before reaching the convergence. -In what follows, we will present the test conditions, the output results and our comments. - -%%RAPH : on vire ca, c'est pas clair et pas important -%For a systematic study, the experiments should figure out that, for various -%grid parameters values, the simulator will confirm Multisplitting method better performance compared to classical GMRES, particularly on poor and slow networks. -%\LZK{Pas du tout claire la dernière phrase (For a systematic...)!!} -%\RCE { Reformule autrement} - +\subsection{Comparison between GMRES and two-stage multisplitting algorithms in synchronous mode} +In the scope of this paper, our first objective is to analyze when the synchronous Krylov two-stage method has better performance than the classical GMRES method. With a synchronous iterative method, better performance means a smaller number of iterations and execution time before reaching the convergence. -%\subsubsection{Execution of the algorithms on various computational grid architectures and scaling up the input matrix size} -\subsubsection{Simulations for various grid architectures and scaling-up matrix sizes} -\ \\ -% environment - -\RC{Je ne comprends plus rien CE : pourquoi dans 5.4.1 il y a 2 network et aussi dans 5.4.2. Quelle est la différence? Dans la figure 3 de la section 5.4.1 pourquoi il n'y a pas N1 et N2?} +Table~\ref{tab:01} summarizes the parameters used in the different simulations: the grid architectures, the network of inter-clusters backbone links and the matrix sizes of the 3D Poisson problem. However, for all simulations we fix the network parameters of the intra-clusters links: the bandwidth $bw$=10Gbs and the latency $lat$=8$\times$10$^{-6}$. In what follows, we will present the test conditions, the output results and our comments. \begin{table} [ht!] \begin{center} -\begin{tabular}{ll } - \hline - Grid architecture & 2$\times$16, 4$\times$8, 4$\times$16 and 8$\times$8\\ %\hline - \multirow{2}{*}{Network} & Inter (N2): $bw$=1Gbs, $lat$=5$\times$10$^{-5}$ \\ %\hline - & Intra (N1): $bw$=10Gbs, $lat$=8$\times$10$^{-6}$ \\ - \multirow{2}{*}{Matrix size} & N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$ =150 $\times$ 150 $\times$ 150\\ %\hline - & N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$ =170 $\times$ 170 $\times$ 170 \\ \hline - \end{tabular} -\caption{Test conditions: various grid configurations with the matrix sizes 150$^3$ or 170$^3$} -%\LZK{Ce sont les caractéristiques du réseau intra ou inter clusters? Ce n'est pas précisé...} -%\RCE{oui c est precise} +\begin{tabular}{ll} +\hline +Grid architecture & 2$\times$16, 4$\times$8, 4$\times$16 and 8$\times$8\\ +\multirow{2}{*}{Network inter-clusters} & $N1$: $bw$=1Gbs, $lat$=5$\times$10$^{-5}$ \\ + & $N2$: $bw$=10Gbs, $lat$=8$\times$10$^{-6}$ \\ +\multirow{2}{*}{Matrix size} & $Mat1$: N$_{x}\times$N$_{y}\times$N$_{z}$=150$\times$150$\times$150\\ + & $Mat2$: N$_{x}\times$N$_{y}\times$N$_{z}$=170$\times$170$\times$170 \\ \hline +\end{tabular} +\caption{Parameters for the different simulations} \label{tab:01} \end{center} \end{table} + +\subsubsection{Simulations for various grid architectures and scaling-up matrix sizes} +\ \\ +% environment +In this section, we analyze the simulations conducted on various grid configurations and for different sizes of the 3D Poisson problem. The parameters of the network between clusters is fixed to $N1$ (see Table~\ref{tab:01}. Figure~\ref{fig:01} shows, for all grid configurations and a given matrix size 170$^3$ elements, a non-variation in the number of iterations for the classical GMRES algorithm, which is not the case of the Krylov two-stage algorithm. + + + + + + + + + + + + + + + + + + + + + + + + -In this section, we analyze the simulations conducted on various grid -configurations presented in Table~\ref{tab:01}. It should be noticed that two -networks are considered: N1 is the network between clusters (inter-cluster) and -N2 is the network inside a cluster (intra-cluster). Figure~\ref{fig:01} shows, -for all grid configurations and a given matrix size, a non-variation in the -number of iterations for the classical GMRES algorithm, which is not the case of -the Krylov two-stage algorithm. -%% First, the results in Figure~\ref{fig:01} -%% show for all grid configurations the non-variation of the number of iterations of -%% classical GMRES for a given input matrix size; it is not the case for the -%% multisplitting method. -%\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vérifie...} -%\RC{Les légendes ne sont pas explicites...} -%\RCE{Corrige} \begin{figure} [htbp] \begin{center} @@ -644,9 +633,9 @@ the network speed drops down (variation of 12.5\%), the difference between t \begin{figure} [htbp] \centering \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf} -\caption{Various grid configurations with networks N1 vs N2 -\AG{\np{8E-6}, \np{5E-6} au lieu de 8E-6, 5E-6}} -\RCE{Corrige} +\caption{Various grid configurations with networks N1 vs N2} +%\AG{\np{8E-6}, \np{5E-6} au lieu de 8E-6, 5E-6}} +%\RCE{Corrige} \label{fig:02} \end{figure} %\end{wrapfigure} @@ -670,18 +659,19 @@ the network speed drops down (variation of 12.5\%), the difference between t \begin{figure} [htbp] \centering \includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf} -\caption{Network latency impacts on execution time -\AG{\np{E-6}}} +\caption{Network latency impacts on execution time} +%\AG{\np{E-6}}} \label{fig:03} \end{figure} -According to the results of Figure~\ref{fig:03}, a degradation of the network -latency from $8.10^{-6}$ to $6.10^{-5}$ implies an absolute time increase of -more than $75\%$ (resp. $82\%$) of the execution for the classical GMRES -(resp. Krylov multisplitting) algorithm which means that the GMRES seems tolerate more the network latency variation with a less rate increase of the execution time. However, the execution time factor between the two algorithms varies from 2.2 to 1.5 times with a network latency decreasing from $8.10^{-6}$ to $6.10^{-5}$. +In Table~\ref{tab:03}, parameters for the influence of the network latency are +reported. According to the results of Figure~\ref{fig:03}, a degradation of the +network latency from $8.10^{-6}$ to $6.10^{-5}$ implies an absolute time +increase of more than $75\%$ (resp. $82\%$) of the execution for the classical +GMRES (resp. Krylov multisplitting) algorithm. The execution time factor +between the two algorithms varies from 2.2 to 1.5 times with a network latency +decreasing from $8.10^{-6}$ to $6.10^{-5}$. -\RC{Les 2 précédentes phrases me semblent en contradiction....} -\RCE{Reformule} \subsubsection{Network bandwidth impacts on performance} \ \\ @@ -694,8 +684,9 @@ more than $75\%$ (resp. $82\%$) of the execution for the classical GMRES & $lat$= 5.10$^{-5}$ second \\ Input matrix size & $N_{x} \times N_{y} \times N_{z} =150 \times 150 \times 150$\\ \hline \\ \end{tabular} -\caption{Test conditions: Network bandwidth impacts\RC{Qu'est ce qui varie ici? Il n'y a pas de variation dans le tableau}} -\RCE{C est le bw} +\caption{Test conditions: Network bandwidth impacts} +% \RC{Qu'est ce qui varie ici? Il n'y a pas de variation dans le tableau} +%\RCE{C est le bw} \label{tab:04} \end{table} @@ -723,7 +714,7 @@ of $40\%$ which is only around $24\%$ for the classical GMRES. \hline Grid Architecture & 4 $\times$ 8\\ %\hline Inter Network & $bw$=1Gbs - $lat$=5.10$^{-5}$ \\ - Input matrix size & $N_{x} \times N_{y} \times N_{z}$ = From 40$^{3}$ to 200$^{3}$\\ \hline + Input matrix size & $N_{x} \times N_{y} \times N_{z}$ = From 50$^{3}$ to 190$^{3}$\\ \hline \end{tabular} \caption{Test conditions: Input matrix size impacts} \label{tab:05} @@ -737,20 +728,15 @@ of $40\%$ which is only around $24\%$ for the classical GMRES. \label{fig:05} \end{figure} -In these experiments, the input matrix size has been set from $N_{x} = N_{y} -= N_{z} = 40$ to $200$ side elements that is from $40^{3} = 64.000$ to $200^{3} -= 8,000,000$ points. Obviously, as shown in Figure~\ref{fig:05}, the execution -time for both algorithms increases when the input matrix size also increases. -But the interesting results are: -\begin{enumerate} - \item the important increase ($10$ times) of the number of iterations needed to - reach the convergence for the classical GMRES algorithm particularly, when the matrix size - go beyond $N_{x}=150$; \RC{C'est toujours pas clair... ok le nommbre d'itérations est 10 fois plus long mais la suite de la phrase ne veut rien dire} - \RCE{Le nombre d'iterations augmente de 10 fois, cela surtout a partir de N=150} - -\item the classical GMRES execution time is almost the double for $N_{x}=140$ - compared with the Krylov multisplitting method. -\end{enumerate} +In these experiments, the input matrix size has been set from $50^3$ to +$190^3$. Obviously, as shown in Figure~\ref{fig:05}, the execution time for both +algorithms increases when the input matrix size also increases. For all problem +sizes, GMRES is always slower than the Krylov multisplitting. Moreover, for this +benchmark, it seems that the greater the problem size is, the bigger the ratio +between both algorithm execution times is. We can also observ that for some +problem sizes, the Krylov multisplitting convergence varies quite a +lot. Consequently the execution times in that cases also varies. + These findings may help a lot end users to setup the best and the optimal targeted environment for the application deployment when focusing on the problem