X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/0ff5badea3e5156e9795cf5724f3dffed49ca7b5..2110422a08b1d1879e5dd4435e37b6f372327aa1:/paper.tex?ds=sidebyside diff --git a/paper.tex b/paper.tex index 523716f..b9187dd 100644 --- a/paper.tex +++ b/paper.tex @@ -495,6 +495,7 @@ latency of 8.10$^{-6}$ seconds (resp. 5.10$^{-5}$) for the intra-clusters links (resp. inter-clusters backbone links). \\ \LZK{Il me semble que le bw et lat des deux réseaux varient dans les expés d'une simu à l'autre. On vire la dernière phrase?} +\RC{il me semble qu'on peut laisser ca} \textbf{Step 5}: Conduct an extensive and comprehensive testings within these configurations by varying the key parameters, especially @@ -540,12 +541,15 @@ and between distant clusters. This parameter is application dependent. In the scope of this paper, our first objective is to analyze when the Krylov two-stage method has better performance than the classical GMRES method. With a synchronous iterative method, better performance means a smaller number of iterations and execution time before reaching the convergence. -For a systematic study, the experiments should figure out that, for various -grid parameters values, the simulator will confirm Multisplitting method better performance compared to classical GMRES, particularly on poor and slow networks. -\LZK{Pas du tout claire la dernière phrase (For a systematic...)!!} -\RCE { Reformule autrement} +In what follows, we will present the test conditions, the output results and our comments. + +%%RAPH : on vire ca, c'est pas clair et pas important +%For a systematic study, the experiments should figure out that, for various +%grid parameters values, the simulator will confirm Multisplitting method better performance compared to classical GMRES, particularly on poor and slow networks. +%\LZK{Pas du tout claire la dernière phrase (For a systematic...)!!} +%\RCE { Reformule autrement} + -In what follows, we will present the test conditions, the output results and our comments.\\ %\subsubsection{Execution of the algorithms on various computational grid architectures and scaling up the input matrix size} \subsubsection{Simulations for various grid architectures and scaling-up matrix sizes} @@ -563,33 +567,42 @@ In what follows, we will present the test conditions, the output results and our & N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$ =170 $\times$ 170 $\times$ 170 \\ \hline \end{tabular} \caption{Test conditions: various grid configurations with the matrix sizes 150$^3$ or 170$^3$} -\LZK{Ce sont les caractéristiques du réseau intra ou inter clusters? Ce n'est pas précisé...} -\RCE{oui c est precise} +%\LZK{Ce sont les caractéristiques du réseau intra ou inter clusters? Ce n'est pas précisé...} +%\RCE{oui c est precise} \label{tab:01} \end{center} \end{table} -In this section, we analyze the simulations conducted on various grid configurations presented in Table~\ref{tab:01}. Figure~\ref{fig:01} shows, for all grid configurations and a given matrix size, a non-variation in the number of iterations for the classical GMRES algorithm, which is not the case of the Krylov two-stage algorithm. +In this section, we analyze the simulations conducted on various grid +configurations presented in Table~\ref{tab:01}. It should be noticed that two +networks are considered: N1 is the network between clusters (inter-cluster) and +N2 is the network inside a cluster (intra-cluster). Figure~\ref{fig:01} shows, +for all grid configurations and a given matrix size, a non-variation in the +number of iterations for the classical GMRES algorithm, which is not the case of +the Krylov two-stage algorithm. %% First, the results in Figure~\ref{fig:01} %% show for all grid configurations the non-variation of the number of iterations of %% classical GMRES for a given input matrix size; it is not the case for the %% multisplitting method. -\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vérifie...} -\RC{Les légendes ne sont pas explicites...} -\RCE{Corrige} +%\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vérifie...} +%\RC{Les légendes ne sont pas explicites...} +%\RCE{Corrige} \begin{figure} [ht!] \begin{center} \includegraphics[width=100mm]{cluster_x_nodes_nx_150_and_nx_170.pdf} \end{center} - \caption{Various grid configurations with the matrix sizes 150$^3$ and 170$^3$ -\AG{Utiliser le point comme séparateur décimal et non la virgule. Idem dans les autres figures.}} -\LZK{Pour quelle taille du problème sont calculés les nombres d'itérations? Que représente le 2 Clusters x 16 Nodes with Nx=150 and Nx=170 en haut de la figure?} -\RCE {Corrige} + \caption{Various grid configurations with the matrix sizes 150$^3$ and 170$^3$} +%\AG{Utiliser le point comme séparateur décimal et non la virgule. Idem dans les autres figures.} +%\LZK{Pour quelle taille du problème sont calculés les nombres d'itérations? Que représente le 2 Clusters x 16 Nodes with Nx=150 and Nx=170 en haut de la figure?} + %\RCE {Corrige} + \RC{Idéalement dans la légende il faudrait insiquer Pb size=$150^3$ ou $170^3$ car pour l'instant Nx=150 ca n'indique rien concernant Ny et Nz} \label{fig:01} \end{figure} + + The execution times between the two algorithms is significant with different grid architectures, even with the same number of processors (for example, 2 $\times$ 16 and 4 $\times 8$). We can observe a better sensitivity of the Krylov multisplitting method @@ -617,7 +630,8 @@ $40\%$ better (resp. $48\%$) when running from 32 (grid 2 $\times$ 16) to 64 pro \end{table} In this section, the experiments compare the behavior of the algorithms running on a -speeder inter-cluster network (N2) and also on a less performant network (N1) respectively defined in the test conditions Table~\ref{tab:02}. \RC{Il faut définir cela avant...} +speeder inter-cluster network (N2) and also on a less performant network (N1) respectively defined in the test conditions Table~\ref{tab:02}. +%\RC{Il faut définir cela avant...} Figure~\ref{fig:02} shows that end users will reduce the execution time for both algorithms when using a grid architecture like 4 $\times$ 16 or 8 $\times$ 8: the reduction factor is around $2$. The results depict also that when the network speed drops down (variation of 12.5\%), the difference between the two Multisplitting algorithms execution times can reach more than 25\%.