From 43651bc69ac69595ccd000528fd530f485ffd7fe Mon Sep 17 00:00:00 2001 From: David Laiymani Date: Sat, 9 May 2015 09:35:43 +0200 Subject: [PATCH] =?utf8?q?DL=20:=20exp=C3=A9=20suite=20et=20fin?= MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit --- paper.tex | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/paper.tex b/paper.tex index 31eb0e8..de34cb6 100644 --- a/paper.tex +++ b/paper.tex @@ -670,7 +670,15 @@ These findings may help a lot end users to setup the best and the optimal target \end{figure} \subsubsection{CPU power impacts on performances\\} -Using the SimGrid simulator flexibility, we have tried to determine the impact of the CPU power of the processors in the different clusters on performances of both algorithms. We have varied the CPU power from $1$GFlops to $19$GFlops. The simulation is conducted in a grid of 2$\times$16 processors interconnected by the network $N2$ (see Table~\ref{tab:01}) to solve a 3D Poisson problem of size $150^3$. The results depicted in Figure~\ref{fig:06} confirm the performance gain, about $95\%$ for both algorithms, after improving the CPU power of processors. + +Using the SimGrid simulator flexibility, we have tried to determine the impact +of the CPU power of the processors in the different clusters on performances of +both algorithms. We have varied the CPU power from $1$GFlops to $19$GFlops. The +simulation is conducted on a grid of $2\times16$ processors interconnected by +the network $N2$ (see Table~\ref{tab:01}) to solve a 3D Poisson problem of size +$150^3$. The results depicted in Figure~\ref{fig:06} confirm the performance +gain, about $95\%$ for both algorithms, after improving the CPU power of +processors. \begin{figure}[ht] \centering @@ -679,11 +687,12 @@ Using the SimGrid simulator flexibility, we have tried to determine the impact o \label{fig:06} \end{figure} \ \\ + To conclude these series of experiments, with SimGrid we have been able to make many simulations with many parameters variations. Doing all these experiments -with a real platform is most of the time not possible. Moreover the behavior of -both GMRES and Krylov two-stage algorithms is in accordance with larger real -executions on large scale supercomputers~\cite{couturier15}. +with a real platform is most of the time not possible or very costly. Moreover +the behavior of both GMRES and Krylov two-stage algorithms is in accordance +with larger real executions on large scale supercomputers~\cite{couturier15}. \subsection{Comparison between synchronous GMRES and asynchronous two-stage multisplitting algorithms} @@ -696,7 +705,7 @@ classical GMRES in \textit{synchronous mode}. The interest of using an asynchronous algorithm is that there is no more synchronization. With geographically distant clusters, this may be essential. -In this case, each processor can compute its iteration freely without any +In this case, each processor can compute its iterations freely without any synchronization with the other processors. Thus, the asynchronous may theoretically reduce the overall execution time and can improve the algorithm performance. @@ -705,8 +714,8 @@ In this section, the SimGrid simulator is used to compare the behavior of the two-stage algorithm in asynchronous mode with GMRES in synchronous mode. Several benchmarks have been performed with various combinations of the grid resources (CPU, Network, matrix size, \ldots). The test conditions are summarized -in Table~\ref{tab:02}. In order to compare the execution times, Table~\ref{tab:03} -reports the relative gain between both algorithms. It is defined by the ratio +in Table~\ref{tab:02}. In order to compare the execution times. Table~\ref{tab:03} +reports the relative gains between both algorithms. It is defined by the ratio between the execution time of GMRES and the execution time of the multisplitting. \LZK{Quelle table repporte les gains relatifs?? Sûrement pas Table II !!} @@ -721,7 +730,7 @@ multisplitting version is faster than GMRES. Grid architecture & 2$\times$50 totaling 100 processors\\ Processors Power & 1 GFlops to 1.5 GFlops \\ \multirow{2}{*}{Network inter-clusters} & $bw$=1.25 Gbits, $lat=50\mu$s \\ - & $bw$=5 Mbits, $lat=20ms$s\\ + & $bw$=5 Mbits, $lat=20ms$\\ Matrix size & from $62^3$ to $150^3$\\ Residual error precision & $10^{-5}$ to $10^{-9}$\\ \hline \\ \end{tabular} -- 2.39.5