\subsection{The 3D Poisson problem}
\label{3dpoisson}
-
-
We use our two-stage algorithms to solve the well-known Poisson problem $\nabla^2\phi=f$~\cite{Polyanin01}. In three-dimensional Cartesian coordinates in $\mathbb{R}^3$, the problem takes the following form:
\begin{equation}
\frac{\partial^2}{\partial x^2}\phi(x,y,z)+\frac{\partial^2}{\partial y^2}\phi(x,y,z)+\frac{\partial^2}{\partial z^2}\phi(x,y,z)=f(x,y,z)\mbox{~in the domain~}\Omega
simulated in the simulator tool to run the program. The following architectures
have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number
represents the number of clusters in the grid and the second number represents
-the number of hosts (processors/cores) in each cluster. The network has been
-designed to operate with a bandwidth equals to 10Gbits (resp. 1Gbits/s) and a
-latency of 8.10$^{-6}$ seconds (resp. 5.10$^{-5}$) for the intra-clusters links
-(resp. inter-clusters backbone links). \\
-
-%\LZK{Il me semble que le bw et lat des deux réseaux varient dans les expés d'une simu à l'autre. On vire la dernière phrase?}
-%\RC{il me semble qu'on peut laisser ca}
+the number of hosts (processors/cores) in each cluster. \\
\textbf{Step 5}: Conduct an extensive and comprehensive testings
within these configurations by varying the key parameters, especially
a lower speed. The network between distant clusters might be a bottleneck
for the global performance of the application.
-\subsection{Comparison of GMRES and Krylov two-stage algorithms in synchronous mode}
-In the scope of this paper, our first objective is to analyze when the Krylov
-two-stage method has better performance than the classical GMRES method. With a synchronous iterative method, better performance means a
-smaller number of iterations and execution time before reaching the convergence.
-In what follows, we will present the test conditions, the output results and our comments.
+\subsection{Comparison between GMRES and two-stage multisplitting algorithms in synchronous mode}
+In the scope of this paper, our first objective is to analyze when the synchronous Krylov two-stage method has better performance than the classical GMRES method. With a synchronous iterative method, better performance means a smaller number of iterations and execution time before reaching the convergence.
-%%RAPH : on vire ca, c'est pas clair et pas important
-%For a systematic study, the experiments should figure out that, for various
-%grid parameters values, the simulator will confirm Multisplitting method better performance compared to classical GMRES, particularly on poor and slow networks.
-%\LZK{Pas du tout claire la dernière phrase (For a systematic...)!!}
-%\RCE { Reformule autrement}
-
-
-
-%\subsubsection{Execution of the algorithms on various computational grid architectures and scaling up the input matrix size}
-\subsubsection{Simulations for various grid architectures and scaling-up matrix sizes}
-\ \\
-% environment
-
-\RC{Je ne comprends plus rien CE : pourquoi dans 5.4.1 il y a 2 network et aussi dans 5.4.2. Quelle est la différence? Dans la figure 3 de la section 5.4.1 pourquoi il n'y a pas N1 et N2?}
+Table~\ref{tab:01} summarizes the parameters used in the different simulations: the grid architectures, the network of inter-clusters backbone links and the matrix sizes of the 3D Poisson problem. However, for all simulations we fix the network parameters of the intra-clusters links: the bandwidth $bw$=10Gbs and the latency $lat$=8$\times$10$^{-6}$. In what follows, we will present the test conditions, the output results and our comments.
\begin{table} [ht!]
\begin{center}
-\begin{tabular}{ll }
- \hline
- Grid architecture & 2$\times$16, 4$\times$8, 4$\times$16 and 8$\times$8\\ %\hline
- \multirow{2}{*}{Network} & Inter (N2): $bw$=1Gbs, $lat$=5$\times$10$^{-5}$ \\ %\hline
- & Intra (N1): $bw$=10Gbs, $lat$=8$\times$10$^{-6}$ \\
- \multirow{2}{*}{Matrix size} & N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$ =150 $\times$ 150 $\times$ 150\\ %\hline
- & N$_{x}$ $\times$ N$_{y}$ $\times$ N$_{z}$ =170 $\times$ 170 $\times$ 170 \\ \hline
- \end{tabular}
-\caption{Test conditions: various grid configurations with the matrix sizes 150$^3$ or 170$^3$}
-%\LZK{Ce sont les caractéristiques du réseau intra ou inter clusters? Ce n'est pas précisé...}
-%\RCE{oui c est precise}
+\begin{tabular}{ll}
+\hline
+Grid architecture & 2$\times$16, 4$\times$8, 4$\times$16 and 8$\times$8\\
+\multirow{2}{*}{Network inter-clusters} & $N1$: $bw$=1Gbs, $lat$=5$\times$10$^{-5}$ \\
+ & $N2$: $bw$=10Gbs, $lat$=8$\times$10$^{-6}$ \\
+\multirow{2}{*}{Matrix size} & $Mat1$: N$_{x}\times$N$_{y}\times$N$_{z}$=150$\times$150$\times$150\\
+ & $Mat2$: N$_{x}\times$N$_{y}\times$N$_{z}$=170$\times$170$\times$170 \\ \hline
+\end{tabular}
+\caption{Parameters for the different simulations}
\label{tab:01}
\end{center}
\end{table}
+
+\subsubsection{Simulations for various grid architectures and scaling-up matrix sizes}
+\ \\
+% environment
+In this section, we analyze the simulations conducted on various grid configurations and for different sizes of the 3D Poisson problem. The parameters of the network between clusters is fixed to $N1$ (see Table~\ref{tab:01}. Figure~\ref{fig:01} shows, for all grid configurations and a given matrix size 170$^3$ elements, a non-variation in the number of iterations for the classical GMRES algorithm, which is not the case of the Krylov two-stage algorithm.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-In this section, we analyze the simulations conducted on various grid
-configurations presented in Table~\ref{tab:01}. It should be noticed that two
-networks are considered: N1 is the network between clusters (inter-cluster) and
-N2 is the network inside a cluster (intra-cluster). Figure~\ref{fig:01} shows,
-for all grid configurations and a given matrix size, a non-variation in the
-number of iterations for the classical GMRES algorithm, which is not the case of
-the Krylov two-stage algorithm.
-%% First, the results in Figure~\ref{fig:01}
-%% show for all grid configurations the non-variation of the number of iterations of
-%% classical GMRES for a given input matrix size; it is not the case for the
-%% multisplitting method.
-%\RC{CE attention tu n'as pas mis de label dans tes figures, donc c'est le bordel, j'en mets mais vérifie...}
-%\RC{Les légendes ne sont pas explicites...}
-%\RCE{Corrige}
\begin{figure} [htbp]
\begin{center}
\hline
Grid Architecture & 4 $\times$ 8\\ %\hline
Inter Network & $bw$=1Gbs - $lat$=5.10$^{-5}$ \\
- Input matrix size & $N_{x} \times N_{y} \times N_{z}$ = From 40$^{3}$ to 200$^{3}$\\ \hline
+ Input matrix size & $N_{x} \times N_{y} \times N_{z}$ = From 50$^{3}$ to 190$^{3}$\\ \hline
\end{tabular}
\caption{Test conditions: Input matrix size impacts}
\label{tab:05}
\label{fig:05}
\end{figure}
-In these experiments, the input matrix size has been set from $N_{x} = N_{y}
-= N_{z} = 40$ to $200$ side elements that is from $40^{3} = 64.000$ to $200^{3}
-= 8,000,000$ points. Obviously, as shown in Figure~\ref{fig:05}, the execution
-time for both algorithms increases when the input matrix size also increases.
-But the interesting results are:
-\begin{enumerate}
- \item the important increase ($10$ times) of the number of iterations needed to
- reach the convergence for the classical GMRES algorithm particularly, when the matrix size
- go beyond $N_{x}=150$; \RC{C'est toujours pas clair... ok le nommbre d'itérations est 10 fois plus long mais la suite de la phrase ne veut rien dire}
- \RCE{Le nombre d'iterations augmente de 10 fois, cela surtout a partir de N=150}
-
-\item the classical GMRES execution time is almost the double for $N_{x}=140$
- compared with the Krylov multisplitting method.
-\end{enumerate}
+In these experiments, the input matrix size has been set from $50^3$ to
+$190^3$. Obviously, as shown in Figure~\ref{fig:05}, the execution time for both
+algorithms increases when the input matrix size also increases. For all problem
+sizes, GMRES is always slower than the Krylov multisplitting. Moreover, for this
+benchmark, it seems that the greater the problem size is, the bigger the ratio
+between both algorithm execution times is. We can also observ that for some
+problem sizes, the Krylov multisplitting convergence varies quite a
+lot. Consequently the execution times in that cases also varies.
+
These findings may help a lot end users to setup the best and the optimal
targeted environment for the application deployment when focusing on the problem