\todo[color=blue!10,#1]{\sffamily\textbf{LZK:} #2}\xspace}
\newcommand{\RCE}[2][inline]{%
\todo[color=yellow!10,#1]{\sffamily\textbf{RCE:} #2}\xspace}
+\newcommand{\DL}[2][inline]{%
+ \todo[color=pink!10,#1]{\sffamily\textbf{DL:} #2}\xspace}
\algnewcommand\algorithmicinput{\textbf{Input:}}
\algnewcommand\Input{\item[\algorithmicinput]}
In addition, the following arguments are given to the programs at runtime:
\begin{itemize}
- \item maximum number of inner and outer iterations;
- \item inner and outer precisions;
- \item maximum number of the GMRES restarts in the Arnorldi process;
- \item maximum number of iterations and the tolerance threshold in classical GMRES;
- \item tolerance threshold for outer and inner-iterations;
- \item matrix size (N$_{x}$, N$_{y}$ and N$_{z}$) respectively on $x, y, z$ axis;
- \item matrix diagonal value is fixed to $6.0$ for synchronous Krylov multisplitting experiments and $6.2$ for asynchronous block Jacobi experiments; \RC{CE tu vérifies, je dis ca de tête}
- \item matrix off-diagonal value;
- \item execution mode: synchronous or asynchronous;
- \RCE {C'est ok la liste des arguments du programme mais si Lilia ou toi pouvez preciser pour les arguments pour CGLS ci dessous} \RC{Vu que tu n'as pas fait varier ce paramètre, on peut ne pas en parler}
- \item Size of matrix S;
- \item Maximum number of iterations and tolerance threshold for CGLS.
+ \item maximum number of inner iterations $\MIG$ and outer iterations $\MIM$,
+ \item inner precision $\TOLG$ and outer precision $\TOLM$,
+ \item matrix sizes of the 3D Poisson problem: N$_{x}$, N$_{y}$ and N$_{z}$ on axis $x$, $y$ and $z$ respectively,
+ \item matrix diagonal value is fixed to $6.0$ for synchronous Krylov multisplitting experiments and $6.2$ for asynchronous block Jacobi experiments, \RC{CE tu vérifies, je dis ca de tête}
+ \item matrix off-diagonal value is fixed to $-1.0$,
+ \item number of vectors in matrix $S$ (i.e. value of $s$),
+ \item maximum number of iterations $\MIC$ and precision $\TOLC$ for CGLS method,
+ \item maximum number of iterations and precision for the classical GMRES method,
+ \item maximum number of restarts for the Arnorldi process in GMRES method,
+ \item execution mode: synchronous or asynchronous,
\end{itemize}
It should also be noticed that both solvers have been executed with the Simgrid selector \texttt{-cfg=smpi/running\_power} which determines the computational power (here 19GFlops) of the simulator host machine.
performance for both algorithms by reducing the execution time (see
Figure~\ref{fig:04}). However, in this case, the Krylov multisplitting method
presents a better performance in the considered bandwidth interval with a gain
-of 40\% which is only around 24\% for classical GMRES.
+of $40\%$ which is only around $24\%$ for the classical GMRES.
\subsubsection{Input matrix size impacts on performance}
\ \\
Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\
Input matrix size & N$_{x}$ = From 40 to 200\\ \hline
\end{tabular}
-\caption{Input matrix size impact}
+\caption{Input matrix size impacts}
\end{figure}
\begin{figure} [ht!]
\centering
\includegraphics[width=100mm]{pb_size_impact_on_execution_time.pdf}
-\caption{Problem size impact on execution time}
+\caption{Problem size impacts on execution time}
\label{fig:05}
\end{figure}
-In these experiments, the input matrix size has been set from N$_{x}$ = N$_{y}$
-= N$_{z}$ = 40 to 200 side elements that is from 40$^{3}$ = 64.000 to 200$^{3}$
-= 8,000,000 points. Obviously, as shown in Figure~\ref{fig:05}, the execution
+In these experiments, the input matrix size has been set from $N_{x} = N_{y}
+= N_{z} = 40$ to $200$ side elements that is from $40^{3} = 64.000$ to $200^{3}
+= 8,000,000$ points. Obviously, as shown in Figure~\ref{fig:05}, the execution
time for both algorithms increases when the input matrix size also increases.
But the interesting results are:
\begin{enumerate}
- \item the drastic increase (300 times) \RC{Je ne vois pas cela sur la figure}
+ \item the drastic increase ($300$ times) \RC{Je ne vois pas cela sur la figure}
of the number of iterations needed to reach the convergence for the classical
-GMRES algorithm when the matrix size go beyond N$_{x}$=150;
-\item the classical GMRES execution time is almost the double for N$_{x}$=140
+GMRES algorithm when the matrix size go beyond $N_{x}=150$;
+\item the classical GMRES execution time is almost the double for $N_{x}=140$
compared with the Krylov multisplitting method.
\end{enumerate}
size scale up. It should be noticed that the same test has been done with the
grid 2x16 leading to the same conclusion.
-\subsubsection{CPU Power impact on performance}
+\subsubsection{CPU Power impacts on performance}
\begin{figure} [ht!]
\centering
Network & N2 : bw=1Gbs - lat=5.10$^{-5}$ \\ %\hline
Input matrix size & N$_{x}$ = 150 x 150 x 150\\ \hline
\end{tabular}
-\caption{CPU Power impact}
+\caption{CPU Power impacts}
\end{figure}
\begin{figure} [ht!]
\centering
\includegraphics[width=100mm]{cpu_power_impact_on_execution_time.pdf}
-\caption{CPU Power impact on execution time}
+\caption{CPU Power impacts on execution time}
\label{fig:06}
\end{figure}
Using the Simgrid simulator flexibility, we have tried to determine the impact
on the algorithms performance in varying the CPU power of the clusters nodes
-from 1 to 19 GFlops. The outputs depicted in Figure~\ref{fig:06} confirm the
-performance gain, around 95\% for both of the two methods, after adding more
+from $1$ to $19$ GFlops. The outputs depicted in Figure~\ref{fig:06} confirm the
+performance gain, around $95\%$ for both of the two methods, after adding more
powerful CPU.
+\DL{il faut une conclusion sur ces tests : ils confirment les résultats déjà
+obtenus en grandeur réelle. Donc c'est une aide précieuse pour les dev. Pas
+besoin de déployer sur une archi réelle}
+
\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
The previous paragraphs put in evidence the interests to simulate the behavior