accurate performance models. That is why another solution is to use a simulation
tool which allows us to change many parameters of the architecture (network
bandwidth, latency, number of processors) and to simulate the execution of such
- applications. We have decided to use SimGrid as it enables to benchmark MPI
- applications.
+ applications. The main contribution of this paper is to show that the use of a
+ simulation tool (here we have decided to use the SimGrid toolkit) can really
+ help developpers to better tune their applications for a given multi-core
+ architecture.
- In this paper, we focus our attention on two parallel iterative algorithms based
+ In particular we focus our attention on two parallel iterative algorithms based
on the Multisplitting algorithm and we compare them to the GMRES algorithm.
- These algorithms are used to solve libear systems. Two different variants of
+ These algorithms are used to solve linear systems. Two different variants of
the Multisplitting are studied: one using synchronoous iterations and another
- one with asynchronous iterations. For each algorithm we have tested different
- parameters to see their influence. We strongly recommend people interested
- by investing into a new expensive hardware architecture to benchmark
- their applications using a simulation tool before.
-
-
-
+ one with asynchronous iterations. For each algorithm we have simulated
+ different architecture parameters to evaluate their influence on the overall
+ execution time. The obtain simulated results confirm the real results
+ previously obtained on different real multi-core architectures and also confirm
+ the efficiency of the asynchronous multisplitting algorithm compared to the
+ synchronous GMRES method.
\end{abstract}
\subsection{Comparing GMRES in native synchronous mode and the multisplitting algorithm in asynchronous mode}
The previous paragraphs put in evidence the interests to simulate the behavior
-of the application before any deployment in a real environment. We have focused
-the study on analyzing the performance in varying the key factors impacting the
-results. The study compares the performance of the two proposed algorithms both
-in \textit{synchronous mode }. In this section, following the same previous
-methodology, the goal is to demonstrate the efficiency of the multisplitting
-method in \textit{ asynchronous mode} compared with the classical GMRES staying
-in \textit{synchronous mode}.
-
-Note that the interest of using the asynchronous mode for data exchange
-is mainly, in opposite of the synchronous mode, the non-wait aspects of
-the current computation after a communication operation like sending
-some data between nodes. Each processor can continue their local
-calculation without waiting for the end of the communication. Thus, the
-asynchronous may theoretically reduce the overall execution time and can
-improve the algorithm performance.
-
-As stated supra, Simgrid simulator tool has been used to prove the
-efficiency of the multisplitting in asynchronous mode and to find the
-best combination of the grid resources (CPU, Network, input matrix size,
-\ldots ) to get the highest \textit{"relative gain"} (exec\_time$_{GMRES}$ / exec\_time$_{multisplitting}$) in comparison with the classical GMRES time.
+of the application before any deployment in a real environment. In this
+section, following the same previous methodology, our goal is to compare the
+efficiency of the multisplitting method in \textit{ asynchronous mode} with the
+classical GMRES in \textit{synchronous mode}.
+
+The interest of using an asynchronous algorithm is that there is no more
+synchronization. With geographically distant clusters, this may be essential.
+In this case, each processor can compute its iteration freely without any
+synchronization with the other processors. Thus, the asynchronous may
+theoretically reduce the overall execution time and can improve the algorithm
+performance.
+
+\RC{la phrase suivante est bizarre, je ne comprends pas pourquoi elle vient ici}
+As stated before, the Simgrid simulator tool has been successfully used to show
+the efficiency of the multisplitting in asynchronous mode and to find the best
+combination of the grid resources (CPU, Network, input matrix size, \ldots ) to
+get the highest \textit{"relative gain"} (exec\_time$_{GMRES}$ /
+exec\_time$_{multisplitting}$) in comparison with the classical GMRES time.
The test conditions are summarized in the table below : \\
-% environment
-\begin{footnotesize}
+\begin{figure} [ht!]
+\centering
\begin{tabular}{r c }
\hline
Grid & 2x50 totaling 100 processors\\ %\hline
Input matrix size & N$_{x}$ = From 62 to 150\\ %\hline
Residual error precision & 10$^{-5}$ to 10$^{-9}$\\ \hline \\
\end{tabular}
-\end{footnotesize}
+\end{figure}
-Again, comprehensive and extensive tests have been conducted varying the
-CPU power and the network parameters (bandwidth and latency) in the
-simulator tool with different problem size. The relative gains greater
-than 1 between the two algorithms have been captured after each step of
-the test. Table 7 below has recorded the best grid configurations
-allowing the multisplitting method execution time more performant 2.5 times than
-the classical GMRES execution and convergence time. The experimentation has demonstrated the relative multisplitting algorithm tolerance when using a low speed network that we encounter usually with distant clusters thru the internet.
+Again, comprehensive and extensive tests have been conducted with different
+parametes as the CPU power, the network parameters (bandwidth and latency) in
+the simulator tool and with different problem size. The relative gains greater
+than 1 between the two algorithms have been captured after each step of the
+test. In Figure~\ref{table:01} are reported the best grid configurations
+allowing the multisplitting method to be more than 2.5 times faster than the
+classical GMRES. These experiments also show the relative tolerance of the
+multisplitting algorithm when using a low speed network as usually observed with
+geographically distant clusters throuth the internet.
% use the same column width for the following three tables
\newlength{\mytablew}\settowidth{\mytablew}{\footnotesize\np{E-11}}
\end{tabular}}
-\begin{table}[!t]
- \centering
+\begin{figure}[!t]
+\centering
+%\begin{table}
% \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES}
% \label{"Table 7"}
-Table 7. Relative gain of the multisplitting algorithm compared with
-the classical GMRES \\
-
- \begin{mytable}{11}
+ \begin{mytable}{11}
\hline
bandwidth (Mbit/s)
& 5 & 5 & 5 & 5 & 5 & 50 & 50 & 50 & 50 & 50 \\
& 2.52 & 2.55 & 2.52 & 2.57 & 2.54 & 2.53 & 2.51 & 2.58 & 2.55 & 2.54 \\
\hline
\end{mytable}
-\end{table}
+%\end{table}
+ \caption{Relative gain of the multisplitting algorithm compared with the classical GMRES}
+ \label{table:01}
+\end{figure}
+
\section{Conclusion}
CONCLUSION