X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/2110422a08b1d1879e5dd4435e37b6f372327aa1..96e726f9524ffdb8eaead14a943d80870ea1da65:/paper.tex?ds=sidebyside diff --git a/paper.tex b/paper.tex index b9187dd..c3cfdbb 100644 --- a/paper.tex +++ b/paper.tex @@ -321,7 +321,7 @@ A_{\ell\ell} x_\ell = c_\ell,\mbox{~for~}\ell=1,\ldots,L, \end{equation} where right-hand sides $c_\ell=b_\ell-\sum_{m\neq\ell}A_{\ell m}x_m$ are computed using the shared vectors $x_m$. In this paper, we use the well-known iterative method GMRES~\cite{saad86} as an inner iteration to approximate the solutions of the different splittings arising from the block Jacobi multisplitting of matrix $A$. The algorithm in Figure~\ref{alg:01} shows the main key points of our block Jacobi two-stage method executed by a cluster of processors. In line~\ref{solve}, the linear sub-system~(\ref{eq:03}) is solved in parallel using GMRES method where $\MIG$ and $\TOLG$ are the maximum number of inner iterations and the tolerance threshold for GMRES respectively. The convergence of the two-stage multisplitting methods, based on synchronous or asynchronous iterations, has been studied by many authors for example~\cite{Bru95,bahi07}. -\begin{figure}[t] +\begin{figure}[htpb] %\begin{algorithm}[t] %\caption{Block Jacobi two-stage multisplitting method} \begin{algorithmic}[1] @@ -359,7 +359,7 @@ At each $s$ outer iterations, the algorithm computes a new approximation $\tilde \end{equation} The algorithm in Figure~\ref{alg:02} includes the procedure of the residual minimization and the outer iteration is restarted with a new approximation $\tilde{x}$ at every $s$ iterations. The least-squares problem~(\ref{eq:06}) is solved in parallel by all clusters using CGLS method~\cite{Hestenes52} such that $\MIC$ is the maximum number of iterations and $\TOLC$ is the tolerance threshold for this method (line~\ref{cgls} in Figure~\ref{alg:02}). -\begin{figure}[t] +\begin{figure}[htbp] %\begin{algorithm}[t] %\caption{Krylov two-stage method using block Jacobi multisplitting} \begin{algorithmic}[1] @@ -407,10 +407,10 @@ in which several clusters are geographically distant, so there are intra and inter-cluster communications. In the following, these parameters are described: \begin{itemize} - \item hostfile: hosts description file. + \item hostfile: hosts description file, \item platform: file describing the platform architecture: clusters (CPU power, \dots{}), intra cluster network description, inter cluster network (bandwidth $bw$, -latency $lat$, \dots{}). +latency $lat$, \dots{}), \item archi : grid computational description (number of clusters, number of nodes/processors in each cluster). \end{itemize} @@ -485,7 +485,7 @@ results comparison and analysis. In the scope of this study, we retain on the one hand the algorithm execution mode (synchronous and asynchronous) and on the other hand the execution time and the number of iterations to reach the convergence. \\ -\textbf{Step 4 }: Set up the different grid testbed environments that will be +\textbf{Step 4}: Set up the different grid testbed environments that will be simulated in the simulator tool to run the program. The following architectures have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number represents the number of clusters in the grid and the second number represents @@ -556,6 +556,8 @@ In what follows, we will present the test conditions, the output results and our \ \\ % environment +\RC{Je ne comprends plus rien CE : pourquoi dans 5.4.1 il y a 2 network et aussi dans 5.4.2. Quelle est la différence? Dans la figure 3 de la section 5.4.1 pourquoi il n'y a pas N1 et N2?} + \begin{table} [ht!] \begin{center} \begin{tabular}{ll } @@ -589,7 +591,7 @@ the Krylov two-stage algorithm. %\RC{Les légendes ne sont pas explicites...} %\RCE{Corrige} -\begin{figure} [ht!] +\begin{figure} [htbp] \begin{center} \includegraphics[width=100mm]{cluster_x_nodes_nx_150_and_nx_170.pdf} \end{center} @@ -639,7 +641,7 @@ the network speed drops down (variation of 12.5\%), the difference between t %\begin{wrapfigure}{l}{100mm} -\begin{figure} [ht!] +\begin{figure} [htbp] \centering \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf} \caption{Various grid configurations with networks N1 vs N2 @@ -665,7 +667,7 @@ the network speed drops down (variation of 12.5\%), the difference between t \label{tab:03} \end{table} -\begin{figure} [ht!] +\begin{figure} [htbp] \centering \includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf} \caption{Network latency impacts on execution time @@ -698,12 +700,12 @@ more than $75\%$ (resp. $82\%$) of the execution for the classical GMRES \end{table} -\begin{figure} [ht!] +\begin{figure} [htbp] \centering \includegraphics[width=100mm]{network_bandwith_impact_on_execution_time.pdf} -\caption{Network bandwith impacts on execution time -\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.} -\RCE{Corrige} +\caption{Network bandwith impacts on execution time} +%\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.} +%\RCE{Corrige} \label{fig:04} \end{figure} @@ -728,7 +730,7 @@ of $40\%$ which is only around $24\%$ for the classical GMRES. \end{table} -\begin{figure} [ht!] +\begin{figure} [htbp] \centering \includegraphics[width=100mm]{pb_size_impact_on_execution_time.pdf} \caption{Problem size impacts on execution time} @@ -757,7 +759,7 @@ grid 2 $\times$ 16 leading to the same conclusion. \subsubsection{CPU Power impacts on performance} -\begin{table} [ht!] +\begin{table} [htbp] \centering \begin{tabular}{r c } \hline @@ -809,18 +811,19 @@ synchronization with the other processors. Thus, the asynchronous may theoretically reduce the overall execution time and can improve the algorithm performance. -\RC{la phrase suivante est bizarre, je ne comprends pas pourquoi elle vient ici} -\RCE{C est la description du dernier test sync/async avec l'introduction de la notion de relative gain} -In this section, Simgrid simulator tool has been successfully used to show -the efficiency of the multisplitting in asynchronous mode and to find the best -combination of the grid resources (CPU, Network, input matrix size, \ldots ) to -get the highest \textit{"relative gain"} (exec\_time$_{GMRES}$ / -exec\_time$_{multisplitting}$) in comparison with the classical GMRES time. +In this section, the Simgrid simulator is used to compare the behavior of the +multisplitting in asynchronous mode with GMRES in synchronous mode. Several +benchmarks have been performed with various combination of the grid resources +(CPU, Network, input matrix size, \ldots ). The test conditions are summarized +in Table~\ref{tab:07}. In order to compare the execution times, this table +reports the relative gain between both algorithms. It is defined by the ratio +between the execution time of GMRES and the execution time of the +multisplitting. The ration is greater than one because the asynchronous +multisplitting version is faster than GMRES. -The test conditions are summarized in the table~\ref{tab:07}: \\ -\begin{table} [ht!] +\begin{table} [htbp] \centering \begin{tabular}{r c } \hline @@ -870,7 +873,7 @@ geographically distant clusters through the internet. power (GFlops) & 1 & 1 & 1 & 1.5 & 1.5 & 1.5 & 1.5 & 1 & 1.5 & 1.5 \\ \hline - size (N) + size ($N^3$) & 62 & 62 & 62 & 100 & 100 & 110 & 120 & 130 & 140 & 150 \\ \hline Precision