petites corrections

author ziane <ziane@spirou.ltas.ulg.ac.be>

Fri, 8 May 2015 08:41:52 +0000 (10:41 +0200)

committer ziane <ziane@spirou.ltas.ulg.ac.be>

Fri, 8 May 2015 08:41:52 +0000 (10:41 +0200)
author ziane <ziane@spirou.ltas.ulg.ac.be>
Fri, 8 May 2015 08:41:52 +0000 (10:41 +0200)
committer ziane <ziane@spirou.ltas.ulg.ac.be>
Fri, 8 May 2015 08:41:52 +0000 (10:41 +0200)
diff --combined paper.tex

index 0e69066fc5e1c495315a7ceeaebc33a08d97cfa1,beb3140bc7f69bf624ae6202e24f4452c6bd4c80..c3cfdbb74a2a898b69f582dc5db4f58c7d0676c8
--- 1/paper.tex
--- 2/paper.tex
+++ b/paper.tex
@@@ -321,7 -321,7 +321,7 @@@ A_{\ell\ell} x_\ell = c_\ell,\mbox{~for
   \end{equation}
   where right-hand sides $c_\ell=b_\ell-\sum_{m\neq\ell}A_{\ell m}x_m$ are computed using the shared vectors $x_m$. In this paper, we use the well-known iterative method GMRES~\cite{saad86} as an inner iteration to approximate the solutions of the different splittings arising from the block Jacobi multisplitting of matrix $A$. The algorithm in Figure~\ref{alg:01} shows the main key points of our block Jacobi two-stage method executed by a cluster of processors. In line~\ref{solve}, the linear sub-system~(\ref{eq:03}) is solved in parallel using GMRES method where $\MIG$ and $\TOLG$ are the maximum number of inner iterations and the tolerance threshold for GMRES respectively. The convergence of the two-stage multisplitting methods, based on synchronous or asynchronous iterations, has been studied by many authors for example~\cite{Bru95,bahi07}.
   
- \begin{figure}[t]
+ \begin{figure}[htpb]
   %\begin{algorithm}[t]
   %\caption{Block Jacobi two-stage multisplitting method}
   \begin{algorithmic}[1]
@@@ -359,7 -359,7 +359,7 @@@ At each $s$ outer iterations, the algor
   \end{equation}
   The algorithm in Figure~\ref{alg:02} includes the procedure of the residual minimization and the outer iteration is restarted with a new approximation $\tilde{x}$ at every $s$ iterations. The least-squares problem~(\ref{eq:06}) is solved in parallel by all clusters using CGLS method~\cite{Hestenes52} such that $\MIC$ is the maximum number of iterations and $\TOLC$ is the tolerance threshold for this method (line~\ref{cgls} in Figure~\ref{alg:02}).
   
- \begin{figure}[t]
+ \begin{figure}[htbp]
   %\begin{algorithm}[t]
   %\caption{Krylov two-stage method using block Jacobi multisplitting}
   \begin{algorithmic}[1]
@@@ -407,10 -407,10 +407,10 @@@ in which  several clusters are  geograp
   inter-cluster communications. In the following, these parameters are described:
   
   \begin{itemize}
- -      \item hostfile: hosts description file.
+ +      \item hostfile: hosts description file,
         \item platform: file describing the platform architecture: clusters (CPU power,
   \dots{}), intra cluster network description, inter cluster network (bandwidth $bw$,
- -latency $lat$, \dots{}).
+ +latency $lat$, \dots{}),
         \item archi   : grid computational description (number of clusters, number of
   nodes/processors in each cluster).
   \end{itemize}
@@@ -485,7 -485,7 +485,7 @@@ results comparison and analysis. In th
   on the  one hand the algorithm execution mode (synchronous and asynchronous)
   and on the other hand the execution time and the number of iterations to reach the convergence. \\
   
- -\textbf{Step 4  }: Set up the  different grid testbed environments  that will be
+ +\textbf{Step 4}: Set up the  different grid testbed environments  that will be
   simulated in the  simulator tool to run the program.  The following architectures
   have been configured in SimGrid : 2$\times$16, 4$\times$8, 4$\times$16, 8$\times$8 and 2$\times$50. The first number
   represents the number  of clusters in the grid and  the second number represents
@@@ -591,7 -591,7 +591,7 @@@ the Krylov two-stage algorithm
   %\RC{Les légendes ne sont pas explicites...}
   %\RCE{Corrige}
   
- \begin{figure} [ht!]
+ \begin{figure} [htbp]
     \begin{center}
       \includegraphics[width=100mm]{cluster_x_nodes_nx_150_and_nx_170.pdf}
     \end{center}
@@@ -641,7 -641,7 +641,7 @@@ the  network speed  drops down (variati
   
   
   %\begin{wrapfigure}{l}{100mm}
- \begin{figure} [ht!]
+ \begin{figure} [htbp]
   \centering
   \includegraphics[width=100mm]{cluster_x_nodes_n1_x_n2.pdf}
   \caption{Various grid configurations with networks N1 vs N2
@@@ -667,7 -667,7 +667,7 @@@
   \label{tab:03}
   \end{table}
   
- \begin{figure} [ht!]
+ \begin{figure} [htbp]
   \centering
   \includegraphics[width=100mm]{network_latency_impact_on_execution_time.pdf}
   \caption{Network latency impacts on execution time
@@@ -700,12 -700,12 +700,12 @@@ more  than $75\%$  (resp.  $82\%$)  of 
   \end{table}
   
   
- \begin{figure} [ht!]
+ \begin{figure} [htbp]
   \centering
   \includegraphics[width=100mm]{network_bandwith_impact_on_execution_time.pdf}
- \caption{Network bandwith impacts on execution time
- \AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.}
- \RCE{Corrige}
+ \caption{Network bandwith impacts on execution time}
+ %\AG{``Execution time'' avec un 't' minuscule}. Idem autres figures.}
+ %\RCE{Corrige}
   \label{fig:04}
   \end{figure}
   
@@@ -730,7 -730,7 +730,7 @@@ of $40\%$ which is only around $24\%$ f
   \end{table}
   
   
- \begin{figure} [ht!]
+ \begin{figure} [htbp]
   \centering
   \includegraphics[width=100mm]{pb_size_impact_on_execution_time.pdf}
   \caption{Problem size impacts on execution time}
@@@ -759,7 -759,7 +759,7 @@@ grid 2 $\times$ 16 leading to the same 
   
   \subsubsection{CPU Power impacts on performance}
   
- \begin{table} [ht!]
+ \begin{table} [htbp]
   \centering
   \begin{tabular}{r c }
    \hline
@@@ -811,18 -811,19 +811,19 @@@ synchronization  with   the  other   pr
   theoretically reduce  the overall execution  time and can improve  the algorithm
   performance.
   
- \RC{la phrase suivante est bizarre, je ne comprends pas pourquoi elle vient ici}
- \RCE{C est la description du dernier test sync/async avec l'introduction de la notion de relative gain}
- In this section, Simgrid simulator tool has been successfully used to show
- the efficiency of  the multisplitting in asynchronous mode and  to find the best
- combination of the grid resources (CPU,  Network, input matrix size, \ldots ) to
- get    the   highest    \textit{"relative    gain"}   (exec\_time$_{GMRES}$    /
- exec\_time$_{multisplitting}$) in comparison with the classical GMRES time.
+ In this section,  the Simgrid simulator is  used to compare the  behavior of the
+ multisplitting in  asynchronous mode  with GMRES  in synchronous  mode.  Several
+ benchmarks have  been performed with  various combination of the  grid resources
+ (CPU, Network, input  matrix size, \ldots ). The test  conditions are summarized
+ in  Table~\ref{tab:07}. In  order to  compare  the execution  times, this  table
+ reports the  relative gain between both  algorithms. It is defined  by the ratio
+ between  the   execution  time  of   GMRES  and   the  execution  time   of  the
+ multisplitting.  The  ration  is  greater  than  one  because  the  asynchronous
+ multisplitting version is faster than GMRES.
   
   
- The test conditions are summarized in the table~\ref{tab:07}: \\
   
- \begin{table} [ht!]
+ \begin{table} [htbp]
   \centering
   \begin{tabular}{r c }
    \hline
@@@ -872,7 -873,7 +873,7 @@@ geographically distant clusters throug
       power (GFlops)
       & 1    & 1    & 1    & 1.5       & 1.5  & 1.5         & 1.5         & 1         & 1.5       & 1.5 \\
       \hline
-     size (N)
+     size ($N^3$)
       & 62  & 62   & 62        & 100       & 100 & 110       & 120       & 130       & 140       & 150 \\
       \hline
       Precision
author	ziane <ziane@spirou.ltas.ulg.ac.be>
	Fri, 8 May 2015 08:41:52 +0000 (10:41 +0200)
committer	ziane <ziane@spirou.ltas.ulg.ac.be>
	Fri, 8 May 2015 08:41:52 +0000 (10:41 +0200)