début du travail sur a patie async

[hpcc2014.git] / hpcc.tex
diff --git a/hpcc.tex b/hpcc.tex

index 02940f04f8a350946f3b8bf9ba0e51b7663fcb30..5ab9faf540739761cb6c88495eb6688aa0841ad7 100644 (file)
--- a/hpcc.tex
+++ b/hpcc.tex
@@ -4,7 +4,7 @@
  \usepackage[utf8]{inputenc}
  \usepackage{amsfonts,amssymb}
  \usepackage{amsmath}
  \usepackage[utf8]{inputenc}
  \usepackage{amsfonts,amssymb}
  \usepackage{amsmath}
-\usepackage{algorithm}
+%\usepackage{algorithm}
  \usepackage{algpseudocode}
  %\usepackage{amsthm}
  \usepackage{graphicx}
  \usepackage{algpseudocode}
  %\usepackage{amsthm}
  \usepackage{graphicx}
@@ -169,12 +169,75 @@ our future work after the results.
   
  \section{The asynchronous iteration model}
  
   
  \section{The asynchronous iteration model}
  
-\DL{Décrire le modèle asynchrone. Je m'en charge}
+As exposed in the introduction, parallel iterative methods are now
+widely used in many scientific domains. They can be classified in three main classes
+depending on how iterations and communications are managed (for more
+details readers can refer to \cite{bcvc02:ip}). In the
+\textit{Synchronous Iterations - Synchronous Communications (SISC)}
+model data are exchanged at the end of each iteration. All the
+processors must begin the same iteration at the same time and
+important idle times on processors are generated. The
+\textit{Synchronous Iterations - Asynchronous Communications (SIAC)}
+model can be compared to the previous one except that data required on
+another processor are sent asynchronously i.e.  without stopping
+current computations. This technique allows to partially overlap
+communications by computations but unfortunately, the overlapping is
+only partial and important idle times remain.  It is clear that, in a
+grid computing context, where the number of computational nodes is large,
+heterogeneous and widely distributed, the idle times generated by
+synchronizations are very penalizing. One way to overcome this problem
+is to use the \textit{Asynchronous Iterations - Asynchronous
+  Communications (AIAC)} model. Here, local computations do not need
+to wait for required data. Processors can then perform their
+iterations with the data present at that time. Figure \ref{fig:aiac}
+illustrates this model where the grey blocks represent the computation
+phases, the white spaces the idle times and the arrows the
+communications. With this algorithmic model, the number of iterations
+required before the convergence is generally greater than for the two
+former classes. But, and as detailed in \cite{bcvc06:ij}, AIAC
+algorithms can significantly reduce overall execution times by
+suppressing idle times due to synchronizations especially in a grid
+computing context.
+
+\begin{figure}[htbp]
+  \centering
+    \includegraphics[width=8cm]{AIAC.pdf}
+  \caption{The Asynchronous Iterations - Asynchronous Communications model } 
+  \label{fig:aiac}
+\end{figure}
+
+
  
  
-\section{SimGrid}
  
  
-\AG{Décrire SimGrid~\cite{casanova+legrand+quinson.2008.simgrid} (Arnaud)}
+\section{SimGrid}
  
  
+SimGrid~\cite{casanova+legrand+quinson.2008.simgrid,SimGrid} is a simulation
+framework to sudy the behavior of large-scale distributed systems.  As its name
+says, it emanates from the grid computing community, but is nowadays used to
+study grids, clouds, HPC or peer-to-peer systems.
+%- open source, developped since 1999, one of the major solution in the field
+%
+SimGrid provides several programming interfaces: MSG to simulate Concurrent
+Sequential Processes, SimDAG to simulate DAGs of (parallel) tasks, and SMPI to
+run real applications written in MPI~\cite{MPI}.  Apart from the native C
+interface, SimGrid provides bindings for the C++, Java, Lua and Ruby programming
+languages.  The SMPI interface supports applications written in C or Fortran,
+with little or no modifications.
+%- implements most of MPI-2 \cite{ref} standard [CHECK]
+
+%%% explain simulation
+%- simulated processes folded in one real process
+%- simulates interactions on the network, fluid model
+%- able to skip long-lasting computations
+%- traces + visu?
+
+%%% platforms
+%- describe resources and their interconnection, with their properties
+%- XML files
+
+%%% validation + refs
+
+\AG{Décrire SimGrid~\cite{casanova+legrand+quinson.2008.simgrid,SimGrid} (Arnaud)}
  
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \section{Simulation of the multisplitting method}
  
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  \section{Simulation of the multisplitting method}
@@ -212,8 +275,9 @@ Y_l = B_l - \displaystyle\sum_{\substack{m=1\\ m\neq l}}^{L}A_{lm}X_m
  \end{equation}
  is solved independently by a cluster and communications are required to update the right-hand side sub-vector $Y_l$, such that the sub-vectors $X_m$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel algorithm of GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. 
  
  \end{equation}
  is solved independently by a cluster and communications are required to update the right-hand side sub-vector $Y_l$, such that the sub-vectors $X_m$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel algorithm of GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. 
  
-\begin{algorithm}
-\caption{A multisplitting solver with GMRES method}
+\begin{figure}
+  %%% IEEE instructions forbid to use an algorithm environment here, use figure
+  %%% instead
  \begin{algorithmic}[1]
  \Input $A_l$ (sparse sub-matrix), $B_l$ (right-hand side sub-vector)
  \Output $X_l$ (solution sub-vector)\vspace{0.2cm}
  \begin{algorithmic}[1]
  \Input $A_l$ (sparse sub-matrix), $B_l$ (right-hand side sub-vector)
  \Output $X_l$ (solution sub-vector)\vspace{0.2cm}
@@ -234,10 +298,23 @@ is solved independently by a cluster and communications are required to update t
  \State \Return $X_l^k$
  \EndFunction
  \end{algorithmic}
  \State \Return $X_l^k$
  \EndFunction
  \end{algorithmic}
+\caption{A multisplitting solver with GMRES method}
  \label{algo:01}
  \label{algo:01}
-\end{algorithm}
+\end{figure}
  
  
-Algorithm~\ref{algo:01} shows the main key points of the multisplitting method to solve a large sparse linear system. This algorithm is based on an outer-inner iteration method where the parallel synchronous GMRES method is used to solve the inner iteration. It is executed in parallel by each cluster of processors. For all $l,m\in\{1,\ldots,L\}$, the matrices and vectors with the subscript $l$ represent the local data for cluster $l$, while $\{A_{lm}\}_{m\neq l}$ are off-diagonal matrices of sparse matrix $A$ and $\{X_m\}_{m\neq l}$ contain vector elements of solution $x$ shared with neighboring clusters. At every outer iteration $k$, asynchronous communications are performed between processors of the local cluster and those of distant clusters (lines $6$ and $7$ in Algorithm~\ref{algo:01}). The shared vector elements of the solution $x$ are exchanged by message passing using MPI non-blocking communication routines. 
+Algorithm on Figure~\ref{algo:01} shows the main key points of the
+multisplitting method to solve a large sparse linear system. This algorithm is
+based on an outer-inner iteration method where the parallel synchronous GMRES
+method is used to solve the inner iteration. It is executed in parallel by each
+cluster of processors. For all $l,m\in\{1,\ldots,L\}$, the matrices and vectors
+with the subscript $l$ represent the local data for cluster $l$, while
+$\{A_{lm}\}_{m\neq l}$ are off-diagonal matrices of sparse matrix $A$ and
+$\{X_m\}_{m\neq l}$ contain vector elements of solution $x$ shared with
+neighboring clusters. At every outer iteration $k$, asynchronous communications
+are performed between processors of the local cluster and those of distant
+clusters (lines $6$ and $7$ in Figure~\ref{algo:01}). The shared vector
+elements of the solution $x$ are exchanged by message passing using MPI
+non-blocking communication routines.
  
  \begin{figure}
  \centering
  
  \begin{figure}
  \centering
@@ -425,7 +502,7 @@ The authors would like to thank\dots{}
  % adjust value as needed - may need to be readjusted if
  % the document is modified later
  \bibliographystyle{IEEEtran}
  % adjust value as needed - may need to be readjusted if
  % the document is modified later
  \bibliographystyle{IEEEtran}
-\bibliography{hpccBib}
+\bibliography{IEEEabrv,hpccBib}
  
  \end{document}
  
  
  \end{document}