X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/rce2015.git/blobdiff_plain/0a43df714a5c16cc2f27439bb28f84c2d1f1db16..9b8612a74b1682a47832402cd8871ab7f7fc9c69:/paper.tex?ds=inline diff --git a/paper.tex b/paper.tex index 93f215d..b11da8c 100644 --- a/paper.tex +++ b/paper.tex @@ -96,20 +96,21 @@ performed. With some applications, it is difficult, if not impossible, to build accurate performance models. That is why another solution is to use a simulation tool which allows us to change many parameters of the architecture (network bandwidth, latency, number of processors) and to simulate the execution of such -applications. We have decided to use SimGrid as it enables to benchmark MPI -applications. +applications. The main contribution of this paper is to show that the use of a +simulation tool (here we have decided to use the SimGrid toolkit) can really +help developpers to better tune their applications for a given multi-core +architecture. -In this paper, we focus our attention on two parallel iterative algorithms based +In particular we focus our attention on two parallel iterative algorithms based on the Multisplitting algorithm and we compare them to the GMRES algorithm. -These algorithms are used to solve libear systems. Two different variants of +These algorithms are used to solve linear systems. Two different variants of the Multisplitting are studied: one using synchronoous iterations and another -one with asynchronous iterations. For each algorithm we have tested different -parameters to see their influence. We strongly recommend people interested -by investing into a new expensive hardware architecture to benchmark -their applications using a simulation tool before. - - - +one with asynchronous iterations. For each algorithm we have simulated +different architecture parameters to evaluate their influence on the overall +execution time. The obtain simulated results confirm the real results +previously obtained on different real multi-core architectures and also confirm +the efficiency of the asynchronous multisplitting algorithm compared to the +synchronous GMRES method. \end{abstract} @@ -367,19 +368,19 @@ It should also be noticed that both solvers have been executed with the Simgrid In this section, experiments for both Multisplitting algorithms are reported. First the 3D Poisson problem used in our experiments is described. -\subsection{3D Poisson} +\subsection{The 3D Poisson problem} -We use our two-stage algorithms to solve the well-known Poisson problem $\nabla^2\phi=f$~\cite{Polyanin01}. In three-dimensional Cartesian coordinates in $\mathbb{R}^3$, the problem takes the following form +We use our two-stage algorithms to solve the well-known Poisson problem $\nabla^2\phi=f$~\cite{Polyanin01}. In three-dimensional Cartesian coordinates in $\mathbb{R}^3$, the problem takes the following form: \begin{equation} \frac{\partial^2}{\partial x^2}\phi(x,y,z)+\frac{\partial^2}{\partial y^2}\phi(x,y,z)+\frac{\partial^2}{\partial z^2}\phi(x,y,z)=f(x,y,z)\mbox{~in the domain~}\Omega \label{eq:07} \end{equation} -such that +such that: \begin{equation*} \phi(x,y,z)=0\mbox{~on the boundary~}\partial\Omega \end{equation*} -where the real-valued function $\phi(x,y,z)$ is the solution sought, $f(x,y,z)$ is a known function and $\Omega=[0,1]^3$. The 3D discretization of the Laplace operator $\nabla^2$ with the finite difference scheme includes 7 points stencil on the computational grid. The numerical approximation of the Poisson problem on three-dimensional grid is repeatedly computed as $\phi=\phi^\star$ such that +where the real-valued function $\phi(x,y,z)$ is the solution sought, $f(x,y,z)$ is a known function and $\Omega=[0,1]^3$. The 3D discretization of the Laplace operator $\nabla^2$ with the finite difference scheme includes 7 points stencil on the computational grid. The numerical approximation of the Poisson problem on three-dimensional grid is repeatedly computed as $\phi=\phi^\star$ such that: \begin{equation} \begin{array}{ll} \phi^\star(x,y,z)=&\frac{1}{6}(\phi(x-h,y,z)+\phi(x,y-h,z)+\phi(x,y,z-h)\\&+\phi(x+h,y,z)+\phi(x,y+h,z)+\phi(x,y,z+h)\\&-h^2f(x,y,z)) @@ -390,7 +391,7 @@ until convergence where $h$ is the grid spacing between two adjacent elements in In the parallel context, the 3D Poisson problem is partitioned into $L\times p$ sub-problems such that $L$ is the number of clusters and $p$ is the number of processors in each cluster. We apply the three-dimensional partitioning instead of the row-by-row one in order to reduce the size of the data shared at the sub-problems boundaries. In this case, each processor is in charge of parallelepipedic block of the problem and has at most six neighbors in the same cluster or in distant clusters with which it shares data at boundaries. -\subsection{Study setup and Simulation Methodology} +\subsection{Study setup and simulation methodology} First, to conduct our study, we propose the following methodology which can be reused for any grid-enabled applications.\\ @@ -399,10 +400,12 @@ which can be reused for any grid-enabled applications.\\ the application to be tested. Numerical parallel iterative algorithms have been chosen for the study in this paper. \\ -\textbf{Step 2}: Collect the software materials needed for the -experimentation. In our case, we have two variants algorithms for the -resolution of the 3D-Poisson problem: (1) using the classical GMRES; (2) and the Multisplitting method. In addition, the Simgrid simulator has been chosen to simulate the behaviors of the -distributed applications. Simgrid is running on the Mesocentre datacenter in the University of Franche-Comte and also in a virtual machine on a simple laptop. \\ +\textbf{Step 2}: Collect the software materials needed for the experimentation. +In our case, we have two variants algorithms for the resolution of the +3D-Poisson problem: (1) using the classical GMRES; (2) and the Multisplitting +method. In addition, the Simgrid simulator has been chosen to simulate the +behaviors of the distributed applications. Simgrid is running in a virtual +machine on a simple laptop. \\ \textbf{Step 3}: Fix the criteria which will be used for the future results comparison and analysis. In the scope of this study, we retain @@ -443,23 +446,20 @@ the program output results: capacity" of the network is defined as the maximum of data that can transit from one point to another in a unit of time. \item the network latency (lat : microsecond) defined as the delay from the - start time to send the data from a source and the final time the destination - have finished to receive it. + start time to send a simple data from a source to a destination. \end{enumerate} -Upon the network characteristics, another impacting factor is the -application dependent volume of data exchanged between the nodes in the cluster -and between distant clusters. Large volume of data can be transferred and -transit between the clusters and nodes during the code execution. +Upon the network characteristics, another impacting factor is the volume of data exchanged between the nodes in the cluster +and between distant clusters. This parameter is application dependent. In a grid environment, it is common to distinguish, on the one hand, the - "intra-network" which refers to the links between nodes within a cluster and, + "intra-network" which refers to the links between nodes within a cluster and on the other hand, the "inter-network" which is the backbone link between - clusters. In practice, these two networks have different speeds. The - intra-network generally works like a high speed local network with a high - bandwith and very low latency. In opposite, the inter-network connects clusters - sometime via heterogeneous networks components throuth internet with a lower - speed. The network between distant clusters might be a bottleneck for the - global performance of the application. + clusters. In practice, these two networks have different speeds. + The intra-network generally works like a high speed local network with a + high bandwith and very low latency. In opposite, the inter-network connects + clusters sometime via heterogeneous networks components throuth internet with + a lower speed. The network between distant clusters might be a bottleneck + for the global performance of the application. \subsection{Comparison of GMRES and Krylov Multisplitting algorithms in synchronous mode}