X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/7ec471dd9a97eba2f485ebc6c130f4f289b59793..0e6d063a9e15647ffb71be54897a88cbe7c9a5b4:/hpcc.tex diff --git a/hpcc.tex b/hpcc.tex index 34c35a7..c79ed41 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -395,9 +395,9 @@ processor is designated (for example the processor with rank 1) and masters of all clusters are interconnected by a virtual unidirectional ring network (see Figure~\ref{fig:4.1}). During the resolution, a Boolean token circulates around the virtual ring from a master processor to another until the global convergence -is achieved. So starting from the cluster with rank 1, each master processor $i$ +is achieved. So starting from the cluster with rank 1, each master processor $\ell$ sets the token to \textit{True} if the local convergence is achieved or to -\textit{False} otherwise, and sends it to master processor $i+1$. Finally, the +\textit{False} otherwise, and sends it to master processor $\ell+1$. Finally, the global convergence is detected when the master of cluster 1 receives from the master of cluster $L$ a token set to \textit{True}. In this case, the master of cluster 1 broadcasts a stop message to masters of other clusters. In this work, @@ -508,7 +508,8 @@ $\text{62}^\text{3} = \text{\np{238328}}$ to $\text{150}^\text{3} = \begin{table}[!t] \centering - \caption{2 clusters, each with 50 nodes} + \caption{Relative gain of the multisplitting algorithm compared to GMRES for + different configurations with 2 clusters, each one composed of 50 nodes.} \label{tab.cluster.2x50} \begin{mytable}{5} @@ -656,10 +657,10 @@ Note that the program was run with the following parameters: After analyzing the outputs, generally, for the two clusters including one hundred hosts configuration (Tables~\ref{tab.cluster.2x50}), some combinations of parameters affecting the results have given a relative gain more than 2.5, showing the effectiveness of the -asynchronous performance compared to the synchronous mode. +asynchronous multisplitting compared to GMRES with two distant clusters. With these settings, Table~\ref{tab.cluster.2x50} shows -that after a deterioration of inter cluster network with a bandwidth of \np[Mbit/s]{5} and a latency in order of one hundredth of millisecond and a processor power +that after setting the bandwidth of the inter cluster network to \np[Mbit/s]{5} and a latency in order of one hundredth of millisecond and a processor power of one GFlops, an efficiency of about \np[\%]{40} is obtained in asynchronous mode for a matrix size of 62 elements. It is noticed that the result remains stable even we vary the residual error precision from \np{E-5} to \np{E-9}. By @@ -689,10 +690,8 @@ elements. %\LZK{Ma question est: le bandwidth et latency sont ceux inter-clusters ou pour les deux inter et intra cluster??} %\CER{Définitivement, les paramètres réseaux variables ici se rapportent au réseau INTER cluster.} \section{Conclusion} -The experimental results on executing a parallel iterative algorithm in -asynchronous mode on an environment simulating a large scale of virtual -computers organized with interconnected clusters have been presented. -Our work has demonstrated that using such a simulation tool allow us to +The simulation of the execution of parallel asynchronous iterative algorithms on large scale clusters has been presented. +In this work, we show that SIMGRID is an efficient simulation tool that allows us to reach the following three objectives: \begin{enumerate} @@ -706,22 +705,23 @@ of the cluster and network specifications permitting to save time in executing the algorithm in asynchronous mode. \end{enumerate} Our results have shown that in certain conditions, asynchronous mode is -speeder up to \np[\%]{40} than executing the algorithm in synchronous mode +speeder up to \np[\%]{40} comparing to the synchronous GMRES method which is not negligible for solving complex practical problems with more and more increasing size. - Several studies have already addressed the performance execution time of +Several studies have already addressed the performance execution time of this class of algorithm. The work presented in this paper has demonstrated an original solution to optimize the use of a simulation tool to run efficiently an iterative parallel algorithm in asynchronous mode in a grid architecture. -\LZK{Perspectives???} +For our futur works, we plan to extend our experimentations to larger scale platforms by increasing the number of computing cores and the number of clusters. +We will also have to increase the size of the input problem which will require the use of a more powerful simulation platform. At last, we expect to compare our simulation results to real execution results on real architectures in order to experimentally validate our study. \section*{Acknowledgment} This work is partially funded by the Labex ACTION program (contract ANR-11-LABX-01-01). -\todo[inline]{The authors would like to thank\dots{}} +%\todo[inline]{The authors would like to thank\dots{}} % trigger a \newpage just before the given reference % number - used to balance the columns on the last page