X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/7314cfe257c8b75f34a34995a4a2075edc1d3888..664f844ffe608be37e65a2061489a0c09ebb731d:/hpcc.tex?ds=sidebyside

diff --git a/hpcc.tex b/hpcc.tex
index 865fb94..ae3a229 100644
--- a/hpcc.tex
+++ b/hpcc.tex
@@ -448,13 +448,10 @@ and with the addition of the primitive MPI\_Test was needed to avoid a memory fa
 \CER{On voulait en fait montrer la simplicitÃ© de l'adaptation de l'algo a SimGrid. Les problÃ¨mes rencontrÃ©s dÃ©crits dans ce paragraphe concerne surtout le mode async}\LZK{OK. J'aurais prÃ©fÃ©rÃ© avoir un peu plus de dÃ©tails sur l'adaptation de la version async} 
 \CER{Le problÃ¨me majeur sur l'adaptation MPI vers SMPI pour la partie asynchrone de l'algorithme a Ã©tÃ© le plantage en SMPI de Waitall aprÃ¨s un Isend et Irecv. J'avais proposÃ© un workaround en utilisant un MPI\_wait sÃ©parÃ© pour chaque Ã©change a la place d'un waitall unique pour TOUTES les Ã©changes, une instruction qui semble bien fonctionner en MPI. Ce workaround aussi fonctionne bien. Mais aprÃ¨s, tu as modifiÃ© le programme avec l'ajout d'un MPI\_Test, au niveau de la routine de dÃ©tection de la convergence et du coup, l'Ã©change global avec waitall a aussi fonctionnÃ©.}
 Note here that the use of SMPI functions optimizer for memory footprint and CPU usage is not recommended knowing that one wants to get real results by simulation.
-As mentioned, upon this adaptation, the algorithm is executed as in the real life in the simulated environment after the following minor changes. First, all declared 
-global variables have been moved to local variables for each subroutine. In fact, global variables generate side effects arising from the concurrent access of 
-shared memory used by threads simulating each computing unit in the SimGrid architecture. Second, the alignment of certain types of variables such as ``long int'' had
-also to be reviewed.
-\AG{Ã propos de ces problÃ¨mes d'alignement, en dire plus si Ã§a a un intÃ©rÃªt, ou l'enlever.}
-\CER{Ce problÃ¨me fait partie des modifications que j'ai dÃ» faire dans l'adaptation du programme MPI vers SMPI. IL dÃ©coule de la diffÃ©rence de la taille des mots en mÃ©moire : en 32 bits, pour les variables declarees en long int, on garde dans les instructions de sortie (printf, sprintf, ...) le format \%lu sinon en 64 bits, on le substitue par \%llu.} 
- Finally, some compilation errors on MPI\_Waitall and MPI\_Finalize primitives have been fixed with the latest version of SimGrid.
+As mentioned, upon this adaptation, the algorithm is executed as in the real life in the simulated environment after the following minor changes. First, the scope of all declared 
+global variables have been moved to local to subroutine. Indeed, global variables generate side effects arising from the concurrent access of 
+shared memory used by threads simulating each computing unit in the SimGrid architecture. 
+Second, some compilation errors on MPI\_Waitall and MPI\_Finalize primitives have been fixed with the latest version of SimGrid.
 In total, the initial MPI program running on the simulation environment SMPI gave after a very simple adaptation the same results as those obtained in a real 
 environment. We have successfully executed the code in synchronous mode using parallel GMRES algorithm compared with our multisplitting algorithm in asynchronous mode after few modifications. 
 
@@ -476,10 +473,6 @@ study that the results depend on the following parameters:
   compared to the asynchronous mode ($t_\text{sync} / t_\text{async}$) is defined as the \emph{relative gain}. So,
   our objective running the algorithm in SimGrid is to obtain a relative gain
   greater than 1.
-  \AG{$t_\text{async} / t_\text{sync} > 1$, l'objectif est donc que Ã§a dure plus
-    longtemps (que Ã§a aille moins vite) en asynchrone qu'en synchrone ?
-    Ce n'est pas plutÃ´t l'inverse ?}
-  \CER{J'ai modifie la phrase.}
 \end{itemize}
 
 A priori, obtaining a relative gain greater than 1 would be difficult in a local
@@ -512,51 +505,51 @@ $\text{62}^\text{3} = \text{\np{238328}}$ to $\text{150}^\text{3} =
   \caption{2 clusters, each with 50 nodes}
   \label{tab.cluster.2x50}
 
-  \begin{mytable}{6}
+  \begin{mytable}{5}
     \hline
-    bandwidth (Mbits/s)
-    & 5         & 5         & 5         & 5         & 5         & 50 \\
+    bandwidth (Mbit/s)
+    & 5         & 5         & 5         & 5         & 5         \\
     \hline
     latency (ms)
-    & 0.02      & 0.02      & 0.02      & 0.02      & 0.02      & 0.02 \\
+    & 0.02      & 0.02      & 0.02      & 0.02      & 0.02      \\
     \hline
     power (GFlops)
-    & 1         & 1         & 1         & 1.5       & 1.5       & 1.5 \\
+    & 1         & 1         & 1         & 1.5       & 1.5       \\
     \hline
     size
-    & 62        & 62        & 62        & 100       & 100       & 110 \\
+    & 62        & 62        & 62        & 100       & 100       \\
     \hline
     Precision
-    & \np{E-5}   & \np{E-8}  & \np{E-9}  & \np{E-11} & \np{E-11} & \np{E-11} \\
+    & \np{E-5}  & \np{E-8}  & \np{E-9}  & \np{E-11} & \np{E-11} \\
     \hline
     \hline
     Relative gain
-    & 2.52     & 2.55     & 2.52     & 2.57     & 2.54     & 2.53 \\
+    & 2.52      & 2.55      & 2.52      & 2.57      & 2.54      \\
     \hline
   \end{mytable}
 
   \bigskip
 
-  \begin{mytable}{6}
+  \begin{mytable}{5}
     \hline
-    bandwidth (Mbits/s)
-    & 50        & 50        & 50        & 50 \\ %       & 10        & 10 \\
+    bandwidth (Mbit/s)
+    & 50        & 50        & 50        & 50        & 50 \\ %       & 10        & 10 \\
     \hline
     latency (ms)
-    & 0.02      & 0.02      & 0.02      & 0.02 \\ %      & 0.03      & 0.01 \\
+    & 0.02      & 0.02      & 0.02      & 0.02      & 0.02 \\ %      & 0.03      & 0.01 \\
     \hline
     Power (GFlops)
-    & 1.5       & 1.5       & 1.5       & 1.5 \\ %      & 1         & 1.5 \\
+    & 1.5       & 1.5       & 1.5       & 1.5       & 1.5 \\ %      & 1         & 1.5 \\
     \hline
     size
-    & 120       & 130       & 140       & 150  \\ %     & 171       & 171 \\
+    & 110       & 120       & 130       & 140       & 150  \\ %     & 171       & 171 \\
     \hline
     Precision
-    & \np{E-11} & \np{E-11} & \np{E-11} & \np{E-11} \\ % & \np{E-5}  & \np{E-5} \\
+    & \np{E-11} & \np{E-11} & \np{E-11} & \np{E-11} & \np{E-11} \\ % & \np{E-5}  & \np{E-5} \\
     \hline
     \hline
     Relative gain
-    & 2.51     & 2.58     & 2.55     & 2.54   \\ %  & 1.59      & 1.29 \\
+    & 2.53      & 2.51     & 2.58     & 2.55     & 2.54   \\ %  & 1.59      & 1.29 \\
     \hline
   \end{mytable}
 \end{table}
@@ -628,18 +621,16 @@ Note that the program was run with the following parameters:
 
 \paragraph*{SMPI parameters}
 
-~\\{}\AG{Donner un peu plus de prÃ©cisions (plateforme en particulier).}
-\CER {PrÃ©cisions ajoutÃ©es}
-
 \begin{itemize}
 \item HOSTFILE: Text file containing the list of the processors units name. Here 100 hosts;
 \item PLATFORM: XML file description of the platform architecture : two clusters (cluster1 and cluster2) with the following characteristics :
-
-	- Processor unit power : 1.5 GFlops;
-
-	- Intracluster network : bandwidth = 1,25 Gbits/s and latency = 5E-05 ms;
-
-	- Intercluster network : bandwidth = 5 Mbits/s and latency = 5E-03 ms;
+  \begin{itemize}
+  \item Processor unit power: \np[GFlops]{1.5};
+  \item Intracluster network bandwidth: \np[Gbit/s]{1.25} and latency:
+    \np[$\mu$s]{0.05};
+  \item Intercluster network bandwidth: \np[Mbit/s]{5} and latency:
+    \np[$\mu$s]{5};
+  \end{itemize}
 \end{itemize}
 
 
@@ -668,7 +659,7 @@ obtained in asynchronous mode for a matrix size of 62 elements. It is noticed th
 stable even we vary the residual error precision from \np{E-5} to \np{E-9}. By
 increasing the matrix size up to 100 elements, it was necessary to increase the
 CPU power of \np[\%]{50} to \np[GFlops]{1.5} to get the algorithm convergence and the same order of asynchronous mode efficiency.  Maintaining such processor power but increasing network throughput inter cluster up to
-\np[Mbit/s]{50}, the result of efficiency with a relative gain of 2.5\AG[]{2.5 ?} is obtained with
+\np[Mbit/s]{50}, the result of efficiency with a relative gain of 2.5 is obtained with
 high external precision of \np{E-11} for a matrix size from 110 to 150 side
 elements.