From: Martin Quinson <martin.quinson@ens-rennes.fr>
Date: Wed, 5 May 2021 15:43:52 +0000 (+0200)
Subject: Extend the description of the IB configuration
X-Git-Tag: v3.28~325
X-Git-Url: http://bilbo.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/commitdiff_plain/37dc62cfad50da4014f99f18f8a8dc24c994fe00

Extend the description of the IB configuration
---

diff --git a/docs/source/Configuring_SimGrid.rst b/docs/source/Configuring_SimGrid.rst
index 08e9d63473..a5497a827d 100644
--- a/docs/source/Configuring_SimGrid.rst
+++ b/docs/source/Configuring_SimGrid.rst
@@ -347,7 +347,6 @@ and you should use the last one, which is the maximal size.
    cat /proc/sys/net/ipv4/tcp_rmem # gives the sender window
    cat /proc/sys/net/ipv4/tcp_wmem # gives the receiver window
 
-.. _cfg=smpi/IB-penalty-factors:
 .. _cfg=network/bandwidth-factor:
 .. _cfg=network/latency-factor:
 .. _cfg=network/weight-S:
@@ -370,15 +369,63 @@ exchange.  By default SMPI uses factors computed on the Stampede
 Supercomputer at TACC, with optimal deployment of processes on
 nodes. Again, only hardcore experts should bother about this fact.
 
-InfiniBand network behavior can be modeled through 3 parameters
-``smpi/IB-penalty-factors:"Î²e;Î²s;Î³s"``, as explained in `this PhD
-thesis
-<http://mescal.imag.fr/membres/jean-marc.vincent/index.html/PhD/Vienne.pdf>`_.
 
 .. todo:: This section should be rewritten, and actually explain the
 	  options network/bandwidth-factor, network/latency-factor,
 	  network/weight-S.
 
+.. _cfg=smpi/IB-penalty-factors:
+
+Infiniband model
+^^^^^^^^^^^^^^^^
+
+InfiniBand network behavior can be modeled through 3 parameters
+``smpi/IB-penalty-factors:"Î²e;Î²s;Î³s"``, as explained in `this PhD
+thesis
+<http://mescal.imag.fr/membres/jean-marc.vincent/index.html/PhD/Vienne.pdf>`_ (in French)
+or more concisely in `this paper <https://hal.inria.fr/hal-00953618/document>`_,
+even if that paper does only describe models for myrinet and ethernet.
+You can see in Fig 2 some results for Infiniband, for example. This model
+may be outdated by now for modern infiniband, anyway, so a new
+validation would be good. 
+
+The three paramaters are defined as follows:
+
+- Î²s: penalty factor for outgoing messages, computed by running a simple send to
+  two nodes and checking slowdown compared to a single send to one node,
+  dividing by 2
+- Î²e: penalty factor for ingoing messages, same computation method but with one
+  node receiving several messages
+- Î³r: slowdown factor when communication buffer memory is saturated. It needs a
+  more complicated pattern to run in order to be computed (5.3 in the thesis,
+  page 107), and formula in the end is Î³r = time(c)/(3ÃÎ²eÃtime(ref)), where
+  time(ref) is the time of a single comm with no contention).
+
+Once these values are computed, a penalty is assessed for each message (this is
+the part implemented in the simulator) as shown page 106 of the thesis. Here is
+a simple translation of this text. First, some notations:
+
+- âe(e) which corresponds to the incoming degree of node e, that is to say the number of communications having as destination node e.
+- âs (s) which corresponds to the degree outgoing from node s, that is to say the number of communications sent by node s.
+- Î¦ (e) which corresponds to the number of communications destined for the node e but coming from a different node.
+- Î© (s, e) which corresponds to the number of messages coming from node s to node e. If node e only receives communications from different nodes then Î¦ (e) = âe (e). On the other hand if, for example, there are three messages coming from node s and going from node e then Î¦ (e) 6 = âe (e) and Î© (s, e) = 3
+
+To determine the penalty for a communication, two values need to be calculated. First, the penalty caused by the conflict in transmission, noted ps.
+
+
+- if âs (i) = 1 then ps = 1. 
+- if âs (i) â¥ 2 and âe (i) â¥ 3 then ps = âs (i) Ã Î²s Ã Î³r
+- else, ps = âs (i) Ã Î²s 
+
+
+Then,  the penalty caused by the conflict in reception (noted pe) should be computed as follows:
+
+- if âe (i) = 1 then pe = 1
+- else, pe = Î¦ (e) Ã Î²e Ã Î© (s, e) 
+
+Finally, the penalty associated with the communication is:
+p = max (ps â s, pe)
+
 .. _cfg=network/crosstraffic:
 
 Simulating Cross-Traffic
@@ -1082,6 +1129,7 @@ https://framagit.org/simgrid/platform-calibration/
 https://simgrid.org/contrib/smpi-saturation-doc.html
 
 .. _cfg=smpi/display-timing:
+
 Reporting Simulation Time
 .........................
 
@@ -1099,8 +1147,9 @@ in application code and in SMPI internals, to provide hints about the
 need to use sampling to reduce simulation time.
 
 .. _cfg=smpi/display-allocs:
+
 Reporting memory allocations
-.........................
+............................
 
 **Option** ``smpi/display-allocs`` **Default:** 0 (false)
 
@@ -1457,8 +1506,9 @@ Then, you can pass the option
 actually activate the huge page support in shared mallocs.
 
 .. _cfg=smpi/auto-shared-malloc-thresh:
+
 Automatically share allocations
-.........................
+...............................
 
 **Option** ``smpi/auto-shared-malloc-thresh:`` **Default:** 0 (false)
    This value in bytes represents the size above which all allocations