docs/source/platform_howtos.rst

   1 .. _platform:
   2
   3 .. raw:: html
   4
   5    <object id="TOC" data="graphical-toc.svg" type="image/svg+xml"></object>
   6    <script>
   7    window.onload=function() { // Wait for the SVG to be loaded before changing it
   8      var elem=document.querySelector("#TOC").contentDocument.getElementById("PlatformBox")
   9      elem.style="opacity:0.93999999;fill:#ff0000;fill-opacity:0.1;stroke:#000000;stroke-width:0.35277778;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1";
  10    }
  11    </script>
  12    <br/>
  13    <br/>
  14
  15 .. _howto:
  16
  17 Modeling Hints
  18 ##############
  19
  20 There is no perfect model. Only models that are adapted to the
  21 specific study that you want to do. SimGrid provides several advanced
  22 mechanisms that you can adapt to model the situation that you are
  23 interested in, and it is often uneasy to see where to start with.
  24 This page collects several hints and tricks on modeling situations.
  25 Even if you are looking for a very advanced, specific use case, these
  26 examples may help you to design the solution you need.
  27
  28 .. _howto_science:
  29
  30 Doing Science with SimGrid
  31 **************************
  32
  33 Many users are using SimGrid as a scientific instrument for their
  34 research. This tool was indeed invented to that extent, and we strive
  35 to streamline this kind of usage. But SimGrid is no magical tool, and
  36 it is of your responsibility that the tool actually provides sensible
  37 results. Fortunately, there is a vast literature on how to avoid
  38 Modeling & Simulations pitfalls. We review here some specific works.
  39
  40 In `An Integrated Approach to Evaluating Simulation Credibility
  41 <http://www.dtic.mil/dtic/tr/fulltext/u2/a405051.pdf>`_, the authors
  42 provide a methodology enabling the users to increase their confidence
  43 in the simulation tools they use. First of all, you must know what you
  44 actually expect to discover whether the tool actually covers your
  45 needs. Then, as they say, "a fool with a tool is still a fool", so you
  46 need to think about your methodology before you submit your articles.
  47 `Towards a Credibility Assessment of Models and Simulations
  48 <https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20080015742.pdf>`_
  49 gives a formal methodology to assess the credibility of your
  50 simulation results.
  51
  52 `Seven Pitfalls in Modeling and Simulation Research
  53 <https://dl.acm.org/citation.cfm?id=2430188>`_ is even more
  54 specific. Here are the listed pitfalls: (1) Don't know whether it's
  55 modeling or simulation, (2) No separation of concerns, (3) No clear
  56 scientific question, (4) Implementing everything from scratch, (5)
  57 Unsupported claims, (6) Toy duck approach, and (7) The tunnel view. As
  58 you can see, this article is a must read. It's a pity that it's not
  59 freely available, though.
  60
  61 .. _howto_churn:
  62
  63 Modeling Churn (e.g., in P2P)
  64 *****************************
  65
  66 One of the biggest challenges in P2P settings is to cope with the
  67 churn, meaning that resources keep appearing and disappearing. In
  68 SimGrid, you can always change the state of each host manually, with
  69 eg :cpp:func:`simgrid::s4u::Host::turn_on`. To reduce the burden when
  70 the churn is high, you can also attach a **state profile** to the host
  71 directly.
  72
  73 This can be done through the XML file, using the ``state_file``
  74 attribute of :ref:`pf_tag_host`, :ref:`pf_tag_cluster` or
  75 :ref:`pf_tag_link`. Every line (but the last) of such files describes
  76 timed events with the form "date value". Example:
  77
  78 .. code-block:: python
  79
  80    1 0
  81    2 1
  82    LOOPAFTER 8
  83
  84   - At time t = 1, the host is turned off (a zero value means OFF)
  85   - At time t = 2, the host is turned back on (any other value than zero means ON)
  86   - At time t = 10, the profile is reset (as we are 8 seconds after the last event). Then the host will be turned off
  87     again at time t = 11.
  88
  89    If your profile does not contain any LOOPAFTER line, then it will be executed only once and not in a repetitive way.
  90
  91 Another possibility is to use the
  92 :cpp:func:`simgrid::s4u::Host::set_state_profile()` or
  93 :cpp:func:`simgrid::s4u::Link::set_state_profile()` functions. These
  94 functions take a profile, that can be a fixed profile exhaustively
  95 listing the events, or something else if you wish.
  96
  97 .. _howto_multicore:
  98
  99 Modeling Multicore Machines
 100 ***************************
 101
 102 Default Model
 103 =============
 104
 105 Multicore machines are very complex, and there are many ways to model
 106 them. The default models of SimGrid are coarse grain and capture some
 107 elements of this reality. Here is how to declare simple multicore hosts:
 108
 109 .. code-block:: xml
 110
 111    <host id="mymachine" speed="8Gf" core="4"/>
 112
 113 It declares a 4-core host called "mymachine", each core computing 8
 114 GFlops per second. If you put one activity of 8 GFlops on this host, it
 115 will be computed in 1 second (by default, activities are
 116 single-threaded and cannot leverage the computing power of more than
 117 one core). If you run two such activities simultaneously, they will still be
 118 computed in one second, and so on up to 4 activities. If you start 5 activities,
 119 they will share the total computing power, and each activity will be
 120 computed in 5/4 = 1.25 seconds. This is a very simple model, but that is
 121 all what you get by default from SimGrid.
 122
 123 Pinning tasks to cores
 124 ======================
 125
 126 The default model does not account for task pinning, where you
 127 manually select on which core each of the existing activity should
 128 execute. The best solution to model this is probably to model your
 129 4-core processor as 4 distinct hosts, and assigning the activities to
 130 cores by migrating them to the declared hosts. In some sense, this
 131 takes the whole Network-On-Chip idea really seriously.
 132
 133 Some extra complications may arise here. If you have more activities than
 134 cores, you'll have to `schedule your activities
 135 <https://en.wikipedia.org/wiki/Scheduling_%28computing%29#Operating_system_process_scheduler_implementations)>`_
 136 yourself on the cores (so you'd better avoid this complexity). Since
 137 you cannot have more than one network model in a given SimGrid
 138 simulation, you will end up with a TCP connection between your cores. A
 139 possible work around is to never start any simulated communication
 140 between the cores and have the same routes from each core to the
 141 rest of the external network.
 142
 143 Modeling a multicore CPU as a set of SimGrid hosts may seem strange
 144 and unconvincing, but some users achieved very realistic simulations
 145 of multicore and GPU machines this way.
 146
 147 Modeling machine boot and shutdown periods
 148 ********************************************
 149
 150 When a physical host boots up, a lot of things happen. It takes time
 151 during which the machine is not usable but dissipates energy, and
 152 programs actually die and restart during a reboot. Since there are many
 153 ways to model it, SimGrid does not do any modeling choice for you but
 154 the most obvious ones.
 155
 156 Any actor (or process in MSG) running on a host that is shut down
 157 will be killed and all its activities (tasks in MSG) will be
 158 automatically canceled. If the actor killed was marked as
 159 auto-restartable (with
 160 :cpp:func:`simgrid::s4u::Actor::set_auto_restart` or with
 161 :cpp:func:`MSG_process_auto_restart_set`), it will start anew with the
 162 same parameters when the host boots back up.
 163
 164 By default, shutdowns and boots are instantaneous. If you want to
 165 add an extra delay, you have to do that yourself, for example from a
 166 `controller` actor that runs on another host. The best way to do so is
 167 to declare a fictional pstate where the CPU delivers 0 flop per
 168 second (so every activity on that host will be frozen when the host is
 169 in this pstate). When you want to switch the host off, your controller
 170 switches the host to that specific pstate (with
 171 :cpp:func:`simgrid::s4u::Host::set_pstate`), waits for the amount of
 172 time that you decided necessary for your host to shut down, and turns
 173 the host off (with :cpp:func:`simgrid::s4u::Host::turn_off`). To boot
 174 up, switch the host on, go into the specific pstate, wait a while and
 175 go to a more regular pstate.
 176
 177 To model the energy dissipation, you need to put the right energy
 178 consumption in your startup/shutdown specific pstate. Remember that
 179 the energy consumed is equal to the instantaneous consumption
 180 multiplied by the time in which the host keeps in that state. Do the
 181 maths, and set the right instantaneous consumption to your pstate, and
 182 you'll get the whole boot period to consume the amount of energy that
 183 you want. You may want to have one fictional pstate for the boot
 184 period and another one for the shutdown period.
 185
 186 Of course, this is only one possible way to model these things. YMMV ;)
 187
 188 .. _understanding_lv08
 189
 190 Understanding the default TCP model
 191 ***********************************
 192 When simulating a data transfer between two hosts, you may be surprised
 193 by the obtained simulation time. Lets consider the following platform:
 194
 195 .. code-block:: xml
 196
 197    <host id="A" speed="1Gf"/>
 198    <host id="B" speed="1Gf"/>
 199
 200    <link id="link1" latency="10ms" bandwidth="1Mbps"/>
 201
 202    <route src="A" dst="B>
 203      <link_ctn id="link1/>
 204    </route>
 205
 206 If host `A` sends `100kB` (a hundred kilobytes) to host `B`, one could expect
 207 that this communication would take `0.81` seconds to complete according to a
 208 simple latency-plus-size-divided-by-bandwidth model (0.01 + 8e5/1e6 = 0.81).
 209 However, the default TCP model of SimGrid is a bit more complex than that. It
 210 accounts for three phenomena that directly impact the simulation time even
 211 on such a simple example:
 212
 213   - The size of a message at the application level (i.e., 100kB in this
 214     example) is not the size that will actually be transferred over the
 215     network. To mimic the fact that TCP and IP headers are added to each packet of
 216     the original payload, the TCP model of SimGrid empirically considers that
 217     `only 97% of the nominal bandwidth` are available. In other words, the
 218     size of your message is increased by a few percents, whatever this size be.
 219
 220   - In the real world, the TCP protocol is not able to fully exploit the
 221     bandwidth of a link from the emission of the first packet. To reflect this
 222     `slow start` phenomenon, the latency declared in the platform file is
 223     multiplied by `a factor of 13.01`. Here again, this is an empirically
 224     determined value that may not correspond to every TCP implementations on
 225     every networks. It can be tuned when more realistic simulated times for
 226     short messages are needed though.
 227
 228   - When data is transferred from A to B, some TCP ACK messages travel in the
 229     opposite direction. To reflect the impact of this `cross-traffic`, SimGrid
 230     simulates a flow from B to A that represents an additional bandwidth
 231     consumption of `0.05`. The route from B to A is implicity declared in the
 232     platfrom file and uses the same link `link1` as if the two hosts were
 233     connected through a communication bus. The bandwidth share allocated to the
 234     flow from A to B is then the available bandwidth of `link1` (i.e., 97% of
 235     the nominal bandwidth of 1Mb/s) divided by 1.05 (i.e., the total consumption).
 236     This feature, activated by default, can be disabled by adding the
 237     `--cfg=network/crosstraffic:0` flag to command line.
 238
 239 As a consequence, the time to transfer 100kB from A to B as simulated by the
 240 default TCP model of SimGrid is not 0.81 seconds but
 241
 242 .. code-block:: python
 243
 244     0.01 * 13.01 + 800000 / ((0.97 * 1e6) / 1.05) =  0.996079 seconds.