docs/source/Platform_howtos.rst

   1 .. raw:: html
   2
   3    <object id="TOC" data="graphical-toc.svg" type="image/svg+xml"></object>
   4    <script>
   5    window.onload=function() { // Wait for the SVG to be loaded before changing it
   6      var elem=document.querySelector("#TOC").contentDocument.getElementById("PlatformBox")
   7      elem.style="opacity:0.93999999;fill:#ff0000;fill-opacity:0.1;stroke:#000000;stroke-width:0.35277778;stroke-linecap:round;stroke-linejoin:round;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1";
   8    }
   9    </script>
  10    <br/>
  11    <br/>
  12
  13 .. _howto:
  14
  15 Modeling Hints
  16 ##############
  17
  18 There is no perfect model. Only models that are adapted to the
  19 specific study that you want to do. SimGrid provides several advanced
  20 mechanisms that you can adapt to model the situation that you are
  21 interested in, and it is often uneasy to see where to start with.
  22 This page collects several hints and tricks on modeling situations.
  23 Even if you are looking for a very advanced, specific use case, these
  24 examples may help you to design the solution you need.
  25
  26 .. _howto_science:
  27
  28 Doing Science with SimGrid
  29 **************************
  30
  31 Many users are using SimGrid as a scientific instrument for their
  32 research. This tool was indeed invented to that extent, and we strive
  33 to streamline this kind of usage. But SimGrid is no magical tool, and
  34 it is of your responsibility that the tool actually provides sensible
  35 results. Fortunately, there is a vast literature on how to avoid
  36 Modeling & Simulations pitfalls. We review here some specific works.
  37
  38 In `An Integrated Approach to Evaluating Simulation Credibility
  39 <http://www.dtic.mil/dtic/tr/fulltext/u2/a405051.pdf>`_, the authors
  40 provide a methodology enabling the users to increase their confidence
  41 in the simulation tools they use. First of all, you must know what you
  42 actually expect to discover whether the tool actually covers your
  43 needs. Then, as they say, "a fool with a tool is still a fool", so you
  44 need to think about your methodology before you submit your articles.
  45 `Towards a Credibility Assessment of Models and Simulations
  46 <https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20080015742.pdf>`_
  47 gives a formal methodology to assess the credibility of your
  48 simulation results.
  49
  50 `Seven Pitfalls in Modeling and Simulation Research
  51 <https://dl.acm.org/citation.cfm?id=2430188>`_ is even more
  52 specific. Here are the listed pitfalls: (1) Don't know whether it's
  53 modeling or simulation, (2) No separation of concerns, (3) No clear
  54 scientific question, (4) Implementing everything from scratch, (5)
  55 Unsupported claims, (6) Toy duck approach, and (7) The tunnel view. As
  56 you can see, this article is a must read. It's a pity that it's not
  57 freely available, though.
  58
  59 .. _howto_calibration:
  60
  61 Getting realistic results
  62 *************************
  63
  64 The simulation models in SimGrid have been developed with care and the
  65 object of thorough validation/invalidation campaigns. These models
  66 come with parameters that configure their behaviors. The values of
  67 these parameters are set based on the :ref:`XML platform description
  68 file <platform>` and on parameters passed via :ref:`--cfg=Item:Value
  69 command-line arguments <options>`. A simulator may also include any
  70 number of custom model parameters that are used to instantiate
  71 particular simulated activities (e.g., a simulator developed with the
  72 S4U API typically defines volumes of computation, communication, and
  73 time to pass to methods such as :cpp:func:`execute()
  74 <simgrid::s4u::this_actor::execute>`, :cpp:func:`put()
  75 <simgrid::s4u::Mailbox::put>`, or :cpp:func:`sleep_for()
  76 <simgrid::s4u::this_actor::sleep_for>`).  Regardless of the potential
  77 accuracy of the simulation models, if they are instantiated with
  78 unrealistic parameter values, then the simulation will be inaccurate.
  79 The provided default values may or may not be appropriate for
  80 simulating a particular system.
  81
  82 Given the above, an integral and crucial part of simulation-driven
  83 research is **simulation calibration**: the process by which one picks
  84 simulation parameter values based on observed real-world executions so
  85 that simulated executions have high accuracy.  We then say that a
  86 simulator is "calibrated".  Once a simulator is calibrated for a
  87 real-world system, it can be used to simulate that system accurately.
  88 But it can also be used to simulate different but structurally
  89 similar systems (e.g., different scales, different basic hardware
  90 characteristics, different application workloads) with high confidence.
  91
  92 Research conclusions derived from simulation results obtained with an
  93 uncalibrated simulator are questionable in terms of their relevance
  94 for real-world systems. Unfortunately, because simulation calibration
  95 is often a painstaking process, is it often not performed sufficiently
  96 thoroughly (or at all!). We strongly urge SimGrid users to perform
  97 simulation calibration. Here is an example of a research publication
  98 in which the authors have calibrated their (SimGrid) simulators:
  99 https://hal.inria.fr/hal-01523608
 100
 101
 102 .. _howto_churn:
 103
 104 Modeling churn (e.g., in P2P)
 105 *****************************
 106
 107 One of the biggest challenges in P2P settings is to cope with the
 108 churn, meaning that resources keep appearing and disappearing. In
 109 SimGrid, you can always change the state of each host manually, with
 110 eg :cpp:func:`simgrid::s4u::Host::turn_on`. To reduce the burden when
 111 the churn is high, you can also attach a **state profile** to the host
 112 directly.
 113
 114 This can be done through the XML file, using the ``state_file``
 115 attribute of :ref:`pf_tag_host`, :ref:`pf_tag_cluster` or
 116 :ref:`pf_tag_link`. Every line (but the last) of such files describes
 117 timed events with the form "date value". Example:
 118
 119 .. code-block:: python
 120
 121    1 0
 122    2 1
 123    LOOPAFTER 8
 124
 125 This file uses a cryptic yet simple formalism:
 126
 127   * At time t = 1, the host is turned off (a zero value means OFF).
 128   * At time t = 2, the host is turned back on (any other value than zero means ON).
 129   * At time t = 10, the profile is reset (as we are 8 seconds after the last event). Then the host will be turned off again at time t = 11.
 130
 131 If your profile does not contain any LOOPAFTER line, then it will be executed only once and not in a repetitive way.
 132
 133 Another possibility is to use the
 134 :cpp:func:`simgrid::s4u::Host::set_state_profile()` or
 135 :cpp:func:`simgrid::s4u::Link::set_state_profile()` functions. These
 136 functions take a profile, that can be a fixed profile exhaustively
 137 listing the events, or something else if you wish.
 138
 139 .. _howto_multicore:
 140
 141 Modeling multicore machines
 142 ***************************
 143
 144 Default model
 145 =============
 146
 147 Multicore machines are very complex, and there are many ways to model
 148 them. The default models of SimGrid are coarse grain and capture some
 149 elements of this reality. Here is how to declare simple multicore hosts:
 150
 151 .. code-block:: xml
 152
 153    <host id="mymachine" speed="8Gf" core="4"/>
 154
 155 It declares a 4-core host called "mymachine", each core computing 8
 156 GFlops per second. If you put one activity of 8 GFlops on this host, it
 157 will be computed in 1 second (by default, activities are
 158 single-threaded and cannot leverage the computing power of more than
 159 one core). If you run two such activities simultaneously, they will still be
 160 computed in one second, and so on up to 4 activities. If you start 5 activities,
 161 they will share the total computing power, and each activity will be
 162 computed in 5/4 = 1.25 seconds. This is a very simple model, but that is
 163 all what you get by default from SimGrid.
 164
 165 Pinning tasks to cores
 166 ======================
 167
 168 The default model does not account for task pinning, where you
 169 manually select on which core each of the existing activity should
 170 execute. The best solution to model this is probably to model your
 171 4-core processor as 4 distinct hosts, and assigning the activities to
 172 cores by migrating them to the declared hosts. In some sense, this
 173 takes the whole Network-On-Chip idea really seriously.
 174
 175 Some extra complications may arise here. If you have more activities than
 176 cores, you'll have to `schedule your activities
 177 <https://en.wikipedia.org/wiki/Scheduling_%28computing%29#Operating_system_process_scheduler_implementations)>`_
 178 yourself on the cores (so you'd better avoid this complexity). Since
 179 you cannot have more than one network model in a given SimGrid
 180 simulation, you will end up with a TCP connection between your cores. A
 181 possible work around is to never start any simulated communication
 182 between the cores and have the same routes from each core to the
 183 rest of the external network.
 184
 185 Modeling a multicore CPU as a set of SimGrid hosts may seem strange
 186 and unconvincing, but some users achieved very realistic simulations
 187 of multicore and GPU machines this way.
 188
 189 Modeling machine boot and shutdown periods
 190 ******************************************
 191
 192 When a physical host boots up, a lot of things happen. It takes time
 193 during which the machine is not usable but dissipates energy, and
 194 programs actually die and restart during a reboot. Since there are many
 195 ways to model it, SimGrid does not do any modeling choice for you but
 196 the most obvious ones.
 197
 198 Any actor (or process in MSG) running on a host that is shut down
 199 will be killed and all its activities (tasks in MSG) will be
 200 automatically canceled. If the actor killed was marked as
 201 auto-restartable (with
 202 :cpp:func:`simgrid::s4u::Actor::set_auto_restart` or with
 203 :cpp:func:`MSG_process_auto_restart_set`), it will start anew with the
 204 same parameters when the host boots back up.
 205
 206 By default, shutdowns and boots are instantaneous. If you want to
 207 add an extra delay, you have to do that yourself, for example from a
 208 `controller` actor that runs on another host. The best way to do so is
 209 to declare a fictional pstate where the CPU delivers 0 flop per
 210 second (so every activity on that host will be frozen when the host is
 211 in this pstate). When you want to switch the host off, your controller
 212 switches the host to that specific pstate (with
 213 :cpp:func:`simgrid::s4u::Host::set_pstate`), waits for the amount of
 214 time that you decided necessary for your host to shut down, and turns
 215 the host off (with :cpp:func:`simgrid::s4u::Host::turn_off`). To boot
 216 up, switch the host on, go into the specific pstate, wait a while and
 217 go to a more regular pstate.
 218
 219 To model the energy dissipation, you need to put the right energy
 220 consumption in your startup/shutdown specific pstate. Remember that
 221 the energy consumed is equal to the instantaneous consumption
 222 multiplied by the time in which the host keeps in that state. Do the
 223 maths, and set the right instantaneous consumption to your pstate, and
 224 you'll get the whole boot period to consume the amount of energy that
 225 you want. You may want to have one fictional pstate for the boot
 226 period and another one for the shutdown period.
 227
 228 Of course, this is only one possible way to model these things. YMMV ;)
 229
 230 .. _howto_parallel_links:
 231
 232 Modeling parallel links
 233 ***********************
 234
 235 Most HPC topologies, such as fat-trees, allow parallel links (a
 236 router A and a router B can be connected by more than one link).
 237 You might be tempted to model this configuration as follows :
 238
 239 .. code-block:: xml
 240
 241     <router id="routerA"/>
 242     <router id="routerB"/>
 243
 244     <link id="link1" bandwidth="10GBps" latency="2us"/>
 245     <link id="link2" bandwidth="10GBps" latency="2us"/>
 246
 247     <route src="routerA" dst="routerB">
 248         <link_ctn id="link1"/>
 249     </route>
 250     <route src="routerA" dst="routerB">
 251         <link_ctn id="link2"/>
 252     </route>
 253
 254 But that will not work, since SimGrid doesn't allow several routes for
 255 a single `{src ; dst}` pair. Instead, what you should do is:
 256
 257   - Use a single route with both links (so both will be traversed
 258     each time a message is exchanged between router A and B)
 259
 260   - Double the bandwidth of one link, to model the total bandwidth of
 261     both links used in parallel. This will make sure no combined
 262     communications between router A and B use more than the bandwidth
 263     of two links
 264
 265   - Assign the other link a `FATPIPE` sharing policy, which will allow
 266     several communications to use the full bandwidth of this link without
 267     having to share it. This will model the fact that individual
 268     communications can use at most this link's bandwidth
 269
 270   - Set the latency of one of the links to 0, so that latency is only
 271     accounted for once (since both link are traversed by each message)
 272
 273 So the final platform for our example becomes :
 274
 275 .. code-block:: xml
 276
 277     <router id="routerA"/>
 278     <router id="routerB"/>
 279
 280     <!-- This link limits the total bandwidth of all parallel communications -->
 281     <link id="link1" bandwidth="20GBps" latency="2us"/>
 282
 283     <!-- This link only limits the bandwidth of individual communications -->
 284     <link id="link2" bandwidth="10GBps" latency="0us" sharing_policy="FATPIPE"/>
 285
 286     <!-- Each message traverses both links -->
 287     <route src="routerA" dst="routerB">
 288         <link_ctn id="link1"/>
 289         <link_ctn id="link2"/>
 290     </route>
 291
 292 .. include:: tuto_disk/analysis.irst
 293
 294 .. include:: tuto_network_calibration/network_calibration_tutorial.rst