X-Git-Url: http://bilbo.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/209b2c14ca411bd87d8c1eb856cb7bf5013d76ef..abf3b21ad3f4a19475431076e6e2b26f76d2e7d6:/docs/source/Configuring_SimGrid.rst diff --git a/docs/source/Configuring_SimGrid.rst b/docs/source/Configuring_SimGrid.rst index 05db361b2c..d06882f609 100644 --- a/docs/source/Configuring_SimGrid.rst +++ b/docs/source/Configuring_SimGrid.rst @@ -105,22 +105,17 @@ Existing Configuration Items - **host/model:** :ref:`options_model_select` -- **maxmin/precision:** :ref:`cfg=maxmin/precision` - **maxmin/concurrency-limit:** :ref:`cfg=maxmin/concurrency-limit` - **model-check:** :ref:`options_modelchecking` -- **model-check/checkpoint:** :ref:`cfg=model-check/checkpoint` - **model-check/communications-determinism:** :ref:`cfg=model-check/communications-determinism` - **model-check/dot-output:** :ref:`cfg=model-check/dot-output` - **model-check/max-depth:** :ref:`cfg=model-check/max-depth` -- **model-check/property:** :ref:`cfg=model-check/property` - **model-check/reduction:** :ref:`cfg=model-check/reduction` - **model-check/replay:** :ref:`cfg=model-check/replay` - **model-check/send-determinism:** :ref:`cfg=model-check/send-determinism` - **model-check/setenv:** :ref:`cfg=model-check/setenv` -- **model-check/termination:** :ref:`cfg=model-check/termination` - **model-check/timeout:** :ref:`cfg=model-check/timeout` -- **model-check/visited:** :ref:`cfg=model-check/visited` - **network/bandwidth-factor:** :ref:`cfg=network/bandwidth-factor` - **network/crosstraffic:** :ref:`cfg=network/crosstraffic` @@ -140,7 +135,8 @@ Existing Configuration Items - **storage/max_file_descriptors:** :ref:`cfg=storage/max_file_descriptors` -- **surf/precision:** :ref:`cfg=surf/precision` +- **precision/timing:** :ref:`cfg=precision/timing` +- **precision/work-amount:** :ref:`cfg=precision/work-amount` - **For collective operations of SMPI,** please refer to Section :ref:`cfg=smpi/coll-selector` - **smpi/auto-shared-malloc-thresh:** :ref:`cfg=smpi/auto-shared-malloc-thresh` @@ -228,30 +224,20 @@ models for all existing resources. `_. - **ns-3** (only available if you compiled SimGrid accordingly): Use the packet-level network - simulators as network models (see :ref:`model_ns3`). + simulators as network models (see :ref:`models_ns3`). This model can be :ref:`further configured `. -- ``cpu/model``: specify the used CPU model. We have only one model - for now: +- ``cpu/model``: specify the used CPU model. We have only one model for now: - **Cas01:** Simplistic CPU model (time=size/speed) -- ``host/model``: The host concept is the aggregation of a CPU with a - network card. Three models exists, but actually, only 2 of them are - interesting. The "compound" one is simply due to the way our - internal code is organized, and can easily be ignored. So at the - end, you have two host models: The default one allows aggregation of - an existing CPU model with an existing network model, but does not - allow parallel tasks because these beasts need some collaboration - between the network and CPU model. - - - **default:** Default host model. Currently, CPU:Cas01 and - network:LV08 (with cross traffic enabled) - - **compound:** Host model that is automatically chosen if - you change the network and CPU models - - **ptask_L07:** Host model somehow similar to Cas01+CM02 but - allowing "parallel tasks", that are intended to model the moldable - tasks of the grid scheduling literature. +- ``host/model``: we have two such models for now. + + - **default:** Default host model. It simply uses the otherwise configured models for cpu, disk and network (i.e. CPU:Cas01, + disk:S19 and network:LV08 by default) + - **ptask_L07:** This model is mandatory if you plan to use parallel tasks (and useless otherwise). ptasks are intended to + model the moldable tasks of the grid scheduling literature. A specific host model is necessary because each such activity + has a both compute and communicate components, so the CPU and network models must be mixed together. - ``storage/model``: specify the used storage model. Only one model is provided so far. @@ -268,7 +254,7 @@ Solver The different models rely on a linear inequalities solver to share the underlying resources. SimGrid allows you to change the solver, but be cautious, **don't change it unless you are 100% sure**. - + - items ``cpu/solver``, ``network/solver``, ``disk/solver`` and ``host/solver`` allow you to change the solver for each model: @@ -313,14 +299,14 @@ configurations. and slow pattern that follows the actual dependencies. .. _cfg=bmf/precision: -.. _cfg=maxmin/precision: -.. _cfg=surf/precision: +.. _cfg=precision/timing: +.. _cfg=precision/work-amount: Numerical Precision ................... -**Option** ``maxmin/precision`` **Default:** 1e-5 (in flops or bytes) |br| -**Option** ``surf/precision`` **Default:** 1e-9 (in seconds) |br| +**Option** ``precision/timing`` **Default:** 1e-9 (in seconds) |br| +**Option** ``precision/work-amount`` **Default:** 1e-5 (in flops or bytes) |br| **Option** ``bmf/precision`` **Default:** 1e-12 (no unit) The analytical models handle a lot of floating point values. It is @@ -373,16 +359,17 @@ Maximal TCP Window Size **Option** ``network/TCP-gamma`` **Default:** 4194304 -The analytical models need to know the maximal TCP window size to take -the TCP congestion mechanism into account. On Linux, this value can -be retrieved using the following commands. Both give a set of values, -and you should use the last one, which is the maximal size. +The analytical models need to know the maximal TCP window size to take the TCP congestion mechanism into account (see +:ref:`this page ` for details). On Linux, this value can be retrieved using the following commands. +Both give a set of values, and you should use the last one, which is the maximal size. .. code-block:: console $ cat /proc/sys/net/ipv4/tcp_rmem # gives the sender window $ cat /proc/sys/net/ipv4/tcp_wmem # gives the receiver window +If you want to disable the TCP windowing mechanism, set this parameter to 0. + .. _cfg=network/bandwidth-factor: .. _cfg=network/latency-factor: .. _cfg=network/weight-S: @@ -441,8 +428,8 @@ This was also computed on the Stampede Supercomputer. **Option** ``network/weight-S`` **Default:** depends on the model Value used to account for RTT-unfairness when sharing a bottleneck (network connections with a large RTT are generally penalized -against those with a small one). Described in `Accuracy Study and Improvement of Network Simulation in the SimGrid Framework -`_ +against those with a small one). See :ref:`models_TCP` and also this scientific paper: `Accuracy Study and Improvement of Network +Simulation in the SimGrid Framework `_ Default values for ``CM02`` is 0. ``LV08`` sets it to 20537 while both ``SMPI`` and ``IB`` set it to 8775. @@ -465,7 +452,7 @@ Infiniband model InfiniBand network behavior can be modeled through 3 parameters ``smpi/IB-penalty-factors:"βe;βs;γs"``, as explained in `the PhD -thesis of Jean-Marc Vincent +thesis of Jérôme Vienne `_ (in French) or more concisely in `this paper `_, even if that paper does only describe models for myrinet and ethernet. @@ -536,8 +523,6 @@ Note that with the default host model this option is activated by default. Simulating Asynchronous Send ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -(this configuration item is experimental and may change or disappear) - It is possible to specify that messages below a certain size (in bytes) will be sent as soon as the call to MPI_Send is issued, without waiting for the correspondent receive. This threshold can be configured through @@ -556,13 +541,12 @@ are meant to be detached as well. Configuring ns-3 ^^^^^^^^^^^^^^^^ -**Option** ``ns3/TcpModel`` **Default:** "default" (ns-3 default) +**Option** ``ns3/NetworkModel`` **Default:** "default" (ns-3 default TCP) -When using ns-3, there is an extra item ``ns3/TcpModel``, corresponding -to the ``ns3::TcpL4Protocol::SocketType`` configuration item in -ns-3. The only valid values (enforced on the SimGrid side) are -'default' (no change to the ns-3 configuration), 'NewReno' or 'Reno' or -'Tahoe'. +When using ns-3, the item ``ns3/NetworkModel`` can be used to switch between TCP or UDP, and switch the used TCP variante. If +the item is left unchanged, ns-3 uses the default TCP implementation. With a value of "UDP", ns-3 is set to use UDP instead. +With the value of either 'NewReno' or 'Cubic', the ``ns3::TcpL4Protocol::SocketType`` configuration item in ns-3 is set to the +corresponding protocol. **Option** ``ns3/seed`` **Default:** "" (don't set the seed in ns-3) @@ -643,9 +627,12 @@ receive call for small messages sent when the system buffers are empty. In SMPI, this depends on the message size, that is compared against two thresholds: - if (size < :ref:`smpi/async-small-thresh `) then - MPI_Send returns immediately, even if the corresponding receive has not be issued yet. -- if (:ref:`smpi/async-small-thresh ` < size < :ref:`smpi/send-is-detached-thresh `) then - MPI_Send returns as soon as the corresponding receive has been issued. This is known as the eager mode. + MPI_Send returns immediately, and the message is sent even if the + corresponding receive has not be issued yet. This is known as the eager mode. +- if (:ref:`smpi/async-small-thresh ` < size < + :ref:`smpi/send-is-detached-thresh `) then + MPI_Send also returns immediately, but SMPI waits for the corresponding + receive to be posted, in order to perform the communication operation. - if (:ref:`smpi/send-is-detached-thresh ` < size) then MPI_Send returns only when the message has actually been sent over the network. This is known as the rendez-vous mode. @@ -654,102 +641,50 @@ The ``smpi/buffering`` (only valid with MC) option gives an easier interface to - **zero:** means that buffering should be disabled. All communications are actually blocking. - **infty:** means that buffering should be made infinite. All communications are non-blocking. -.. _cfg=model-check/property: - -Specifying a liveness property -.............................. - -**Option** ``model-check/property`` **Default:** unset - -If you want to specify liveness properties, you have to pass them on -the command line, specifying the name of the file containing the -property, as formatted by the `ltl2ba `_ program. -Note that ltl2ba is not part of SimGrid and must be installed separately. - -.. code-block:: console - - $ simgrid-mc ./my_program --cfg=model-check/property: - -.. _cfg=model-check/checkpoint: - -Going for Stateful Verification -............................... - -By default, the system is backtracked to its initial state to explore -another path, instead of backtracking to the exact step before the fork -that we want to explore (this is called stateless verification). This -is done this way because saving intermediate states can rapidly -exhaust the available memory. If you want, you can change the value of -the ``model-check/checkpoint`` item. For example, -``--cfg=model-check/checkpoint:1`` asks to take a checkpoint every -step. Beware, this will certainly explode your memory. Larger values -are probably better, make sure to experiment a bit to find the right -setting for your specific system. - .. _cfg=model-check/reduction: Specifying the kind of reduction ................................ +**Option** model-check/reduction **Default:** "dpor" + The main issue when using the model-checking is the state space explosion. You can activate some reduction technique with ``--cfg=model-check/reduction:``. For now, this configuration variable can take 2 values: - - **none:** Do not apply any kind of reduction (mandatory for - liveness properties, as our current DPOR algorithm breaks cycles) - - **dpor:** Apply Dynamic Partial Ordering Reduction. Only valid if - you verify local safety properties (default value for safety - checks). - -Another way to mitigate the state space explosion is to search for -cycles in the exploration with the :ref:`cfg=model-check/visited` -configuration. Note that DPOR and state-equality reduction may not -play well together. You should choose between them. + - **none:** Do not apply any kind of reduction + - **dpor:** Apply Dynamic Partial Ordering Reduction. Only valid if you verify local safety properties (default value for + safety checks). + - **sdpor:** Source-set DPOR, as described in "Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction" + by Abdulla et al. + - **odpor:** Optimal DPOR, as described in "Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction" + by Abdulla et al. Our current DPOR implementation could be improved in may ways. We are currently improving its efficiency (both in term of reduction ability -and computational speed), and future work could make it compatible -with liveness properties. - -.. _cfg=model-check/visited: - -Size of Cycle Detection Set (state equality reduction) -...................................................... - -Mc SimGrid can be asked to search for cycles during the exploration, -i.e. situations where a new explored state is in fact the same state -than a previous one.. This can prove useful to mitigate the state -space explosion with safety properties, and this is the crux when -searching for counter-examples to the liveness properties. - -Note that this feature may break the current implementation of the -DPOR reduction technique. - -The ``model-check/visited`` item is the maximum number of states, which -are stored in memory. If the maximum number of snapshotted state is -reached, some states will be removed from the memory and some cycles -might be missed. Small values can lead to incorrect verifications, but -large values can exhaust your memory and be CPU intensive as each new -state must be compared to that amount of older saved states. - -The default settings depend on the kind of exploration. With safety -checking, no state is snapshotted and cycles cannot be detected. With -liveness checking, all states are snapshotted because missing a cycle -could hinder the exploration soundness. +and computational speed). -.. _cfg=model-check/termination: +.. _cfg=model-check/strategy: -Non-Termination Detection -......................... +Guiding strategy +................ -The ``model-check/termination`` configuration item can be used to -report if a non-termination execution path has been found. This is a -path with a cycle, which means that the program might never terminate. +**Option** model-check/strategy **Default:** "none" -This only works in safety mode, not in liveness mode. +Even after the DPOR's reduction, the state space that we have to explore remains huge. SimGrid provides several guiding +strategies aiming at converging faster toward bugs. By default, none of these strategy is enabled, and SimGrid does a regular +DFS exploration. -This options is disabled by default. + - **max_match_comm**: Try to minimize the number of in-fly communication by appairing matching send and receive. This tend to + produce nicer backtraces, in particular when a user-level ``send`` is broken down internally into a ``send_async`` + ``wait``. + This strategy will ensure that the ``wait`` occures as soon as possible, easing the understanding of the user who do not + expect her ``send`` to be split. + - **min_match_comm**: Try to maximize the number of in-fly communication by not appairing matching send and receive. This is + the exact opposite strategy, but it is still useful as it tend to explore first the branches where the risk of deadlock is + higher. + - **uniform**: this is a boring random strategy where choices are based on a uniform sampling of possible choices. + Surprisingly, it gives really really good results. .. _cfg=model-check/dot-output: @@ -758,8 +693,7 @@ Dot Output If set, the ``model-check/dot-output`` configuration item is the name of a file in which to write a dot file of the path leading to the -property violation discovered (safety or liveness violation), as well -as the cycle for liveness properties. This dot file can then be fed to the +property violation discovered (safety violation). This dot file can then be fed to the graphviz dot tool to generate a corresponding graphical representation. .. _cfg=model-check/max-depth: @@ -824,17 +758,14 @@ memory (see :ref:`contexts/guard-size `). Replaying buggy execution paths from the model checker ...................................................... -Debugging the problems reported by the model checker is challenging: -First, the application under verification cannot be debugged with gdb -because the model checker already traces it. Then, the model checker may -explore several execution paths before encountering the issue, making it -very difficult to understand the output. Fortunately, SimGrid provides +Debugging the problems reported by the model checker is challenging because +the model checker may explore several execution paths before encountering the issue, +the output very difficult to understand. Fortunately, SimGrid provides the execution path leading to any reported issue so that you can replay -this path reported by the model checker, enabling the usage of classical -debugging tools. +this path reported by the model checker, restoring a classical experience of debugging. When the model checker finds an interesting path in the application -execution graph (where a safety or liveness property is violated), it +execution graph (where a safety property is violated), it generates an identifier for this path. Here is an example of the output: .. code-block:: console @@ -855,10 +786,84 @@ The interesting line is ``Path = 1/3;1/4``, which means that you should use ``--cfg=model-check/replay:1/3;1/4`` to replay your application on the buggy execution path. All options (but the model checker related ones) must remain the same. In particular, if you ran your application with -``smpirun -wrapper simgrid-mc``, then do it again. Remove all +``smpirun -wrapper simgrid-mc``, then remove the ``-wrapper simgrid-mc`` part +(you may want to use valgrind or gdb as wrappers instead). Also remove all MC-related options, keep non-MC-related ones and add ``--cfg=model-check/replay:???``. +Things are very similar if you are using sthread. Simply drop ``simgrid-mc`` from your command line, as follows: + +.. code-block:: console + + $ LD_PRELOAD=../../lib/libsthread.so ./pthread-mutex-simpledeadlock --cfg=model-check/replay:'2;2;3;2;3;3' + sthread is intercepting the execution of ./pthread-mutex-simpledeadlock + [0.000000] [xbt_cfg/INFO] Configuration change: Set 'model-check/replay' to '2;2;3;2;3;3' + [0.000000] [mc_record/INFO] path=2;2;3;2;3;3 + All threads are started. + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #1 '2/0' Actor thread 1(pid:2): MUTEX_ASYNC_LOCK(mutex_id:0 owner:none) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 1): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun1 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:21 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #2 '2/0' Actor thread 1(pid:2): MUTEX_WAIT(mutex_id:0 owner:2) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 1): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:29 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun1 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:21 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #3 '3/0' Actor thread 2(pid:3): MUTEX_ASYNC_LOCK(mutex_id:1 owner:none) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 2): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun2 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:31 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #4 '2/0' Actor thread 1(pid:2): MUTEX_ASYNC_LOCK(mutex_id:1 owner:3) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 1): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun1 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:22 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #5 '3/0' Actor thread 2(pid:3): MUTEX_WAIT(mutex_id:1 owner:3) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 2): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:29 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun2 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:31 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #6 '3/0' Actor thread 2(pid:3): MUTEX_ASYNC_LOCK(mutex_id:0 owner:2) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 2): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun2 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:32 + + [0.000000] [mc_record/INFO] The replay of the trace is complete. DEADLOCK detected. + [0.000000] [ker_engine/INFO] 3 actors are still running, waiting for something. + [0.000000] [ker_engine/INFO] Legend of the following listing: "Actor (@): " + [0.000000] [ker_engine/INFO] Actor 1 (main thread@Lilibeth) simcall ActorJoin(pid:2) + [0.000000] [ker_engine/INFO] Actor 2 (thread 1@Lilibeth) simcall MUTEX_WAIT(mutex_id:1 owner:3) + [0.000000] [ker_engine/INFO] Actor 3 (thread 2@Lilibeth) simcall MUTEX_WAIT(mutex_id:0 owner:2) + [0.000000] [sthread/INFO] All threads exited. Terminating the simulation. + [0.000000] ../../src/kernel/EngineImpl.cpp:265: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups + [0.000000] ../../src/kernel/EngineImpl.cpp:265: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups + Currently, if the path is of the form ``X;Y;Z``, each number denotes the actor's pid that is selected at each indecision point. If it's of the form ``X/a;Y/b``, the X and Y are the selected pids while the a @@ -1071,7 +1076,7 @@ options, check See the :ref:`tracing_tracing_options`. Configuring SMPI ---------------- -The SMPI interface provides several specific configuration items. +The SMPI interface provides several specific configuration items. These are not easy to see with ``--help-cfg``, since SMPI binaries are usually launched through the ``smiprun`` script. .. _cfg=smpi/host-speed: @@ -1336,12 +1341,10 @@ existing MPI libraries. The ``smpi/coll-selector`` item can be used to select the decision logic either of the OpenMPI or the MPICH libraries. (By default SMPI uses naive version of collective operations.) -Each collective operation can be manually selected with a -``smpi/collective_name:algo_name``. Available algorithms are listed in -:ref:`SMPI_use_colls`. - -.. TODO:: All available collective algorithms will be made available - via the ``smpirun --help-coll`` command. +Each collective operation can be manually selected with a ``smpi/collective_name:algo_name``. For example, if you want to use +the Bruck algorithm for the Alltoall algorithm, you should use ``--cfg=smpi/alltoall:bruck`` on the command-line of smpirun. The +reference of all available algorithms are listed in :ref:`SMPI_use_colls`, and you can get the full list implemented in your +version using ``smpirun --help-coll``. .. _cfg=smpi/barrier-collectives: