X-Git-Url: http://bilbo.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/f5defacd4191925aa1bbdf1ce62abb617a747886..abf3b21ad3f4a19475431076e6e2b26f76d2e7d6:/docs/source/Configuring_SimGrid.rst diff --git a/docs/source/Configuring_SimGrid.rst b/docs/source/Configuring_SimGrid.rst index d344ae1374..d06882f609 100644 --- a/docs/source/Configuring_SimGrid.rst +++ b/docs/source/Configuring_SimGrid.rst @@ -108,18 +108,14 @@ Existing Configuration Items - **maxmin/concurrency-limit:** :ref:`cfg=maxmin/concurrency-limit` - **model-check:** :ref:`options_modelchecking` -- **model-check/checkpoint:** :ref:`cfg=model-check/checkpoint` - **model-check/communications-determinism:** :ref:`cfg=model-check/communications-determinism` - **model-check/dot-output:** :ref:`cfg=model-check/dot-output` - **model-check/max-depth:** :ref:`cfg=model-check/max-depth` -- **model-check/property:** :ref:`cfg=model-check/property` - **model-check/reduction:** :ref:`cfg=model-check/reduction` - **model-check/replay:** :ref:`cfg=model-check/replay` - **model-check/send-determinism:** :ref:`cfg=model-check/send-determinism` - **model-check/setenv:** :ref:`cfg=model-check/setenv` -- **model-check/termination:** :ref:`cfg=model-check/termination` - **model-check/timeout:** :ref:`cfg=model-check/timeout` -- **model-check/visited:** :ref:`cfg=model-check/visited` - **network/bandwidth-factor:** :ref:`cfg=network/bandwidth-factor` - **network/crosstraffic:** :ref:`cfg=network/crosstraffic` @@ -527,8 +523,6 @@ Note that with the default host model this option is activated by default. Simulating Asynchronous Send ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -(this configuration item is experimental and may change or disappear) - It is possible to specify that messages below a certain size (in bytes) will be sent as soon as the call to MPI_Send is issued, without waiting for the correspondent receive. This threshold can be configured through @@ -647,102 +641,50 @@ The ``smpi/buffering`` (only valid with MC) option gives an easier interface to - **zero:** means that buffering should be disabled. All communications are actually blocking. - **infty:** means that buffering should be made infinite. All communications are non-blocking. -.. _cfg=model-check/property: - -Specifying a liveness property -.............................. - -**Option** ``model-check/property`` **Default:** unset - -If you want to specify liveness properties, you have to pass them on -the command line, specifying the name of the file containing the -property, as formatted by the `ltl2ba `_ program. -Note that ltl2ba is not part of SimGrid and must be installed separately. - -.. code-block:: console - - $ simgrid-mc ./my_program --cfg=model-check/property: - -.. _cfg=model-check/checkpoint: - -Going for Stateful Verification -............................... - -By default, the system is backtracked to its initial state to explore -another path, instead of backtracking to the exact step before the fork -that we want to explore (this is called stateless verification). This -is done this way because saving intermediate states can rapidly -exhaust the available memory. If you want, you can change the value of -the ``model-check/checkpoint`` item. For example, -``--cfg=model-check/checkpoint:1`` asks to take a checkpoint every -step. Beware, this will certainly explode your memory. Larger values -are probably better, make sure to experiment a bit to find the right -setting for your specific system. - .. _cfg=model-check/reduction: Specifying the kind of reduction ................................ +**Option** model-check/reduction **Default:** "dpor" + The main issue when using the model-checking is the state space explosion. You can activate some reduction technique with ``--cfg=model-check/reduction:``. For now, this configuration variable can take 2 values: - - **none:** Do not apply any kind of reduction (mandatory for - liveness properties, as our current DPOR algorithm breaks cycles) - - **dpor:** Apply Dynamic Partial Ordering Reduction. Only valid if - you verify local safety properties (default value for safety - checks). - -Another way to mitigate the state space explosion is to search for -cycles in the exploration with the :ref:`cfg=model-check/visited` -configuration. Note that DPOR and state-equality reduction may not -play well together. You should choose between them. + - **none:** Do not apply any kind of reduction + - **dpor:** Apply Dynamic Partial Ordering Reduction. Only valid if you verify local safety properties (default value for + safety checks). + - **sdpor:** Source-set DPOR, as described in "Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction" + by Abdulla et al. + - **odpor:** Optimal DPOR, as described in "Source Sets: A Foundation for Optimal Dynamic Partial Order Reduction" + by Abdulla et al. Our current DPOR implementation could be improved in may ways. We are currently improving its efficiency (both in term of reduction ability -and computational speed), and future work could make it compatible -with liveness properties. +and computational speed). -.. _cfg=model-check/visited: +.. _cfg=model-check/strategy: -Size of Cycle Detection Set (state equality reduction) -...................................................... - -Mc SimGrid can be asked to search for cycles during the exploration, -i.e. situations where a new explored state is in fact the same state -than a previous one.. This can prove useful to mitigate the state -space explosion with safety properties, and this is the crux when -searching for counter-examples to the liveness properties. - -Note that this feature may break the current implementation of the -DPOR reduction technique. - -The ``model-check/visited`` item is the maximum number of states, which -are stored in memory. If the maximum number of snapshotted state is -reached, some states will be removed from the memory and some cycles -might be missed. Small values can lead to incorrect verifications, but -large values can exhaust your memory and be CPU intensive as each new -state must be compared to that amount of older saved states. - -The default settings depend on the kind of exploration. With safety -checking, no state is snapshotted and cycles cannot be detected. With -liveness checking, all states are snapshotted because missing a cycle -could hinder the exploration soundness. - -.. _cfg=model-check/termination: - -Non-Termination Detection -......................... +Guiding strategy +................ -The ``model-check/termination`` configuration item can be used to -report if a non-termination execution path has been found. This is a -path with a cycle, which means that the program might never terminate. +**Option** model-check/strategy **Default:** "none" -This only works in safety mode, not in liveness mode. +Even after the DPOR's reduction, the state space that we have to explore remains huge. SimGrid provides several guiding +strategies aiming at converging faster toward bugs. By default, none of these strategy is enabled, and SimGrid does a regular +DFS exploration. -This options is disabled by default. + - **max_match_comm**: Try to minimize the number of in-fly communication by appairing matching send and receive. This tend to + produce nicer backtraces, in particular when a user-level ``send`` is broken down internally into a ``send_async`` + ``wait``. + This strategy will ensure that the ``wait`` occures as soon as possible, easing the understanding of the user who do not + expect her ``send`` to be split. + - **min_match_comm**: Try to maximize the number of in-fly communication by not appairing matching send and receive. This is + the exact opposite strategy, but it is still useful as it tend to explore first the branches where the risk of deadlock is + higher. + - **uniform**: this is a boring random strategy where choices are based on a uniform sampling of possible choices. + Surprisingly, it gives really really good results. .. _cfg=model-check/dot-output: @@ -751,8 +693,7 @@ Dot Output If set, the ``model-check/dot-output`` configuration item is the name of a file in which to write a dot file of the path leading to the -property violation discovered (safety or liveness violation), as well -as the cycle for liveness properties. This dot file can then be fed to the +property violation discovered (safety violation). This dot file can then be fed to the graphviz dot tool to generate a corresponding graphical representation. .. _cfg=model-check/max-depth: @@ -817,17 +758,14 @@ memory (see :ref:`contexts/guard-size `). Replaying buggy execution paths from the model checker ...................................................... -Debugging the problems reported by the model checker is challenging: -First, the application under verification cannot be debugged with gdb -because the model checker already traces it. Then, the model checker may -explore several execution paths before encountering the issue, making it -very difficult to understand the output. Fortunately, SimGrid provides +Debugging the problems reported by the model checker is challenging because +the model checker may explore several execution paths before encountering the issue, +the output very difficult to understand. Fortunately, SimGrid provides the execution path leading to any reported issue so that you can replay -this path reported by the model checker, enabling the usage of classical -debugging tools. +this path reported by the model checker, restoring a classical experience of debugging. When the model checker finds an interesting path in the application -execution graph (where a safety or liveness property is violated), it +execution graph (where a safety property is violated), it generates an identifier for this path. Here is an example of the output: .. code-block:: console @@ -848,10 +786,84 @@ The interesting line is ``Path = 1/3;1/4``, which means that you should use ``--cfg=model-check/replay:1/3;1/4`` to replay your application on the buggy execution path. All options (but the model checker related ones) must remain the same. In particular, if you ran your application with -``smpirun -wrapper simgrid-mc``, then do it again. Remove all +``smpirun -wrapper simgrid-mc``, then remove the ``-wrapper simgrid-mc`` part +(you may want to use valgrind or gdb as wrappers instead). Also remove all MC-related options, keep non-MC-related ones and add ``--cfg=model-check/replay:???``. +Things are very similar if you are using sthread. Simply drop ``simgrid-mc`` from your command line, as follows: + +.. code-block:: console + + $ LD_PRELOAD=../../lib/libsthread.so ./pthread-mutex-simpledeadlock --cfg=model-check/replay:'2;2;3;2;3;3' + sthread is intercepting the execution of ./pthread-mutex-simpledeadlock + [0.000000] [xbt_cfg/INFO] Configuration change: Set 'model-check/replay' to '2;2;3;2;3;3' + [0.000000] [mc_record/INFO] path=2;2;3;2;3;3 + All threads are started. + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #1 '2/0' Actor thread 1(pid:2): MUTEX_ASYNC_LOCK(mutex_id:0 owner:none) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 1): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun1 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:21 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #2 '2/0' Actor thread 1(pid:2): MUTEX_WAIT(mutex_id:0 owner:2) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 1): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:29 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun1 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:21 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #3 '3/0' Actor thread 2(pid:3): MUTEX_ASYNC_LOCK(mutex_id:1 owner:none) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 2): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun2 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:31 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #4 '2/0' Actor thread 1(pid:2): MUTEX_ASYNC_LOCK(mutex_id:1 owner:3) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 1): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun1 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:22 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #5 '3/0' Actor thread 2(pid:3): MUTEX_WAIT(mutex_id:1 owner:3) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 2): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:29 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun2 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:31 + + [0.000000] [mc_record/INFO] *********************************************************************************** + [0.000000] [mc_record/INFO] * Path chunk #6 '3/0' Actor thread 2(pid:3): MUTEX_ASYNC_LOCK(mutex_id:0 owner:2) + [0.000000] [mc_record/INFO] *********************************************************************************** + Backtrace (displayed in actor thread 2): + -> #0 simgrid::s4u::Mutex::lock() at ../../src/s4u/s4u_Mutex.cpp:26 + -> #1 sthread_mutex_lock at ../../src/sthread/sthread_impl.cpp:188 + -> #2 pthread_mutex_lock at ../../src/sthread/sthread.c:141 + -> #3 thread_fun2 at ../../examples/sthread/pthread-mutex-simpledeadlock.c:32 + + [0.000000] [mc_record/INFO] The replay of the trace is complete. DEADLOCK detected. + [0.000000] [ker_engine/INFO] 3 actors are still running, waiting for something. + [0.000000] [ker_engine/INFO] Legend of the following listing: "Actor (@): " + [0.000000] [ker_engine/INFO] Actor 1 (main thread@Lilibeth) simcall ActorJoin(pid:2) + [0.000000] [ker_engine/INFO] Actor 2 (thread 1@Lilibeth) simcall MUTEX_WAIT(mutex_id:1 owner:3) + [0.000000] [ker_engine/INFO] Actor 3 (thread 2@Lilibeth) simcall MUTEX_WAIT(mutex_id:0 owner:2) + [0.000000] [sthread/INFO] All threads exited. Terminating the simulation. + [0.000000] ../../src/kernel/EngineImpl.cpp:265: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups + [0.000000] ../../src/kernel/EngineImpl.cpp:265: [ker_engine/WARNING] Process called exit when leaving - Skipping cleanups + Currently, if the path is of the form ``X;Y;Z``, each number denotes the actor's pid that is selected at each indecision point. If it's of the form ``X/a;Y/b``, the X and Y are the selected pids while the a