From: Martin Quinson Date: Fri, 4 Mar 2022 21:03:19 +0000 (+0100) Subject: Reimplement s4u::Barrier natively, and make them visible from MC X-Git-Tag: v3.31~226 X-Git-Url: http://bilbo.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/commitdiff_plain/97e476dc536bfd9fada85509d1c1f93714d46a10 Reimplement s4u::Barrier natively, and make them visible from MC It comes with a DPOR dependency computation and a s4u hijacked test. The ordering of some simcalls changed in SMPI, as there now much less of them when using barriers. This impacts in particular the order in which PID are given in the gh-139 test. Adding a link here for reference: https://github.com/simgrid/simgrid/issues/139 MSG_barrier_destroy() cannot take a const barrier anymore, but how was it possible in the first place? --- diff --git a/ChangeLog b/ChangeLog index dc7acaef3c..25be7b7112 100644 --- a/ChangeLog +++ b/ChangeLog @@ -4,7 +4,7 @@ MC: - Rework the internals, for simpler and modern code. This shall unlock many future improvements. - You can now define plugins onto SafetyChecker (a simple DFS explorer), using the declared signals. See CommunicationDeterminism for an example. - - Support mutex and semaphore in DPOR reduction + - Support mutex, semaphore and barrier in DPOR reduction - Seems to work on Arm64 architectures too. - Display a nice error message when ptrace is not usable. @@ -19,6 +19,12 @@ SMPI: S4U: - New signal: Engine::on_simulation_start_cb() + - Reimplementation of barriers natively. + Previously, they were implemented on top of s4u::Mutex and s4u::ConditionVariable. + The new version should be faster (and can be used in the model-checker). + +MSG: + - MSG_barrier_destroy now expects a non-const msg_barrier parameter. New plugin: the Chaos Monkey (killing actors at any time) - Along with the new simgrid-monkey script, it tests whether your simulation diff --git a/MANIFEST.in b/MANIFEST.in index e357c90b19..ef27bfc583 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -359,6 +359,7 @@ include examples/cpp/replay-io/s4u-replay-io.txt include examples/cpp/replay-io/s4u-replay-io_d.xml include examples/cpp/routing-get-clusters/s4u-routing-get-clusters.cpp include examples/cpp/routing-get-clusters/s4u-routing-get-clusters.tesh +include examples/cpp/synchro-barrier/s4u-mc-synchro-barrier.tesh include examples/cpp/synchro-barrier/s4u-synchro-barrier.cpp include examples/cpp/synchro-barrier/s4u-synchro-barrier.tesh include examples/cpp/synchro-condition-variable-waituntil/s4u-synchro-condition-variable-waituntil.cpp @@ -2242,6 +2243,8 @@ include src/kernel/EngineImpl.cpp include src/kernel/EngineImpl.hpp include src/kernel/activity/ActivityImpl.cpp include src/kernel/activity/ActivityImpl.hpp +include src/kernel/activity/BarrierImpl.cpp +include src/kernel/activity/BarrierImpl.hpp include src/kernel/activity/CommImpl.cpp include src/kernel/activity/CommImpl.hpp include src/kernel/activity/ConditionVariableImpl.cpp diff --git a/docs/source/Release_Notes.rst b/docs/source/Release_Notes.rst index 8867503ef6..a78ce9e341 100644 --- a/docs/source/Release_Notes.rst +++ b/docs/source/Release_Notes.rst @@ -491,9 +491,10 @@ This release introduces a new design, where the simcalls are given object-orient This information is used on the checker side to build Transition objects, representing the application simcalls. This explanation may not be crystal clear, but the checker code is now much easier to work with as the formal logic is not spoiled with system-level tricks to retrieve the needed information. -This cleaned design allowed us to finally implement the support for mutexes and semaphores in the model-checker. +This cleaned design allowed us to finally implement the support for mutexes, semaphores and barriers in the model-checker. In particular, this should +enable the verification of RMA primitives with Mc SimGrid, as their implementation in SMPI is based on mutexes and barriers. -Future work on the model checker include: support for condition variables and barriers (that are still not handled), implementation of another exploration algorithm based on UDPOR +Future work on the model checker include: support for condition variables (that are still not handled), implementation of another exploration algorithm based on UDPOR (`The Anh Pham's thesis `_ was defended in 2019), and robustness improvement using the `MPI Bug Initiative `_ tests. Many things that were long dreamed of now become technically possible in this code base. diff --git a/examples/cpp/CMakeLists.txt b/examples/cpp/CMakeLists.txt index 6d15aa7694..89d9024236 100644 --- a/examples/cpp/CMakeLists.txt +++ b/examples/cpp/CMakeLists.txt @@ -80,7 +80,7 @@ else() endif() # Hijack some regular tests to run them on top of the MC -foreach (example synchro-mutex synchro-semaphore) +foreach (example synchro-barrier synchro-mutex synchro-semaphore) set(tesh_files ${tesh_files} ${CMAKE_CURRENT_SOURCE_DIR}/${example}/s4u-mc-${example}.tesh) if (SIMGRID_HAVE_MC) diff --git a/examples/cpp/synchro-barrier/s4u-mc-synchro-barrier.tesh b/examples/cpp/synchro-barrier/s4u-mc-synchro-barrier.tesh new file mode 100644 index 0000000000..e7cb4a33c2 --- /dev/null +++ b/examples/cpp/synchro-barrier/s4u-mc-synchro-barrier.tesh @@ -0,0 +1,63 @@ +#!/usr/bin/env tesh + +$ ${bindir:=.}/../../../bin/simgrid-mc --log=mc_safety.thres:verbose --log=root.fmt="[Checker]%e%m%n" -- ${bindir:=.}/s4u-synchro-barrier 1 --log=s4u_test.thres:critical --log=root.fmt="[App%e%e%e%e]%e%m%n" +> [Checker] Start a DFS exploration. Reduction is: dpor. +> [Checker] Execute 1: BARRIER_LOCK(barrier: 0) (stack depth: 1, state: 1, 0 interleaves) +> [Checker] Execute 1: BARRIER_WAIT(barrier: 0) (stack depth: 2, state: 2, 0 interleaves) +> [Checker] Backtracking from 1;1;0 +> [Checker] DFS exploration ended. 3 unique states visited; 1 backtracks (3 transition replays, 0 states visited overall) + +$ ${bindir:=.}/../../../bin/simgrid-mc --log=mc_safety.thres:verbose --log=root.fmt="[Checker]%e%m%n" -- ${bindir:=.}/s4u-synchro-barrier 2 --log=s4u_test.thres:critical --log=root.fmt="[App%e%e%e%e]%e%m%n" +> [Checker] Start a DFS exploration. Reduction is: dpor. +> [Checker] Execute 1: BARRIER_LOCK(barrier: 0) (stack depth: 1, state: 1, 0 interleaves) +> [Checker] Execute 2: BARRIER_LOCK(barrier: 0) (stack depth: 2, state: 2, 0 interleaves) +> [Checker] Execute 1: BARRIER_WAIT(barrier: 0) (stack depth: 3, state: 3, 0 interleaves) +> [Checker] Execute 2: BARRIER_WAIT(barrier: 0) (stack depth: 4, state: 4, 0 interleaves) +> [Checker] Backtracking from 1;2;1;2;0 +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_WAIT(barrier: 0) (state=3) +> [Checker] BARRIER_WAIT(barrier: 0) (state=4) +> [Checker] Dependent Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=2) +> [Checker] BARRIER_WAIT(barrier: 0) (state=3) +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=1) +> [Checker] BARRIER_LOCK(barrier: 0) (state=2) +> [Checker] Backtracking from 1;2 +> [Checker] DFS exploration ended. 5 unique states visited; 2 backtracks (7 transition replays, 1 states visited overall) + +$ ${bindir:=.}/../../../bin/simgrid-mc --log=mc_safety.thres:verbose --log=root.fmt="[Checker]%e%m%n" -- ${bindir:=.}/s4u-synchro-barrier 3 --log=s4u_test.thres:critical --log=root.fmt="[App%e%e%e%e]%e%m%n" +> [Checker] Start a DFS exploration. Reduction is: dpor. +> [Checker] Execute 1: BARRIER_LOCK(barrier: 0) (stack depth: 1, state: 1, 0 interleaves) +> [Checker] Execute 2: BARRIER_LOCK(barrier: 0) (stack depth: 2, state: 2, 0 interleaves) +> [Checker] Execute 3: BARRIER_LOCK(barrier: 0) (stack depth: 3, state: 3, 0 interleaves) +> [Checker] Execute 1: BARRIER_WAIT(barrier: 0) (stack depth: 4, state: 4, 0 interleaves) +> [Checker] Execute 2: BARRIER_WAIT(barrier: 0) (stack depth: 5, state: 5, 0 interleaves) +> [Checker] Execute 3: BARRIER_WAIT(barrier: 0) (stack depth: 6, state: 6, 0 interleaves) +> [Checker] Backtracking from 1;2;3;1;2;3;0 +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_WAIT(barrier: 0) (state=5) +> [Checker] BARRIER_WAIT(barrier: 0) (state=6) +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_WAIT(barrier: 0) (state=4) +> [Checker] BARRIER_WAIT(barrier: 0) (state=6) +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_WAIT(barrier: 0) (state=4) +> [Checker] BARRIER_WAIT(barrier: 0) (state=5) +> [Checker] Dependent Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=3) +> [Checker] BARRIER_WAIT(barrier: 0) (state=5) +> [Checker] Dependent Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=3) +> [Checker] BARRIER_WAIT(barrier: 0) (state=4) +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=2) +> [Checker] BARRIER_LOCK(barrier: 0) (state=3) +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=1) +> [Checker] BARRIER_LOCK(barrier: 0) (state=3) +> [Checker] Backtracking from 1;2;3 +> [Checker] INDEPENDENT Transitions: +> [Checker] BARRIER_LOCK(barrier: 0) (state=1) +> [Checker] BARRIER_LOCK(barrier: 0) (state=2) +> [Checker] DFS exploration ended. 7 unique states visited; 2 backtracks (10 transition replays, 2 states visited overall) diff --git a/examples/cpp/synchro-barrier/s4u-synchro-barrier.cpp b/examples/cpp/synchro-barrier/s4u-synchro-barrier.cpp index 969366c523..f7bbf0e58a 100644 --- a/examples/cpp/synchro-barrier/s4u-synchro-barrier.cpp +++ b/examples/cpp/synchro-barrier/s4u-synchro-barrier.cpp @@ -41,7 +41,7 @@ int main(int argc, char **argv) int actor_count = std::stoi(argv[1]); xbt_assert(actor_count > 0, " must be greater than 0"); - e.load_platform("../../platforms/two_hosts.xml"); + e.load_platform(argc > 2 ? argv[2] : "../../platforms/two_hosts.xml"); simgrid::s4u::Actor::create("master", e.host_by_name("Tremblay"), master, actor_count); e.run(); diff --git a/examples/cpp/synchro-barrier/s4u-synchro-barrier.tesh b/examples/cpp/synchro-barrier/s4u-synchro-barrier.tesh index a1e7709163..f701c034db 100644 --- a/examples/cpp/synchro-barrier/s4u-synchro-barrier.tesh +++ b/examples/cpp/synchro-barrier/s4u-synchro-barrier.tesh @@ -9,17 +9,17 @@ $ ${bindir:=.}/s4u-synchro-barrier 2 > [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Spawning 1 workers > [Jupiter:worker:(2) 0.000000] [s4u_test/INFO] Waiting on the barrier > [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Waiting on the barrier -> [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(2) 0.000000] [s4u_test/INFO] Bye +> [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Bye $ ${bindir:=.}/s4u-synchro-barrier 3 > [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Spawning 2 workers > [Jupiter:worker:(2) 0.000000] [s4u_test/INFO] Waiting on the barrier > [Jupiter:worker:(3) 0.000000] [s4u_test/INFO] Waiting on the barrier > [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Waiting on the barrier -> [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(2) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(3) 0.000000] [s4u_test/INFO] Bye +> [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Bye $ ${bindir:=.}/s4u-synchro-barrier 10 > [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Spawning 9 workers @@ -33,7 +33,6 @@ $ ${bindir:=.}/s4u-synchro-barrier 10 > [Jupiter:worker:(9) 0.000000] [s4u_test/INFO] Waiting on the barrier > [Jupiter:worker:(10) 0.000000] [s4u_test/INFO] Waiting on the barrier > [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Waiting on the barrier -> [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(2) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(3) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(4) 0.000000] [s4u_test/INFO] Bye @@ -43,3 +42,4 @@ $ ${bindir:=.}/s4u-synchro-barrier 10 > [Jupiter:worker:(8) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(9) 0.000000] [s4u_test/INFO] Bye > [Jupiter:worker:(10) 0.000000] [s4u_test/INFO] Bye +> [Tremblay:master:(1) 0.000000] [s4u_test/INFO] Bye diff --git a/include/simgrid/barrier.h b/include/simgrid/barrier.h index ff92119c12..f7b2f5739d 100644 --- a/include/simgrid/barrier.h +++ b/include/simgrid/barrier.h @@ -1,4 +1,4 @@ -/* Public interface to the Link datatype */ +/* Public interface to the Barrier datatype */ /* Copyright (c) 2018-2022. The SimGrid Team. All rights reserved. */ @@ -11,7 +11,7 @@ #include #ifdef __cplusplus -constexpr int SG_BARRIER_SERIAL_THREAD = -1; +constexpr bool SG_BARRIER_SERIAL_THREAD = true; #else #define SG_BARRIER_SERIAL_THREAD -1 #endif @@ -20,7 +20,7 @@ constexpr int SG_BARRIER_SERIAL_THREAD = -1; SG_BEGIN_DECL XBT_PUBLIC sg_bar_t sg_barrier_init(unsigned int count); -XBT_PUBLIC void sg_barrier_destroy(const_sg_bar_t bar); +XBT_PUBLIC void sg_barrier_destroy(sg_bar_t bar); XBT_PUBLIC int sg_barrier_wait(sg_bar_t bar); SG_END_DECL diff --git a/include/simgrid/forward.h b/include/simgrid/forward.h index 8f9c545de1..a49a499106 100644 --- a/include/simgrid/forward.h +++ b/include/simgrid/forward.h @@ -125,6 +125,13 @@ namespace activity { XBT_PUBLIC void intrusive_ptr_add_ref(ActivityImpl* activity); XBT_PUBLIC void intrusive_ptr_release(ActivityImpl* activity); + class BarrierImpl; + using BarrierImplPtr = boost::intrusive_ptr; + XBT_PUBLIC void intrusive_ptr_add_ref(BarrierImpl* cond); + XBT_PUBLIC void intrusive_ptr_release(BarrierImpl* cond); + class BarrierAcquisitionImpl; + using BarrierAcquisitionImplPtr = boost::intrusive_ptr; + class ConditionVariableImpl; using ConditionVariableImplPtr = boost::intrusive_ptr; XBT_PUBLIC void intrusive_ptr_add_ref(ConditionVariableImpl* cond); diff --git a/include/simgrid/msg.h b/include/simgrid/msg.h index 6267fb0e1d..cac1d4440a 100644 --- a/include/simgrid/msg.h +++ b/include/simgrid/msg.h @@ -410,7 +410,7 @@ typedef sg_bar_t msg_bar_t; /** @brief Initializes a barrier, with count elements */ XBT_PUBLIC msg_bar_t MSG_barrier_init(unsigned int count); /** @brief Destroys barrier */ -XBT_PUBLIC void MSG_barrier_destroy(const_sg_bar_t bar); +XBT_PUBLIC void MSG_barrier_destroy(sg_bar_t bar); /** @brief Performs a barrier already initialized */ XBT_PUBLIC int MSG_barrier_wait(msg_bar_t bar); diff --git a/include/simgrid/s4u/Barrier.hpp b/include/simgrid/s4u/Barrier.hpp index b17adf4f05..1e5e589f75 100644 --- a/include/simgrid/s4u/Barrier.hpp +++ b/include/simgrid/s4u/Barrier.hpp @@ -19,18 +19,12 @@ namespace simgrid { namespace s4u { class XBT_PUBLIC Barrier { -private: - MutexPtr mutex_ = Mutex::create(); - ConditionVariablePtr cond_ = ConditionVariable::create(); - unsigned int expected_actors_; - unsigned int arrived_actors_ = 0; + kernel::activity::BarrierImpl* pimpl_; + friend kernel::activity::BarrierImpl; - /* refcounting */ - std::atomic_int_fast32_t refcount_{0}; + explicit Barrier(kernel::activity::BarrierImpl* pimpl) : pimpl_(pimpl) {} public: - /** Creates a barrier for the given amount of actors */ - explicit Barrier(unsigned int expected_actors) : expected_actors_(expected_actors) {} #ifndef DOXYGEN Barrier(Barrier const&) = delete; Barrier& operator=(Barrier const&) = delete; diff --git a/src/kernel/activity/BarrierImpl.cpp b/src/kernel/activity/BarrierImpl.cpp new file mode 100644 index 0000000000..c1537f1531 --- /dev/null +++ b/src/kernel/activity/BarrierImpl.cpp @@ -0,0 +1,71 @@ +/* Copyright (c) 2007-2022. The SimGrid Team. All rights reserved. */ + +/* This program is free software; you can redistribute it and/or modify it + * under the terms of the license (GNU LGPL) which comes with this package. */ + +#include "src/kernel/activity/BarrierImpl.hpp" +#include "src/kernel/activity/Synchro.hpp" + +XBT_LOG_NEW_DEFAULT_SUBCATEGORY(ker_barrier, ker_synchro, "Barrier kernel-space implementation"); + +namespace simgrid { +namespace kernel { +namespace activity { + +/* -------- Acquisition -------- */ +bool BarrierAcquisitionImpl::test(actor::ActorImpl*) +{ + return granted_; +} +void BarrierAcquisitionImpl::wait_for(actor::ActorImpl* issuer, double timeout) +{ + xbt_assert(issuer == issuer_, "Cannot wait on acquisitions created by another actor (id %ld)", issuer_->get_pid()); + xbt_assert(timeout < 0, "Timeouts on barrier acquisitions are not implemented yet."); + + this->register_simcall(&issuer_->simcall_); // Block on that acquisition + + if (granted_) { // This was unblocked already + finish(); + } else { + // Already in the queue + } +} +void BarrierAcquisitionImpl::finish() +{ + xbt_assert(simcalls_.size() == 1, "Unexpected number of simcalls waiting: %zu", simcalls_.size()); + smx_simcall_t simcall = simcalls_.front(); + simcalls_.pop_front(); + + simcall->issuer_->waiting_synchro_ = nullptr; + simcall->issuer_->simcall_answer(); +} +/* -------- Barrier -------- */ + +unsigned BarrierImpl::next_id_ = 0; + +BarrierAcquisitionImplPtr BarrierImpl::acquire_async(actor::ActorImpl* issuer) +{ + auto res = BarrierAcquisitionImplPtr(new kernel::activity::BarrierAcquisitionImpl(issuer, this), true); + + XBT_DEBUG("%s acquires barrier #%u (%zu of %u)", issuer->get_cname(), id_, ongoing_acquisitions_.size(), + expected_actors_); + if (ongoing_acquisitions_.size() < expected_actors_ - 1) { + /* Not everybody arrived yet */ + ongoing_acquisitions_.push_back(res); + + } else { + for (auto const& acqui : ongoing_acquisitions_) { + acqui->granted_ = true; + if (acqui == acqui->get_issuer()->waiting_synchro_) + acqui->finish(); + // else, the issuer is not blocked on this acquisition so no need to release it + } + ongoing_acquisitions_.clear(); // Rearm the barier for subsequent uses + res->granted_ = true; + } + return res; +} + +} // namespace activity +} // namespace kernel +} // namespace simgrid diff --git a/src/kernel/activity/BarrierImpl.hpp b/src/kernel/activity/BarrierImpl.hpp new file mode 100644 index 0000000000..44736cd3ce --- /dev/null +++ b/src/kernel/activity/BarrierImpl.hpp @@ -0,0 +1,82 @@ +/* Copyright (c) 2012-2022. The SimGrid Team. All rights reserved. */ + +/* This program is free software; you can redistribute it and/or modify it + * under the terms of the license (GNU LGPL) which comes with this package. */ + +#ifndef SIMGRID_KERNEL_ACTIVITY_BARRIER_HPP +#define SIMGRID_KERNEL_ACTIVITY_BARRIER_HPP + +#include "simgrid/s4u/Barrier.hpp" +#include "src/kernel/activity/ActivityImpl.hpp" +#include "src/kernel/actor/ActorImpl.hpp" +#include "src/kernel/actor/SynchroObserver.hpp" + +namespace simgrid { +namespace kernel { +namespace activity { +/** Barrier Acquisition: the act / process of acquiring the barrier. + * + * This is the asynchronous activity associated to Barriers. See the doc of MutexImpl for more details on the rationnal. + */ +class XBT_PUBLIC BarrierAcquisitionImpl : public ActivityImpl_T { + actor::ActorImpl* issuer_ = nullptr; + BarrierImpl* barrier_ = nullptr; + bool granted_ = false; + + friend actor::BarrierObserver; + friend BarrierImpl; + +public: + BarrierAcquisitionImpl(actor::ActorImpl* issuer, BarrierImpl* bar) : issuer_(issuer), barrier_(bar) {} + BarrierImplPtr get_barrier() { return barrier_; } + actor::ActorImpl* get_issuer() { return issuer_; } + + bool test(actor::ActorImpl* issuer = nullptr) override; + void wait_for(actor::ActorImpl* issuer, double timeout) override; + void post() override + { /*no surf action*/ + } + void finish() override; + void set_exception(actor::ActorImpl* issuer) override + { /* nothing to do */ + } +}; + +class XBT_PUBLIC BarrierImpl { + std::atomic_int_fast32_t refcount_{1}; + s4u::Barrier piface_; + unsigned int expected_actors_; + // std::vector arrived_actors_; + std::deque ongoing_acquisitions_; + static unsigned next_id_; + unsigned id_ = next_id_++; + + friend BarrierAcquisitionImpl; + friend s4u::Barrier; + +public: + BarrierImpl(int expected_actors) : piface_(this), expected_actors_(expected_actors) {} + BarrierImpl(BarrierImpl const&) = delete; + BarrierImpl& operator=(BarrierImpl const&) = delete; + + BarrierAcquisitionImplPtr acquire_async(actor::ActorImpl* issuer); + unsigned get_id() const { return id_; } + + friend void intrusive_ptr_add_ref(BarrierImpl* barrier) + { + XBT_ATTRIB_UNUSED auto previous = barrier->refcount_.fetch_add(1); + xbt_assert(previous != 0); + } + + friend void intrusive_ptr_release(BarrierImpl* barrier) + { + if (barrier->refcount_.fetch_sub(1) == 1) + delete barrier; + } + + s4u::Barrier& get_iface() { return piface_; } +}; +} // namespace activity +} // namespace kernel +} // namespace simgrid +#endif diff --git a/src/kernel/actor/SynchroObserver.cpp b/src/kernel/actor/SynchroObserver.cpp index ef52035c1a..d35d2e6fd0 100644 --- a/src/kernel/actor/SynchroObserver.cpp +++ b/src/kernel/actor/SynchroObserver.cpp @@ -5,6 +5,7 @@ #include "src/kernel/actor/SynchroObserver.hpp" #include "simgrid/s4u/Host.hpp" +#include "src/kernel/activity/BarrierImpl.hpp" #include "src/kernel/activity/MutexImpl.hpp" #include "src/kernel/activity/SemaphoreImpl.hpp" #include "src/kernel/actor/ActorImpl.hpp" @@ -61,6 +62,27 @@ void SemaphoreAcquisitionObserver::serialize(std::stringstream& stream) const stream << (short)type_ << ' ' << acquisition_->semaphore_->get_id() << ' ' << acquisition_->granted_; } +BarrierObserver::BarrierObserver(ActorImpl* actor, mc::Transition::Type type, activity::BarrierImpl* bar) + : ResultingSimcall(actor, false), type_(type), barrier_(bar), timeout_(-1) +{ + xbt_assert(type_ == mc::Transition::Type::BARRIER_LOCK); +} +BarrierObserver::BarrierObserver(ActorImpl* actor, mc::Transition::Type type, activity::BarrierAcquisitionImpl* acqui, + double timeout) + : ResultingSimcall(actor, false), type_(type), acquisition_(acqui), timeout_(timeout) +{ + xbt_assert(type_ == mc::Transition::Type::BARRIER_WAIT); +} +void BarrierObserver::serialize(std::stringstream& stream) const +{ + stream << (short)type_ << ' ' << (barrier_ != nullptr ? barrier_->get_id() : acquisition_->barrier_->get_id()); +} +bool BarrierObserver::is_enabled() +{ + return type_ == mc::Transition::Type::BARRIER_LOCK || + (type_ == mc::Transition::Type::BARRIER_WAIT && acquisition_ != nullptr && acquisition_->granted_); +} + } // namespace actor } // namespace kernel } // namespace simgrid diff --git a/src/kernel/actor/SynchroObserver.hpp b/src/kernel/actor/SynchroObserver.hpp index 5cfa830baf..de283d1c08 100644 --- a/src/kernel/actor/SynchroObserver.hpp +++ b/src/kernel/actor/SynchroObserver.hpp @@ -60,6 +60,24 @@ public: double get_timeout() const { return timeout_; } }; +/* This observer is ued for BARRIER_LOCK and BARRIER_WAIT. WAIT is returning and needs the acquisition */ +class BarrierObserver : public ResultingSimcall { + mc::Transition::Type type_; + activity::BarrierImpl* const barrier_ = nullptr; + activity::BarrierAcquisitionImpl* const acquisition_ = nullptr; + const double timeout_; + +public: + BarrierObserver(ActorImpl* actor, mc::Transition::Type type, activity::BarrierImpl* bar); + BarrierObserver(ActorImpl* actor, mc::Transition::Type type, activity::BarrierAcquisitionImpl* acqui, + double timeout = -1.0); + + void serialize(std::stringstream& stream) const override; + bool is_enabled() override; + + double get_timeout() const { return timeout_; } +}; + } // namespace actor } // namespace kernel } // namespace simgrid diff --git a/src/mc/remote/AppSide.cpp b/src/mc/remote/AppSide.cpp index afca0e3cb8..b61d3fa9c3 100644 --- a/src/mc/remote/AppSide.cpp +++ b/src/mc/remote/AppSide.cpp @@ -127,7 +127,6 @@ void AppSide::handle_actor_enabled(const s_mc_message_actor_enabled_t* msg) cons { bool res = mc::actor_is_enabled(kernel::actor::ActorImpl::by_pid(msg->aid)); s_mc_message_int_t answer{MessageType::ACTOR_ENABLED_REPLY, res}; - XBT_DEBUG("Actor %ld %s enabled", msg->aid, (res ? "IS" : "is NOT")); xbt_assert(channel_.send(answer) == 0, "Could not send ACTOR_ENABLED_REPLY"); } diff --git a/src/mc/transition/Transition.cpp b/src/mc/transition/Transition.cpp index 70537d95d6..64cb123651 100644 --- a/src/mc/transition/Transition.cpp +++ b/src/mc/transition/Transition.cpp @@ -63,6 +63,10 @@ Transition* deserialize_transition(aid_t issuer, int times_considered, std::stri auto simcall = static_cast(type); switch (simcall) { + case Transition::Type::BARRIER_LOCK: + case Transition::Type::BARRIER_WAIT: + return new BarrierTransition(issuer, times_considered, simcall, stream); + case Transition::Type::COMM_RECV: return new CommRecvTransition(issuer, times_considered, stream); case Transition::Type::COMM_SEND: diff --git a/src/mc/transition/Transition.hpp b/src/mc/transition/Transition.hpp index 4897de6289..e06a460aa0 100644 --- a/src/mc/transition/Transition.hpp +++ b/src/mc/transition/Transition.hpp @@ -32,8 +32,9 @@ class Transition { public: /* Ordering is important here. depends() implementations only consider subsequent types in this ordering */ - XBT_DECLARE_ENUM_CLASS(Type, RANDOM, /* First because indep with anybody */ - TESTANY, WAITANY, /* high priority because they can rewrite themselves to *_WAIT */ + XBT_DECLARE_ENUM_CLASS(Type, RANDOM, /* First because indep with anybody */ + TESTANY, WAITANY, /* high priority because they can rewrite themselves to *_WAIT */ + BARRIER_LOCK, BARRIER_WAIT, /* BARRIER transitions sorted alphabetically */ COMM_RECV, COMM_SEND, COMM_TEST, COMM_WAIT, /* Alphabetical ordering of COMM_* */ MUTEX_LOCK, MUTEX_TEST, MUTEX_TRYLOCK, MUTEX_UNLOCK, MUTEX_WAIT, /* alphabetical */ SEM_LOCK, SEM_UNLOCK, SEM_WAIT, /* alphabetical ordering of SEM transitions */ diff --git a/src/mc/transition/TransitionSynchro.cpp b/src/mc/transition/TransitionSynchro.cpp index 8813a15e49..2c6b916f8c 100644 --- a/src/mc/transition/TransitionSynchro.cpp +++ b/src/mc/transition/TransitionSynchro.cpp @@ -15,6 +15,39 @@ XBT_LOG_NEW_DEFAULT_SUBCATEGORY(mc_trans_synchro, mc_transition, "Logging specif namespace simgrid { namespace mc { + +std::string BarrierTransition::to_string(bool verbose) const +{ + return xbt::string_printf("%s(barrier: %u)", Transition::to_c_str(type_), bar_); +} +BarrierTransition::BarrierTransition(aid_t issuer, int times_considered, Type type, std::stringstream& stream) + : Transition(type, issuer, times_considered) +{ + xbt_assert(stream >> bar_); +} +bool BarrierTransition::depends(const Transition* o) const +{ + if (o->type_ < type_) + return o->depends(this); + + if (auto* other = dynamic_cast(o)) { + if (bar_ != other->bar_) + return false; + + // LOCK indep LOCK: requests are not ordered in a barrier + if (type_ == Type::BARRIER_LOCK && other->type_ == Type::BARRIER_LOCK) + return false; + + // WAIT indep WAIT: requests are not ordered + if (type_ == Type::BARRIER_WAIT && other->type_ == Type::BARRIER_WAIT) + return false; + + return true; // LOCK/WAIT is dependent because lock may enable wait + } + + return false; // barriers are INDEP with non-barrier transitions +} + std::string MutexTransition::to_string(bool verbose) const { return xbt::string_printf("%s(mutex: %" PRIxPTR ", owner:%ld)", Transition::to_c_str(type_), mutex_, owner_); diff --git a/src/mc/transition/TransitionSynchro.hpp b/src/mc/transition/TransitionSynchro.hpp index 814908a998..bea07f268f 100644 --- a/src/mc/transition/TransitionSynchro.hpp +++ b/src/mc/transition/TransitionSynchro.hpp @@ -11,6 +11,15 @@ namespace simgrid { namespace mc { +class BarrierTransition : public Transition { + unsigned bar_; + +public: + std::string to_string(bool verbose) const override; + BarrierTransition(aid_t issuer, int times_considered, Type type, std::stringstream& stream); + bool depends(const Transition* other) const override; +}; + class MutexTransition : public Transition { uintptr_t mutex_; aid_t owner_; diff --git a/src/msg/msg_legacy.cpp b/src/msg/msg_legacy.cpp index 95f5915be2..54afa6f647 100644 --- a/src/msg/msg_legacy.cpp +++ b/src/msg/msg_legacy.cpp @@ -403,7 +403,7 @@ sg_bar_t MSG_barrier_init(unsigned int count) return sg_barrier_init(count); } -void MSG_barrier_destroy(const_sg_bar_t bar) +void MSG_barrier_destroy(sg_bar_t bar) { sg_barrier_destroy(bar); } diff --git a/src/s4u/s4u_Barrier.cpp b/src/s4u/s4u_Barrier.cpp index 1bb6e23225..80ce0e29b2 100644 --- a/src/s4u/s4u_Barrier.cpp +++ b/src/s4u/s4u_Barrier.cpp @@ -4,8 +4,12 @@ * under the terms of the license (GNU LGPL) which comes with this package. */ #include +#include #include -#include + +#include "src/kernel/activity/BarrierImpl.hpp" +#include "src/kernel/actor/SynchroObserver.hpp" +#include "src/mc/mc_replay.hpp" XBT_LOG_NEW_DEFAULT_SUBCATEGORY(s4u_barrier, s4u, "S4U barrier"); @@ -18,45 +22,45 @@ namespace s4u { */ BarrierPtr Barrier::create(unsigned int expected_actors) { - return BarrierPtr(new Barrier(expected_actors)); + auto* res = new kernel::activity::BarrierImpl(expected_actors); + return BarrierPtr(&res->piface_); } /** @brief Block the current actor until all expected actors reach the barrier. * * This method is meant to be somewhat consistent with the pthread_barrier_wait function. * - * @return 0 for all actors but one: exactly one actor will get SG_BARRIER_SERIAL_THREAD as a return value. + * @return false for all actors but one: exactly one actor will get true as a return value. */ int Barrier::wait() { - mutex_->lock(); - arrived_actors_++; - XBT_DEBUG("waiting %p %u/%u", this, arrived_actors_, expected_actors_); - if (arrived_actors_ == expected_actors_) { - cond_->notify_all(); - arrived_actors_ = 0; - mutex_->unlock(); - return SG_BARRIER_SERIAL_THREAD; - } + kernel::actor::ActorImpl* issuer = kernel::actor::ActorImpl::self(); - cond_->wait(mutex_); - mutex_->unlock(); - return 0; + if (MC_is_active() || MC_record_replay_is_active()) { // Split in 2 simcalls for transition persistency + kernel::actor::BarrierObserver lock_observer{issuer, mc::Transition::Type::BARRIER_LOCK, pimpl_}; + auto acquisition = + kernel::actor::simcall_answered([issuer, this] { return pimpl_->acquire_async(issuer); }, &lock_observer); + + kernel::actor::BarrierObserver wait_observer{issuer, mc::Transition::Type::BARRIER_WAIT, acquisition.get()}; + return kernel::actor::simcall_blocking([issuer, acquisition] { return acquisition->wait_for(issuer, -1); }, + &wait_observer); + + } else { // Do it in one simcall only + kernel::activity::BarrierAcquisitionImpl* acqui = nullptr; // unused here, but must be typed to pick the right ctor + kernel::actor::BarrierObserver observer{issuer, mc::Transition::Type::BARRIER_WAIT, acqui}; + return kernel::actor::simcall_blocking([issuer, this] { pimpl_->acquire_async(issuer)->wait_for(issuer, -1); }, + &observer); + } } void intrusive_ptr_add_ref(Barrier* barrier) { - xbt_assert(barrier); - barrier->refcount_.fetch_add(1, std::memory_order_relaxed); + intrusive_ptr_add_ref(barrier->pimpl_); } void intrusive_ptr_release(Barrier* barrier) { - xbt_assert(barrier); - if (barrier->refcount_.fetch_sub(1, std::memory_order_release) == 1) { - std::atomic_thread_fence(std::memory_order_acquire); - delete barrier; - } + intrusive_ptr_release(barrier->pimpl_); } } // namespace s4u } // namespace simgrid @@ -65,17 +69,23 @@ void intrusive_ptr_release(Barrier* barrier) sg_bar_t sg_barrier_init(unsigned int count) { - return new simgrid::s4u::Barrier(count); + simgrid::s4u::BarrierPtr bar = simgrid::s4u::Barrier::create(count); + intrusive_ptr_add_ref(bar.get()); + return bar.get(); } /** @brief Initializes a barrier, with count elements */ -void sg_barrier_destroy(const_sg_bar_t bar) +void sg_barrier_destroy(sg_bar_t bar) { - delete bar; + intrusive_ptr_release(bar); } -/** @brief Performs a barrier already initialized */ +/** @brief Performs a barrier already initialized. + * + * @return 0 for all actors but one: exactly one actor will get SG_BARRIER_SERIAL_THREAD as a return value. */ int sg_barrier_wait(sg_bar_t bar) { - return bar->wait(); + if (bar->wait()) + return SG_BARRIER_SERIAL_THREAD; + return 0; } diff --git a/src/smpi/include/smpi_win.hpp b/src/smpi/include/smpi_win.hpp index f020e8648c..0789da5aec 100644 --- a/src/smpi/include/smpi_win.hpp +++ b/src/smpi/include/smpi_win.hpp @@ -28,7 +28,7 @@ class Win : public F2C, public Keyval { MPI_Comm comm_; std::vector requests_; s4u::MutexPtr mut_ = s4u::Mutex::create(); - s4u::Barrier* bar_ = nullptr; + s4u::BarrierPtr bar_; std::vector connected_wins_; std::string name_; int opened_ = 0; diff --git a/src/smpi/internals/smpi_deployment.cpp b/src/smpi/internals/smpi_deployment.cpp index ff50710791..929a7bee35 100644 --- a/src/smpi/internals/smpi_deployment.cpp +++ b/src/smpi/internals/smpi_deployment.cpp @@ -26,9 +26,9 @@ public: auto* group = new simgrid::smpi::Group(size_); comm_world_ = new simgrid::smpi::Comm(group, nullptr, false, -1); universe_size += max_no_processes; - bar_ = std::make_shared(size_); + bar_ = s4u::Barrier::create(size_); } - std::shared_ptr bar_; + s4u::BarrierPtr bar_; unsigned int size_; unsigned int finalized_ranks_ = 0; MPI_Comm comm_world_; diff --git a/src/smpi/mpi/smpi_win.cpp b/src/smpi/mpi/smpi_win.cpp index 672063de57..8b0dfdacc0 100644 --- a/src/smpi/mpi/smpi_win.cpp +++ b/src/smpi/mpi/smpi_win.cpp @@ -62,10 +62,8 @@ Win::Win(void* base, MPI_Aint size, int disp_unit, MPI_Info info, MPI_Comm comm, colls::allgather(&connected_wins_[rank_], sizeof(MPI_Win), MPI_BYTE, connected_wins_.data(), sizeof(MPI_Win), MPI_BYTE, comm); if (MC_is_active() || MC_record_replay_is_active()){ - if(rank_==0){ - bar_ = new s4u::Barrier(comm->size()); - } - colls::bcast(&bar_, sizeof(s4u::Barrier*), MPI_BYTE, 0, comm); + if (bar_.get() == 0) // First to arrive on the barrier + bar_ = s4u::Barrier::create(comm->size()); bar_->wait(); }else{ colls::barrier(comm); @@ -91,9 +89,6 @@ Win::~Win(){ colls::barrier(comm_); Comm::unref(comm_); - if (rank_ == 0) - delete bar_; - if (allocated_) xbt_free(base_); diff --git a/teshsuite/smpi/gh-139/gh-139.tesh b/teshsuite/smpi/gh-139/gh-139.tesh index 14ba8f1ee8..a1fcc1ceed 100644 --- a/teshsuite/smpi/gh-139/gh-139.tesh +++ b/teshsuite/smpi/gh-139/gh-139.tesh @@ -2,15 +2,15 @@ $ ${bindir:=.}/../../../smpi_script/bin/smpirun -np 2 -platform ${platfdir:=.}/small_platform.xml -hostfile ../hostfile ${bindir:=.}/gh-139 --cfg=smpi/simulate-computation:no --log=smpi_config.thres:warning --log=xbt_cfg.thres:warning > [Tremblay:0:(1) 0.000000] [smpi_test/INFO] I'm 0/2 > [Jupiter:1:(2) 0.000000] [smpi_test/INFO] I'm 1/2 -> [Tremblay:wait send:(4) 0.000000] [smpi_test/INFO] new thread has parameter rank 0 and global variable rank 0 -> [Tremblay:wait send:(4) 0.000000] [smpi_test/INFO] 0 has MPI rank 0 and global variable rank 0 -> [Tremblay:wait send:(4) 0.000000] [smpi_test/INFO] 0 waiting request -> [Tremblay:wait send:(4) 0.000000] [smpi_test/INFO] 0 request done, return MPI_SUCCESS -> [Tremblay:wait send:(4) 0.000000] [smpi_test/INFO] 0 still has MPI rank 0 and global variable 0 -> [Jupiter:wait recv:(3) 0.000000] [smpi_test/INFO] new thread has parameter rank 1 and global variable rank 1 -> [Jupiter:wait recv:(3) 0.000000] [smpi_test/INFO] 1 has MPI rank 1 and global variable rank 1 -> [Jupiter:wait recv:(3) 0.000000] [smpi_test/INFO] 1 waiting request -> [Jupiter:wait recv:(3) 0.002948] [smpi_test/INFO] 1 request done, return MPI_SUCCESS -> [Jupiter:wait recv:(3) 0.002948] [smpi_test/INFO] 1 still has MPI rank 1 and global variable 1 +> [Tremblay:wait send:(3) 0.000000] [smpi_test/INFO] new thread has parameter rank 0 and global variable rank 0 +> [Tremblay:wait send:(3) 0.000000] [smpi_test/INFO] 0 has MPI rank 0 and global variable rank 0 +> [Tremblay:wait send:(3) 0.000000] [smpi_test/INFO] 0 waiting request +> [Tremblay:wait send:(3) 0.000000] [smpi_test/INFO] 0 request done, return MPI_SUCCESS +> [Tremblay:wait send:(3) 0.000000] [smpi_test/INFO] 0 still has MPI rank 0 and global variable 0 +> [Jupiter:wait recv:(4) 0.000000] [smpi_test/INFO] new thread has parameter rank 1 and global variable rank 1 +> [Jupiter:wait recv:(4) 0.000000] [smpi_test/INFO] 1 has MPI rank 1 and global variable rank 1 +> [Jupiter:wait recv:(4) 0.000000] [smpi_test/INFO] 1 waiting request +> [Jupiter:wait recv:(4) 0.002948] [smpi_test/INFO] 1 request done, return MPI_SUCCESS +> [Jupiter:wait recv:(4) 0.002948] [smpi_test/INFO] 1 still has MPI rank 1 and global variable 1 > [Tremblay:0:(1) 1.000000] [smpi_test/INFO] finally 42 > [Jupiter:1:(2) 2.000000] [smpi_test/INFO] finally 42 diff --git a/tools/cmake/DefinePackages.cmake b/tools/cmake/DefinePackages.cmake index 2fd0da30f5..042419795f 100644 --- a/tools/cmake/DefinePackages.cmake +++ b/tools/cmake/DefinePackages.cmake @@ -386,6 +386,8 @@ set(SIMIX_SRC src/simix/popping.cpp src/kernel/activity/ActivityImpl.cpp src/kernel/activity/ActivityImpl.hpp + src/kernel/activity/BarrierImpl.cpp + src/kernel/activity/BarrierImpl.hpp src/kernel/activity/ConditionVariableImpl.cpp src/kernel/activity/ConditionVariableImpl.hpp src/kernel/activity/CommImpl.cpp