doc/doxygen/uhood_switch.doc

   1 /*! @page uhood_switch Process Synchronizations and Context Switching
   2
   3 @tableofcontents
   4
   5 @section uhood_switch_DES SimGrid as an Operating System
   6
   7 SimGrid is a discrete event simulator of distributed systems: it does
   8 not simulate the world by small fixed-size steps but determines the
   9 date of the next event (such as the end of a communication, the end of
  10 a computation) and jumps to this date.
  11
  12 A number of actors executing user-provided code run on top of the
  13 simulation kernel. The interactions between these actors and the
  14 simulation kernel are very similar to the ones between the system
  15 processes and the Operating System (except that the actors and
  16 simulation kernel share the same address space in a single OS
  17 process).
  18
  19 When an actor needs to interact with the outer world (eg. to start a
  20 communication), it issues a <i>simcall</i> (simulation call), just
  21 like a system process issues a <i>syscall</i> to interact with its
  22 environment through the Operating System. Any <i>simcall</i> freezes
  23 the actor until it is woken up by the simulation kernel (eg. when the
  24 communication is finished).
  25
  26 Mimicking the OS behavior may seem over-engineered here, but this is
  27 mandatory to the model-checker. The simcalls, representing actors'
  28 actions, are the transitions of the formal system. Verifying the
  29 system requires to manipulate these transitions explicitly. This also
  30 allows to run safely the actors in parallel, even if this is less
  31 commonly used by our users.
  32
  33 So, the key ideas here are:
  34
  35  - The simulator is a discrete event simulator (event-driven).
  36
  37  - An actor can issue a blocking simcall and will be suspended until
  38    it is woken up by the simulation kernel (when the operation is
  39    completed).
  40
  41  - In order to move forward in (simulated) time, the simulation kernel
  42    needs to know which actions the actors want to do.
  43
  44  - The simulated time will only move forward when all the actors are
  45    blocked, waiting on a simcall.
  46
  47 This leads to some very important consequences:
  48
  49  - An actor cannot synchronize with another actor using OS-level primitives
  50    such as `pthread_mutex_lock()` or `std::mutex`. The simulation kernel
  51    would wait for the actor to issue a simcall and would deadlock. Instead it
  52    must use simulation-level synchronization primitives
  53    (such as `simcall_mutex_lock()`).
  54
  55  - Similarly, an actor cannot sleep using
  56    `std::this_thread::sleep_for()` which waits in the real world but
  57    must instead wait in the simulation with
  58    `simgrid::s4u::Actor::this_actor::sleep_for()` which waits in the
  59    simulation.
  60
  61  - The simulation kernel cannot block.
  62    Only the actors can block (using simulation primitives).
  63
  64 @section uhood_switch_futures Futures and Promises
  65
  66 @subsection uhood_switch_futures_what What is a future?
  67
  68 Futures are a nice classical programming abstraction, present in many
  69 language.  Wikipedia defines a
  70 [future](https://en.wikipedia.org/wiki/Futures_and_promises) as an
  71 object that acts as a proxy for a result that is initially unknown,
  72 usually because the computation of its value is yet incomplete. This
  73 concept is thus perfectly adapted to represent in the kernel the
  74 asynchronous operations corresponding to the actors' simcalls.
  75
  76
  77 Futures can be manipulated using two kind of APIs:
  78
  79  - a <b>blocking API</b> where we wait for the result to be available
  80    (`res = f.get()`);
  81
  82  - a <b>continuation-based API</b> where we say what should be done
  83    with the result when the operation completes
  84    (`future.then(something_to_do_with_the_result)`). This is heavily
  85    used in ECMAScript that exhibits the same kind of never-blocking
  86    asynchronous model as our discrete event simulator.
  87
  88 C++11 includes a generic class (`std::future<T>`) which implements a
  89 blocking API.  The continuation-based API is not available in the
  90 standard (yet) but is [already
  91 described](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0159r0.html#futures.unique_future.6)
  92 in the Concurrency Technical Specification.
  93
  94 `Promise`s are the counterparts of `Future`s: `std::future<T>` is used
  95 <em>by the consumer</em> of the result. On the other hand,
  96 `std::promise<T>` is used <em>by the producer</em> of the result. The
  97 producer calls `promise.set_value(42)` or `promise.set_exception(e)`
  98 in order to <em>set the result</em> which will be made available to
  99 the consumer by `future.get()`.
 100
 101 @subsection uhood_switch_futures_needs Which future do we need?
 102
 103 The blocking API provided by the standard C++11 futures does not suit
 104 our needs since the simulation kernel <em>cannot</em> block, and since
 105 we want to explicitly schedule the actors.  Instead, we need to
 106 reimplement a continuation-based API to be used in our event-driven
 107 simulation kernel.
 108
 109 Our futures are based on the C++ Concurrency Technical Specification
 110 API, with a few differences:
 111
 112  - The simulation kernel is single-threaded so we do not need
 113    inter-thread synchronization for our futures.
 114
 115  - As the simulation kernel cannot block, `f.wait()` is not meaningful
 116    in this context.
 117
 118  - Similarly, `future.get()` does an implicit wait. Calling this method in the
 119    simulation kernel only makes sense if the future is already ready. If the
 120    future is not ready, this would deadlock the simulator and an error is
 121    raised instead.
 122
 123  - We always call the continuations in the simulation loop (and not
 124    inside the `future.then()` or `promise.set_value()` calls). That
 125    way, we don't have to fear problems like invariants not being
 126    restored when the callbacks are called :fearful: or stack overflows
 127    triggered by deeply nested continuations chains :cold_sweat:. The
 128    continuations are all called in a nice and predictable place in the
 129    simulator with a nice and predictable state :relieved:.
 130
 131  - Some features of the standard (such as shared futures) are not
 132    needed in our context, and thus not considered here.
 133
 134 @subsection uhood_switch_futures_implem Implementing `Future` and `Promise`
 135
 136 The `simgrid::kernel::Future` and `simgrid::kernel::Promise` use a
 137 shared state defined as follows:
 138
 139 @code{cpp}
 140 enum class FutureStatus {
 141   not_ready,
 142   ready,
 143   done,
 144 };
 145
 146 class FutureStateBase : private boost::noncopyable {
 147 public:
 148   void schedule(simgrid::xbt::Task<void()>&& job);
 149   void set_exception(std::exception_ptr exception);
 150   void set_continuation(simgrid::xbt::Task<void()>&& continuation);
 151   FutureStatus get_status() const;
 152   bool is_ready() const;
 153   // [...]
 154 private:
 155   FutureStatus status_ = FutureStatus::not_ready;
 156   std::exception_ptr exception_;
 157   simgrid::xbt::Task<void()> continuation_;
 158 };
 159
 160 template<class T>
 161 class FutureState : public FutureStateBase {
 162 public:
 163   void set_value(T value);
 164   T get();
 165 private:
 166   boost::optional<T> value_;
 167 };
 168
 169 template<class T>
 170 class FutureState<T&> : public FutureStateBase {
 171   // ...
 172 };
 173 template<>
 174 class FutureState<void> : public FutureStateBase {
 175   // ...
 176 };
 177 @endcode
 178
 179 Both `Future` and `Promise` have a reference to the shared state:
 180
 181 @code{cpp}
 182 template<class T>
 183 class Future {
 184   // [...]
 185 private:
 186   std::shared_ptr<FutureState<T>> state_;
 187 };
 188
 189 template<class T>
 190 class Promise {
 191   // [...]
 192 private:
 193   std::shared_ptr<FutureState<T>> state_;
 194   bool future_get_ = false;
 195 };
 196 @endcode
 197
 198 The crux of `future.then()` is:
 199
 200 @code{cpp}
 201 template<class T>
 202 template<class F>
 203 auto simgrid::kernel::Future<T>::thenNoUnwrap(F continuation)
 204 -> Future<decltype(continuation(std::move(*this)))>
 205 {
 206   typedef decltype(continuation(std::move(*this))) R;
 207
 208   if (state_ == nullptr)
 209     throw std::future_error(std::future_errc::no_state);
 210
 211   auto state = std::move(state_);
 212   // Create a new future...
 213   Promise<R> promise;
 214   Future<R> future = promise.get_future();
 215   // ...and when the current future is ready...
 216   state->set_continuation(simgrid::xbt::makeTask(
 217     [](Promise<R> promise, std::shared_ptr<FutureState<T>> state,
 218          F continuation) {
 219       // ...set the new future value by running the continuation.
 220       Future<T> future(std::move(state));
 221       simgrid::xbt::fulfillPromise(promise,[&]{
 222         return continuation(std::move(future));
 223       });
 224     },
 225     std::move(promise), state, std::move(continuation)));
 226   return std::move(future);
 227 }
 228 @endcode
 229
 230 We added a (much simpler) `future.then_()` method which does not
 231 create a new future:
 232
 233 @code{cpp}
 234 template<class T>
 235 template<class F>
 236 void simgrid::kernel::Future<T>::then_(F continuation)
 237 {
 238   if (state_ == nullptr)
 239     throw std::future_error(std::future_errc::no_state);
 240   // Give shared-ownership to the continuation:
 241   auto state = std::move(state_);
 242   state->set_continuation(simgrid::xbt::makeTask(
 243     std::move(continuation), state));
 244 }
 245 @endcode
 246
 247 The `.get()` delegates to the shared state. As we mentioned previously, an
 248 error is raised if the future is not ready:
 249
 250 @code{cpp}
 251 template<class T>
 252 T simgrid::kernel::Future::get()
 253 {
 254   if (state_ == nullptr)
 255     throw std::future_error(std::future_errc::no_state);
 256   std::shared_ptr<FutureState<T>> state = std::move(state_);
 257   return state->get();
 258 }
 259
 260 template<class T>
 261 T simgrid::kernel::FutureState<T>::get()
 262 {
 263   if (status_ != FutureStatus::ready)
 264     xbt_die("Deadlock: this future is not ready");
 265   status_ = FutureStatus::done;
 266   if (exception_) {
 267     std::exception_ptr exception = std::move(exception_);
 268     std::rethrow_exception(std::move(exception));
 269   }
 270   xbt_assert(this->value_);
 271   auto result = std::move(this->value_.get());
 272   this->value_ = boost::optional<T>();
 273   return std::move(result);
 274 }
 275 @endcode
 276
 277 @section uhood_switch_simcalls Implementing the simcalls
 278
 279 So a simcall is a way for the actor to push a request to the
 280 simulation kernel and yield the control until the request is
 281 fulfilled. The performance requirements are very high because
 282 the actors usually do an inordinate amount of simcalls during the
 283 simulation.
 284
 285 As for real syscalls, the basic idea is to write the wanted call and
 286 its arguments in a memory area that is specific to the actor, and
 287 yield the control to the simulation kernel. Once in kernel mode, the
 288 simcalls of each demanding actor are evaluated sequentially in a
 289 strictly reproducible order. This makes the whole simulation
 290 reproducible.
 291
 292
 293 @subsection uhood_switch_simcalls_v2 The historical way
 294
 295 In the very first implementation, everything was written by hand and
 296 highly optimized, making our software very hard to maintain and
 297 evolve. We decided to sacrifice some performance for
 298 maintainability. In a second try (that is still in use in SimGrid
 299 v3.13), we had a lot of boiler code generated from a python script,
 300 taking the [list of simcalls](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/simcalls.in)
 301 as input. It looks like this:
 302
 303 @code{cpp}
 304 # This looks like C++ but it is a basic IDL-like language
 305 # (one definition per line) parsed by a python script:
 306
 307 void process_kill(smx_process_t process);
 308 void process_killall(int reset_pid);
 309 void process_cleanup(smx_process_t process) [[nohandler]];
 310 void process_suspend(smx_process_t process) [[block]];
 311 void process_resume(smx_process_t process);
 312 void process_set_host(smx_process_t process, sg_host_t dest);
 313 int  process_is_suspended(smx_process_t process) [[nohandler]];
 314 int  process_join(smx_process_t process, double timeout) [[block]];
 315 int  process_sleep(double duration) [[block]];
 316
 317 smx_mutex_t mutex_init();
 318 void        mutex_lock(smx_mutex_t mutex) [[block]];
 319 int         mutex_trylock(smx_mutex_t mutex);
 320 void        mutex_unlock(smx_mutex_t mutex);
 321
 322 [...]
 323 @endcode
 324
 325 At runtime, a simcall is represented by a structure containing a simcall
 326 number and its arguments (among some other things):
 327
 328 @code{cpp}
 329 struct s_smx_simcall {
 330   // Simcall number:
 331   e_smx_simcall_t call;
 332   // Issuing actor:
 333   smx_process_t issuer;
 334   // Arguments of the simcall:
 335   union u_smx_scalar args[11];
 336   // Result of the simcall:
 337   union u_smx_scalar result;
 338   // Some additional stuff:
 339   smx_timer_t timer;
 340   int mc_value;
 341 };
 342 @endcode
 343
 344 with the a scalar union type:
 345
 346 @code{cpp}
 347 union u_smx_scalar {
 348   char            c;
 349   short           s;
 350   int             i;
 351   long            l;
 352   long long       ll;
 353   unsigned char   uc;
 354   unsigned short  us;
 355   unsigned int    ui;
 356   unsigned long   ul;
 357   unsigned long long ull;
 358   double          d;
 359   void*           dp;
 360   FPtr            fp;
 361 };
 362 @endcode
 363
 364 When manually calling the relevant [Python
 365 script](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/simcalls.py),
 366 this generates a bunch of C++ files:
 367
 368 * an enum of all the [simcall numbers](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_enum.h#L19);
 369
 370 * [user-side wrappers](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_bodies.cpp)
 371   responsible for wrapping the parameters in the `struct s_smx_simcall`;
 372   and wrapping out the result;
 373
 374 * [accessors](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_accessors.h)
 375    to get/set values of of `struct s_smx_simcall`;
 376
 377 * a simulation-kernel-side [big switch](https://github.com/simgrid/simgrid/blob/4ae2fd01d8cc55bf83654e29f294335e3cb1f022/src/simix/popping_generated.cpp#L106)
 378   handling all the simcall numbers.
 379
 380 Then one has to write the code of the kernel side handler for the simcall
 381 and the code of the simcall itself (which calls the code-generated
 382 marshaling/unmarshaling stuff).
 383
 384 In order to simplify this process, we added two generic simcalls which can be
 385 used to execute a function in the simulation kernel:
 386
 387 @code{cpp}
 388 # This one should really be called run_immediate:
 389 void run_kernel(std::function<void()> const* code) [[nohandler]];
 390 void run_blocking(std::function<void()> const* code) [[block,nohandler]];
 391 @endcode
 392
 393 ### Immediate simcall
 394
 395 The first one (`simcall_run_kernel()`) executes a function in the simulation
 396 kernel context and returns immediately (without blocking the actor):
 397
 398 @code{cpp}
 399 void simcall_run_kernel(std::function<void()> const& code)
 400 {
 401   simcall_BODY_run_kernel(&code);
 402 }
 403
 404 template<class F> inline
 405 void simcall_run_kernel(F& f)
 406 {
 407   simcall_run_kernel(std::function<void()>(std::ref(f)));
 408 }
 409 @endcode
 410
 411 On top of this, we add a wrapper which can be used to return a value of any
 412 type and properly handles exceptions:
 413
 414 @code{cpp}
 415 template<class F>
 416 typename std::result_of<F()>::type kernelImmediate(F&& code)
 417 {
 418   // If we are in the simulation kernel, we take the fast path and
 419   // execute the code directly without simcall
 420   // marshalling/unmarshalling/dispatch:
 421   if (SIMIX_is_maestro())
 422     return std::forward<F>(code)();
 423
 424   // If we are in the application, pass the code to the simulation
 425   // kernel which executes it for us and reports the result:
 426   typedef typename std::result_of<F()>::type R;
 427   simgrid::xbt::Result<R> result;
 428   simcall_run_kernel([&]{
 429     xbt_assert(SIMIX_is_maestro(), "Not in maestro");
 430     simgrid::xbt::fulfillPromise(result, std::forward<F>(code));
 431   });
 432   return result.get();
 433 }
 434 @endcode
 435
 436 where [`Result<R>`](#result) can store either a `R` or an exception.
 437
 438 Example of usage:
 439
 440 @code{cpp}
 441 xbt_dict_t Host::properties() {
 442   return simgrid::simix::kernelImmediate([&] {
 443     simgrid::surf::HostImpl* surf_host =
 444       this->extension<simgrid::surf::HostImpl>();
 445     return surf_host->getProperties();
 446   });
 447 }
 448 @endcode
 449
 450 ### Blocking simcall {#uhood_switch_v2_blocking}
 451
 452 The second generic simcall (`simcall_run_blocking()`) executes a function in
 453 the SimGrid simulation kernel immediately but does not wake up the calling actor
 454 immediately:
 455
 456 @code{cpp}
 457 void simcall_run_blocking(std::function<void()> const& code);
 458
 459 template<class F>
 460 void simcall_run_blocking(F& f)
 461 {
 462   simcall_run_blocking(std::function<void()>(std::ref(f)));
 463 }
 464 @endcode
 465
 466 The `f` function is expected to setup some callbacks in the simulation
 467 kernel which will wake up the actor (with
 468 `simgrid::simix::unblock(actor)`) when the operation is completed.
 469
 470 This is wrapped in a higher-level primitive as well. The
 471 `kernelSync()` function expects a function-object which is executed
 472 immediately in the simulation kernel and returns a `Future<T>`.  The
 473 simulator blocks the actor and resumes it when the `Future<T>` becomes
 474 ready with its result:
 475
 476 @code{cpp}
 477 template<class F>
 478 auto kernelSync(F code) -> decltype(code().get())
 479 {
 480   typedef decltype(code().get()) T;
 481   if (SIMIX_is_maestro())
 482     xbt_die("Can't execute blocking call in kernel mode");
 483
 484   smx_process_t self = SIMIX_process_self();
 485   simgrid::xbt::Result<T> result;
 486
 487   simcall_run_blocking([&result, self, &code]{
 488     try {
 489       auto future = code();
 490       future.then_([&result, self](simgrid::kernel::Future<T> value) {
 491         // Propagate the result from the future
 492         // to the simgrid::xbt::Result:
 493         simgrid::xbt::setPromise(result, value);
 494         simgrid::simix::unblock(self);
 495       });
 496     }
 497     catch (...) {
 498       // The code failed immediately. We can wake up the actor
 499       // immediately with the exception:
 500       result.set_exception(std::current_exception());
 501       simgrid::simix::unblock(self);
 502     }
 503   });
 504
 505   // Get the result of the operation (which might be an exception):
 506   return result.get();
 507 }
 508 @endcode
 509
 510 A contrived example of this would be:
 511
 512 @code{cpp}
 513 int res = simgrid::simix::kernelSync([&] {
 514   return kernel_wait_until(30).then(
 515     [](simgrid::kernel::Future<void> future) {
 516       return 42;
 517     }
 518   );
 519 });
 520 @endcode
 521
 522 ### Asynchronous operations {#uhood_switch_v2_async}
 523
 524 We can write the related `kernelAsync()` which wakes up the actor immediately
 525 and returns a future to the actor. As this future is used in the actor context,
 526 it is a different future
 527 (`simgrid::simix::Future` instead of `simgrid::kernel::Future`)
 528 which implements a C++11 `std::future` wait-based API:
 529
 530 @code{cpp}
 531 template <class T>
 532 class Future {
 533 public:
 534   Future() {}
 535   Future(simgrid::kernel::Future<T> future) : future_(std::move(future)) {}
 536   bool valid() const { return future_.valid(); }
 537   T get();
 538   bool is_ready() const;
 539   void wait();
 540 private:
 541   // We wrap an event-based kernel future:
 542   simgrid::kernel::Future<T> future_;
 543 };
 544 @endcode
 545
 546 The `future.get()` method is implemented as[^getcompared]:
 547
 548 @code{cpp}
 549 template<class T>
 550 T simgrid::simix::Future<T>::get()
 551 {
 552   if (!valid())
 553     throw std::future_error(std::future_errc::no_state);
 554   smx_process_t self = SIMIX_process_self();
 555   simgrid::xbt::Result<T> result;
 556   simcall_run_blocking([this, &result, self]{
 557     try {
 558       // When the kernel future is ready...
 559       this->future_.then_(
 560         [this, &result, self](simgrid::kernel::Future<T> value) {
 561           // ... wake up the process with the result of the kernel future.
 562           simgrid::xbt::setPromise(result, value);
 563           simgrid::simix::unblock(self);
 564       });
 565     }
 566     catch (...) {
 567       result.set_exception(std::current_exception());
 568       simgrid::simix::unblock(self);
 569     }
 570   });
 571   return result.get();
 572 }
 573 @endcode
 574
 575 `kernelAsync()` simply :wink: calls `kernelImmediate()` and wraps the
 576 `simgrid::kernel::Future` into a `simgrid::simix::Future`:
 577
 578 @code{cpp}
 579 template<class F>
 580 auto kernelAsync(F code)
 581   -> Future<decltype(code().get())>
 582 {
 583   typedef decltype(code().get()) T;
 584
 585   // Execute the code in the simulation kernel and get the kernel future:
 586   simgrid::kernel::Future<T> future =
 587     simgrid::simix::kernelImmediate(std::move(code));
 588
 589   // Wrap the kernel future in a user future:
 590   return simgrid::simix::Future<T>(std::move(future));
 591 }
 592 @endcode
 593
 594 A contrived example of this would be:
 595
 596 @code{cpp}
 597 simgrid::simix::Future<int> future = simgrid::simix::kernelSync([&] {
 598   return kernel_wait_until(30).then(
 599     [](simgrid::kernel::Future<void> future) {
 600       return 42;
 601     }
 602   );
 603 });
 604 do_some_stuff();
 605 int res = future.get();
 606 @endcode
 607
 608 `kernelSync()` could be rewritten as:
 609
 610 @code{cpp}
 611 template<class F>
 612 auto kernelSync(F code) -> decltype(code().get())
 613 {
 614   return kernelAsync(std::move(code)).get();
 615 }
 616 @endcode
 617
 618 The semantic is equivalent but this form would require two simcalls
 619 instead of one to do the same job (one in `kernelAsync()` and one in
 620 `.get()`).
 621
 622 ## Mutexes and condition variables
 623
 624 ## Mutexes
 625
 626 SimGrid has had a C-based API for mutexes and condition variables for
 627 some time.  These mutexes are different from the standard
 628 system-level mutex (`std::mutex`, `pthread_mutex_t`, etc.) because
 629 they work at simulation-level.  Locking on a simulation mutex does
 630 not block the thread directly but makes a simcall
 631 (`simcall_mutex_lock()`) which asks the simulation kernel to wake the calling
 632 actor when it can get ownership of the mutex. Blocking directly at the
 633 OS level would deadlock the simulation.
 634
 635 Reusing the C++ standard API for our simulation mutexes has many
 636 benefits:
 637
 638  * it makes it easier for people familiar with the `std::mutex` to
 639    understand and use SimGrid mutexes;
 640
 641  * we can benefit from a proven API;
 642
 643  * we can reuse from generic library code in SimGrid.
 644
 645 We defined a reference-counted `Mutex` class for this (which supports
 646 the [`Lockable`](http://en.cppreference.com/w/cpp/concept/Lockable)
 647 requirements, see
 648 [`[thread.req.lockable.req]`](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=1175)
 649 in the C++14 standard):
 650
 651 @code{cpp}
 652 class Mutex {
 653   friend ConditionVariable;
 654 private:
 655   friend simgrid::simix::Mutex;
 656   simgrid::simix::Mutex* mutex_;
 657   Mutex(simgrid::simix::Mutex* mutex) : mutex_(mutex) {}
 658 public:
 659
 660   friend void intrusive_ptr_add_ref(Mutex* mutex);
 661   friend void intrusive_ptr_release(Mutex* mutex);
 662   using Ptr = boost::intrusive_ptr<Mutex>;
 663
 664   // No copy:
 665   Mutex(Mutex const&) = delete;
 666   Mutex& operator=(Mutex const&) = delete;
 667
 668   static Ptr createMutex();
 669
 670 public:
 671   void lock();
 672   void unlock();
 673   bool try_lock();
 674 };
 675 @endcode
 676
 677 The methods are simply wrappers around existing simcalls:
 678
 679 @code{cpp}
 680 void Mutex::lock()
 681 {
 682   simcall_mutex_lock(mutex_);
 683 }
 684 @endcode
 685
 686 Using the same API as `std::mutex` (`Lockable`) means we can use existing
 687 C++-standard code such as `std::unique_lock<Mutex>` or
 688 `std::lock_guard<Mutex>` for exception-safe mutex handling[^lock]:
 689
 690 @code{cpp}
 691 {
 692   std::lock_guard<simgrid::s4u::Mutex> lock(*mutex);
 693   sum += 1;
 694 }
 695 @endcode
 696
 697 ### Condition Variables
 698
 699 Similarly SimGrid already had simulation-level condition variables
 700 which can be exposed using the same API as `std::condition_variable`:
 701
 702 @code{cpp}
 703 class ConditionVariable {
 704 private:
 705   friend s_smx_cond;
 706   smx_cond_t cond_;
 707   ConditionVariable(smx_cond_t cond) : cond_(cond) {}
 708 public:
 709
 710   ConditionVariable(ConditionVariable const&) = delete;
 711   ConditionVariable& operator=(ConditionVariable const&) = delete;
 712
 713   friend void intrusive_ptr_add_ref(ConditionVariable* cond);
 714   friend void intrusive_ptr_release(ConditionVariable* cond);
 715   using Ptr = boost::intrusive_ptr<ConditionVariable>;
 716   static Ptr createConditionVariable();
 717
 718   void wait(std::unique_lock<Mutex>& lock);
 719   template<class P>
 720   void wait(std::unique_lock<Mutex>& lock, P pred);
 721
 722   // Wait functions taking a plain double as time:
 723
 724   std::cv_status wait_until(std::unique_lock<Mutex>& lock,
 725     double timeout_time);
 726   std::cv_status wait_for(
 727     std::unique_lock<Mutex>& lock, double duration);
 728   template<class P>
 729   bool wait_until(std::unique_lock<Mutex>& lock,
 730     double timeout_time, P pred);
 731   template<class P>
 732   bool wait_for(std::unique_lock<Mutex>& lock,
 733     double duration, P pred);
 734
 735   // Wait functions taking a std::chrono time:
 736
 737   template<class Rep, class Period, class P>
 738   bool wait_for(std::unique_lock<Mutex>& lock,
 739     std::chrono::duration<Rep, Period> duration, P pred);
 740   template<class Rep, class Period>
 741   std::cv_status wait_for(std::unique_lock<Mutex>& lock,
 742     std::chrono::duration<Rep, Period> duration);
 743   template<class Duration>
 744   std::cv_status wait_until(std::unique_lock<Mutex>& lock,
 745     const SimulationTimePoint<Duration>& timeout_time);
 746   template<class Duration, class P>
 747   bool wait_until(std::unique_lock<Mutex>& lock,
 748     const SimulationTimePoint<Duration>& timeout_time, P pred);
 749
 750   // Notify:
 751
 752   void notify_one();
 753   void notify_all();
 754
 755 };
 756 @endcode
 757
 758 We currently accept both `double` (for simplicity and consistency with
 759 the current codebase) and `std::chrono` types (for compatibility with
 760 C++ code) as durations and timepoints. One important thing to notice here is
 761 that `cond.wait_for()` and `cond.wait_until()` work in the simulated time,
 762 not in the real time.
 763
 764 The simple `cond.wait()` and `cond.wait_for()` delegate to
 765 pre-existing simcalls:
 766
 767 @code{cpp}
 768 void ConditionVariable::wait(std::unique_lock<Mutex>& lock)
 769 {
 770   simcall_cond_wait(cond_, lock.mutex()->mutex_);
 771 }
 772
 773 std::cv_status ConditionVariable::wait_for(
 774   std::unique_lock<Mutex>& lock, double timeout)
 775 {
 776   // The simcall uses -1 for "any timeout" but we don't want this:
 777   if (timeout < 0)
 778     timeout = 0.0;
 779
 780   try {
 781     simcall_cond_wait_timeout(cond_, lock.mutex()->mutex_, timeout);
 782     return std::cv_status::no_timeout;
 783   }
 784   catch (xbt_ex& e) {
 785
 786     // If the exception was a timeout, we have to take the lock again:
 787     if (e.category == timeout_error) {
 788       try {
 789         lock.mutex()->lock();
 790         return std::cv_status::timeout;
 791       }
 792       catch (...) {
 793         std::terminate();
 794       }
 795     }
 796
 797     std::terminate();
 798   }
 799   catch (...) {
 800     std::terminate();
 801   }
 802 }
 803 @endcode
 804
 805 Other methods are simple wrappers around those two:
 806
 807 @code{cpp}
 808 template<class P>
 809 void ConditionVariable::wait(std::unique_lock<Mutex>& lock, P pred)
 810 {
 811   while (!pred())
 812     wait(lock);
 813 }
 814
 815 template<class P>
 816 bool ConditionVariable::wait_until(std::unique_lock<Mutex>& lock,
 817   double timeout_time, P pred)
 818 {
 819   while (!pred())
 820     if (this->wait_until(lock, timeout_time) == std::cv_status::timeout)
 821       return pred();
 822   return true;
 823 }
 824
 825 template<class P>
 826 bool ConditionVariable::wait_for(std::unique_lock<Mutex>& lock,
 827   double duration, P pred)
 828 {
 829   return this->wait_until(lock,
 830     SIMIX_get_clock() + duration, std::move(pred));
 831 }
 832 @endcode
 833
 834
 835 ## Conclusion
 836
 837 We wrote two future implementations based on the `std::future` API:
 838
 839 * the first one is a non-blocking event-based (`future.then(stuff)`)
 840   future used inside our (non-blocking event-based) simulation kernel;
 841
 842 * the second one is a wait-based (`future.get()`) future used in the actors
 843   which waits using a simcall.
 844
 845 These futures are used to implement `kernelSync()` and `kernelAsync()` which
 846 expose asynchronous operations in the simulation kernel to the actors.
 847
 848 In addition, we wrote variations of some other C++ standard library
 849 classes (`SimulationClock`, `Mutex`, `ConditionVariable`) which work in
 850 the simulation:
 851
 852   * using simulated time;
 853
 854   * using simcalls for synchronisation.
 855
 856 Reusing the same API as the C++ standard library is very useful because:
 857
 858   * we use a proven API with a clearly defined semantic;
 859
 860   * people already familiar with those API can use our own easily;
 861
 862   * users can rely on documentation, examples and tutorials made by other
 863     people;
 864
 865   * we can reuse generic code with our types (`std::unique_lock`,
 866    `std::lock_guard`, etc.).
 867
 868 This type of approach might be useful for other libraries which define
 869 their own contexts. An example of this is
 870 [Mordor](https://github.com/mozy/mordor), a I/O library using fibers
 871 (cooperative scheduling): it implements cooperative/fiber
 872 [mutex](https://github.com/mozy/mordor/blob/4803b6343aee531bfc3588ffc26a0d0fdf14b274/mordor/fibersynchronization.h#L70),
 873 [recursive
 874 mutex](https://github.com/mozy/mordor/blob/4803b6343aee531bfc3588ffc26a0d0fdf14b274/mordor/fibersynchronization.h#L105)
 875 which are compatible with the
 876 [`BasicLockable`](http://en.cppreference.com/w/cpp/concept/BasicLockable)
 877 requirements (see
 878 [`[thread.req.lockable.basic]`]((http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=1175))
 879 in the C++14 standard).
 880
 881 ## Appendix: useful helpers
 882
 883 ### `Result`
 884
 885 Result is like a mix of `std::future` and `std::promise` in a
 886 single-object without shared-state and synchronisation:
 887
 888 @code{cpp}
 889 template<class T>
 890 class Result {
 891   enum class ResultStatus {
 892     invalid,
 893     value,
 894     exception,
 895   };
 896 public:
 897   Result();
 898   ~Result();
 899   Result(Result const& that);
 900   Result& operator=(Result const& that);
 901   Result(Result&& that);
 902   Result& operator=(Result&& that);
 903   bool is_valid() const;
 904   void reset();
 905   void set_exception(std::exception_ptr e);
 906   void set_value(T&& value);
 907   void set_value(T const& value);
 908   T get();
 909 private:
 910   ResultStatus status_ = ResultStatus::invalid;
 911   union {
 912     T value_;
 913     std::exception_ptr exception_;
 914   };
 915 };
 916 @endcode~
 917
 918 ### Promise helpers
 919
 920 Those helper are useful for dealing with generic future-based code:
 921
 922 @code{cpp}
 923 template<class R, class F>
 924 auto fulfillPromise(R& promise, F&& code)
 925 -> decltype(promise.set_value(code()))
 926 {
 927   try {
 928     promise.set_value(std::forward<F>(code)());
 929   }
 930   catch(...) {
 931     promise.set_exception(std::current_exception());
 932   }
 933 }
 934
 935 template<class P, class F>
 936 auto fulfillPromise(P& promise, F&& code)
 937 -> decltype(promise.set_value())
 938 {
 939   try {
 940     std::forward<F>(code)();
 941     promise.set_value();
 942   }
 943   catch(...) {
 944     promise.set_exception(std::current_exception());
 945   }
 946 }
 947
 948 template<class P, class F>
 949 void setPromise(P& promise, F&& future)
 950 {
 951   fulfillPromise(promise, [&]{ return std::forward<F>(future).get(); });
 952 }
 953 @endcode
 954
 955 ### Task
 956
 957 `Task<R(F...)>` is a type-erased callable object similar to
 958 `std::function<R(F...)>` but works for move-only types. It is similar to
 959 `std::package_task<R(F...)>` but does not wrap the result in a `std::future<R>`
 960 (it is not <i>packaged</i>).
 961
 962 |               |`std::function` |`std::packaged_task`|`simgrid::xbt::Task`
 963 |---------------|----------------|--------------------|--------------------------
 964 |Copyable       | Yes            | No                 | No
 965 |Movable        | Yes            | Yes                | Yes
 966 |Call           | `const`        | non-`const`        | non-`const`
 967 |Callable       | multiple times | once               | once
 968 |Sets a promise | No             | Yes                | No
 969
 970 It could be implemented as:
 971
 972 @code{cpp}
 973 template<class T>
 974 class Task {
 975 private:
 976   std::packaged_task<T> task_;
 977 public:
 978
 979   template<class F>
 980   void Task(F f) :
 981     task_(std::forward<F>(f))
 982   {}
 983
 984   template<class... ArgTypes>
 985   auto operator()(ArgTypes... args)
 986   -> decltype(task_.get_future().get())
 987   {
 988     task_(std::forward<ArgTypes)(args)...);
 989     return task_.get_future().get();
 990   }
 991
 992 };
 993 @endcode
 994
 995 but we don't need a shared-state.
 996
 997 This is useful in order to bind move-only type arguments:
 998
 999 @code{cpp}
1000 template<class F, class... Args>
1001 class TaskImpl {
1002 private:
1003   F code_;
1004   std::tuple<Args...> args_;
1005   typedef decltype(simgrid::xbt::apply(
1006     std::move(code_), std::move(args_))) result_type;
1007 public:
1008   TaskImpl(F code, std::tuple<Args...> args) :
1009     code_(std::move(code)),
1010     args_(std::move(args))
1011   {}
1012   result_type operator()()
1013   {
1014     // simgrid::xbt::apply is C++17 std::apply:
1015     return simgrid::xbt::apply(std::move(code_), std::move(args_));
1016   }
1017 };
1018
1019 template<class F, class... Args>
1020 auto makeTask(F code, Args... args)
1021 -> Task< decltype(code(std::move(args)...))() >
1022 {
1023   TaskImpl<F, Args...> task(
1024     std::move(code), std::make_tuple(std::move(args)...));
1025   return std::move(task);
1026 }
1027 @endcode
1028
1029
1030 ## Notes
1031
1032 [^getcompared]:
1033
1034     You might want to compare this method with `simgrid::kernel::Future::get()`
1035     we showed previously: the method of the kernel future does not block and
1036     raises an error if the future is not ready; the method of the actor future
1037     blocks after having set a continuation to wake the actor when the future
1038     is ready.
1039
1040 [^lock]:
1041
1042     `std::lock()` might kinda work too but it may not be such as good idea to
1043     use it as it may use a [<q>deadlock avoidance algorithm such as
1044     try-and-back-off</q>](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#page=1199).
1045     A backoff would probably uselessly wait in real time instead of simulated
1046     time. The deadlock avoidance algorithm might as well add non-determinism
1047     in the simulation which we would like to avoid.
1048     `std::try_lock()` should be safe to use though.
1049
1050 */