Martin Quinson [Mon, 3 Sep 2018 07:20:56 +0000 (09:20 +0200)]
Somehow fix the killing of actors in Java
Things are somehow fixed, as all tests seem to pass, but the situation
is still very messy after this commit. Contents:
- Reimplement ContextJava as subclass of ContextThread to reduce duplication.
- Don't send the StopRequest exception on host failure if we are in
Java because *some* of the actors don't catch it well, resulting in
simulation failure.
- Forcefully kill the process ("exit(0)" in C) after MSG_run() because
dead actors are sometimes not completely killed, preventing the
simulation from ending.
See the comment in ActorImpl for a better understanding of this mess
and how to fix it in the future.
Martin Quinson [Sun, 2 Sep 2018 00:09:27 +0000 (02:09 +0200)]
don't catch an exception that is never thrown
xbt_os_thread_create() asserts that it succeeds, it does not throw
anything. So put the documentation in the doc instead of displaying it
when that non-existent exception is received.
Martin Quinson [Wed, 29 Aug 2018 20:04:11 +0000 (22:04 +0200)]
simplify the actor finalization a tiny bit by using a callback
This is part of the removal of all trace-related pimpl all over the
code of MSG (my goal is to kill MSG_process_cleanup_from_SIMIX() all
together).
Note that I changed from Container::by_name() to
Container::by_name_or_null. It seems that not all actors have a
container by their name, not sure why.
Martin Quinson [Wed, 29 Aug 2018 09:35:10 +0000 (11:35 +0200)]
Display a msg when contexts are killed by uncatched exceptions
and when I want to really kill an actor (eg when its host is turned
off), I launch an uncatchable kernel::Context::StopRequest instead of
a catchable simgrid::HostFailureException (which will be used in case
of remote exec and similar)
Maybe there should be a config flag to decide if we want to kill the
simulation when an actor fails. The current setting forces the user to
add try/catch (simgrid::Exception) around their main functions. That's
not a bad thing either, not sure.
Martin Quinson [Wed, 29 Aug 2018 00:10:12 +0000 (02:10 +0200)]
Let's exhaustively test the activity lifecycle
This test is not complete yet. It aims at being as exhaustive and
paranoid as possible, just like cloud-sharing even if I didn't find a
good DSL to specify the tests this time.
Augustin Degomme [Tue, 28 Aug 2018 15:39:33 +0000 (17:39 +0200)]
Switch to ompi for umpire tests.
MPICH changes brought SMP-aware algorithm, which MC does not really like.
I guess the init_smp is the culprit here, as it uses badly various collectives.
Martin Quinson [Sat, 25 Aug 2018 22:36:16 +0000 (00:36 +0200)]
Do not convert TimeoutError to xbt_ex(timeout) in case they were a wait_any
If there is an issue while dealing with a test_any or a wait_any, the
caller must be told which activity failed. I'm not sure of how to
cleanly do so. For now, we use exception.value to store the rank of
that activity in the container.
To modify the exception, C++ leaves us no way but to rethrow it and
recatch it, change its value field, and re-store it in the
issuer->exception. But then, the exception become of the catching
type. Wicked! Vicious! It means that since we were catching (xbt_ex&
e), we actually converted the simgrid::TimeoutException into a xbt_ex.
And this conversion was done in any case, even if the value was set
only if the simcall was actually a wait_any or test_any...
With this commit, we catch, extend and rethrow any TimeoutException,
and if it's not such an xbt_ex, we do the same for a xbt_ex.
A proper version could involve a WaitAnyException (with failing_rank
and cause fields), or maybe the TimeoutException could contain a
pointer to the timeouted activity so that the caller can find its rank
by itself.
Martin Quinson [Sat, 25 Aug 2018 10:14:54 +0000 (12:14 +0200)]
Merge the content of xbt::WithContextException into simgrid::Exception
simgrid::Exception was inheriting from xbt::WithContextException anyway.
Plus, move all of the thrown point context into xbt::ThrownPoint.
Earlier, it only contained __FILE__, __LINE__ and __func__. This
commit adds the backtrace, the procname and the pid.
Martin Quinson [Sat, 25 Aug 2018 09:25:29 +0000 (11:25 +0200)]
Please people, stop including internal_config.h in generic header files
Every file including src/internal_config.h (directly or indirectly)
must be rebuilt when the cmake file or config are changed. I change
these file *a lot* during my refactorings, and I get tired of
recompiling large amount of files that were not affected in any way.
This time, all of SMPI was recompiled each time. Including the **many**
collectives that we integrated a long time ago and never changed since
then. These files are not build-configured in anyway, so please don't
make me recompiling them again and again, please.
Martin Quinson [Sat, 25 Aug 2018 08:22:28 +0000 (10:22 +0200)]
Rename simgrid::exception into simgrid::Exception
Also move simgrid/exception.hpp to simgrid/Exception.hpp (our coding
standards say that files defining a class must have their name
upper-cased as the class)