AND Public Git Repository - simgrid.git/log

]> AND Public Git Repository - simgrid.git/log

Algorithmique Numérique Distribuée

Public GIT Repository

projects / simgrid.git / log

commit | commitdiff | tree

Loic Guegan [Tue, 27 Aug 2019 08:08:47 +0000 (10:08 +0200)]

WIFI: Now implemented

commit | commitdiff | tree

Martin Quinson [Sun, 25 Aug 2019 20:44:39 +0000 (22:44 +0200)]

fix pip packaging

commit | commitdiff | tree

Martin Quinson [Sun, 25 Aug 2019 19:41:40 +0000 (21:41 +0200)]

tesh kill timeouted processes with KILL also

It seems that the sanitizers processes are surviving their timeouts on
the server, so this may help fixing the build robots.

On the way, I switched to save the PID of the forked process instead
of its PGID. This helps when I try to see if the process is still
alive, later on: I can always do PID->PGID later (as I do now) while
it seems impossible to do PGID->PID to see whether the process is
still alive.

Fun fact: I think that PID=PGID on POSIX systems, but let's play safe :)

commit | commitdiff | tree

Martin Quinson [Mon, 19 Aug 2019 14:50:52 +0000 (16:50 +0200)]

DPOR: improve debug messages

commit | commitdiff | tree

Augustin Degomme [Thu, 22 Aug 2019 23:44:10 +0000 (01:44 +0200)]

add test proposed in #39
For now in teshsuite, might be moved to examples if someone asks

commit | commitdiff | tree

Augustin Degomme [Thu, 22 Aug 2019 13:40:40 +0000 (15:40 +0200)]

should work better with the hostfile in the dist.

commit | commitdiff | tree

Augustin Degomme [Thu, 22 Aug 2019 12:49:48 +0000 (14:49 +0200)]

One rma test actually needs exactly 2 processes per node.
Let's indulge it and add a hostfile for mpich tests, with various ppn setups.

commit | commitdiff | tree

Augustin Degomme [Thu, 22 Aug 2019 08:46:20 +0000 (10:46 +0200)]

typos

commit | commitdiff | tree

Augustin Degomme [Mon, 19 Aug 2019 09:49:56 +0000 (11:49 +0200)]

attach errhandlers to some forgotten calls

commit | commitdiff | tree

Augustin Degomme [Mon, 19 Aug 2019 09:49:40 +0000 (11:49 +0200)]

Errors occurring during calls to routines that create MPI windows (e.g., MPI_WIN_CREATE (...,comm,...)) cause the error handler currently associated with comm to be invoked.

commit | commitdiff | tree

Augustin Degomme [Tue, 20 Aug 2019 19:26:20 +0000 (21:26 +0200)]

Merge branch 'trace_smpi_execute_flops' into 'master'

execute_flops now logs compute

See merge request simgrid/simgrid!15

commit | commitdiff | tree

Faure Adrien [Tue, 20 Aug 2019 19:21:32 +0000 (21:21 +0200)]

fix bad private function location
and fix bad reference to spmi_execute_benched

commit | commitdiff | tree

Faure Adrien [Tue, 20 Aug 2019 18:08:28 +0000 (20:08 +0200)]

execute_flops now logs compute
I created a internal function that execute flops without tracing for
specific needs

commit | commitdiff | tree

Augustin Degomme [Mon, 19 Aug 2019 09:33:25 +0000 (11:33 +0200)]

Allgatherv : don't output MPI_ERR_BUFFER if recvbuf is null if we don't receive any data.
FIXME : other collectives can have the same constraint relaxed

commit | commitdiff | tree

Augustin Degomme [Mon, 19 Aug 2019 08:50:38 +0000 (10:50 +0200)]

For File, we can change the default error handler by specifying MPI_FILE_NULL as the fh argument

commit | commitdiff | tree

Augustin Degomme [Mon, 19 Aug 2019 08:09:22 +0000 (10:09 +0200)]

Unlike errors on communicators and windows, the default behavior for files is to have MPI_ERRORS_RETURN. ( End of advice to users.)
https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node223.htm#Node223
because why not.

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 21:42:30 +0000 (23:42 +0200)]

These tests really need zero buffering

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 21:29:10 +0000 (23:29 +0200)]

switch to infty buffering for now as some tests are broken in zero-buffering mode

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:37:52 +0000 (23:37 +0200)]

update changelog

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:36:50 +0000 (23:36 +0200)]

This particular RMA test is filled with stupid calls... We send errors for most of them.
So let's put that to rest by setting MPI_ERRORS_RETURN for this one.

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:35:24 +0000 (23:35 +0200)]

fix request and coll failing tests

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:34:56 +0000 (23:34 +0200)]

topo was a bit too eager to return errors.

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:33:50 +0000 (23:33 +0200)]

Don't return an error when a key is not found in an MPI_Info.

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:32:48 +0000 (23:32 +0200)]

our coll-* tests include bad calls for coverage, they need MPI_ERRORS_RETURN

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 21:31:58 +0000 (23:31 +0200)]

set default error handler to MPI_ERRORS_ARE_FATAL, as in a proper MPI implementation

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 19:02:50 +0000 (21:02 +0200)]

disalign tags for collectives from their nonblocking counterparts, to correctly deadlock when both are entangled
as some blocking ones used internally are actually implement with nonblocking+wait, use the right blocking tag in this case.

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 18:40:32 +0000 (20:40 +0200)]

add tests for errhandlers
Most of them actually need mpi_add_error features, which are not really implemented yet

commit | commitdiff | tree

Augustin Degomme [Sun, 18 Aug 2019 18:39:15 +0000 (20:39 +0200)]

Add support for MPI Errhandlers in Comm, File, Win.
Default remains MPI_ERRORS_RETURN for now, pending test fixes

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 15:01:40 +0000 (17:01 +0200)]

Apply the default settings of 'smpi/buffering' too

Previously, we did obey to that option when given, but the default
value was ignored. This is because the handling was done in the value
verification callback, that is not used for the default value.

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 14:21:56 +0000 (16:21 +0200)]

more informative error message when checking params of MPI_Iallgather

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 14:21:41 +0000 (16:21 +0200)]

appveyor: don't build python as it was not tested anyway, and don't build anymore

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 10:34:26 +0000 (12:34 +0200)]

MPI_Scatterv: sendcounts and displs params can be NULL on non-root ranks

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 10:21:35 +0000 (12:21 +0200)]

more informative error messages on parameter checks of MPI_Iscatterv

This should be done for each and every collective, but not today...

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 08:12:39 +0000 (10:12 +0200)]

Fix tesh for the new mc-sendsend test

- Test for all factories but thread (that is borken with MC)
- Don't run this test when Java, no matter what. Not sure how I came
to such a stupid idea :)

commit | commitdiff | tree

Martin Quinson [Sun, 18 Aug 2019 00:34:08 +0000 (02:34 +0200)]

fix make dist and python dist

commit | commitdiff | tree

Martin Quinson [Sat, 4 Nov 2017 17:51:02 +0000 (18:51 +0100)]

add a test of MC detecting blocking send/send patterns

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 23:33:14 +0000 (01:33 +0200)]

New MC option: smpi/buffering, to control MPI buffering

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 23:19:27 +0000 (01:19 +0200)]

smpirun: add a -quiet argument, allowing failing tests in tesh

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 22:42:48 +0000 (00:42 +0200)]

more informative message when setting inexistant config items

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 22:10:43 +0000 (00:10 +0200)]

docs: fix borken links

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 21:07:21 +0000 (23:07 +0200)]

PMPI_Cart_create: check that each dim is positive

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 20:24:44 +0000 (22:24 +0200)]

Have SMPI fail on MPI_ERR_* in MC mode

The standard says that upon error, implementations should call the
current MPI error handler, which is MPI_ERRORS_ARE_FATAL by default
but could be changed to MPI_ERRORS_RETURN on need.

Since we don't implement MPI_Comm_set_errhandler() to switch between
modes, the simulation mode of SMPI is only issuing a warning on
errors (which is similar to MPI_ERRORS_RETURN).

This commit adds a MC_assert() stating that every MPI call succeed.
This will lead to a property failure (visible only in MC mode) when a
MPI_ERR_* is issued by the implementation (which is somehow similar to
MPI_ERRORS_ARE_FATAL).

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 15:53:57 +0000 (17:53 +0200)]

fix non-MC builds

commit | commitdiff | tree

Martin Quinson [Sat, 17 Aug 2019 14:57:04 +0000 (16:57 +0200)]

use assignment to non-trivial class rather than artificial trivialization and memset

plus some other small cleanups

commit | commitdiff | tree

Martin Quinson [Thu, 15 Aug 2019 16:28:24 +0000 (18:28 +0200)]

Restore triviality of s_smx_simcall to please GCC

Field initialization make this struct non-trivial, making GCC to panic
when we memset it. Even if we initialize some fields to nullptr and
memset it to 0.

commit | commitdiff | tree

Martin Quinson [Thu, 15 Aug 2019 14:02:22 +0000 (16:02 +0200)]

fix java build

commit | commitdiff | tree

Martin Quinson [Thu, 15 Aug 2019 13:37:24 +0000 (15:37 +0200)]

small comments improvements around a complex code

commit | commitdiff | tree

Martin Quinson [Wed, 14 Aug 2019 00:03:56 +0000 (02:03 +0200)]

now, kernel::actor::simcall_blocking can return a value

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 23:50:00 +0000 (01:50 +0200)]

Let simcall.py produce valid code after recent renamings

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 23:26:51 +0000 (01:26 +0200)]

code simplification + replace a FIXME with an assert

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 22:49:31 +0000 (00:49 +0200)]

small simplifications around simcalls

- Use ActivityImpl::register_simcall() where possible
- Uniformity in ActivityImpl::post() methods
- Rename processes into actors

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 21:42:10 +0000 (23:42 +0200)]

small logic simplification

That function is a bit long, but that's not a reason to test twice
whether we should kill the issuer because it runs on a dead host.

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 20:54:53 +0000 (22:54 +0200)]

Introduce a class mc::SimcallInspector, that allows MC to learn about the ongoing simcalls

Not quite used yet.

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 10:02:53 +0000 (12:02 +0200)]

MC: rename processes to actors

commit | commitdiff | tree

Martin Quinson [Tue, 13 Aug 2019 09:47:52 +0000 (11:47 +0200)]

obey our coding standards, and snake_case some parts of MC

commit | commitdiff | tree

Martin Quinson [Mon, 12 Aug 2019 10:21:51 +0000 (12:21 +0200)]

start to make generic simcalls observable from the MC

WIP, done in the easy parts for now.

Not finished in MC_state_get_request_for_process() yet. That's too bad
because this function is the core of the use of simcalls by the MC,
where the next simcall is picked.

Not done in the independence computing part either.

commit | commitdiff | tree

Martin Quinson [Mon, 12 Aug 2019 10:04:18 +0000 (12:04 +0200)]

transparent cleanups around simcalls mechanism

commit | commitdiff | tree

Martin Quinson [Mon, 12 Aug 2019 09:37:10 +0000 (11:37 +0200)]

move the simcall template to the kernel::actor namespace

commit | commitdiff | tree

Frederic Suter [Tue, 13 Aug 2019 08:07:45 +0000 (10:07 +0200)]

protect accesses to cnst->cnst_light_

commit | commitdiff | tree

Martin Quinson [Sat, 10 Aug 2019 22:51:16 +0000 (00:51 +0200)]

Port simcall_process_suspend to the modernity

Also simplify its logic by moving the s4u parts within the kernel

commit | commitdiff | tree

Martin Quinson [Sat, 10 Aug 2019 20:41:56 +0000 (22:41 +0200)]

getters don't have to be simcalls

commit | commitdiff | tree

Martin Quinson [Sat, 10 Aug 2019 13:41:08 +0000 (15:41 +0200)]

convert simcall_process_sleep to modernity

commit | commitdiff | tree

Martin Quinson [Sat, 10 Aug 2019 08:35:55 +0000 (10:35 +0200)]

reduce the use of simcall_process_sleep() -> this_actor::sleep_for()

commit | commitdiff | tree

Martin Quinson [Fri, 9 Aug 2019 23:12:06 +0000 (01:12 +0200)]

port a blocking simcall to the modernity

commit | commitdiff | tree

Martin Quinson [Fri, 9 Aug 2019 22:31:53 +0000 (00:31 +0200)]

small simplification

commit | commitdiff | tree

Martin Quinson [Fri, 9 Aug 2019 22:24:53 +0000 (00:24 +0200)]

introduce a simcall_blocking(), and improve the comments

commit | commitdiff | tree

Martin Quinson [Fri, 9 Aug 2019 11:38:21 +0000 (13:38 +0200)]

Fix GCC+MC builds

commit | commitdiff | tree

Martin Quinson [Fri, 9 Aug 2019 10:24:48 +0000 (12:24 +0200)]

convert SIMIX_simcall_answer() into ActorImpl::simcall_answer()

commit | commitdiff | tree

Martin Quinson [Thu, 8 Aug 2019 21:20:44 +0000 (23:20 +0200)]

change SIMIX_simcall_handle() into ActorImpl::simcall_handle()

commit | commitdiff | tree

Martin Quinson [Thu, 8 Aug 2019 20:41:36 +0000 (22:41 +0200)]

another assert to make one segfault more explicit

commit | commitdiff | tree

Martin Quinson [Thu, 8 Aug 2019 16:32:13 +0000 (18:32 +0200)]

namespacify a global function of SIMIX

commit | commitdiff | tree

Martin Quinson [Thu, 8 Aug 2019 16:08:43 +0000 (18:08 +0200)]

shy attempt at simplifying the simcall mechanism

commit | commitdiff | tree

Martin Quinson [Fri, 2 Aug 2019 18:12:24 +0000 (20:12 +0200)]

forcefully kill exiting actors even if their host is not off

Without this, this_actor::exit() deadlocks on Thread factory because
the actor does not exit soon enough to release maestro that is joining
its thread.

commit | commitdiff | tree

Martin Quinson [Fri, 2 Aug 2019 18:11:41 +0000 (20:11 +0200)]

improve debug msg

commit | commitdiff | tree

Martin Quinson [Fri, 2 Aug 2019 16:55:49 +0000 (18:55 +0200)]

run Actor::on_destruction even if the actor was killed before starting, but not on maestro

It seems that the actors are sometimes killed after the simix_global
destruction... Maybe when the user code keeps a reference on them.

commit | commitdiff | tree

Martin Quinson [Fri, 2 Aug 2019 15:50:49 +0000 (17:50 +0200)]

New signal: Actor::on_termination (when its code terminates)

commit | commitdiff | tree

Martin Quinson [Fri, 2 Aug 2019 15:49:14 +0000 (17:49 +0200)]

activity-lifecycle: make tests shorter to ensure that they are really done when we test them so

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 20:49:34 +0000 (22:49 +0200)]

Actor: make the refcount observable, and improve debug messages

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 20:47:23 +0000 (22:47 +0200)]

smpicxx: obey the VERBOSE environment variable and display what we do

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 20:44:06 +0000 (22:44 +0200)]

Actor::by_pid: also search through the dead actors

Finally, my trick works with it. MPI ranks are not garbage collected
as soon as they end because the MPI instance keeps a reference on
them. With this, by_pid() works properly, even on dead but not
collected actors.

But this only works until the trash is emptied, obviously. This seem
to be enough for the test I wanted to get running, so I will not fix
that point tonight.

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 20:41:59 +0000 (22:41 +0200)]

ActorImpl: postpone the on_destroy signal to the destructor

This is to prevent that the SMPI extension gets destroyed too early.
The problem is also that this extension is not really an extension but
manually handled, but I'll not fix that tonight.

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 20:40:14 +0000 (22:40 +0200)]

Restore the -no-privatize to the ampi example

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 15:55:38 +0000 (17:55 +0200)]

fix ampi example by not cleaning the tracing submodule before the actor end

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 12:57:31 +0000 (14:57 +0200)]

add a dependency of this test to mpi, as required on jenkins

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 07:34:53 +0000 (09:34 +0200)]

SMPI: redesign the end of actors/ranks' lifetime

The problem is that we don't use enough of refcounting in SMPI, so we
should not let any rank finish before the others, because it may be
involved in a communication or something.

Previously, there were a barrier at the end of the user code, so that
every ranks finishes exactly at the same time.

Now, the MPI instance keeps a reference on every actor it contains,
and the actor terminates with no delay after its code. The terminating
actors unregister from their MPI instance, but they are still
referenced until the last actor unregisters from the MPI instance.
Once the MPI instance is empty, it unregisters all the actors,
allowing their collection by the refcounting.

This commit changes the ending time of many ranks in many examples, as
expected. The ranks now terminate as soon as they are done, they are
not waiting the others anymore.

It introduces a segfault in ampi that I fail to understand. It seems
that a container is used after being collected in this example, but I
fail to see the reason so far.

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 05:11:31 +0000 (07:11 +0200)]

sort out the functions on MPI init/fini

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 04:45:53 +0000 (06:45 +0200)]

small simplification of MPI initialization

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 03:53:31 +0000 (05:53 +0200)]

MPI: we don't mess with argc/argv anymore nowadays

Previously, the rank and instance were added in argv, mandating a
specific handling of MPI_Init parameters. But now, they are passed as
properties, and the argv is left unmodified. So there is no need to
deal specifically with the MPI_Init parameters.

commit | commitdiff | tree

Martin Quinson [Thu, 1 Aug 2019 03:25:31 +0000 (05:25 +0200)]

smpi_deployment: obey our coding rules

commit | commitdiff | tree

Martin Quinson [Wed, 31 Jul 2019 16:55:08 +0000 (18:55 +0200)]

I'd like to turn process_data into a regular extension of Actor

commit | commitdiff | tree

Martin Quinson [Wed, 31 Jul 2019 16:46:37 +0000 (18:46 +0200)]

SMPI: prefer xbt_assert to 'if () xbt_die'

commit | commitdiff | tree

Martin Quinson [Wed, 31 Jul 2019 16:35:02 +0000 (18:35 +0200)]

MPI init: inline a function and various small cleanups

commit | commitdiff | tree

Martin Quinson [Wed, 31 Jul 2019 16:12:10 +0000 (18:12 +0200)]

smpi::ActorExt: this constructor parameter is not used

commit | commitdiff | tree

Martin Quinson [Wed, 31 Jul 2019 15:58:33 +0000 (17:58 +0200)]

MPI init: inline a function

commit | commitdiff | tree

Martin Quinson [Wed, 31 Jul 2019 09:25:17 +0000 (11:25 +0200)]

SMPI init: stringify and rename a variable

commit | commitdiff | tree

Martin Quinson [Sun, 28 Jul 2019 23:55:25 +0000 (01:55 +0200)]

kill smpi_process_count(), use smpi_get_universe_size() instead

process_count was probably the original name while universe_size was
added to implement MPI_Attr_get(MPI_UNIVERSE_SIZE). Merge both to sort
things out.

While I'm at it, move all of it to smpi_deployment to reduce the
amount of globals made visible to more than one module.

commit | commitdiff | tree

Martin Quinson [Sun, 28 Jul 2019 23:42:33 +0000 (01:42 +0200)]

cmake: build smpi internals before the collectives

My changes break more often the internals than the collectives (that
are rarely modified)

commit | commitdiff | tree

Martin Quinson [Sun, 28 Jul 2019 23:31:06 +0000 (01:31 +0200)]

MC: kill a useless function

commit | commitdiff | tree

Martin Quinson [Sun, 28 Jul 2019 22:38:00 +0000 (00:38 +0200)]

smpi: some useless cleanups while I read this code

commit | commitdiff | tree

Augustin Degomme [Sat, 27 Jul 2019 09:04:36 +0000 (11:04 +0200)]

avoid mixing travis builders

Clone of SimGrid repository with local patches

RSS Atom