Bruno Donassolo [Tue, 15 Jun 2021 14:36:32 +0000 (16:36 +0200)]
Fix some links in graphical toc [doc]
Bruno Donassolo [Tue, 15 Jun 2021 10:52:27 +0000 (12:52 +0200)]
Adding some packages missing packages [doc]
Packages needed to run ./Build.sh script
Bruno Donassolo [Tue, 15 Jun 2021 10:52:12 +0000 (12:52 +0200)]
Adding C++ platform documentation [doc]
- Point to the examples.
- Rewrite routing part to explain how we calculate the routes between
different zones.
Augustin Degomme [Wed, 16 Jun 2021 22:24:24 +0000 (00:24 +0200)]
Add a SMPI_SAMPLE_LOCAL_TAG and SMPI_SAMPLE_GLOBAL_TAG macro for sampling, to provide unique parameters to separate various calls to sampling.
This can be used when a kernel is called with various distinct sets of parameters.
Tag is a string of max size 128.
Augustin Degomme [Wed, 16 Jun 2021 20:26:25 +0000 (22:26 +0200)]
Try to be more coherent, fix misleading message, and fix test.
It was asking 0 iterations max, let's say that this means "exit after the first".
Augustin Degomme [Wed, 16 Jun 2021 19:00:57 +0000 (21:00 +0200)]
fix bug in sampling, it was ignoring max number of iterations
Augustin Degomme [Mon, 14 Jun 2021 12:10:19 +0000 (14:10 +0200)]
set this error added in
9d5b713b1 as a pedantic one, to allow ignoring it.
Rationale : we don't really know if it's actually an error, and read2shmem code triggers it in proxy apps.
Augustin Degomme [Mon, 14 Jun 2021 12:08:23 +0000 (14:08 +0200)]
Add smpi/pedantic flag to avoiding reporting controversial errors that may or may not be important.
Augustin Degomme [Mon, 14 Jun 2021 12:07:41 +0000 (14:07 +0200)]
Add option smpi/errors-are-fatal to allow users to bypass MPI errors returned by SMPI.
This basically sets the default errhandler to MPI_ERRORS_RETURN if set to false.
Arnaud Giersch [Sun, 13 Jun 2021 19:48:26 +0000 (21:48 +0200)]
Cleanup in .gitignore files; delete obsolete tesh file. [ci-skip]
Arnaud Giersch [Sun, 13 Jun 2021 13:45:25 +0000 (15:45 +0200)]
[sonar] Replace redundant type with 'auto'.
Arnaud Giersch [Sat, 12 Jun 2021 13:01:42 +0000 (15:01 +0200)]
Remove misleading comment: there is no simcall between the two assignments.
Arnaud Giersch [Sat, 12 Jun 2021 13:00:49 +0000 (15:00 +0200)]
Add more optimizations for MC builds.
Arnaud Giersch [Fri, 11 Jun 2021 11:52:17 +0000 (13:52 +0200)]
Factorize common code to assemble vector<LinkImpl*> and update latency.
Arnaud Giersch [Fri, 11 Jun 2021 09:33:14 +0000 (11:33 +0200)]
Fix loop: iterator is invalid after insertion.
Arnaud Giersch [Fri, 11 Jun 2021 08:20:04 +0000 (10:20 +0200)]
Use 'std::vector' instead of a C-style array.
Arnaud Giersch [Fri, 11 Jun 2021 07:57:38 +0000 (09:57 +0200)]
For sonar.
Arnaud Giersch [Thu, 10 Jun 2021 14:35:37 +0000 (16:35 +0200)]
Cosmetics: snake_case.
Arnaud Giersch [Thu, 10 Jun 2021 14:13:22 +0000 (16:13 +0200)]
Use existing helper function to create zones.
Arnaud Giersch [Thu, 10 Jun 2021 14:09:22 +0000 (16:09 +0200)]
Useless static function.
Arnaud Giersch [Thu, 10 Jun 2021 09:23:06 +0000 (11:23 +0200)]
Reduce depth of nested statements (sonar).
Arnaud Giersch [Thu, 10 Jun 2021 09:21:07 +0000 (11:21 +0200)]
Superfluous blank lines.
Arnaud Giersch [Thu, 10 Jun 2021 08:36:32 +0000 (10:36 +0200)]
Remove unused parameter.
Arnaud Giersch [Thu, 10 Jun 2021 08:23:37 +0000 (10:23 +0200)]
Misc. Sonar smells.
Arnaud Giersch [Thu, 10 Jun 2021 08:19:38 +0000 (10:19 +0200)]
Use empty() to check for container emptiness.
Arnaud Giersch [Thu, 10 Jun 2021 07:25:19 +0000 (09:25 +0200)]
Use xbt logs, not printf.
Update tesh files accordingly.
Arnaud Giersch [Thu, 10 Jun 2021 09:25:20 +0000 (09:25 +0000)]
Merge branch 'host-energy' into 'master'
host energy: no direct W/J access of invalid data
See merge request simgrid/simgrid!65
Millian Poquet [Thu, 10 Jun 2021 08:18:04 +0000 (10:18 +0200)]
host energy: explicit 'return 0.0' when badly init
Millian Poquet [Mon, 7 Jun 2021 15:59:50 +0000 (17:59 +0200)]
host energy: no direct W/J access of invalid data
When energy consumption data is missing about a host,
SimGrid assumes it consumes no energy (0 W).
This change does not modify this behavior.
It prevents direct user access to power/energy consumption of such hosts,
as this is likely a user bug:
- either forgot to parametrize the host energy consumption
- or requested the energy consumption of the wrong host
Arnaud Giersch [Wed, 9 Jun 2021 14:34:34 +0000 (16:34 +0200)]
Fix a FIXME: use a dynamic table.
Arnaud Giersch [Wed, 9 Jun 2021 13:09:26 +0000 (15:09 +0200)]
Don't disable malloc override anymore when using realloc().
Arnaud Giersch [Wed, 9 Jun 2021 13:07:49 +0000 (15:07 +0200)]
Finally implement smpi shared realloc.
Use legacy malloc/free again (through xbt) to be able to call realloc on need.
Arnaud Giersch [Wed, 9 Jun 2021 15:21:48 +0000 (17:21 +0200)]
Add memset(0) for shared calloc.
Add missing barrier to auto-shared test.
Arnaud Giersch [Tue, 8 Jun 2021 14:52:52 +0000 (16:52 +0200)]
Let's be more tolerant with shared realloc() and only issue a warning.
Arnaud Giersch [Tue, 8 Jun 2021 13:53:02 +0000 (15:53 +0200)]
Die loudly if smpi_shared_realloc_intercept is used.
Arnaud Giersch [Tue, 8 Jun 2021 13:20:43 +0000 (15:20 +0200)]
Plug memory leaks with tests mpich3-test/rma/linked_list_*.
Arnaud Giersch [Tue, 8 Jun 2021 12:11:50 +0000 (14:11 +0200)]
Fix memleaks with MPI_*_get_info, when info is duplicated.
Augustin Degomme [Tue, 8 Jun 2021 09:19:19 +0000 (09:19 +0000)]
Merge branch 'factor_in_actions' into 'master'
New implementation for bandwidth factors
See merge request simgrid/simgrid!64
Augustin Degomme [Tue, 8 Jun 2021 07:26:04 +0000 (09:26 +0200)]
reduce number of iterations to speedup rma test
Augustin Degomme [Mon, 7 Jun 2021 19:12:53 +0000 (21:12 +0200)]
protect type_creation routines against null pointers in output types.
Augustin Degomme [Mon, 7 Jun 2021 18:57:48 +0000 (20:57 +0200)]
catch if MPI_Win_fence was only called once (not enough) when MPI_Win_free is called.
Augustin Degomme [Mon, 7 Jun 2021 18:56:32 +0000 (20:56 +0200)]
better check for mpi_datatype_null
Augustin Degomme [Mon, 7 Jun 2021 15:12:45 +0000 (17:12 +0200)]
check that we are not using RMA-reserved MPI_Op in non-RMA calls.
Augustin Degomme [Mon, 7 Jun 2021 14:32:21 +0000 (16:32 +0200)]
get_accumulate: if MPI_NO_OP is specified, origin* inputs are irrelevant
+ activate test.
Bruno Donassolo [Mon, 7 Jun 2021 15:15:16 +0000 (17:15 +0200)]
Try to avoid another ifdef WIN32.
Use unique_ptr to manage handle. Thanks @agiersch
Bruno Donassolo [Mon, 7 Jun 2021 14:03:21 +0000 (16:03 +0200)]
As always, forgot windows build
Bruno Donassolo [Mon, 7 Jun 2021 10:49:10 +0000 (12:49 +0200)]
Move load_platf to EngineImpl
Keep platf lib opened until end of simulation in case of user is using
some network callback defined in it.
Bruno Donassolo [Mon, 7 Jun 2021 08:08:30 +0000 (10:08 +0200)]
Please sonar
Arnaud Giersch [Sun, 6 Jun 2021 20:40:42 +0000 (22:40 +0200)]
Destroy dead actors after mc::replay() is completed (fix memory leak).
Arnaud Giersch [Sun, 6 Jun 2021 10:03:26 +0000 (12:03 +0200)]
Correctly remember buffer between persistent communications.
Fixes lots of Petsc tests, especially vec/is/sf/tests/ex14.c.
The buffer was lost after the first communication, and no more data could
be transfered effectively.
Arnaud Giersch [Fri, 4 Jun 2021 19:47:14 +0000 (21:47 +0200)]
Call cleanup_attr<Comm> before marking Comm as deleted.
The MPI_Comm may be used by the attr cleanup callbacks.
Arnaud Giersch [Fri, 4 Jun 2021 19:34:02 +0000 (21:34 +0200)]
Remove a global variable, and use a static to remember if smpi_main is running.
The value of 'running_with_smpi_main' is effectively used later, when the
simgrid::config callback is executed.
Arnaud Giersch [Fri, 4 Jun 2021 15:45:07 +0000 (17:45 +0200)]
Restore public smpi_init_options().
It was wrongly removed in commit
6a046487fb Make smpi_switch_data_segment check if a switch is needed, and return true when it occurs.
Bruno Donassolo [Fri, 4 Jun 2021 14:37:13 +0000 (16:37 +0200)]
Fix test on MAC OS
Bruno Donassolo [Fri, 4 Jun 2021 12:17:04 +0000 (14:17 +0200)]
Try to fix test on CI
Bruno Donassolo [Thu, 3 Jun 2021 13:59:50 +0000 (15:59 +0200)]
An example with SMPI and CPP platform
Shows how to use smpirun to execute an application with the platform described in C++.
Bruno Donassolo [Thu, 3 Jun 2021 13:59:27 +0000 (15:59 +0200)]
Minor fix usage
Bruno Donassolo [Thu, 3 Jun 2021 12:40:31 +0000 (14:40 +0200)]
Adjust test
Empty hostfiles are now checked inside the C++ code, not in smpirun
Bruno Donassolo [Thu, 3 Jun 2021 12:40:06 +0000 (14:40 +0200)]
Adjust test
Nodes chosen to run the test arent the same anymore.
Bruno Donassolo [Thu, 3 Jun 2021 09:23:45 +0000 (11:23 +0200)]
Minor fix in test.
Just messages order has changed.
Bruno Donassolo [Wed, 2 Jun 2021 18:11:09 +0000 (20:11 +0200)]
Moving SMPI app deployment to C++ code
Enable the deployment of SMPI experiments with C++ platform description.
Move application deployment from smpirun to smpi_main function.
The smpirun script used to parse the platform XML to create an
application deployment. This isn't possible anymore since we may don't
have a platform XML anymore.
Move the necessary input to smpi_main through specific cfg variables:
- smpi/hostfile: host file
- smpi/replay: replay file
- smpi/np: number of processes
- smpi/map: mapping process/rank
This cfg isn't used by users, they are cached inside the smpirun script.
Bruno Donassolo [Fri, 23 Apr 2021 16:56:43 +0000 (18:56 +0200)]
New platform example: StarZone of StarZone
Re-implements the griffon.xml using the C++ interface.
Simplify the implementation of homogeneous clusters organized in
cabinets.
Add tests in teshsuite to use both files.
Bruno Donassolo [Fri, 16 Apr 2021 19:12:18 +0000 (21:12 +0200)]
Fix build mac/windows
Bruno Donassolo [Thu, 15 Apr 2021 12:42:50 +0000 (14:42 +0200)]
Change C++ platform example
Remove small_platform.cpp.
Add a more programmatic platform using the StarZone.
Bruno Donassolo [Wed, 14 Apr 2021 18:31:46 +0000 (20:31 +0200)]
Try to fix python build
Add dependency for dl library.
Bruno Donassolo [Mon, 15 Mar 2021 17:45:25 +0000 (18:45 +0100)]
Runs examples with C++ platform description
*Loading platform
- Generates library files for C++ platforms. They can be loaded by the
engine using the same load_platform method.
- The Engine::load_platform will verify if the extension is .so, it'll
open the file using dlopen and search for the load_platform symbol.
- The platform so must contain a load_platform function that will be
called by the engine to generate the platform properly.
*Implementing an example
- Added a CMakeLists.txt in examples/platform to generate the .so for
each example.
- Pass to the tesh files a new variable "libdir" containing the
directory where the libraries are located.
Arnaud Giersch [Fri, 4 Jun 2021 08:16:08 +0000 (10:16 +0200)]
[sonar] Replace redundant type with 'auto'.
Arnaud Giersch [Fri, 4 Jun 2021 07:40:39 +0000 (09:40 +0200)]
[sonar] Redundant parentheses.
Arnaud Giersch [Fri, 4 Jun 2021 07:38:35 +0000 (09:38 +0200)]
[sonar] Pointer-to-const.
Arnaud Giersch [Thu, 3 Jun 2021 15:15:07 +0000 (17:15 +0200)]
Fix build with enable_smpi=OFF.
Arnaud Giersch [Thu, 3 Jun 2021 14:57:51 +0000 (16:57 +0200)]
Fix include.
Arnaud Giersch [Thu, 3 Jun 2021 13:34:01 +0000 (15:34 +0200)]
Ensure correct ordering of the accumulate requests.
Arnaud Giersch [Thu, 3 Jun 2021 11:57:25 +0000 (13:57 +0200)]
Useless test; TODO--.
Arnaud Giersch [Thu, 3 Jun 2021 11:07:08 +0000 (13:07 +0200)]
Initialize mmap-privatized segments earlier (before main).
Sometimes we may want to initiailze a global before MPI_Init.
Arnaud Giersch [Thu, 3 Jun 2021 09:28:56 +0000 (11:28 +0200)]
Make smpi_switch_data_segment check if a switch is needed, and return true when it occurs.
Kill global SMPI_switch_data_segment.
Arnaud Giersch [Thu, 3 Jun 2021 07:38:57 +0000 (09:38 +0200)]
Use existing function (also empties requests_ after waitall).
Arnaud Giersch [Thu, 3 Jun 2021 07:37:39 +0000 (09:37 +0200)]
Improve debug messages and avoid calling finish_comms twice when for myself.
Arnaud Giersch [Thu, 3 Jun 2021 07:23:48 +0000 (09:23 +0200)]
Initialize variable.
Arnaud Giersch [Wed, 2 Jun 2021 15:38:03 +0000 (17:38 +0200)]
Use existing functions to finish comms (and fix Win::flush).
Arnaud Giersch [Wed, 2 Jun 2021 15:10:47 +0000 (17:10 +0200)]
Review usage of rank/rank_/rank() is smpi_win.
Arnaud Giersch [Wed, 2 Jun 2021 14:37:24 +0000 (16:37 +0200)]
Little simplifications in loops.
Arnaud Giersch [Wed, 2 Jun 2021 12:55:46 +0000 (14:55 +0200)]
Some int -> bool conversions (+ use of existing macro).
Arnaud Giersch [Wed, 2 Jun 2021 11:49:30 +0000 (13:49 +0200)]
Call rank() only once.
Arnaud Giersch [Wed, 2 Jun 2021 10:23:02 +0000 (12:23 +0200)]
Ooops, fmt is second arg.
Arnaud Giersch [Wed, 2 Jun 2021 09:09:14 +0000 (11:09 +0200)]
Prefer emplace_back.
Arnaud Giersch [Wed, 2 Jun 2021 09:05:48 +0000 (11:05 +0200)]
Get rid of "%s" in second argument of function xbt_str_parse_*.
Arnaud Giersch [Wed, 2 Jun 2021 08:45:13 +0000 (10:45 +0200)]
XBT_ATTRIB_PRINTF for vprintf-like functions.
Arnaud Giersch [Wed, 2 Jun 2021 08:15:28 +0000 (10:15 +0200)]
Define class SmpiBenchGuard, and use RAII to handle smpi_bench_end()/smpi_bench_begin().
Arnaud Giersch [Tue, 1 Jun 2021 20:45:17 +0000 (22:45 +0200)]
Add missing calls to smpi_bench_begin() on error paths.
Bruno Donassolo [Tue, 1 Jun 2021 15:48:47 +0000 (17:48 +0200)]
Cannot set split-duplex through s4u intf.
This makes sense only in XML where it properly creates the
link-up/link-down.
Bruno Donassolo [Tue, 1 Jun 2021 15:48:18 +0000 (17:48 +0200)]
Add fg#71 to changelog [ci-skip]
Bruno Donassolo [Tue, 1 Jun 2021 09:12:51 +0000 (11:12 +0200)]
Update ChangeLog
Bruno Donassolo [Mon, 31 May 2021 12:52:56 +0000 (14:52 +0200)]
Adjust timing of SMPI tests
For sure SMPI is the most impacted by the bandwidth factors changes.
It seems especially impacted when collective comms are involved.
The old version used 2 factors for SMPI comms:
1) network/bandwidth-factor was used to reduce the link capacity (e.g.
0.97*C for LV08)
2) smpi/bw-factor was used for each communication, limitating the flow
capacity.
Now, the code is simplified, each communication has only 1 bw-factor
that is applied after it's done, at the update remaining phase.
In most cases, only the bw-factor are applied now but after the comm is
done, not before
Bruno Donassolo [Mon, 31 May 2021 08:51:47 +0000 (10:51 +0200)]
Fix link-load test
In the old version, our links capacity were 0.97*C, now it's just C.
So, more bytes can be transmitted through the links
Bruno Donassolo [Mon, 31 May 2021 08:41:40 +0000 (10:41 +0200)]
Fix timing of Vivaldi/two_peers.xml tests
In this platform file, the communication is bounded by the big latency
between nodes (as consequence of the distance between vivaldi
coordinates).
Therefore, we now apply the bandwidth factor (0.97) on top of this
time, delaying a little the communications.
Bruno Donassolo [Mon, 31 May 2021 08:18:41 +0000 (10:18 +0200)]
Fix timing of Wi-Fi tests.
Their timing were calculated considering that no factor were applied in
Wi-Fi communications.
This isn't the case anymore, since by default, we would apply the 0.97
factor from LV08 to these communications.
Setting CM02 as base network model since it doesn't apply any bandwidth
factor.
Bruno Donassolo [Fri, 28 May 2021 10:08:26 +0000 (12:08 +0200)]
Adjust dynamic network-factors test.
Improve test, including no crosstraffic config.
Bruno Donassolo [Thu, 27 May 2021 14:57:57 +0000 (16:57 +0200)]
New implementation for bandwidth factors
Bandwidth factors are now implemented at the Action level, reducing the
speed that an action advances (e.g. the number of bytes transmitted).
In the past, the factor took place at the maxmin system, limiting the
amount of resources a communication could use.
For example, a bw factor of 0.97 (default for LV08 network model) was
reflected by reducing the link capacity to 0.97*C. So, a 100MBs link had
97MBs capacity in the maxmin system.
Now, a communication alone using this link may use the 100MBs, but after
one second, it'll transmit only 97MB of data.
NOTE: This change may impact the timing of your experiments.
Arnaud Giersch [Tue, 1 Jun 2021 13:51:01 +0000 (15:51 +0200)]
Missing include.
Arnaud Giersch [Tue, 1 Jun 2021 13:30:44 +0000 (15:30 +0200)]
Coding style: no global "using namespace".