From: Augustin Degomme Date: Tue, 16 Sep 2014 12:44:57 +0000 (+0200) Subject: doc update X-Git-Tag: v3_12~815 X-Git-Url: http://bilbo.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/commitdiff_plain/eb15ac18c9e874325ef1326cf10f8e60b9d8d628 doc update --- diff --git a/COPYING b/COPYING index 4ff2f05d39..a7c05048f3 100644 --- a/COPYING +++ b/COPYING @@ -366,4 +366,53 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ========================================================================== +Some collective algorithms and selection logic has been taken from MVAPICH2, available on http://mvapich.cse.ohio-state.edu/ + + + COPYRIGHT + +Copyright (c) 2001-2014, The Ohio State University. All rights +reserved. + +The MVAPICH2 software package is developed by the team members of The +Ohio State University's Network-Based Computing Laboratory (NBCL), +headed by Professor Dhabaleswar K. (DK) Panda. + +Contact: +Prof. Dhabaleswar K. (DK) Panda +Dept. of Computer Science and Engineering +The Ohio State University +2015 Neil Avenue +Columbus, OH - 43210-1277 +Tel: (614)-292-5199; Fax: (614)-292-2911 +E-mail:panda@cse.ohio-state.edu + +This program is available under BSD licensing. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +(1) Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +(2) Redistributions in binary form must reproduce the above copyright +notice, this list of conditions and the following disclaimer in the +documentation and/or other materials provided with the distribution. + +(3) Neither the name of The Ohio State University nor the names of +their contributors may be used to endorse or promote products derived +from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/doc/doxygen/module-smpi.doc b/doc/doxygen/module-smpi.doc index b51f5dfe04..f747121319 100644 --- a/doc/doxygen/module-smpi.doc +++ b/doc/doxygen/module-smpi.doc @@ -171,14 +171,14 @@ to allow the user to tune the library and use the better collective if the default one is not good enough. SMPI tries to apply the same logic, regrouping algorithms from OpenMPI, MPICH -libraries, and from StarMPI (STAR-MPI). -This collection of more than a hundred algorithms allows a simple and effective +libraries, StarMPI (STAR-MPI), and MVAPICH2 libraries. +This collection of more than 115 algorithms allows a simple and effective comparison of their behavior and performance, making SMPI a tool of choice for the development of such algorithms. \subsection Tracing_internals Tracing of internal communications -For each collective, default tracing only outputs only global data. +For each collective, default tracing only outputs global data. Internal communication operations are not traced to avoid outputting too much data to the trace. To debug and compare algorithm, this can be changed with the item \b tracing/smpi/internals , which has 0 for default value. @@ -195,8 +195,17 @@ the first one with a ring algorithm, the second with a pairwise one : The default selection logic implemented by default in OpenMPI (version 1.7) and MPICH (version 3.0.4) has been replicated and can be used by setting the -\b smpi/coll_selector item to either ompi or mpich. The code and details for each -selector can be found in the src/smpi/colls/smpi_(openmpi/mpich)_selector.c file. +\b smpi/coll_selector item to either ompi or mpich. A selector based on the selection logic of MVAPICH2 (version 1.9) tuned on the Stampede cluster as also been implemented, as well as a preliminary version of an Intel MPI selector (version 4.1.3, also tuned for the Stampede cluster). Due the closed source nature of Intel MPI, some of the algorithms described in the documentation are not available, and are replaced by mvapich ones. + +Values for option \b smpi/coll_selector are : + - ompi + - mpich + - mvapich2 + - impi + - default + +The code and details for each +selector can be found in the src/smpi/colls/smpi_(openmpi/mpich/mvapich2/impi)_selector.c file. As this is still in development, we do not insure that all algorithms are correctly replicated and that they will behave exactly as the real ones. If you notice a difference, please contact SimGrid developers mailing list @@ -222,6 +231,8 @@ Most of these are best described in Rabenseifner's reduce algorithm \subsubsection MPI_Allreduce - default : naive one, by default - ompi : use openmpi selector for the allreduce operations - mpich : use mpich selector for the allreduce operations + - mvapich2 : use mvapich2 selector for the allreduce operations + - impi : use intel mpi selector for the allreduce operations - automatic (experimental) : use an automatic self-benchmarking algorithm - lr : logical ring reduce-scatter then logical ring allgather - rab1 : variations of the Rabenseifner algorithm : reduce_scatter then allgather @@ -328,24 +359,29 @@ one in most cases) - rab_rsag : variation of the Rabenseifner algorithm : recursive doubling reduce_scatter then recursive doubling allgather - rdb : recursive doubling - - smp_binomial : binomial tree with smp : 8 cores/SMP, binomial intra + - smp_binomial : binomial tree with smp : binomial intra SMP reduce, inter reduce, inter broadcast then intra broadcast - smp_binomial_pipeline : same with segment size = 4096 bytes - - smp_rdb : 8 cores/SMP, intra : binomial allreduce, inter : Recursive + - smp_rdb : intra : binomial allreduce, inter : Recursive doubling allreduce, intra : binomial broadcast - - smp_rsag : 8 cores/SMP, intra : binomial allreduce, inter : reduce-scatter, + - smp_rsag : intra : binomial allreduce, inter : reduce-scatter, inter:allgather, intra : binomial broadcast - - smp_rsag_lr : 8 cores/SMP, intra : binomial allreduce, inter : logical ring + - smp_rsag_lr : intra : binomial allreduce, inter : logical ring reduce-scatter, logical ring inter:allgather, intra : binomial broadcast - - smp_rsag_rab : 8 cores/SMP, intra : binomial allreduce, inter : rab + - smp_rsag_rab : intra : binomial allreduce, inter : rab reduce-scatter, rab inter:allgather, intra : binomial broadcast - redbcast : reduce then broadcast, using default or tuned algorithms if specified - ompi_ring_segmented : ring algorithm used by OpenMPI + - mvapich2_rs : rdb for small messages, reduce-scatter then allgather else + - mvapich2_two_level : SMP-aware algorithm, with mpich as intra algoritm, and rdb as inter (Change this behavior by using mvapich2 selector to use tuned values) + - rab : default Rabenseifner implementation \subsubsection MPI_Reduce_scatter - default : naive one, by default - ompi : use openmpi selector for the reduce_scatter operations - mpich : use mpich selector for the reduce_scatter operations + - mvapich2 : use mvapich2 selector for the reduce_scatter operations + - impi : use intel mpi selector for the reduce_scatter operations - automatic (experimental) : use an automatic self-benchmarking algorithm - ompi_basic_recursivehalving : recursive halving version from OpenMPI - ompi_ring : ring version from OpenMPI @@ -359,6 +395,8 @@ reduce-scatter, rab inter:allgather, intra : binomial broadcast - default : naive one, by default - ompi : use openmpi selector for the allgather operations - mpich : use mpich selector for the allgather operations + - mvapich2 : use mvapich2 selector for the allgather operations + - impi : use intel mpi selector for the allgather operations - automatic (experimental) : use an automatic self-benchmarking algorithm - 2dmesh : see alltoall - 3dmesh : see alltoall @@ -383,12 +421,15 @@ using simple algorithm (hardcoded, default processes/SMP: 8) i + 2, ..., i -> (i + p -1) % P - ompi_neighborexchange : Neighbor Exchange algorithm for allgather. Described by Chen et.al. in Performance Evaluation of Allgather Algorithms on Terascale Linux Cluster with Fast Ethernet + - mvapich2_smp : SMP aware algorithm, performing intra-node gather, inter-node allgather with one process/node, and bcast intra-node \subsubsection MPI_Allgatherv - default : naive one, by default - ompi : use openmpi selector for the allgatherv operations - mpich : use mpich selector for the allgatherv operations + - mvapich2 : use mvapich2 selector for the allgatherv operations + - impi : use intel mpi selector for the allgatherv operations - automatic (experimental) : use an automatic self-benchmarking algorithm - GB : Gatherv - Broadcast (uses tuned version if specified, but only for Bcast, gatherv is not tuned) @@ -404,6 +445,8 @@ one from STAR-MPI - default : naive one, by default - ompi : use openmpi selector for the bcast operations - mpich : use mpich selector for the bcast operations + - mvapich2 : use mvapich2 selector for the bcast operations + - impi : use intel mpi selector for the bcast operations - automatic (experimental) : use an automatic self-benchmarking algorithm - arrival_pattern_aware : root exchanges with the first process to arrive - arrival_pattern_aware_wait : same with slight variation @@ -421,7 +464,9 @@ one from STAR-MPI - SMP_linear : linear algorithm with 8 cores/SMP - ompi_split_bintree : binary tree algorithm from OpenMPI, with message split in 8192 bytes pieces - ompi_pipeline : pipeline algorithm from OpenMPI, with message split in 128KB pieces - + - mvapich2_inter_node : Inter node default mvapich worker + - mvapich2_intra_node : Intra node default mvapich worker + - mvapich2_knomial_intra_node : k-nomial intra node default mvapich worker. default factor is 4. \subsection auto Automatic evaluation