X-Git-Url: http://bilbo.iut-bm.univ-fcomte.fr/pub/gitweb/simgrid.git/blobdiff_plain/67d66b0cf79b9fc02c0450f254584693dbf21d3b..8d9c110f5bf839dcb7426f7750c09b3ff196bdf3:/docs/source/Introduction.rst diff --git a/docs/source/Introduction.rst b/docs/source/Introduction.rst index 4102473b54..32975c934f 100644 --- a/docs/source/Introduction.rst +++ b/docs/source/Introduction.rst @@ -9,201 +9,90 @@ Introduction

-Main Concepts -------------- - -Typical Study based on SimGrid -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Any SimGrid study entails the following components: - - - The studied **application**. This can be either a distributed - algorithm described in our simple APIs or a full-featured real - parallel application using for example the MPI interface - :ref:`(more info) `. - - - The **simulated platform**. This is a description of a given - distributed system (machines, links, disks, clusters, etc). Most of - the platform files are written in XML although a Lua interface is - under development. SimGrid makes it easy to augment the Simulated - Platform with a Dynamic Scenario where for example the links are - slowed down (because of external usage) or the machines fail. You - even have support to specify the applicative workload that you want - to feed to your application - :ref:`(more info) `. - - - The application's **deployment description**. In SimGrid - terminology, the application is an inert set of source files and - binaries. To make it run, you have to describe how your application - should be deployed on the simulated platform. You need to specify - which process is mapped onto which machine, along with their parameters - :ref:`(more info) `. - - - The **platform models**. They describe how the simulated platform - reacts to the actions of the application. For example, they compute - the time taken by a given communication on the simulated platform. - These models are already included in SimGrid, and you only need to - pick one and maybe tweak its configuration to get your results - :ref:`(more info) `. - -These components are put together to run a **simulation**, that is an -experiment or a probe. Simulations produce **outcomes** (logs, -visualization, or statistical analysis) that help to answer the -**question** targeted by this study. - -Here are some questions on which SimGrid is particularly relevant: - - - **Compare an Application to another**. This is the classical use - case for scientists, who use SimGrid to test how the solution that - they contribute to compares to the existing solutions from the - literature. - - - **Design the best [Simulated] Platform for a given Application.** - Tweaking the platform file is much easier than building a new real - platform for testing purposes. SimGrid also allows for the co-design - of the platform and the application by modifying both of them. - - - **Debug Real Applications**. With real systems, is sometimes - difficult to reproduce the exact run leading to the bug that you - are tracking. With SimGrid, you are *clairvoyant* about your - *reproducible experiments*: you can explore every part of the - system, and your probe will not change the simulated state. It also - makes it easy to mock some parts of the real system that are not - under study. - -Depending on the context, you may see some parts of this process as -less important, but you should pay close attention if you want to be -confident in the results coming out of your simulations. In -particular, you should not blindly trust your results but always -strive to double-check them. Likewise, :ref:`you should question the -realism of your input configuration `, and we even -encourage you to :ref:`doubt (and check) the provided performance models -`. - -To ease such questioning, you really should logically separate these -parts in your experimental setup. It is seen as a very bad practice to -merge the application, the platform, and the deployment altogether. -SimGrid is versatile and your mileage may vary, but you should start -with your Application specified as a C++ or Java program, using one of -the provided XML platform files, and with your deployment in a separate -XML file. - -SimGrid Execution Modes -^^^^^^^^^^^^^^^^^^^^^^^ - -Depending on the intended study, SimGrid can be run in several execution modes. - -**Simulation Mode**. This is the most common execution mode, where you want -to study how your application behaves on the simulated platform under -the experimental scenario. - -In this mode, SimGrid can provide information about the time taken by -your application, the amount of energy dissipated by the platform to -run your application, and the detailed usage of each resource. - -**Model-Checking Mode**. This can be seen as a sort of exhaustive -testing mode, where every possible outcome of your application is -explored. In some sense, this mode tests your application for all -possible platforms that you could imagine (and more). - -You just provide the application and its deployment (number of -processes and parameters), and the model checker will -explore all possible outcomes by testing all possible message -interleavings: if at some point a given process can either receive the -message A first or the message B depending on the platform -characteristics, the model checker will explore the scenario where A -arrives first, and then rewind to the same point to explore the -scenario where B arrives first. - -This is a very powerful mode, where you can evaluate the correctness of -your application. It can verify either **safety properties** (assertions) -or **liveness properties** stating for example that if a given event -occurs, then another given event will occur in a finite amount of -steps. This mode is not only usable with the abstract algorithms -developed on top of the SimGrid APIs, but also with real MPI -applications (to some extent). - -The main limit of Model Checking lies in the huge amount of scenarios -to explore. SimGrid tries to explore only non-redundant scenarios -thanks to classical reduction techniques (such as DPOR and stateful -exploration) but the exploration may well never finish if you don't -carefully adapt your application to this mode. - -A classical trap is that the Model Checker can only verify whether -your application fits the properties provided, which is useless if you -have a bug in your property. Remember also that one way for your -application to never violate a given assertion is to not start at all, -because of a stupid bug. - -Another limit of this mode is that it does not use the performance -models of the simulation mode. Time becomes discrete: You can say for -example that the application took 42 steps to run, but there is no way -to know how much time it took or the number of watts that were dissipated. - -Finally, the model checker only explores the interleavings of -computations and communications. Other factors such as thread -execution interleaving are not considered by the SimGrid model -checker. - -The model checker may well miss existing issues, as it computes the -possible outcomes *from a given initial situation*. There is no way to -prove the correctness of your application in full generality with this -tool. - -**Benchmark Recording Mode**. During debug sessions, continuous -integration testing, and other similar use cases, you are often only -interested in the control flow. If your application applies filters to -huge images split into small blocks, the filtered image is probably not -what you are interested in. You are probably looking for a way to run -each computational kernel only once, and record the time it takes to cache it. -This code block can then be skipped in simulation -and replaced by a synthetic block using the cached information. The -simulated platform will take this block into account without requesting -the actual hosting machine to benchmark it. +What is SimGrid +--------------- -SimGrid Limits -^^^^^^^^^^^^^^ +SimGrid is a framework for developing simulators of distributed application executions on distributed platforms. It can +be used to prototype, evaluate and compare relevant platform configurations, system designs, and algorithmic approaches. -This framework is by no means the holy grail, able to solve -every problem on Earth. +What SimGrid allows you to do +---------------------------- -**SimGrid scope is limited to distributed systems.** Real-time -multi-threaded systems are out of this scope. You could probably tweak -SimGrid for such studies (or the framework could be extended -in this direction), but another framework specifically targeting such a -use case would probably be more suited. +Here are some objectives for which SimGrid is particularly relevant and has been used extensively: -**There is currently no support for 5G or LoRa networks**. -The framework could certainly be improved in this direction, but this -still has to be done. + - **Compare designs**. This is a classical use case for researchers/developers, who use SimGrid to assess how their contributed solution (a platform, system, application, and/or algorithm design) compares to existing solutions from the literature. -**There is no perfect model, only models adapted to your study.** The SimGrid -models target fast and large studies, and yet they target realistic results. In -particular, our models abstract away parameters and phenomena that are often -irrelevant to reality in our context. + - **Design the best [Simulated] Platform for a given Application.** Modifying a platform file use to drive simulations is much easier than building + real-world platforms for testing purposes. SimGrid also allows for the co-design of the platform and the application, as both can be modified with little work. -SimGrid is obviously not intended for a study of any phenomenon that our -abstraction removes. Here are some **studies that you should not do with -SimGrid**: + - **Debug Real Applications**. With real systems it is often difficult to reproduce the exact run that would lead to a bug that is being tracked. + With SimGrid, you are *clairvoyant* about your *reproducible experiments*: you can explore every part of the + system, and your exploration will not change the simulated state. It also makes it easy to mock or abstract away parts of the real system that + are not under study. - - Studying the effect of L3 vs. L2 cache effects on your application - - Comparing kernel schedulers and policies - - Comparing variants of TCP - - Exploring pathological cases where TCP breaks down, resulting in - abnormal executions. - - Studying security aspects of your application, in presence of - malicious agents. + - **Formally assess an algorithm**. Inspired by model checking, SimGrid provides an execution mode that does not + quantify an application's performance behavior, but that instead explores all causally possible outcomes of the application so as to evaluate application correctness. This exhaustive + search is ideal for finding bugs that are difficult to trigger experimentally. But because it is exhaustive, there is a limit to the scale of the applications for which it can be used. + +Anatomy of a project that uses SimGrid +-------------------------------------- + +Any project that uses SimGrid as its simulation framework comprises the following components: + + - An **application**. An application consists of one or more process that can either implement distributed algorithms described using a simple API (either in C++, Python or + C) or be part of a full-featured real parallel application implemented with, for example, the MPI standard :ref:`(more info) `. + + - A **simulated platform**. This is a description (in either XML or C++) of a distributed system's hardware (machines, links, + disks, clusters, etc). SimGrid makes it straightforward to augment the simulated platform with dynamic behaviors where, for example, the + links are slowed down (because of external usage) or the machines fail :ref:`(more info) `. + + - An application's **deployment description**. To simulate the execution of the application on the platform, they way in which the application is + deployed on the platform must be described. This is done by specifying which process is mapped onto which machine :ref:`(more + info) `. + + - **Platform models**. SimGrid implements models that describe how the simulated platform reacts to the simulated activities performed my + application processes. SimGrid provides a range of documented models, + which the user can select and configure for their particular use case. A + big selling point of SimGrid, which sets it apart from its competitors, + is that it can accurately model the network contention that results from + concurrent communications. :ref:`(more info) `. + + +The above components are put together to run a **simulation experiment** +that produces **outcomes** (logs, visualization, statistics) that help +answer the user's research and development **question**. The outcomes +typically include a timeline of the application execution and information +about its energy consumption. + + +We work hard to make SimGrid easy to use, but you should not blindly trust your results and always strive to validate +the simulation outcomes. Assessing the realism of these outcomes will lead you to better :ref:`calibrate the models `, +which is the best way to achieved high simulation accuracy. Please refer to the section :ref:`howto_science`. + +Using SimGrid in practice +------------------------- + +SimGrid is versatile and can be used in many ways, but the most typical setup is to specify your algorithm as a C++ or Python +program using our API, along with one of the provided XML platform files as shown in the **first tutorial** on +:ref:`usecase_simalgo`. If your application is already written in MPI, then you are in luck because SimGrid comes with MPI support, +as explained in our **second tutorial** on :ref:`usecase_smpi`. The **third tutorial** is on +:ref:`usecase_modelchecking`. Docker images are provided to run these tutorials without installing any software, other than Docker, on your machine. + +SimGrid comes with :ref:`many examples `, so that you can quick-start your simulator by +assembling and modifying some of the provided examples (see :ref:`this section ` on how to get your own project +to compile with SimGrid). An extensive documentation is available from the left menu bar. If you want to get an idea of how +SimGrid works you can read about its :ref:`design goals `. SimGrid Success Stories -^^^^^^^^^^^^^^^^^^^^^^^ +----------------------- SimGrid was cited in over 3,000 scientific papers (according to Google Scholar). Among them, `over 500 publications `_ (written by hundreds of individuals) use SimGrid as a scientific -instrument to conduct their experimental evaluation. These -numbers do not include the articles contributing to SimGrid. -This instrument was used in many research communities, such as +instrument to conduct experimental evaluations. These +numbers do not include those articles that directly contribute to SimGrid itself. +SimGrid was used in many research communities, such as `High-Performance Computing `_, `Cloud Computing `_, `Workflow Scheduling `_, @@ -214,8 +103,7 @@ This instrument was used in many research communities, such as `Peer-to-Peer Computing `_, `Network Architecture `_, `Fog Computing `_, or -`Batch Scheduling `_ -`(more info) `_. +`Batch Scheduling `_. If your platform description is accurate enough (see `here `_ or @@ -223,15 +111,16 @@ If your platform description is accurate enough (see SimGrid can provide high-quality performance predictions. For example, we determined the speedup achieved by the Tibidabo ARM-based cluster before its construction -(`paper `_). In this case, -some differences between the prediction and the real timings were due to -misconfigurations with the real platform. To some extent, -SimGrid could even be used to debug the real platform :) +(`paper `_). Some +differences between the simulated and the real timings were observed, and +turned out to be due to +misconfigurations in the real platform! +SimGrid can thus even be used to debug a real platform :) SimGrid is also used to debug, improve, and tune several large applications. `BigDFT `_ (a massively parallel code -computing the electronic structure of chemical elements developed by +for computing the electronic structure of chemical elements developed by the CEA), `StarPU `_ (a Unified Runtime System for Heterogeneous Multicore Architectures developed by Inria Bordeaux), and @@ -239,4 +128,35 @@ developed by Inria Bordeaux), and key-value pair storage library developed at the University of Zurich). Some of these applications enjoy large user communities themselves. +SimGrid Limits +-------------- + +SimGrid is by no means the holy grail that is able to solve every conceivable simulation problem. + +**SimGrid's scope is limited to distributed systems.** Real-time +multi-threaded systems are out of this scope. You could probably use and/or +extend SimGrid for this purpose, but another framework that specifically +targets this use case would probably be more suitable. + +**There is currently no support for 5G or LoRa networks**. +SimGrid could certainly be extended with models for these networks, but this +yet to be done. + +**There is no perfect model, only models adapted to your purposes.** SimGrid's +models were designed to make it possible to run fast and accurate +simulations of large systems. As a result, the models abstract away many +parameters and phenomena that are often irrelevant for most use cases in the +field. This means that SimGrid cannot be used to study any phenomenon that our +model do not capture. Here are some **phenomena that you currently cannot study with +SimGrid**: + + - Studying the effect of L3 vs. L2 cache effects on your application; + - Comparing kernel schedulers and policies; + - Comparing variants of TCP; + - Exploring pathological cases where TCP breaks down, resulting in + abnormal executions; + - Studying security aspects of your application, in the presence of + malicious agents. + + .. LocalWords: SimGrid