1 /*! \page tracing Tracing Simulations
4 Tracing is widely used to observe and understand the behavior of
5 parallel applications and distributed algorithms. Usually, this is
6 done in a two-step fashion: the user instruments the application and
7 the traces are analyzed after the end of the execution. The analysis
8 can highlights unexpected behaviors, bottlenecks and sometimes can be
9 used to correct distributed algorithms. The SimGrid team has
10 instrumented the library in order to let users trace their simulations
11 and analyze them. This part of the user manual explains how the
12 tracing-related features can be enabled and used during the
13 development of simulators using the SimGrid library.
15 \section tracing_tracing_enabling Enabling using CMake
17 With the sources of SimGrid, it is possible to enable the tracing
18 using the parameter <b>-Denable_tracing=ON</b> when the cmake is
19 executed. The sections \ref instr_category_functions, \ref
20 instr_mark_functions, and \ref instr_uservariables_functions describe
21 all the functions available when this Cmake options is
22 activated. These functions will have no effect if SimGrid is
23 configured without this option (they are wiped-out by the
27 $ cmake -Denable_tracing=ON .
31 \section instr_category_functions Tracing categories functions
33 The SimGrid library is instrumented so users can trace the platform
34 utilization using MSG, SimDAG and SMPI interfaces. It registers how
35 much power is used for each host and how much bandwidth is used for
36 each link of the platform. The idea with this type of tracing is to
37 observe the overall view of resources utilization in the first place,
38 especially the identification of bottlenecks, load-balancing among
41 Another possibility is to trace resource utilization by
42 categories. Categorized resource utilization tracing gives SimGrid
43 users to possibility to classify MSG and SimDAG tasks by category,
44 tracing resource utilization for each of the categories. The functions
45 below let the user declare a category and apply it to tasks. <em>The
46 tasks that are not classified according to a category are not
47 traced</em>. Even if the user does not specify any category, the
48 simulations can still be traced in terms of resource utilization by
49 using a special parameter that is detailed below (see section \ref
50 tracing_tracing_options).
52 \li \c TRACE_category(const char *category)
53 \li \c TRACE_category_with_color(const char *category, const char *color)
54 \li \c MSG_task_set_category(msg_task_t task, const char *category)
55 \li \c MSG_task_get_category(msg_task_t task)
56 \li \c SD_task_set_category(SD_task_t task, const char *category)
57 \li \c SD_task_get_category(SD_task_t task)
59 \section instr_mark_functions Tracing marks functions
60 \li \c TRACE_declare_mark(const char *mark_type)
61 \li \c TRACE_mark(const char *mark_type, const char *mark_value)
63 \section instr_uservariables_functions Tracing user variables functions
67 \li \c TRACE_host_variable_declare(const char *variable)
68 \li \c TRACE_host_variable_declare_with_color(const char *variable, const char *color)
69 \li \c TRACE_host_variable_set(const char *host, const char *variable, double value)
70 \li \c TRACE_host_variable_add(const char *host, const char *variable, double value)
71 \li \c TRACE_host_variable_sub(const char *host, const char *variable, double value)
72 \li \c TRACE_host_variable_set_with_time(double time, const char *host, const char *variable, double value)
73 \li \c TRACE_host_variable_add_with_time(double time, const char *host, const char *variable, double value)
74 \li \c TRACE_host_variable_sub_with_time(double time, const char *host, const char *variable, double value)
78 \li \c TRACE_link_variable_declare(const char *variable)
79 \li \c TRACE_link_variable_declare_with_color(const char *variable, const char *color)
80 \li \c TRACE_link_variable_set(const char *link, const char *variable, double value)
81 \li \c TRACE_link_variable_add(const char *link, const char *variable, double value)
82 \li \c TRACE_link_variable_sub(const char *link, const char *variable, double value)
83 \li \c TRACE_link_variable_set_with_time(double time, const char *link, const char *variable, double value)
84 \li \c TRACE_link_variable_add_with_time(double time, const char *link, const char *variable, double value)
85 \li \c TRACE_link_variable_sub_with_time(double time, const char *link, const char *variable, double value)
87 For links, but use source and destination to get route:
89 \li \c TRACE_link_srcdst_variable_set(const char *src, const char *dst, const char *variable, double value)
90 \li \c TRACE_link_srcdst_variable_add(const char *src, const char *dst, const char *variable, double value)
91 \li \c TRACE_link_srcdst_variable_sub(const char *src, const char *dst, const char *variable, double value)
92 \li \c TRACE_link_srcdst_variable_set_with_time(double time, const char *src, const char *dst, const char *variable, double value)
93 \li \c TRACE_link_srcdst_variable_add_with_time(double time, const char *src, const char *dst, const char *variable, double value)
94 \li \c TRACE_link_srcdst_variable_sub_with_time(double time, const char *src, const char *dst, const char *variable, double value)
96 \section tracing_tracing_options Tracing configuration Options
98 To check which tracing options are available for your simulator, you
99 can just run it with the option \verbatim --help-tracing \endverbatim
100 to get a very detailed and updated explanation of each tracing
101 parameter. These are some of the options accepted by the tracing
102 system of SimGrid, you can use them by running your simulator with the
103 <b>--cfg=</b> switch:
108 Safe switch. It activates (or deactivates) the tracing system.
109 No other tracing options take effect if this one is not activated.
117 It activates the categorized resource utilization tracing. It should
118 be enabled if tracing categories are used by this simulator.
120 --cfg=tracing/categorized:1
124 tracing/uncategorized
126 It activates the uncategorized resource utilization tracing. Use it if
127 this simulator do not use tracing categories and resource use have to be
130 --cfg=tracing/uncategorized:1
136 A file with this name will be created to register the simulation. The file
137 is in the Paje format and can be analyzed using Viva or Paje visualization
138 tools. More information can be found in these webpages:
139 <a href="http://github.com/schnorr/viva/">http://github.com/schnorr/viva/</a>
140 <a href="http://github.com/schnorr/pajeng/">http://github.com/schnorr/pajeng/</a>
142 --cfg=tracing/filename:mytracefile.trace
144 If you do not provide this parameter, the trace file will be named simgrid.trace.
149 By default, the tracing system uses all routes in the platform file
150 to re-create a "graph" of the platform and register it in the trace file.
151 This option let the user tell the tracing system to use only the routes
152 that are composed with just one link.
154 --cfg=tracing/onelink_only:1
160 This option only has effect if this simulator is SMPI-based. Traces the MPI
161 interface and generates a trace that can be analyzed using Gantt-like
162 visualizations. Every MPI function (implemented by SMPI) is transformed in a
163 state, and point-to-point communications can be analyzed with arrows.
171 This option only has effect if this simulator is SMPI-based. The processes
172 are grouped by the hosts where they were executed.
174 --cfg=tracing/smpi/group:1
178 tracing/smpi/computing
180 This option only has effect if this simulator is SMPI-based. The parts external
181 to SMPI are also outputted to the trace. Provides better way to analyze the data automatically.
183 --cfg=tracing/smpi/computing:1
187 tracing/smpi/internals
189 This option only has effect if this simulator is SMPI-based. Display internal communications
190 happening during a collective MPI call.
192 --cfg=tracing/smpi/internals:1
196 tracing/smpi/display_sizes
198 This option only has effect if this simulator is SMPI-based. Display the sizes of the messages
199 exchanged in the trace, both in the links and on the states. For collective, size means the global size of data sent by the process in general.
201 --cfg=tracing/smpi/display_sizes:1
207 This option only has effect if this simulator is MSG-based. It traces the
208 behavior of all categorized MSG processes, grouping them by hosts. This option
209 can be used to track process location if this simulator has process migration.
211 --cfg=tracing/msg/process:1
217 This option put some events in a time-ordered buffer using the
218 insertion sort algorithm. The process of acquiring and releasing
219 locks to access this buffer and the cost of the sorting algorithm
220 make this process slow. The simulator performance can be severely
221 impacted if this option is activated, but you are sure to get a trace
222 file with events sorted.
224 --cfg=tracing/buffer:1
230 This option changes the way SimGrid register its platform on the trace
231 file. Normally, the tracing considers all routes (no matter their
232 size) on the platform file to re-create the resource topology. If this
233 option is activated, only the routes with one link are used to
234 register the topology within an AS. Routes among AS continue to be
237 --cfg=tracing/onelink_only:1
241 tracing/disable_destroy
243 Disable the destruction of containers at the end of simulation. This
244 can be used with simulators that have a different notion of time
245 (different from the simulated time).
247 --cfg=tracing/disable_destroy:1
253 Some visualization tools are not able to parse correctly the Paje file format.
254 Use this option if you are using one of these tools to visualize the simulation
255 trace. Keep in mind that the trace might be incomplete, without all the
256 information that would be registered otherwise.
258 --cfg=tracing/basic:1
264 Use this to add a comment line to the top of the trace file.
266 --cfg=tracing/comment:my_string
272 Use this to add the contents of a file to the top of the trace file as comment.
274 --cfg=tracing/comment_file:textual_file.txt
280 This option generates a graph configuration file for Viva considering
281 categorized resource utilization.
283 --cfg=viva/categorized:graph_categorized.plist
289 This option generates a graph configuration file for Viva considering
290 uncategorized resource utilization.
292 --cfg=viva/uncategorized:graph_uncategorized.plist
295 Please pass \verbatim --help-tracing \endverbatim to your simulator
296 for the updated list of tracing options.
298 \section tracing_tracing_example_parameters Case studies
300 Some scenarios that might help you decide which tracing options
301 you should use to analyze your simulator.
303 \li I want to trace the resource utilization of all hosts
304 and links of the platform, and my simulator <b>does not</b> use
305 the tracing API. For that, you can run a uncategorized trace
306 with the following parameters (it will work with <b>any</b> Simgrid
311 --cfg=tracing/uncategorized:1 \
312 --cfg=tracing/filename:mytracefile.trace \
313 --cfg=viva/uncategorized:uncat.plist
316 \li I want to trace only a subset of my MSG (or SimDAG) tasks.
317 For that, you will need to create tracing categories using the
318 <b>TRACE_category (...)</b> function (as explained above),
319 and then classify your tasks to a previously declared category
320 using the <b>MSG_task_set_category (...)</b>
321 (or <b>SD_task_set_category (...)</b> for SimDAG tasks). After
322 recompiling, run your simulator with the following parameters:
326 --cfg=tracing/categorized:1 \
327 --cfg=tracing/filename:mytracefile.trace \
328 --cfg=viva/categorized:cat.plist
332 \section tracing_tracing_example Example of Instrumentation
334 A simplified example using the tracing mandatory functions.
337 int main (int argc, char **argv)
339 MSG_init (&argc, &argv);
341 //(... after deployment ...)
343 //note that category declaration must be called after MSG_create_environment
344 TRACE_category_with_color ("request", "1 0 0");
345 TRACE_category_with_color ("computation", "0.3 1 0.4");
346 TRACE_category ("finalize");
348 msg_task_t req1 = MSG_task_create("1st_request_task", 10, 10, NULL);
349 msg_task_t req2 = MSG_task_create("2nd_request_task", 10, 10, NULL);
350 msg_task_t req3 = MSG_task_create("3rd_request_task", 10, 10, NULL);
351 msg_task_t req4 = MSG_task_create("4th_request_task", 10, 10, NULL);
352 MSG_task_set_category (req1, "request");
353 MSG_task_set_category (req2, "request");
354 MSG_task_set_category (req3, "request");
355 MSG_task_set_category (req4, "request");
357 msg_task_t comp = MSG_task_create ("comp_task", 100, 100, NULL);
358 MSG_task_set_category (comp, "computation");
360 msg_task_t finalize = MSG_task_create ("finalize", 0, 0, NULL);
361 MSG_task_set_category (finalize, "finalize");
370 \section tracing_tracing_analyzing Analyzing SimGrid Simulation Traces
372 A SimGrid-based simulator, when executed with the correct parameters
373 (see above) creates a trace file in the Paje file format holding the
374 simulated behavior of the application or the platform. You have
375 several options to analyze this trace file:
377 - Dump its contents to a CSV-like format using `pj_dump` (see <a
378 href="https://github.com/schnorr/pajeng/wiki/pj_dump">PajeNG's wiki
379 on pj_dump</a> and more generally the <a
380 href="https://github.com/schnorr/pajeng/">PajeNG suite</a>) and use
381 gnuplot to plot resource usage, time spent on blocking/executing
382 functions, and so on. Filtering capabilities are at your hand by
383 doing `grep`, with the best regular expression you can provide, to
384 get only parts of the trace (for instance, only a subset of
385 resources or processes).
387 - Derive statistics from trace metrics (the ones built-in with any
388 SimGrid simulation, but also those metrics you injected in the trace
389 using the TRACE module) using the <a
390 href="http://www.r-project.org/">R project</a> and all its
391 modules. You can also combine R with <a
392 href="http://ggplot2.org/">ggplot2</a> to get a number of high
393 quality plots from your simulation metrics. You need to `pj_dump`
394 the contents of the SimGrid trace file to use R.
396 - Visualize the behavior of your simulation using classic space/time
397 views (gantt-charts) provided by the <a
398 href="https://github.com/schnorr/pajeng/">PajeNG suite</a> and any
399 other tool that supports the <a
400 href="http://paje.sourceforge.net/download/publication/lang-paje.pdf">Paje
401 file format</a>. Consider this option if you need to understand the
402 causality of your distributed simulation.
404 - Visualize the behavior of your simulation with treemaps (specially
405 if your simulation has a platform with several thousand resources),
406 provided by the <a href="http://github.com/schnorr/viva/">Viva</a>
407 visualization tool. See <a
408 href="https://github.com/schnorr/viva/wiki">Viva's wiki</a> for
409 further details on what is a treemap and how to use it.
411 - Correlate the behavior of your simulator with the platform topology
412 with an interactive, force-directed, and hierarchical graph
413 visualization, provided by <a
414 href="http://github.com/schnorr/viva/">Viva</a>. Check <a
415 href="https://github.com/schnorr/viva/wiki">Viva's wiki</a> for
416 further details. This <a
417 href="http://hal.inria.fr/hal-00738321/">research report</a>,
418 published at ISPASS 2013, has a detailed description of this
419 visualization technique.
421 - You can also check our online <a
422 href="http://simgrid.gforge.inria.fr/tutorials.html"> tutorial
423 section</a> that contains a dedicated tutorial with several
424 suggestions on how to use the tracing infrastructure. Look for the
425 SimGrid User::Visualization 101 tutorial.
427 - Ask for help on the <a
428 href="mailto:simgrid-user@lists.gforge.inria.fr">simgrid-user@lists.gforge.inria.fr</a>
429 mailing list, giving us a detailed explanation on what your
430 simulator does and what kind of information you want to trace. You
431 can also check the <a
432 href="http://lists.gforge.inria.fr/pipermail/simgrid-user/">mailing
433 list archive</a> for old messages regarding tracing and analysis.
435 \subsection tracing_viva_analysis Viva Visualization Tool
437 This subsection describe some of the concepts regarding the <a
438 href="http://github.com/schnorr/viva/">Viva Visualization Tool</a> and
439 its relation with SimGrid traces. You should refer to Viva's website
440 for further details on all its visualization techniques.
442 \subsubsection tracing_viva_time_slice Time Slice
444 The analysis of a trace file using the tool always takes into account
445 the concept of the <em>time-slice</em>. This concept means that what
446 is being visualized in the screen is always calculated considering a
447 specific time frame, with its beggining and end timestamp. The
448 time-slice is configured by the user and can be changed dynamically
449 through the window called <em>Time Interval</em> that is opened
450 whenever a trace file is being analyzed. Users are capable to select
451 the beggining and size of the time slice.
453 \subsubsection tracing_viva_graph Hierarchical Graph View
455 As stated above (see section \ref tracing_tracing_analyzing), one
456 possibility to analyze SimGrid traces is to use Viva's graph view with
457 a graph configuration to customize the graph according to the
458 traces. A valid graph configuration (we are using the non-XML <a
459 href="http://en.wikipedia.org/wiki/Property_list">Property List
460 Format</a> to describe the configuration) can be created for any
461 SimGrid-based simulator using the
462 <em>--cfg=viva/uncategorized:graph_uncategorized.plist</em> or
463 <em>--cfg=viva/categorized:graph_categorized.plist</em> (if the
464 simulator defines resource utilization categories) when executing the
467 \subsubsection basic_conf Basic Graph Configuration
469 The basic description of the configuration is as follows:
472 node = (LINK, HOST, );
473 edge = (HOST-LINK, LINK-HOST, LINK-LINK, );
476 The nodes of the graph will be created based on the <i>node</i>
477 parameter, which in this case is the different <em>"HOST"</em>s and
478 <em>"LINK"</em>s of the platform used to simulate. The <i>edge</i>
479 parameter indicates that the edges of the graph will be created based
480 on the <em>"HOST-LINK"</em>s, <em>"LINK-HOST"</em>s, and
481 <em>"LINK-LINK"</em>s of the platform. After the definition of these
482 two parameters, the configuration must detail how the nodes
483 (<em>HOST</em>s and <em>LINK</em>s) should be drawn.
485 For that, the configuration must have an entry for each of
486 the types used. For <em>HOST</em>, as basic configuration, we have:
492 values = (power_used);
496 The parameter <em>size</em> indicates which variable from the trace
497 file will be used to define the size of the node HOST in the
498 visualization. If the simulation was executed with availability
499 traces, the size of the nodes will be changed according to these
500 traces. The parameter <em>type</em> indicates which geometrical shape
501 will be used to represent HOST, and the <em>values</em> parameter
502 indicates which values from the trace will be used to fill the shape.
504 For <em>LINK</em> we have:
510 values = (bandwidth_used);
515 The same configuration parameters are used here: <em>type</em> (with a
516 rhombus), the <em>size</em> (whose value is from trace's bandwidth
517 variable) and the <em>values</em>.
519 \subsubsection custom_graph Customizing the Graph Representation
521 Viva is capable to handle a customized graph representation based on
522 the variables present in the trace file. In the case of SimGrid, every
523 time a category is created for tasks, two variables in the trace file
524 are defined: one to indicate node utilization (how much power was used
525 by that task category), and another to indicate link utilization (how
526 much bandwidth was used by that category). For instance, if the user
527 declares a category named <i>request</i>, there will be variables
528 named <b>p</b><i>request</i> and a <b>b</b><i>request</i> (<b>p</b>
529 for power and <b>b</b> for bandwidth). It is important to notice that
530 the variable <i>prequest</i> in this case is only available for HOST,
531 and <i>brequest</i> is only available for LINK. <b>Example</b>:
532 suppose there are two categories for tasks: request and compute. To
533 create a customized graph representation with a proportional
534 separation of host and link utilization, use as configuration for HOST
541 values = (prequest, pcomputation);
546 values = (brequest, bcomputation);
550 This configuration enables the analysis of resource utilization by MSG
551 tasks through the identification of load-balancing issues and network
552 bottlenecks, for instance.