1 /*! @page introduction Introduction to SimGrid
3 [SimGrid](http://simgrid.gforge.inria.fr/) is a toolkit
4 that provides core functionalities for the simulation of distributed
5 applications in heterogeneous distributed environments.
7 The specific goal of the project is to facilitate research in the area of
8 distributed and parallel application scheduling on distributed computing
9 platforms ranging from simple network of workstations to Computational
13 The goal of this practical session is to illustrate various usage of
14 the MSG interface. To this end we will use the following simple setting:
16 > Assume we have a (possibly large) bunch of (possibly large) data to
17 > process and which originally reside on a server (a.k.a. master). For
18 > sake of simplicity, we assume all input file require the same amount
19 > of computation. We assume the server can be helped by a (possibly
20 > large) set of worker machines. What is the best way to organize the
23 Although this looks like a very simple setting it raises several
24 interesting questions:
26 - Which algorithm should the master use to send workload?
28 The most obvious algorithm would be to send tasks to workers in a
29 round-robin fashion. This is the initial code we provide you.
31 A less obvious one but probably more efficient would be to set up
32 a request mechanism where client first ask for tasks, which allows
33 the server to decide which request to answer and possibly to send
34 the tasks to the fastest machines. Maybe you can think of a
37 - How much tasks should the client ask for?
39 Indeed, if we set up a request mechanism and that workers only
40 send request whenever they have no more task to process, they are
41 likely to be poorly exploited since they will have to wait for the
42 master to consider their request and for the input data to be
43 transferred. A client should thus probably request a pool of tasks
44 but if it requests too much task, it is likely to lead to a poor
47 - How is the quality of such algorithm dependent on the platform
48 characteristics? on the task characteristics?
50 Whenever the input communication time is very small compared to
51 processing time and workers are homogeneous, it is likely that the
52 round-robin algorithm performs very well. Would it still hold true
53 when transfer time is not negligible and the platform is, say,
54 a volunteer computing system ?
56 - The network topology interconnecting the master and the workers
57 may be quite complicated. How does such topology impact the
60 When data transfers are the bottleneck, it is likely that a good
61 modeling of the platform becomes essential, in which case, you may
62 want to be able to account for complex platform topologies.
64 - Do the algorithms depend on a perfect knowledge of this
67 Should we still use a flat master worker deployment or should we
70 - How is such algorithm sensitive to external workload variation?
72 What if bandwidth, latency and power can vary with no warning?
73 Shouldn't you study whether your algorithm is sensitive to such
76 - Although an algorithm may be more efficient than another, how
77 does it interfere with other applications?
79 As you can see, this very simple setting may need to evolve way
80 beyond what you initially imagined.
82 <blockquote> Premature optimization is the root of all evil. -- D.E.Knuth</blockquote>
84 Furthermore, writing your own simulator is much harder that what you
85 may imagine. This is why should rely on an established and flexible
88 The following figure is a screenshot of [triva][fn:1] visualizing a [SimGrid
89 simulation][fn:2] of two master worker applications (one in light gray and
90 the other in dark gray) running in concurrence and showing resource
91 usage over a long period of time.
93 ![Test](./sc3-description.png)
99 A lot of information on how to install and use Simgrid are
100 available on the [online documentation][fn:4] and in the tutorials:
102 - http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf
103 - http://simgrid.gforge.inria.fr/tutorials/simgrid-tracing-101.pdf
104 - http://simgrid.gforge.inria.fr/tutorials/simgrid-platf-101.pdf
106 ## Installing SimGrid
108 sudo apt-get install simgrid
110 This tutorial requires simgrid 3.8 at last so you may need to get
111 the [debian package](http://packages.debian.org/unstable/main/simgrid). Here is a shortcut:
113 - AMD64: http://ftp.de.debian.org/debian/pool/main/s/simgrid/simgrid_3.8.1-2_amd64.deb
114 - i386: http://ftp.de.debian.org/debian/pool/main/s/simgrid/simgrid_3.8.1-2_i386.deb
119 sudo dpkg -i simgrid_3.8*.deb
126 This [software][fn:1] will be useful to make fancy graph or treemap
127 visualizations and get a better understanding of simulations. You
128 will first need to install pajeng:
131 sudo apt-get install git cmake build-essential libqt4-dev libboost-dev freeglut3-dev ;
132 git clone https://github.com/schnorr/pajeng.git
133 cd pajeng && mkdir -p build && cd build && cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME && make -j install
137 Then you can install viva.
140 sudo apt-get install libboost-dev libconfig++-dev libconfig8-dev libgtk2.0-dev freeglut3-dev
141 git clone https://github.com/schnorr/viva.git
142 cd viva && mkdir -p build_graph && cd build_graph && cmake ../ -DTUPI_LIBRARY=ON -DVIVA=ON -DCMAKE_INSTALL_PREFIX=$HOME && make -j install
148 This [software][fn:5] provides a Gantt-chart visualization.
151 sudo apt-get install paje.app
156 This software provides a [Gantt-chart visualization][fn:6].
159 sudo apt-get install vite
163 ## Setting up and Compiling.
165 The corresponding archive with all source files and platform files
166 can be obtained [here](http://simgrid.gforge.inria.fr/tutorials/msg-tuto/msg-tuto.tgz).
174 As you can see, there is already a nice Makefile that compiles
175 everything for you. Now the tiny example has been compiled and it
176 can be easily run as follows:
179 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1
182 If you create a single self-content C-file named foo.c, the
183 corresponding program will be simply compiled and linked with
190 For a more "fancy" output, you can try:
193 ./masterworker0 platforms/platform.xml deployment0.xml 2>&1 | simgrid-colorizer
196 For a really fancy output, you should use [viva/triva][fn:1]:
199 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:1 \
200 --cfg=tracing/uncategorized:1 --cfg=viva/uncategorized:uncat.plist
201 LANG=C ; viva simgrid.trace uncat.plist
204 For a more classical Gantt-Chart visualization, you can produce a
208 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:1 \
209 --cfg=tracing/msg/process:1
210 LANG=C ; Paje simgrid.trace
213 Alternatively, you can use [vite][fn:6].
216 ./masterworker0 platforms/platform.xml deployment0.xml --cfg=tracing:1 \
217 --cfg=tracing/msg/process:1 --cfg=tracing/basic:1
221 ## Getting Rid of Workers in the Deployment File
223 In the previous example, the deployment file `deployment0.xml`
224 is tightly connected to the platform file `platform.xml` and a
225 worker process is launched on each host:
228 <?xml version='1.0'?>
229 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
230 <platform version="3">
231 <!-- The master process (with some arguments) -->
232 <process host="Tremblay" function="master">
233 <argument value="20"/> <!-- Number of tasks -->
234 <argument value="50000000"/> <!-- Computation size of tasks -->
235 <argument value="1000000"/> <!-- Communication size of tasks -->
236 <argument value="Jupiter"/> <!-- First worker -->
237 <argument value="Fafard"/> <!-- Second worker -->
238 <argument value="Ginette"/> <!-- Third worker -->
239 <argument value="Bourassa"/> <!-- Last worker -->
240 <argument value="Tremblay"/> <!-- Me! I can work too! -->
242 <!-- The worker process (with no argument) -->
243 <process host="Tremblay" function="worker" on_failure="RESTART"/>
244 <process host="Jupiter" function="worker" on_failure="RESTART"/>
245 <process host="Fafard" function="worker" on_failure="RESTART"/>
246 <process host="Ginette" function="worker" on_failure="RESTART"/>
247 <process host="Bourassa" function="worker" on_failure="RESTART"/>
251 This is ok as the platform is rather small but will be painful when
252 using larger platforms. Instead, modify the simulator
253 `masterworker0.c` into `masterworker1.c` so that the master
254 launches a worker process on all the other machines at startup. The
255 new deployment file `deployment1.xml` should thus now simply be:
258 <?xml version='1.0'?>
259 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
260 <platform version="3">
261 <!-- The master process (with some arguments) -->
262 <process host="Tremblay" function="master">
263 <argument value="20"/> <!-- Number of tasks -->
264 <argument value="50000000"/> <!-- Computation size of tasks -->
265 <argument value="1000000"/> <!-- Communication size of tasks -->
270 To this end you may need the following MSG functions, whose
271 behavior is described in the [online documentation](http://simgrid.gforge.inria.fr/simgrid/3.8.1/ref_guide/html/index.html) (hint: use the
272 search field to access directly the function you are looking for):
275 int MSG_get_host_number (void)
276 xbt_dynar_t MSG_hosts_as_dynar(void);
277 void * xbt_dynar_to_array (xbt_dynar_t dynar);
278 msg_process_t MSG_process_create(const char *name, xbt_main_func_t code,
279 void *data, msg_host_t host);
282 Note that it may avoid bugs later to avoid launching a worker on
283 the master host so you probably want to remove it from the host
286 The `data` field of the `MSG_process_create` can be used to pass
287 a channel name that will be private between master
288 and workers (e.g., `master_name:worker_name`). Adding the
289 `master_name` in the channel name will allow to easily have several
290 masters and a worker per master on each machine. To this end, you
291 may need to use the following functions:
294 msg_host_t MSG_host_self(void);
295 const char * MSG_host_get_name(msg_host_t host);
296 msg_process_t MSG_process_self(void);
297 void * MSG_process_get_data(msg_process_t process);
300 Again, you should check the [online documentation](http://simgrid.gforge.inria.fr/simgrid/3.8.1/ref_guide/html/index.html)
301 for more information. If you are not too much familiar with string
302 manipulation in C, you may want to use the following functions
305 char *strcpy(char *dest, const char *src);
306 char *strcat(char *dest, const char *src);
309 ## Setting up a Time Limit Mechanism
311 In the current version, the number of tasks is defined in the
312 worker arguments. Hence, tasks are created at the very beginning of
313 the simulation. Instead, create tasks as needed and provide a time
314 limit indicating when it stops sending tasks. To this end, you will
315 obviously need to know what time it is[fn:7]:
318 double MSG_get_clock(void);
321 Otherwise, a quite effective way of terminating the simulation
322 would be to use some of the following function[fn:7]:
325 void MSG_process_kill(msg_process_t process);
326 int MSG_process_killall(int reset_PIDs);
329 Anyway, the new deployment `deployment2.xml` file should thus look
333 <?xml version='1.0'?>
334 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
335 <platform version="3">
336 <process host="Tremblay" function="master">
337 <argument value="3600"/> <!-- Simulation timeout -->
338 <argument value="50000000"/> <!-- Computation size of tasks -->
339 <argument value="1000000"/> <!-- Communication size of tasks -->
344 It may also be a good idea to transform most of the `XBT_INFO` into
345 `XBT_DEBUG` (e.g., keep the information on the total number of
346 tasks processed). These debug messages can be activated as follows:
349 ./masterworker2 platforms/platform.xml deployment2.xml --log=msg_test.thres:debug
352 ## Using the Tracing Mechanism
354 SimGrid can trace all resource consumption and the outcome can be
355 displayed with viva as illustrated [[*Setting%20up%20and%20Compiling.][here]]. However, when several
356 masters are deployed, it is hard to understand what happens.
359 <?xml version='1.0'?>
360 <!DOCTYPE platform SYSTEM "http://simgrid.gforge.inria.fr/simgrid.dtd">
361 <platform version="3">
362 <process host="Tremblay" function="master">
363 <argument value="3600"/> <!-- Simulation timeout -->
364 <argument value="50000000"/> <!-- Computation size of tasks -->
365 <argument value="10"/> <!-- Communication size of tasks -->
367 <process host="Fafard" function="master">
368 <argument value="3600"/> <!-- Simulation timeout -->
369 <argument value="50000000"/> <!-- Computation size of tasks -->
370 <argument value="10"/> <!-- Communication size of tasks -->
372 <process host="Jupiter" function="master">
373 <argument value="3600"/> <!-- Simulation timeout -->
374 <argument value="50000000"/> <!-- Computation size of tasks -->
375 <argument value="10"/> <!-- Communication size of tasks -->
380 So let's use categories to track more precisely who does what and
384 void TRACE_category(const char *category);
385 void MSG_task_set_category (msg_task_t task, const char *category);
388 The outcome can then be visualized as follows:
391 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:1\
392 --cfg=tracing/categorized:1 --cfg=viva/categorized:viva_cat.plist
393 LANG=C; viva simgrid.trace viva_cat.plist
396 Right now, you should realize that nothing is behaving like you
397 expect. Most workers are idle even though input data are ridiculous
398 and there are several masters deployed on the platform. Using a
399 Gantt-chart visualization may help:
402 ./masterworker3 platforms/platform.xml deployment3.xml --cfg=tracing:1 \
403 --cfg=tracing/msg/process:1
404 LANG=C; Paje simgrid.trace
407 OK, so it should now be obvious that round robin is actually
410 ## Improving the Scheduling
412 Instead of a round-robin scheduling, let's implement a first-come
413 first-served mechanism. To this end, workers need to send a tiny
414 request first. A possible way to implement such a request with MSG
415 is to send on a specific channel (e.g., the name of the master
416 name) a task with payload 0 and whose attached data is the worker
417 name. This way, the master can keep track of which workers are idle
420 To know whether it has pending requests, the master can use the
421 following function[fn:7]:
424 int MSG_task_listen(const char *alias);
427 If so, it should get the request and push the corresponding host
428 into a dynar so that they can later be retrieved when sending a
432 xbt_dynar_t xbt_dynar_new(const unsigned long elm_size,
433 void_f_pvoid_t const free_f);
434 void xbt_dynar_push(xbt_dynar_t const dynar, const void *src);
435 void xbt_dynar_shift(xbt_dynar_t const dynar, void *const dst);
436 unsigned long xbt_dynar_length(const xbt_dynar_t dynar);
439 As you will soon realize, with such simple mechanisms, simple
440 deadlocks will soon appear. They can easily be removed with a
441 simple polling mechanism, hence the need for the following
445 msg_error_t MSG_process_sleep(double nb_sec);
448 As you should quickly realize, on the simple previous example, it
449 will double the throughput of the platform but will be quite
450 ineffective when input size of the tasks is not negligible anymore.
452 From this, many things can easily be added. For example, you could:
453 - add a performance measurement mechanism;
454 - enable the master to make smart scheduling choices using
455 measurement information;
456 - allow workers to have several pending requests so as to overlap
457 communication and computations as much as possible;
460 ## Using More Elaborate Platforms
462 SimGrid offers a rather powerful platform modeling mechanism. The
463 `src/platform/` repository comprises a variety of platform ranging
464 from simple ones to quite elaborated ones. Associated to a good
465 visualization tool to ensure your simulation is meaningful, they
466 can allow you to study to which extent your algorithm scales...
468 What is the largest number of tasks requiring 50e6 flops and 1e5
469 bytes that you manage to distribute and process in one hour on
470 `g5k.xml` (you should use `deployment_general.xml`)?
472 # Points to improve for the next time
474 - Propose equivalent exercises and skeleton in java.
475 - Propose a virtualbox image with everything (simgrid, paje, viva,
477 - Ease the installation on mac OS X (binary installer) and
479 - Explain that programming in C or java and having a working
480 development environment is a prerequisite.
482 [fn:1]: http://triva.gforge.inria.fr/index.html
483 [fn:2]: http://hal.inria.fr/inria-00529569
484 [fn:3]: http://hal.inria.fr/hal-00738321
485 [fn:4]: http://simgrid.gforge.inria.fr/documentation.html
486 [fn:5]: http://paje.sourceforge.net/
487 [fn:6]: http://vite.gforge.inria.fr/
488 [fn:7]: http://simgrid.gforge.inria.fr/simgrid/3.8.1/ref_guide/html/index.html