docs/source/tuto_disk/analysis.irst

   1 .. This file has "irst" as an extension to ensure that it's not parsed by sphinx as is. Instead, it's included in another file that is parsed.
   2 .. _howto_disk:
   3
   4 I/O calibration
   5 ***************
   6
   7 Introduction
   8 ============
   9
  10 This tutorial presents how to perform faithful IO experiments in
  11 SimGrid. It is based on the paper "Adding Storage Simulation
  12 Capacities to the SimGridToolkit: Concepts, Models, and API".
  13
  14 The paper presents a series of experiments to analyze the performance
  15 of IO operations (read/write) on different kinds of disks (SATA, SAS,
  16 SSD). In this tutorial, we present a detailed example of how to
  17 extract experimental data to simulate: i) performance degradation
  18 with concurrent operations (Fig. 8 in the paper) and ii) variability
  19 in IO operations (Fig. 5 to 7).
  20
  21 - Link for paper: `https://hal.inria.fr/hal-01197128 <https://hal.inria.fr/hal-01197128>`_
  22
  23 - Link for data: `https://figshare.com/articles/dataset/Companion_of_the_SimGrid_storage_modeling_article/1175156 <https://figshare.com/articles/dataset/Companion_of_the_SimGrid_storage_modeling_article/1175156>`_
  24
  25 **Disclaimer**:
  26
  27 - The purpose of this document is to illustrate how we can
  28   extract data from experiments and inject on SimGrid. However, the
  29   data shown on this page may **not** reflect the reality.
  30
  31 - You must run similar experiments on your hardware to get realistic
  32   data for your context.
  33
  34 - SimGrid has been in active development since the paper release in
  35   2015, thus the XML description used in the paper may have evolved
  36   while MSG was superseeded by S4U since then.
  37
  38 Running this tutorial
  39 ---------------------
  40
  41 A Dockerfile is available in ``docs/source/tuto_disk``. It allows you to
  42 re-run this tutorial. For that, build the image and run the container:
  43
  44 - ``docker build -t tuto_disk .``
  45
  46 - ``docker run -it tuto_disk``
  47
  48 Analyzing the experimental data
  49 ===============================
  50
  51 We start by analyzing and extracting the real data available.
  52
  53 Scripts
  54 -------
  55
  56 We use a special method to create non-uniform histograms to represent
  57 the noise in IO operations.
  58
  59 Unable to install the library properly, I copied the important methods
  60 here.
  61
  62 Copied from: `https://rdrr.io/github/dlebauer/pecan-priors/src/R/plots.R <https://rdrr.io/github/dlebauer/pecan-priors/src/R/plots.R>`_
  63
  64 Data preparation
  65 ----------------
  66
  67 Some initial configurations/list of packages.
  68
  69 .. code:: R
  70
  71     library(jsonlite)
  72     library(ggplot2)
  73     library(plyr)
  74     library(dplyr)
  75     library(gridExtra)
  76
  77     IO_INFO = list()
  78
  79 ::
  80
  81
  82     Use suppressPackageStartupMessages() to eliminate package startup
  83     messages.
  84
  85     Attaching package: 'dplyr'
  86
  87     The following objects are masked from 'package:plyr':
  88
  89         arrange, count, desc, failwith, id, mutate, rename, summarise,
  90         summarize
  91
  92     The following objects are masked from 'package:stats':
  93
  94         filter, lag
  95
  96     The following objects are masked from 'package:base':
  97
  98         intersect, setdiff, setequal, union
  99
 100     Attaching package: 'gridExtra'
 101
 102     The following object is masked from 'package:dplyr':
 103
 104         combine
 105
 106 This was copied from the ``sg_storage_ccgrid15.org`` available at the
 107 figshare of the paper. Before executing this code, please download and
 108 decompress the appropriate file.
 109
 110 .. code:: sh
 111
 112     curl -O -J -L "https://ndownloader.figshare.com/files/1928095"
 113     tar xfz bench.tgz
 114
 115 Preparing data for varialiby analysis.
 116
 117 .. code:: R
 118
 119
 120     clean_up <- function (df, infra){
 121     names(df) <- c("Hostname","Date","DirectIO","IOengine","IOscheduler","Error","Operation","Jobs","BufferSize","FileSize","Runtime","Bandwidth","BandwidthMin","BandwidthMax","Latency", "LatencyMin", "LatencyMax","IOPS")
 122     df=subset(df,Error=="0")
 123     df=subset(df,DirectIO=="1")
 124     df <- merge(df,infra,by="Hostname")
 125     df$Hostname = sapply(strsplit(df$Hostname, "[.]"), "[", 1)
 126     df$HostModel = paste(df$Hostname, df$Model, sep=" - ")
 127     df$Duration = df$Runtime/1000 # fio outputs runtime in msec, we want to display seconds
 128     df$Size = df$FileSize/1024/1024
 129     df=subset(df,Duration!=0.000)
 130     df$Bwi=df$Duration/df$Size
 131     df[df$Operation=="read",]$Operation<- "Read"
 132     df[df$Operation=="write",]$Operation<- "Write"
 133     return(df)
 134     }
 135
 136     grenoble <- read.csv('./bench/grenoble.csv', header=FALSE,sep = ";",
 137     stringsAsFactors=FALSE)
 138     luxembourg <- read.csv('./bench/luxembourg.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 139     nancy <- read.csv('./bench/nancy.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 140     all <- rbind(grenoble,nancy, luxembourg)
 141     infra <- read.csv('./bench/infra.csv', header=FALSE,sep = ";",  stringsAsFactors=FALSE)
 142     names(infra) <- c("Hostname","Model","DiskSize")
 143
 144     all = clean_up(all, infra)
 145     griffon = subset(all,grepl("^griffon", Hostname))
 146     griffon$Cluster <-"Griffon (SATA II)"
 147     edel = subset(all,grepl("^edel", Hostname))
 148     edel$Cluster<-"Edel (SSD)"
 149
 150     df = rbind(griffon[griffon$Jobs=="1" & griffon$IOscheduler=="cfq",],
 151                edel[edel$Jobs=="1" & edel$IOscheduler=="cfq",])
 152     #Get rid off of 64 Gb disks of Edel as they behave differently (used to be "edel-51")
 153     df = df[!(grepl("^Edel",df$Cluster) & df$DiskSize=="64 GB"),]
 154
 155 Preparing data for concurrent analysis.
 156
 157 .. code:: R
 158
 159     dfc = rbind(griffon[griffon$Jobs>1 & griffon$IOscheduler=="cfq",],
 160                edel[edel$Jobs>1 & edel$IOscheduler=="cfq",])
 161     dfc2 = rbind(griffon[griffon$Jobs==1 & griffon$IOscheduler=="cfq",],
 162                edel[edel$Jobs==1 & edel$IOscheduler=="cfq",])
 163     dfc = rbind(dfc,dfc2[sample(nrow(dfc2),size=200),])
 164
 165     dd <- data.frame(
 166           Hostname="??",
 167           Date = NA, #tmpl$Date,
 168           DirectIO = NA,
 169           IOengine = NA,
 170           IOscheduler = NA,
 171           Error = 0,
 172           Operation = NA, #tmpl$Operation,
 173           Jobs = NA, # #d$nb.of.concurrent.access,
 174           BufferSize = NA, #d$bs,
 175           FileSize = NA, #d$size,
 176           Runtime = NA,
 177           Bandwidth = NA,
 178           BandwidthMin = NA,
 179           BandwidthMax = NA,
 180           Latency = NA,
 181           LatencyMin = NA,
 182           LatencyMax = NA,
 183           IOPS = NA,
 184           Model = NA, #tmpl$Model,
 185           DiskSize = NA, #tmpl$DiskSize,
 186           HostModel = NA,
 187           Duration = NA, #d$time,
 188           Size = NA,
 189           Bwi = NA,
 190           Cluster = NA) #tmpl$Cluster)
 191
 192     dd$Size = dd$FileSize/1024/1024
 193     dd$Bwi = dd$Duration/dd$Size
 194
 195     dfc = rbind(dfc, dd)
 196     # Let's get rid of small files!
 197     dfc = subset(dfc,Size >= 10)
 198     # Let's get rid of 64Gb edel disks
 199     dfc = dfc[!(grepl("^Edel",dfc$Cluster) & dfc$DiskSize=="64 GB"),]
 200
 201     dfc$TotalSize=dfc$Size * dfc$Jobs
 202     dfc$BW = (dfc$TotalSize) / dfc$Duration
 203     dfc = dfc[dfc$BW>=20,] # get rid of one point that is typically an outlier and does not make sense
 204
 205     dfc$method="lm"
 206     dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Read",]$method="loess"
 207
 208     dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method="lm"
 209     dfc[dfc$Cluster=="Edel (SSD)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
 210
 211     dfc[dfc$Cluster=="Griffon (SATA II)" & dfc$Operation=="Write",]$method="lm"
 212     dfc[dfc$Cluster=="Griffon (SATA II)"  & dfc$Operation=="Write" & dfc$Jobs ==1,]$method=""
 213
 214     dfd = dfc[dfc$Operation=="Write" & dfc$Jobs ==1 &
 215               (dfc$Cluster %in% c("Griffon (SATA II)", "Edel (SSD)")),]
 216     dfd = ddply(dfd,c("Cluster","Operation","Jobs","DiskSize"), summarize,
 217                 mean = mean(BW), num = length(BW), sd = sd(BW))
 218     dfd$BW=dfd$mean
 219     dfd$ci = 2*dfd$sd/sqrt(dfd$num)
 220
 221     dfrange=ddply(dfc,c("Cluster","Operation","DiskSize"), summarize,
 222                 max = max(BW))
 223     dfrange=ddply(dfrange,c("Cluster","DiskSize"), mutate,
 224                 BW = max(max))
 225     dfrange$Jobs=16
 226
 227 Griffon (SATA)
 228 --------------
 229
 230 Modeling resource sharing w/ concurrent access
 231 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 232
 233 This figure presents the overall performance of IO operation with
 234 concurrent access to the disk. Note that the image is different
 235 from the one in the paper. Probably, we need to further clean the
 236 available data to obtain exaclty the same results.
 237
 238 .. code:: R
 239
 240     ggplot(data=dfc,aes(x=Jobs,y=BW, color=Operation)) + theme_bw() +
 241       geom_point(alpha=.3) +
 242       geom_point(data=dfrange, size=0) +
 243       facet_wrap(Cluster~Operation,ncol=2,scale="free_y")+ # ) + #
 244       geom_smooth(data=dfc[dfc$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
 245       geom_smooth(data=dfc[dfc$method=="lm",], color="black", method=lm,se=TRUE) +
 246       geom_point(data=dfd, aes(x=Jobs,y=BW),color="black",shape=21,fill="white") +
 247       geom_errorbar(data=dfd, aes(x=Jobs, ymin=BW-ci, ymax=BW+ci),color="black",width=.6) +
 248       xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)")  + guides(color=FALSE)  + xlim(0,NA) + ylim(0,NA)
 249
 250 .. image:: tuto_disk/fig/griffon_deg.png
 251
 252 Read
 253 """"
 254
 255 Getting read data for Griffon from 1 to 15 concurrent reads.
 256
 257 .. code:: R
 258
 259     deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read")
 260     model = lm(BW~Jobs, data = deg_griffon)
 261     IO_INFO[["griffon"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
 262
 263     toJSON(IO_INFO, pretty = TRUE)
 264
 265 ::
 266
 267
 268     {
 269       "griffon": {
 270         "degradation": {
 271           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575]
 272         }
 273       }
 274     }
 275
 276 Write
 277 """""
 278
 279 Same for write operations.
 280
 281 .. code:: R
 282
 283     deg_griffon = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
 284     mean_job_1 = dfc %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
 285     model = lm(BW~Jobs, data = deg_griffon)
 286     IO_INFO[["griffon"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
 287     toJSON(IO_INFO, pretty = TRUE)
 288
 289 ::
 290
 291
 292     {
 293       "griffon": {
 294         "degradation": {
 295           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 296           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 297         }
 298       }
 299     }
 300
 301 Modeling read/write bandwidth variability
 302 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 303
 304 Fig.5 in the paper presents the noise in the read/write operations in
 305 the Griffon SATA disk.
 306
 307 The paper uses regular histogram to illustrate the distribution of the
 308 effective bandwidth. However, in this tutorial, we use dhist
 309 (`https://rdrr.io/github/dlebauer/pecan-priors/man/dhist.html <https://rdrr.io/github/dlebauer/pecan-priors/man/dhist.html>`_) to have a
 310 more precise information over the highly dense areas around the mean.
 311
 312 Read
 313 """"
 314
 315 First, we present the histogram for read operations.
 316
 317 .. code:: R
 318
 319     griffon_read = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
 320     dhist(1/griffon_read$Bwi)
 321
 322 .. image:: tuto_disk/fig/griffon_read_dhist.png
 323
 324 Saving it to be exported in json format.
 325
 326 .. code:: R
 327
 328     griffon_read_dhist = dhist(1/griffon_read$Bwi, plot=FALSE)
 329     IO_INFO[["griffon"]][["noise"]][["read"]] = c(breaks=list(griffon_read_dhist$xbr), heights=list(unclass(griffon_read_dhist$heights)))
 330     IO_INFO[["griffon"]][["read_bw"]] = mean(1/griffon_read$Bwi)
 331     toJSON(IO_INFO, pretty = TRUE)
 332
 333 ::
 334
 335     Warning message:
 336     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 337       argument 'probability' is not made use of
 338
 339     {
 340       "griffon": {
 341         "degradation": {
 342           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 343           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 344         },
 345         "noise": {
 346           "read": {
 347         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 348         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 349           }
 350         },
 351         "read_bw": [68.5425]
 352       }
 353     }
 354
 355 Write
 356 """""
 357
 358 Same analysis for write operations.
 359
 360 .. code:: R
 361
 362     griffon_write = df %>% filter(grepl("^Griffon", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
 363     dhist(1/griffon_write$Bwi)
 364
 365 .. image:: tuto_disk/fig/griffon_write_dhist.png
 366
 367 .. code:: R
 368
 369     griffon_write_dhist = dhist(1/griffon_write$Bwi, plot=FALSE)
 370     IO_INFO[["griffon"]][["noise"]][["write"]] = c(breaks=list(griffon_write_dhist$xbr), heights=list(unclass(griffon_write_dhist$heights)))
 371     IO_INFO[["griffon"]][["write_bw"]] = mean(1/griffon_write$Bwi)
 372     toJSON(IO_INFO, pretty = TRUE)
 373
 374 ::
 375
 376     Warning message:
 377     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 378       argument 'probability' is not made use of
 379
 380     {
 381       "griffon": {
 382         "degradation": {
 383           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 384           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 385         },
 386         "noise": {
 387           "read": {
 388         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 389         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 390           },
 391           "write": {
 392         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 393         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 394           }
 395         },
 396         "read_bw": [68.5425],
 397         "write_bw": [50.6045]
 398       }
 399     }
 400
 401 Edel (SSD)
 402 ----------
 403
 404 This section presents the exactly same analysis for the Edel SSDs.
 405
 406 Modeling resource sharing w/ concurrent access
 407 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 408
 409 Read
 410 """"
 411
 412 Getting read data for Edel from 1 to 15 concurrent operations.
 413
 414 .. code:: R
 415
 416     deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read")
 417     model = loess(BW~Jobs, data = deg_edel)
 418     IO_INFO[["edel"]][["degradation"]][["read"]] = predict(model,data.frame(Jobs=seq(1,15)))
 419     toJSON(IO_INFO, pretty = TRUE)
 420
 421 ::
 422
 423
 424     {
 425       "griffon": {
 426         "degradation": {
 427           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 428           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 429         },
 430         "noise": {
 431           "read": {
 432         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 433         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 434           },
 435           "write": {
 436         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 437         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 438           }
 439         },
 440         "read_bw": [68.5425],
 441         "write_bw": [50.6045]
 442       },
 443       "edel": {
 444         "degradation": {
 445           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515]
 446         }
 447       }
 448     }
 449
 450 Write
 451 """""
 452
 453 Same for write operations.
 454
 455 .. code:: R
 456
 457     deg_edel = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs > 2)
 458     mean_job_1 = dfc %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% filter(Jobs == 1) %>% summarize(mean = mean(BW))
 459     model = lm(BW~Jobs, data = deg_edel)
 460     IO_INFO[["edel"]][["degradation"]][["write"]] = c(mean_job_1$mean, predict(model,data.frame(Jobs=seq(2,15))))
 461     toJSON(IO_INFO, pretty = TRUE)
 462
 463 ::
 464
 465
 466     {
 467       "griffon": {
 468         "degradation": {
 469           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 470           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 471         },
 472         "noise": {
 473           "read": {
 474         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 475         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 476           },
 477           "write": {
 478         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 479         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 480           }
 481         },
 482         "read_bw": [68.5425],
 483         "write_bw": [50.6045]
 484       },
 485       "edel": {
 486         "degradation": {
 487           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515],
 488           "write": [132.2771, 170.174, 170.137, 170.1, 170.063, 170.026, 169.9889, 169.9519, 169.9149, 169.8779, 169.8408, 169.8038, 169.7668, 169.7298, 169.6927]
 489         }
 490       }
 491     }
 492
 493 Modeling read/write bandwidth variability
 494 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 495
 496 Read
 497 """"
 498
 499 .. code:: R
 500
 501     edel_read = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Read") %>% select(Bwi)
 502     dhist(1/edel_read$Bwi)
 503
 504 .. image:: tuto_disk/fig/edel_read_dhist.png
 505
 506 Saving it to be exported in json format.
 507
 508 .. code:: R
 509
 510     edel_read_dhist = dhist(1/edel_read$Bwi, plot=FALSE)
 511     IO_INFO[["edel"]][["noise"]][["read"]] = c(breaks=list(edel_read_dhist$xbr), heights=list(unclass(edel_read_dhist$heights)))
 512     IO_INFO[["edel"]][["read_bw"]] = mean(1/edel_read$Bwi)
 513     toJSON(IO_INFO, pretty = TRUE)
 514
 515 ::
 516
 517     Warning message:
 518     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 519       argument 'probability' is not made use of
 520
 521     {
 522       "griffon": {
 523         "degradation": {
 524           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 525           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 526         },
 527         "noise": {
 528           "read": {
 529         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 530         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 531           },
 532           "write": {
 533         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 534         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 535           }
 536         },
 537         "read_bw": [68.5425],
 538         "write_bw": [50.6045]
 539       },
 540       "edel": {
 541         "degradation": {
 542           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515],
 543           "write": [132.2771, 170.174, 170.137, 170.1, 170.063, 170.026, 169.9889, 169.9519, 169.9149, 169.8779, 169.8408, 169.8038, 169.7668, 169.7298, 169.6927]
 544         },
 545         "noise": {
 546           "read": {
 547         "breaks": [104.1667, 112.3335, 120.5003, 128.6671, 136.8222, 144.8831, 149.6239, 151.2937, 154.0445, 156.3837, 162.3555, 170.3105, 178.3243],
 548         "heights": [0.1224, 0.1224, 0.1224, 0.2452, 1.2406, 61.6128, 331.2201, 167.6488, 212.1086, 31.3996, 2.3884, 1.747]
 549           }
 550         },
 551         "read_bw": [152.7139]
 552       }
 553     }
 554
 555 Write
 556 """""
 557
 558 .. code:: R
 559
 560
 561     edel_write = df %>% filter(grepl("^Edel", Cluster)) %>% filter(Operation == "Write") %>% select(Bwi)
 562     dhist(1/edel_write$Bwi)
 563
 564 .. image:: tuto_disk/fig/edel_write_dhist.png
 565
 566 Saving it to be exported later.
 567
 568 .. code:: R
 569
 570     edel_write_dhist = dhist(1/edel_write$Bwi, plot=FALSE)
 571     IO_INFO[["edel"]][["noise"]][["write"]] = c(breaks=list(edel_write_dhist$xbr), heights=list(unclass(edel_write_dhist$heights)))
 572     IO_INFO[["edel"]][["write_bw"]] = mean(1/edel_write$Bwi)
 573     toJSON(IO_INFO, pretty = TRUE)
 574
 575 ::
 576
 577     Warning message:
 578     In hist.default(x, breaks = cut.pt, plot = FALSE, probability = TRUE) :
 579       argument 'probability' is not made use of
 580
 581     {
 582       "griffon": {
 583         "degradation": {
 584           "read": [66.6308, 64.9327, 63.2346, 61.5365, 59.8384, 58.1403, 56.4423, 54.7442, 53.0461, 51.348, 49.6499, 47.9518, 46.2537, 44.5556, 42.8575],
 585           "write": [49.4576, 26.5981, 27.7486, 28.8991, 30.0495, 31.2, 32.3505, 33.501, 34.6515, 35.8019, 36.9524, 38.1029, 39.2534, 40.4038, 41.5543]
 586         },
 587         "noise": {
 588           "read": {
 589         "breaks": [39.257, 51.3413, 60.2069, 66.8815, 71.315, 74.2973, 80.8883, 95.1944, 109.6767, 125.0231, 140.3519, 155.6807, 171.0094, 186.25],
 590         "heights": [15.3091, 41.4578, 73.6826, 139.5982, 235.125, 75.3357, 4.1241, 3.3834, 0, 0.0652, 0.0652, 0.0652, 0.3937]
 591           },
 592           "write": {
 593         "breaks": [5.2604, 21.0831, 31.4773, 39.7107, 45.5157, 50.6755, 54.4726, 59.7212, 67.8983, 81.2193, 95.6333, 111.5864, 127.8409, 144.3015],
 594         "heights": [1.7064, 22.6168, 38.613, 70.8008, 84.4486, 128.5118, 82.3692, 39.1431, 9.2256, 5.6195, 1.379, 0.6429, 0.1549]
 595           }
 596         },
 597         "read_bw": [68.5425],
 598         "write_bw": [50.6045]
 599       },
 600       "edel": {
 601         "degradation": {
 602           "read": [150.5119, 167.4377, 182.2945, 195.1004, 205.8671, 214.1301, 220.411, 224.6343, 227.7141, 230.6843, 233.0923, 235.2027, 236.8369, 238.0249, 238.7515],
 603           "write": [132.2771, 170.174, 170.137, 170.1, 170.063, 170.026, 169.9889, 169.9519, 169.9149, 169.8779, 169.8408, 169.8038, 169.7668, 169.7298, 169.6927]
 604         },
 605         "noise": {
 606           "read": {
 607         "breaks": [104.1667, 112.3335, 120.5003, 128.6671, 136.8222, 144.8831, 149.6239, 151.2937, 154.0445, 156.3837, 162.3555, 170.3105, 178.3243],
 608         "heights": [0.1224, 0.1224, 0.1224, 0.2452, 1.2406, 61.6128, 331.2201, 167.6488, 212.1086, 31.3996, 2.3884, 1.747]
 609           },
 610           "write": {
 611         "breaks": [70.9593, 79.9956, 89.0654, 98.085, 107.088, 115.9405, 123.5061, 127.893, 131.083, 133.6696, 135.7352, 139.5932, 147.4736],
 612         "heights": [0.2213, 0, 0.3326, 0.4443, 1.4685, 11.8959, 63.869, 110.286, 149.9741, 202.887, 80.8298, 9.0298]
 613           }
 614         },
 615         "read_bw": [152.7139],
 616         "write_bw": [131.7152]
 617       }
 618     }
 619
 620 Exporting to JSON
 621 =================
 622
 623 Finally, let's save it to a file to be opened by our simulator.
 624
 625 .. code:: R
 626
 627     json = toJSON(IO_INFO, pretty = TRUE)
 628     cat(json, file="IO_noise.json")
 629
 630 Injecting this data in SimGrid
 631 ==============================
 632
 633 To mimic this behavior in SimGrid, we use two features in the platform
 634 description: non-linear sharing policy and bandwidth factors. For more
 635 details, please see the source code in ``tuto_disk.cpp``.
 636
 637 Modeling resource sharing w/ concurrent access
 638 ----------------------------------------------
 639
 640 The ``set_sharing_policy`` method allows the user to set a callback to
 641 dynamically change the disk capacity. The callback is called each time
 642 SimGrid will share the disk between a set of I/O operations.
 643
 644 The callback has access to the number of activities sharing the
 645 resource and its current capacity. It must return the new resource's
 646 capacity.
 647
 648 .. code:: C++
 649
 650     static double disk_dynamic_sharing(double capacity, int n)
 651     {
 652        return capacity; //useless callback
 653     }
 654
 655     auto* disk = host->create_disk("dump", 1e6, 1e6);
 656     disk->set_sharing_policy(sg4::Disk::Operation::READ, sg4::Disk::SharingPolicy::NONLINEAR, &disk_dynamic_sharing);
 657
 658 Modeling read/write bandwidth variability
 659 -----------------------------------------
 660
 661 The noise in I/O operations can be obtained by applying a factor to
 662 the I/O bandwidth of the disk. This factor is applied when we update
 663 the remaining amount of bytes to be transferred, increasing or
 664 decreasing the effective disk bandwidth.
 665
 666 The ``set_factor`` method allows the user to set a callback to
 667 dynamically change the factor to be applied for each I/O operation.
 668 The callback has access to size of the operation and its type (read or
 669 write). It must return a multiply factor (e.g. 1.0 for doing nothing).
 670
 671 .. code:: C++
 672
 673     static double disk_variability(sg_size_t size, sg4::Io::OpType op)
 674     {
 675        return 1.0; //useless callback
 676     }
 677
 678     auto* disk = host->create_disk("dump", 1e6, 1e6);
 679     disk->set_factor_cb(&disk_variability);
 680
 681 Running our simulation
 682 ----------------------
 683
 684 The binary was compiled in the provided docker container.
 685
 686 .. code:: sh
 687
 688     ./tuto_disk > ./simgrid_disk.csv
 689
 690 Analyzing the SimGrid results
 691 =============================
 692
 693 The figure below presents the results obtained by SimGrid.
 694
 695 The experiment performs I/O operations, varying the number of
 696 concurrent operations from 1 to 15. We run only 20 simulations for
 697 each case.
 698
 699 We can see that the graphics are quite similar to the ones obtained in
 700 the real platform.
 701
 702 .. code:: R
 703
 704     sg_df = read.csv("./simgrid_disk.csv")
 705     sg_df = sg_df %>% group_by(disk, op, flows) %>% mutate(bw=((size*flows)/elapsed)/10^6, method=if_else(disk=="edel" & op=="read", "loess", "lm"))
 706     sg_dfd = sg_df %>% filter(flows==1 & op=="write") %>% group_by(disk, op, flows) %>% summarize(mean = mean(bw), sd = sd(bw), se=sd/sqrt(n()))
 707
 708     sg_df[sg_df$op=="write" & sg_df$flows ==1,]$method=""
 709
 710     ggplot(data=sg_df, aes(x=flows, y=bw, color=op)) + theme_bw() +
 711         geom_point(alpha=.3) +
 712         geom_smooth(data=sg_df[sg_df$method=="loess",], color="black", method=loess,se=TRUE,fullrange=T) +
 713         geom_smooth(data=sg_df[sg_df$method=="lm",], color="black", method=lm,se=TRUE) +
 714         geom_errorbar(data=sg_dfd, aes(x=flows, y=mean, ymin=mean-2*se, ymax=mean+2*se),color="black",width=.6) +
 715         facet_wrap(disk~op,ncol=2,scale="free_y")+ # ) + #
 716         xlab("Number of concurrent operations") + ylab("Aggregated Bandwidth (MiB/s)")  + guides(color=FALSE)  + xlim(0,NA) + ylim(0,NA)
 717
 718 .. image:: tuto_disk/fig/simgrid_results.png
 719
 720 Note: The variability in griffon read operation seems to decrease when
 721 we have more concurrent operations. This is a particularity of the
 722 griffon read speed profile and the elapsed time calculation.
 723
 724 Given that:
 725
 726 - Each point represents the time to perform the N I/O operations.
 727
 728 - Griffon read speed decreases with the number of concurrent
 729   operations.
 730
 731 With 15 read operations:
 732
 733 - At the beginning, every read gets the same bandwidth, about
 734   42MiB/s.
 735
 736 - We sample the noise in I/O operations, some will be faster than
 737   others (e.g. factor > 1).
 738
 739 When the first read operation finish:
 740
 741 - We will recalculate the bandwidth sharing, now considering that we
 742   have 14 active read operations. This will increase the bandwidth for
 743   each operation (about 44MiB/s).
 744
 745 - The remaining "slower" activities will be speed up.
 746
 747 This behavior keeps happening until the end of the 15 operations,
 748 at each step, we speed up a little the slowest operations and
 749 consequently, decreasing the variability we see.