gpc2011.tex

   1 \documentclass{llncs}
   2 %\usepackage{latex8}
   3 %\usepackage{times}
   4 %\documentclass[a4paper,11pt]{article}
   5 %\usepackage{fullpage}
   6 \usepackage[T1]{fontenc}
   7 \usepackage[utf8]{inputenc}
   8 \usepackage{graphicx,subfigure,graphics}
   9 \usepackage{epsfig}
  10 %\usepackage[usenames]{color}
  11 %\usepackage{latexsym,stmaryrd}
  12 %\usepackage{amsfonts,amssymb}
  13 \usepackage{verbatim,theorem,moreverb}
  14 %\usepackage{float,floatflt}
  15 \usepackage{boxedminipage}
  16 \usepackage{url}
  17 %\usepackage{psfig}
  18 \usepackage{amsmath}
  19 \usepackage{amsfonts}
  20 \usepackage{amssymb}
  21 \usepackage{algorithm}
  22 \usepackage{algorithmic}
  23 %\usepackage{floatfig}
  24 %\usepackage{picins}
  25
  26
  27
  28 \def\sfixme#1{\fbox{\textbf{FIXME: }#1}}
  29
  30 \newcommand{\fixme}[1]{%
  31   \begin{center}
  32     \begin{boxedminipage}{.8\linewidth}
  33       \textsl{{\bf #1}}
  34     \end{boxedminipage}
  35   \end{center}
  36 }
  37 \newcommand{\FIXME}[1]{\marginpar[\null\hspace{2cm} FIXME]{FIXME} \fixme{#1}}
  38
  39 %\psfigurepath{.:fig:IMAGES}
  40 \graphicspath{{.}{fig/}{IMAGES/}}
  41
  42 %\initfloatingfigs
  43
  44 \begin{document}
  45
  46 \title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
  47
  48
  49 \author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem\inst{1} \and Raphaël Couturier\inst{2} \and
  50   David Laiymani\inst{2} \and Sébastien  Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{2}}
  51
  52 \institute{
  53 University of Applied Sciences Western Switzerland, hepia Geneva,
  54 Switzerland \\
  55 \email{nabil.abdennadher@hesge.ch,mohamed.benbelgacem@unige.ch,markopekka.niinimaeki@hesge.ch}
  56 \and
  57 Laboratoire d'Informatique de l'universit\'{e}
  58   de Franche-Comt\'{e} \\
  59   IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
  60 \email{\{raphael.couturier,david.laiymani,sebastien.miquee\}@univ-fcomte.fr}
  61 \and
  62  FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
  63 \email{marc.sauget@femto-st.fr}
  64 }
  65
  66
  67 \maketitle
  68
  69 \begin{abstract}
  70   This paper presents the design and the evaluation of the
  71   gridification of a radiotherapy dose computation application. Due to
  72   the inherent characteristics of the application and its execution,
  73   we choose the architectural context of global (or volunteer)
  74   computing.  For this, we used the XtremWeb-CH
  75   environment. Experiments were conducted on a real global computing
  76   testbed and show good speed-ups and very acceptable platform
  77   overhead letting XtremWeb-CH be a good candidate for deploying
  78   parallel applications over a global computing environment.
  79 \end{abstract}
  80
  81
  82 %-------------INTRODUCTION--------------------
  83 \section{Introduction}
  84
  85 The use of distributed architectures for solving large scientific
  86 problems seems to become mandatory in a lot of cases. For example, in
  87 the domain of radiotherapy dose computation the problem is
  88 crucial. The main goal of external beam radiotherapy is the treatment
  89 of tumors while minimizing exposure to healthy tissue. Dosimetric
  90 planning has to be carried out in order to optimize the dose
  91 distribution within the patient. Thus, to determine the most accurate
  92 dose distribution during treatment planning, a compromise must be
  93 found between the precision and the speed of calculation. Current
  94 techniques, using analytic methods, models and databases, are rapid
  95 but lack precision. Enhanced precision can be achieved by using
  96 calculation codes based, for example, on Monte Carlo methods. The main
  97 drawback of these methods is their computation times which can be
  98 rapidly huge. In \cite{} the authors proposed a novel approach, called
  99 Neurad, using neural networks. This approach is based on the
 100 collaboration of computation codes and multi-layer neural networks
 101 used as universal approximators. It provides a fast and accurate
 102 evaluation of radiation doses in any given environment for given
 103 irradiation parameters. As the learning step is often very time
 104 consuming, in \cite{} the authors proposed a parallel
 105 algorithm that enables to decompose the learning domain into
 106 subdomains. The decomposition has the advantage to significantly
 107 reduce the complexity of the target functions to approximate.
 108
 109 Now, as there exist several classes of distributed/parallel
 110 architectures (supercomputers, clusters, global computing...)  we have
 111 to choose the best suited one for the parallel Neurad application.
 112 The Global or Volunteer Computing model seems to be an interesting
 113 approach. Here, the computing power is obtained by aggregating unused
 114 (or volunteer) public resources connected to the Internet. For our
 115 case, we can imagine for example, that a part of the architecture will
 116 be composed of some of the different computers of the hospital. This
 117 approach presents the advantage to be clearly cheaper than a more
 118 dedicated approach like the use of supercomputers or clusters.
 119
 120 The aim of this paper is to propose and evaluate a gridification of
 121 the Neurad application (more precisely, of the most time consuming
 122 part, the learning step) using a Global Computing approach. For this,
 123 we focus on the XtremWeb-CH environment\cite{}. We choose this environment
 124 because it tackles the centralized aspect of other global computing
 125 environments such as XtremWeb\cite{} or Seti\cite{}. It tends to a
 126 peer-to-peer approach by distributing some components of the
 127 architecture. For instance, the computing nodes are allowed to
 128 directly communicate. Experiments were conducted on a real Global
 129 Computing testbed. The results are very encouraging. They exhibit an
 130 interesting speed-up and show that the overhead induced by the use of
 131 XtremWeb-CH is very acceptable.
 132
 133 The paper is organized as follows. In Section 2 we present the Neurad
 134 application and particularly its most time consuming part, i.e. the
 135 learning step. Section 3 details the XtremWeb-CH environment and
 136 Section 4 exposes the gridification of the Neurad
 137 application. Experimental results are presented in Section 5 and we
 138 end in Section 6 by some concluding remarks and perspectives.
 139
 140 \section{The Neurad application}
 141
 142 \begin{figure}[http]
 143   \centering
 144   \includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
 145   \caption{The Neurad project}
 146   \label{f_neurad}
 147 \end{figure}
 148
 149 The \emph{Neurad}~\cite{Neurad} project presented in this paper takes
 150 place in a multi-disciplinary project, involving medical physicists
 151 and computer scientists whose goal is to enhance the treatment
 152 planning of cancerous tumors by external radiotherapy. In our previous
 153 works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an original
 154 approach to solve scientific problems whose accurate modeling and/or
 155 analytical description are difficult. That method is based on the
 156 collaboration of computational codes and neural networks used as
 157 universal interpolator. Thanks to that method, the \emph{Neurad}
 158 software provides a fast and accurate evaluation of radiation doses in
 159 any given environment (possibly inhomogeneous) for given irradiation
 160 parameters. We have shown in a previous work (\cite{AES2009}) the
 161 interest to use a distributed algorithm for the neural network
 162 learning. We use a classical RPROP (DEFINITION)algorithm with a HPU
 163 topology to do the training of our neural network.
 164
 165 Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts
 166 are clearly independent: the initial data production, the learning
 167 process and the dose deposit evaluation. The first step, the data
 168 production, is outside of the {\it{Neurad}} project. They are many
 169 solutions to obtain data about the radiotherapy treatments like the
 170 measure or the simulation. The only essential criterion is that the
 171 result must be obtained in an homogeneous environment.
 172
 173 % We have chosen to
 174 % use only a Monte Carlo simulation because this kind of tool is the
 175 % reference in the radiotherapy domains. The advantages to use data
 176 % obtained with a Monte Carlo simulator are the following: accuracy,
 177 % profusion, quantified error and regularity of measure points. But,
 178 % there exist also some disagreements and the most important is the
 179 % statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
 180 % presents the general behavior of a dose deposit in water.
 181
 182
 183 % \begin{figure}[http]
 184 %   \centering
 185 %   \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
 186 %   \caption{Dose deposit by a photon beam  of 24 mm of width in water (normalized value).}
 187 %   \label{f_tray}
 188 % \end{figure}
 189
 190 The secondary stage of the {\it{Neurad}} project is the learning step
 191 and this is the most time consuming step. This step is performed
 192 off-line but it is important to reduce the time used for the learning
 193 process to keep a workable tool. Indeed, if the learning time is too
 194 huge (for the moment, this time could reach one week for a limited
 195 domain), this process should not be launched at any time, but only
 196 when a major modification occurs in the environment, like a change of
 197 context for instance. However, it is interesting to update the
 198 knowledge of the neural network, by using the learning process, when
 199 the domain evolves (evolution in material used for the prosthesis or
 200 evolution on the beam (size, shape or energy)). The learning time is
 201 related to the volume of data who could be very important in a real
 202 medical context.  A work has been done to reduce this learning time
 203 with the parallelization of the learning process by using a
 204 partitioning method of the global dataset. The goal of this method is
 205 to train many neural networks on sub-domains of the global
 206 dataset. After this training, the use of these neural networks all
 207 together allows to obtain a response for the global domain of study.
 208
 209
 210 \begin{figure}[h]
 211   \centering
 212   \includegraphics[width=0.5\columnwidth]{figures/overlap.pdf}
 213   \caption{Overlapping for a sub-network  in a two-dimensional domain with ratio
 214     $\alpha$}
 215   \label{fig:overlap}
 216 \end{figure}
 217
 218
 219 However, performing the learning on sub-domains constituting a
 220 partition of the initial domain is not satisfying according to the
 221 quality of the results. This comes from the fact that the accuracy of
 222 the approximation performed by a neural network is not constant over
 223 the learned domain. Thus, it is necessary to use an overlapping of
 224 the sub-domains. The overall principle is depicted in
 225 Figure~\ref{fig:overlap}. In this way, each sub-network has an
 226 exploitation domain smaller than its training domain and the
 227 differences observed at the borders are no longer relevant.
 228 Nonetheless, in order to preserve the performance of the parallel
 229 algorithm, it is important to carefully set the overlapping ratio
 230 $\alpha$. It must be large enough to avoid the border's errors, and
 231 as small as possible to limit the size increase of the data subsets
 232 (Qu'en est-il pour nos test ?).
 233
 234
 235
 236 \section{The XtremWeb-CH environment}
 237 \input{xwch.tex}
 238
 239 \section{The Neurad gridification}
 240
 241 \label{sec:neurad_gridif}
 242
 243
 244 As previously exposed, the Neurad application can be divided into
 245 three steps.  The goal of the first step is to decompose the data
 246 representing the dose distribution on an area. This area contains
 247 various parameters, like the nature of the medium and its
 248 density. This part is out of the scope of this paper.
 249 %Multiple ``views'' can be
 250 %superposed in order to obtain a more accurate learning.
 251
 252 The second step of the application, and the most time consuming, is
 253 the learning itself. This is the one which has been parallelized,
 254 using the XWCH environment. As exposed in the section 2, the
 255 parallelization relies on a partitionning of the global
 256 dataset. Following this partitionning all learning tasks are executed
 257 in parallel independently with their own local data part, with no
 258 communication, following the fork/join model. Clearly, this
 259 computation fits well with the model of the chosen middleware.
 260
 261 The execution scheme is then the following (see Figure
 262 \ref{fig:neurad_grid}):
 263 \begin{enumerate}
 264 \item We first send the learning application and its data to the
 265   middleware (more precisely on warehouses (DW)) and create the
 266   computation module;
 267 \item When a worker (W) is ready to compute, it requests a task to
 268   execute to the coordinator (Coord.);
 269 \item The coordinator assigns the worker a task. This last one retrieves the
 270 application and its assigned data and so can start the computation.
 271 \item At the end of the learning process, the worker sends the result to a warehouse.
 272 \end{enumerate}
 273
 274 The last step of the application is to retrieve these results (some
 275 weighted neural networks) and exploit them through a dose distribution
 276 process. This latter step is out of the scope of this paper.
 277
 278
 279 \begin{figure}[ht]
 280   \centering
 281   \includegraphics[width=8cm]{figures/neurad_gridif}
 282   \caption{The proposed Neurad gridification}
 283   \label{fig:neurad_grid}
 284 \end{figure}
 285
 286 \section{Experimental results}
 287 \label{sec:neurad_xp}
 288
 289 The aim of this section is to describe and analyze the experimental
 290 results we have obtained with the parallel Neurad version previously
 291 described. Our goal was to carry out this application with real input
 292 data and on a real global computing testbed.
 293
 294 \subsubsection{Experimental conditions}
 295 \label{sec:neurad_cond}
 296
 297 The size of the input data is about 2.4Gb. In order to avoid that data
 298 noise appears and disturbs the learning process, these data can be
 299 divided into, at most, 25 parts. This generates input data parts of
 300 about 15Mb (in a compressed format). The output data, which are
 301 retrieved after the process, are about 30Kb for each
 302 part. Unfortunately, the data decomposition limitation does not allow
 303 us to use more than 25 computers (XWCH workers). Nevertheless, we used two
 304 distinct deployments of XWCH:
 305 \begin{enumerate}
 306
 307 \item In the first one, called ``distributed XWCH'' in the following,
 308   the XWCH coordinator and the warehouses were located in Geneva,
 309   Switzerland while the workers were running in the same local cluster
 310   in Belfort, France.
 311
 312 \item The second deployment, called ``local XWCH'' is a local
 313   deployment where both coordinator, warehouses and workers were in
 314   the same local cluster.
 315
 316 \end{enumerate}
 317 For both deployments, during the day these machines were used by
 318 students of the Computer Science Department of the IUT of Belfort.
 319
 320 In order to evaluate the overhead induced by the use of the platform
 321 we have furthermore compared the execution of the Neurad application
 322 with and without the XWCH platform. For the latter case, we mean that the
 323 testbed consists only in workers deployed with their respective data
 324 by the use of shell scripts. No specific middleware was used and the
 325 workers were in the same local cluster.
 326
 327 Finally, five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$,
 328 $0.50e^{-1}$, $0.25e^{-1}$, and $1e^{-2}$.
 329
 330
 331 \subsubsection{Results}
 332 \label{sec:neurad_result}
 333
 334 Table \ref{tab:neurad_res} presents the execution times of the Neurad
 335 application on 25 machines with XWCH (local and distributed
 336 deployment) and without XWCH. These results correspond to the measures
 337 of the same steps for both kinds of execution, i.e. sending of local
 338 data and the executable, the learning process, and retrieving the
 339 results. Results represent the average time of $?? x ??$ executions.
 340
 341
 342 \begin{table}[h!]
 343   \renewcommand{\arraystretch}{1.7}
 344   \centering
 345   \begin{tabular}[h!]{|c|c|c|c|c|}
 346     \hline
 347     ~Precision~ & ~1 machine~ & ~Without XWCH~ & ~With XWCH~ & ~With
 348     local XWCH~ \\
 349     \hline
 350      $1e^{-1}$ & 5190 & 558 & 759 & 629\\
 351     $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
 352     $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
 353     $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
 354     $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
 355     \hline
 356   \end{tabular}
 357   \vspace{0.3cm}
 358 \caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
 359   \label{tab:neurad_res}
 360 \end{table}
 361
 362 %\begin{table}[ht]
 363 %  \centering
 364 %  \begin{tabular}[h]{|c|c|c|}
 365 %    \hline
 366 %    Precision & Without XWCH & With XWCH \\
 367 %    \hline
 368 %    $1e^{-1}$ & $558$s & $759$s\\
 369 %    \hline
 370 %  \end{tabular}
 371 %  \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
 372 %  \label{tab:neurad_res}
 373 %\end{table}
 374
 375
 376 As we can see, in the case of a local deployment the overhead induced
 377 by the use of the XWCH platform is about $7\%$. It is clearly a low
 378 overhead. Now, for the distributed deployment, the overhead is about
 379 $34\%$. Regarding to the benefits of the platform, it is a very
 380 acceptable overhead which can be explained by the following points.
 381
 382 First, we point out that the conditions of executions are not really
 383 identical between with and without XWCH contexts. For this last one,
 384 though the same steps were done, all transfer processes are inside a
 385 local cluster with a high bandwidth and a low latency. Whereas when
 386 using XWCH, all transfer processes (between datawarehouses, workers,
 387 and the coordinator) used a wide network area with a smaller
 388 bandwidth. In addition, in executions without XWCH, all the machines
 389 started immediately the computation, whereas when using the XWCH
 390 platform, a latency is introduced by the fact that a computation
 391 starts on a machine, only when this one requests a task.
 392
 393 This underlines that, unsurprisingly, deploying a local
 394 coordinator and one or more warehouses near a cluster of workers can
 395 enhance computations and platform performances.
 396
 397
 398 \section{Conclusion and future works}
 399
 400 In this paper, we have presented a gridification of a real medical
 401 application, the Neurad application. This radiotherapy application
 402 tries to optimize the irradiated dose distribution within a
 403 patient. Based on a multi-layer neural network, this application
 404 presents a very time consuming step, i.e. the learning step. Due to the
 405 computing characteristics of this step, we choose to parallelize it
 406 using the XtremWeb-CH global computing environment. Obtained
 407 experimental results show good speed-ups and underline that overheads
 408 induced by XWCH are very acceptable, letting it be a good candidate
 409 for deploying parallel applications over a global computing environment.
 410
 411 Our future works include the testing of the application on a more
 412 large scale testbed. This implies, the choice of a data input set
 413 allowing a finer decomposition. Unfortunately, this choice of input
 414 data is not trivial and relies on a large number of parameters
 415 (demander ici des précisions à Marc).
 416
 417 \bibliographystyle{plain}
 418 \bibliography{biblio}
 419
 420
 421
 422 \end{document}