\title{Gridification of a Radiotherapy Dose Computation Application with the XtremWeb-CH Environment}
-\author{Nabil Abdennhader\inst{1} \and Raphaël Couturier\inst{1} \and David \and
- Julien Henriet\inst{2} \and Laiymani\inst{1} \and Sébastien Miquée\inst{1}
- \and Marc Sauget\inst{2}}
-\institute{Laboratoire d'Informatique de l'universit\'{e}
+\author{Nabil Abdennhader\inst{1} \and Mohamed Ben Belgacem{1} \and Raphaël Couturier\inst{2} \and
+ David Laiymani\inst{2} \and Sébastien Miquée\inst{2} \and Marko Niinimaki\inst{1} \and Marc Sauget\inst{2}}
+
+\institute{
+University of Applied Sciences Western Switzerland, hepia Geneva,
+Switzerland \\
+\email{nabil.abdennadher@hesge.ch, mohamed.benbelgacem@unige.ch, markopekka.niinimaeki@hesge.ch}
+\and
+Laboratoire d'Informatique de l'universit\'{e}
de Franche-Comt\'{e} \\
IUT Belfort-Montbéliard, Rue Engel Gros, 90016 Belfort - France \\
\email{raphael.couturier, david.laiymani, sebastien.miquee@univ-fcomte.fr}
\and
FEMTO-ST, ENISYS/IRMA, F-25210 Montb\'{e}liard , FRANCE\\
+\email{marc.sauget@femtost.fr}
}
-%\email{\texttt{[laiymani]@lifc.univ-fcomte.fr}}}
\maketitle
\begin{abstract}
-
+ This paper presents the design and the evaluation of the
+ gridification of a radiotherapy dose computation application. Due to
+ the inherent characteristics of the application and its execution,
+ we choose the architectural context of global (or volunteer) computing.
+ For this, we used the XtremWeb-CH environement. Experiments were
+ conducted on a real global computing testbed and show good speed-ups
+ and very acceptable platform overhead.
\end{abstract}
+
%-------------INTRODUCTION--------------------
\section{Introduction}
-The use of distributed architectures for solving large scientific problems seems
-to become mandatory in a lot of cases. For example, in the domain of
-radiotherapy dose computation the problem is crucial. The main goal of external
-beam radiotherapy is the treatment of tumours while minimizing exposure to
-healthy tissue. Dosimetric planning has to be carried out in order to optimize
-the dose distribution within the patient is necessary. Thus, for determining the
-most accurate dose distribution during treatment planning, a compromise must be
-found between the precision and the speed of calculation. Current techniques,
-using analytic methods, models and databases, are rapid but lack
-precision. Enhanced precision can be achieved by using calculation codes based,
-for example, on Monte Carlo methods. In [] the authors proposed a novel approach
-based on the use of neural networks. The approach is based on the collaboration
-of computation codes and multi-layer neural networks used as universal
-approximators. It provides a fast and accurate evaluation of radiation doses in
-any given environment for given irradiation parameters. As the learning step is
-often very time consumming, in \cite{bcvsv08:ip} the authors proposed a parallel
-algorithm that enable to decompose the learning domain into subdomains. The
-decomposition has the advantage to significantly reduce the complexity of the
-target functions to approximate.
-
-Now, as there exist several classes of distributed/parallel architectures
-(supercomputers, clusters, global computing...) we have to choose the best
-suited one for the parallel Neurad application. The Global or Volunteer
-computing model seems to be an interesting approach. Here, the computing power
-is obtained by agregating unused (or volunteer) public resources connected to
-the Internet. For our case, we can imagine for example, that a part of the
-architecture will be composed of some of the different computers of the
-hospital. This approach present the advantage to be clearly cheaper than a more
-dedicated approach like the use of supercomputer or clusters.
-
-The aim of this paper is to propose and evaluate a gridification of the Neurad
-application (more precisely, of the most time consuming part, the learning step)
-using a Global computing approach. For this, we focus on the XtremWeb-CH
-environnement []. We choose this environnent because it tackles the centralized
-aspect of other global computing environments such as XTremWeb [] or Seti []. It
-tends to a peer-to-peer approach by distributing some components of the
-architecture. For instance, the computing nodes are allowed to directly
-communicate. Experimentations were conducted on a real Global Computing
-testbed. The results are very encouraging. They exhibit an interesting speed-up
-and show that the overhead induced by the use of XTremWeb-CH is very acceptable.
-
-The paper is organized as follows. In section 2 we present the Neurad
-application and particularly it most time consuming part i.e. the learning
-step. Section 3 details the XtremWeb-CH environnement while in section 4 we
-expose the gridification of the Neurad application. Experimental results are
-presented in section 5 and we end in section 6 by some concluding remarks and
-perspectives.
+The use of distributed architectures for solving large scientific
+problems seems to become mandatory in a lot of cases. For example, in
+the domain of radiotherapy dose computation the problem is
+crucial. The main goal of external beam radiotherapy is the treatment
+of tumors while minimizing exposure to healthy tissue. Dosimetric
+planning has to be carried out in order to optimize the dose
+distribution within the patient. Thus, to determine the most accurate
+dose distribution during treatment planning, a compromise must be
+found between the precision and the speed of calculation. Current
+techniques, using analytic methods, models and databases, are rapid
+but lack precision. Enhanced precision can be achieved by using
+calculation codes based, for example, on Monte Carlo methods. The main
+drawback of these methods is their computation times which can be
+rapidly be huge. In [] the authors proposed a novel approach, called
+Neurad, using neural networks. This approach is based on the
+collaboration of computation codes and multi-layer neural networks
+used as universal approximators. It provides a fast and accurate
+evaluation of radiation doses in any given environment for given
+irradiation parameters. As the learning step is often very time
+consuming, in \cite{bcvsv08:ip} the authors proposed a parallel
+algorithm that enable to decompose the learning domain into
+subdomains. The decomposition has the advantage to significantly
+reduce the complexity of the target functions to approximate.
+
+Now, as there exist several classes of distributed/parallel
+architectures (supercomputers, clusters, global computing...) we have
+to choose the best suited one for the parallel Neurad application.
+The Global or Volunteer Computing model seems to be an interesting
+approach. Here, the computing power is obtained by aggregating unused
+(or volunteer) public resources connected to the Internet. For our
+case, we can imagine for example, that a part of the architecture will
+be composed of some of the different computers of the hospital. This
+approach present the advantage to be clearly cheaper than a more
+dedicated approach like the use of supercomputers or clusters.
+
+The aim of this paper is to propose and evaluate a gridification of
+the Neurad application (more precisely, of the most time consuming
+part, the learning step) using a Global Computing approach. For this,
+we focus on the XtremWeb-CH environment []. We choose this environment
+because it tackles the centralized aspect of other global computing
+environments such as XtremWeb [] or Seti []. It tends to a
+peer-to-peer approach by distributing some components of the
+architecture. For instance, the computing nodes are allowed to
+directly communicate. Experiments were conducted on a real Global
+Computing testbed. The results are very encouraging. They exhibit an
+interesting speed-up and show that the overhead induced by the use of
+XtremWeb-CH is very acceptable.
+
+The paper is organized as follows. In Section 2 we present the Neurad
+application and particularly it most time consuming part i.e. the
+learning step. Section 3 details the XtremWeb-CH environment and
+Section 4 exposes the gridification of the Neurad
+application. Experimental results are presented in Section 5 and we
+end in Section 6 by some concluding remarks and perspectives.
\section{The Neurad application}
\begin{figure}[http]
\centering
\includegraphics[width=0.7\columnwidth]{figures/neurad.pdf}
- \caption{The Neurad projects}
+ \caption{The Neurad project}
\label{f_neurad}
\end{figure}
-The \emph{Neurad}~\cite{Neurad} project presented in this paper takes place in a
-multi-disciplinary project , involving medical physicists and computer
-scientists whose goal is to enhance the treatment planning of cancerous tumors
-by external radiotherapy. In our previous
-works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an original approach to
-solve scientific problems whose accurate modeling and/or analytical description
-are difficult. That method is based on the collaboration of computational codes
-and neural networks used as universal interpolator. Thanks to that method, the
-\emph{Neurad} software provides a fast and accurate evaluation of radiation
-doses in any given environment (possibly inhomogeneous) for given irradiation
-parameters. We have shown in a previous work (\cite{AES2009}) the interest to
-use a distributed algorithm for the neural network learning. We use a classical
-RPROP algorithm with a HPU topology to do the training of our neural network.
-
-The Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts are
-clearly independant : the initial data production, the learning process and the
-dose deposit evaluation. The first step, the data production, is outside the
-{\it{Neurad}} project. They are many solutions to obtains data about the
-radiotherapy treatments like the measure or the simulation. The only essential
-criterion is that the result must be obtain in a homogeneous environment. We
-have chosen to use only a Monte Carlo simulation because this tools are the
-references in the radiotherapy domains. The advantages to use data obtain with a
-Monte Carlo simulator are the following : accuracy, profusing, quantify error
-and regularity of measure point. But, they are too disagreement and the most
-important is the statistical noise forcing a data post treatment. The
-Figure~\ref{f_tray} present the general behavior of a dose deposit in water.
-
-
-\begin{figure}[http]
- \centering
- \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
- \caption{Dose deposit by a photon beam of 24 mm of width in water (Normalized value). }
- \label{f_tray}
-\end{figure}
-
-The secondary stage of the {\it{Neurad}} project is about the learning step and
-it is the most time consuming step. This step is off-line but is it important to
-reduce the time used for the learning process to keep a workable tools. Indeed,
-if the learning time is too important (for the moment, this time could reach one
-week for a limited works domain), the use of this process could be be limited
-only at a major modification of the use context. However, it is interesting to
-do an update to the learning process when the bound of the learning domain
-evolves (evolution in material used for the prosthesis or evolution on the beam
-(size, shape or energy)). The learning time is linked with the volume of data
-who could be very important in real medical context. We have work to reduce
-this learning time with a parallel method of the learning process using a
-partitioning method of the global dataset. The goal of this method is to train
-many neural networks on sub-domain of the global dataset. After this training,
-the use of this neural networks together allows to obtain a response for the
-global domain of study.
+The \emph{Neurad}~\cite{Neurad} project presented in this paper takes
+place in a multi-disciplinary project, involving medical physicists
+and computer scientists whose goal is to enhance the treatment
+planning of cancerous tumors by external radiotherapy. In our
+previous works~\cite{RADIO09,ICANN10,NIMB2008}, we have proposed an
+original approach to solve scientific problems whose accurate modeling
+and/or analytical description are difficult. That method is based on
+the collaboration of computational codes and neural networks used as
+universal interpolator. Thanks to that method, the \emph{Neurad}
+software provides a fast and accurate evaluation of radiation doses in
+any given environment (possibly inhomogeneous) for given irradiation
+parameters. We have shown in a previous work (\cite{AES2009}) the
+interest to use a distributed algorithm for the neural network
+learning. We use a classical RPROP algorithm with a HPU topology to do
+the training of our neural network.
+
+Figure~\ref{f_neurad} presents the {\it{Neurad}} scheme. Three parts
+are clearly independent: the initial data production, the learning
+process and the dose deposit evaluation. The first step, the data
+production, is outside the {\it{Neurad}} project. They are many
+solutions to obtain data about the radiotherapy treatments like the
+measure or the simulation. The only essential criterion is that the
+result must be obtain in a homogeneous environment.
+
+% We have chosen to
+% use only a Monte Carlo simulation because this kind of tool is the
+% reference in the radiotherapy domains. The advantages to use data
+% obtained with a Monte Carlo simulator are the following: accuracy,
+% profusion, quantified error and regularity of measure points. But,
+% there exist also some disagreements and the most important is the
+% statistical noise, forcing a data post treatment. Figure~\ref{f_tray}
+% presents the general behavior of a dose deposit in water.
+
+
+% \begin{figure}[http]
+% \centering
+% \includegraphics[width=0.7\columnwidth]{figures/testC.pdf}
+% \caption{Dose deposit by a photon beam of 24 mm of width in water (normalized value).}
+% \label{f_tray}
+% \end{figure}
+
+The secondary stage of the {\it{Neurad}} project is the learning step
+and this is the most time consuming step. This step is off-line but it
+is important to reduce the time used for the learning process to keep
+a workable tool. Indeed, if the learning time is too huge (for the
+moment, this time could reach one week for a limited domain), this
+process should not be launched at any time, but only when a major
+modification occurs in the environment, like a change of context for
+instance. However, it is interesting to update the knowledge of the
+neural network, by using the learning process, when the domain evolves
+(evolution in material used for the prosthesis or evolution on the
+beam (size, shape or energy)). The learning time is related to the
+volume of data who could be very important in a real medical context.
+A work has been done to reduce this learning time with the
+parallelization of the learning process by using a partitioning method
+of the global dataset. The goal of this method is to train many neural
+networks on sub-domains of the global dataset. After this training,
+the use of these neural networks all together allows to obtain a
+response for the global domain of study.
\begin{figure}[h]
\end{figure}
-However, performing the learnings on sub-domains constituting a partition of the
-initial domain is not satisfying according to the quality of the results. This
-comes from the fact that the accuracy of the approximation performed by a neural
-network is not constant over the learned domain. Thus, it is necessary to use
-an overlapping of the sub-domains. The overall principle is depicted in
-Figure~\ref{fig:overlap}. In this way, each sub-network has an exploitation
-domain smaller than its training domain and the differences observed at the
-borders are no longer relevant. Nonetheless, in order to preserve the
-performances of the parallel algorithm, it is important to carefully set the
-overlapping ratio $\alpha$. It must be large enough to avoid the border's
-errors, and as small as possible to limit the size increase of the data subsets.
-
-
+However, performing the learning on sub-domains constituting a
+partition of the initial domain is not satisfying according to the
+quality of the results. This comes from the fact that the accuracy of
+the approximation performed by a neural network is not constant over
+the learned domain. Thus, it is necessary to use an overlapping of
+the sub-domains. The overall principle is depicted in
+Figure~\ref{fig:overlap}. In this way, each sub-network has an
+exploitation domain smaller than its training domain and the
+differences observed at the borders are no longer relevant.
+Nonetheless, in order to preserve the performance of the parallel
+algorithm, it is important to carefully set the overlapping ratio
+$\alpha$. It must be large enough to avoid the border's errors, and
+as small as possible to limit the size increase of the data subsets.
\section{The XtremWeb-CH environment}
-\section{Neurad gridification with XTremweb-ch}
+\input{xwch.tex}
+
+\section{}
+
+\label{sec:neurad_gridif}
+
+
+The Neurad application can be divided into three parts. The first one
+aims at dividing data representing dose distribution on an area. This
+area contains various parameters, like the density of the medium and
+its nature. Multiple ``views'' can be superposed in order to obtain a
+more accurate learning. The second part of the application is the
+learning itself. This is the most time consuming part and therefore
+this is the one which has been ported to XWCH. This part fits well
+with the model of the middleware -- all learning tasks execute in
+parallel independently with their own local data part, with no
+communication, following the fork-join model. As described on Figure
+\ref{fig:neurad_grid}, we first send the learning application and data
+to the middleware (more precisely on warehouses (DW)) and create the
+computation module. When a worker (W) is ready to compute, it requests
+a task to execute to the coordinator (Coord.). This latter assigns it
+a task. The worker retrieves the application and its assigned data,
+and can start the computation. At the end of the learning process, it
+sends the result, a weighted neural network which will be used in a
+dose distribution process, to a warehouse. The last step of the
+application is to retrieve these results and exploit them.
+
+
+\begin{figure}[ht]
+ \centering
+ \includegraphics[width=\linewidth]{neurad_gridif}
+ \caption{Neurad gridification}
+ \label{fig:neurad_grid}
+\end{figure}
+
\section{Experimental results}
+
+\label{sec:neurad_xp}
+
+\subsubsection{Conditions}
+\label{sec:neurad_cond}
+
+
+The evaluation of the execution of the Neurad application on XWCH was
+composed as follows. The size of the input data is about 2.4Gb. This
+amount of data can be divided into 25 parts – otherwise, data noise
+appears and will disturb the learning. We have used 25 computers (XWCH
+workers) to execute this part of the application. This generates input
+data parts of about 15Mb (in a compressed format). The output data,
+which are retrieved after the process, are about 30Kb for each part. We
+used two distincts deployments of XWCH. In the first one, the XWCH
+coordinator and the warehouses were situated in Geneva, Switzerland
+while the workers were running in the same local cluster in Belfort,
+France. The second deployment is a local deployment where both
+coordinator, warehouses and workers were in the same local cluster.
+During the day these machines were used by students of the Computer
+Science Department of the IUT of Belfort.
+
+We have furthermore compared the execution of the Neurad application
+with and without the XWCH platform in order to measure the overhead
+induced by the use of the platform. By "without XWCH" we mean that the
+testbed consists only in workers deployed with their respective data by
+the use of shell scripts. No specific middleware was used and the
+workers were in the same local cluster.
+
+Five computation precisions were used: $1e^{-1}$, $0.75e^{-1}$, $0.50e^{-1}$, $0.25e^{-1}$ and $1e^{-2}$.
+
+
+\subsubsection{Results}
+\label{sec:neurad_result}
+
+
+In these experiments, we measured the same steps on both kinds of
+executions. The steps consist of sending of local data and the
+executable, the learning process, and retrieving the result. Table
+\ref{tab:neurad_res} presents the execution times of the Neurad
+application on 25 machines with XWCH (local and distributed deployment)
+and without XWCH.
+
+
+\begin{table}[h!]
+ \centering
+ \begin{tabular}[h!]{|c|c|c|c|c|}
+ \hline
+ Precision & 1 machine & Without XWCH & With XWCH & With local XWCH\\
+ \hline
+ $1e^{-1}$ & 5190 & 558 & 759 & 629\\
+ $0.75e^{-1}$ & 6307 & 792 & 1298 & 801 \\
+ $0.50e^{-1}$ & 7487 & 792 & 1010 & 844 \\
+ $0.25e^{-1}$ & 7787 & 791 & 1000 & 852\\
+ $1e^{-2}$ & 11030 & 1035 & 1447 & 1108 \\
+ \hline
+ \end{tabular}
+\caption{Execution time in seconds of the Neurad application, with and without using the XWCH platform}
+ \label{tab:neurad_res}
+\end{table}
+
+%\begin{table}[ht]
+% \centering
+% \begin{tabular}[h]{|c|c|c|}
+% \hline
+% Precision & Without XWCH & With XWCH \\
+% \hline
+% $1e^{-1}$ & $558$s & $759$s\\
+% \hline
+% \end{tabular}
+% \caption{Execution time in seconds of Neurad application, with and without using XtremWeb-CH platform}
+% \label{tab:neurad_res}
+%\end{table}
+
+
+These experiments show that the overhead induced by the use of the XWCH
+platform is about $34\%$ in the distributed deployment and about $7\%$
+in the local deployment. For this last one, the overhead is very acceptable regarding to the benefits of the platform.
+
+Now, in the distributed deployment the overhead is also acceptable and can be explained by
+different factors. First, we point out that the conditions of executions
+are not really identical between with and without XWCH. For this last
+one, though the same steps were done, all transfer processes are inside
+a local cluster with a high bandwidth and a low latency. Whereas when
+using XWCH, all transfer processes (between datawarehouses, workers, and
+the coordinator) used a wide network area with a smaller bandwidth.
+
+In addition, in executions without XWCH, all the machines started
+immediately the computation, whereas when using the XWCH platform, a
+latency is introduced by the fact that a task starts on a machine, only
+when this one requests a task.
+
+These experiments underline that deploying a local coordinator and one
+or more warehouses near a cluster of workers can enhance computations
+and platform performances. They also show a limited overhead due to the
+use of the platform.
+
+
+\end{document}
+
+
+
\section{Conclusion and future works}