INTRODUCTION.tex

   1 \chapter*{Introduction \markboth{Introduction }{Introduction }}
   2 \label{chI}
   3 \addcontentsline{toc}{chapter}{Introduction }
   4
   5 %%-------------------------------------------------------------------------------------------------------%%
   6
   7 \section*{1. General Introduction}
   8 \addcontentsline{toc}{section}{1. General Introduction }
   9 The need and the demand for more computing power have been increasing since the
  10 birth of the first computing unit and they are not expected to slow down in the
  11 coming years. To meet these demands, at first the frequency of the CPU was regularly increased  until  reaching the thermal limit. Then, researchers and supercomputers
  12 constructors have been regularly increasing the number of computing cores and
  13 processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node.
  14 These two  approaches are the most common up to now to get more computing power, but they increase the energy consumption of the resulting computing architecture.
  15 Indeed, the power consumed by a processor exponentially increases when its frequency is increased and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
  16 As an example, the Chinese supercomputer
  17 Tianhe-2 had the highest FLOPS in November 2015  according to the Top500 list
  18 \cite{ref101}.  However, it was also the most power hungry
  19 platform with more than 3 million cores consuming around 17.8 megawatts.
  20 Moreover, according to the U.S.  annual energy outlook 2015
  21 \cite{ref102}, the price of energy for 1 megawatt per hour
  22 was approximately equal to \$70.  Therefore, the price of the energy consumed by
  23 the Tianhe-2 platform is approximately more than \$10 million each year.
  24 Moreover, the platform generates a lot of heat and to prevent it from overheating a cooling
  25 infrastructure \cite{ref103} which consumes a lot of energy must be implemented.
  26  High CPU's temperatures can also drastically increase its energy consumption,  see \cite{ref104} for more details.
  27 An efficient computing platform must offer the highest number
  28 of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN
  29 which became the top of the Green500 list in November 2015 \cite{ref105}.
  30 This heterogeneous platform executes more than 7 GFlops per watt while only consuming
  31 50.32 kilowatts.
  32
  33 For all these reasons energy reduction has become an important topic in the high performance
  34 computing (HPC) field.  To tackle this problem, many researchers use DVFS (Dynamic
  35 Voltage and Frequency Scaling) operations which reduce dynamically the frequency and
  36 voltage of cores and thus their energy consumption \cite{ref49}.
  37 Indeed, modern CPUs offer a set of acceptable frequencies which are usually called gears, and the user or
  38 the operating system can modify the frequency of the processor according to its
  39 needs.  However, DVFS reduces the number of FLOPS executed by the processor which may increase the execution
  40 time of the application running over that processor.
  41 Therefore researchers try to reduce the frequency to the minimum when processors are idle
  42 (waiting for data from other processors or communicating with other processors).
  43 Moreover, depending on their objectives, they use heuristics to find the best
  44 frequency scaling factor during the computation.  If they aim for performance they choose
  45 the best frequency scaling factor that reduces the consumed energy while affecting as
  46 little as possible the performance.  On the other hand, if they aim for energy
  47 reduction, the chosen frequency scaling factor must produce the most energy efficient
  48 execution without considering the degradation of the performance. Whereas, it is
  49 important to notice that lowering the frequency to the minimum value does not always
  50 give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases.  However, a  more important question is how to select the best frequency gears that minimize the total energy consumption and the maximize the performance of a parallel application, running over a parallel platform,  at the same time?
  51
  52
  53
  54
  55
  56 \section*{2. Motivation of the Dissertation}
  57 \addcontentsline{toc}{section}{2. Motivation of the Dissertation }
  58
  59 The main objective of an HPC system such as  clusters, grids and supercomputers is to execute as fast as possible a given task  over that system.
  60 Hence, using DVFS to scale down the frequencies of the CPUs composing the system to reduce their energy consumption, it can also significantly  degrade the performance of the executed program, especially if it is compute bound. A compute bound program contain a lot of computations and a relatively small amount of communicators and Inputs/Outputs operations. The execution time of the program is directly dependent on
  61 the computing powers of the CPUs and their selected frequencies.
  62 Therefore, the chosen frequency scaling factor must  give the best possible trade-off between the energy reduction and the performance of the parallel application.
  63
  64 On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of  parallel applications when scaling the frequency of  the processors executing them because one affects the other. In order to evaluate the impact of scaling down the CPU's frequency on its energy consumption and computing power, mathematical models should be defined to predict them for different frequencies.
  65
  66
  67 Furthermore, researchers use different optimization strategies to select the frequencies of the CPUs. They might be executed  during the execution of the application (online) or during a pre-execution phase (offline). In our opinion a good approach  should minimize the energy consumption while preserving the performance at the same time. Finally, it should also be applied to the application during its execution without requiring any training or profiling and with minimal overhead.
  68
  69
  70 \section*{3. Main Contributions of this Dissertation}
  71  \addcontentsline{toc}{section}{3. Main Contributions of this Dissertation}
  72
  73 The main objective of this work is to minimize the energy consumption of  parallel applications with iterations running over clusters and grids while preserving their performance. The main contributions of this work can be summarized as follows:
  74
  75 \begin{enumerate} [I)]
  76
  77 \item  Energy consumption and performance models for synchronous and asynchronous message passing  applications with iterations were developed. These models take into consideration both the computation and communications times of these applications in addition to their relation to the frequency scaling factors.
  78
  79 \item The  parallel applications with iterations were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity  in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which  connect these nodes on the energy consumption and the performance of parallel  applications with iterations.
  80
  81 \item Depending on the proposed energy consumption and the performance models,  a new  objective function to optimize both  the energy consumption and the performance of the  parallel applications with iterations at the same were defined. It computes the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them.
  82
  83 \item  New online frequency selecting algorithms for clusters and grids were developed. They use the new objective function and  select the frequency scaling factors that simultaneously optimize both the  energy consumption and  performance. They have a very small overhead when comparing them to other methods in the state of the art and they work without training and profiling.
  84
  85
  86 \item  The proposed algorithms were applied to the NAS parallel benchmarks \cite{ref65} and the Multi-splitting method.  These applications offer different computations to communications ratios and a good testbed to evaluate the proposed algorithm in different scenarios.
  87
  88 \item The proposed algorithms were evaluated  over the SimGrid simulator \cite{ref66} which offers flexible and easy tools to built different types of parallel architectures.  Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and  compared with the simulated ones.
  89 The experiments  were conducted over different number of nodes and different  platform scenarios.
  90
  91 \item All the proposed methods were compared with either Rauber and Rünger      \cite{ref47} method or  Spiliopoulos et al.  \cite{ref67} objective function. Both the simulation and real experiments showed that the proposed methods give better energy to performance trade-offs  than the other methods.
  92 \end{enumerate}
  93
  94
  95 \section*{4. Dissertation Outline}
  96 \addcontentsline{toc}{section}{4. Dissertation Outline}
  97 The dissertation is organized as follows: chapter \ref{ch1} presents  different types of parallel architectures and  parallel  applications with iterations. It also presents an  energy consumption model from the state of the art that can be used to measure the energy consumption of these applications.
  98 Chapter \ref{ch2} describes the proposed energy and performance optimization method for synchronous  applications with iterations running over homogeneous clusters. Chapter \ref{ch3}  presents two algorithms for the energy and performance optimization of synchronous  applications  with iterations running over heterogeneous clusters and grids. In chapter \ref{ch4} the energy and performance models and the optimization method are adapted  for asynchronous iterative applications running over grids. Finally, this dissertation ends with a summary and some perspective works.
  99
 100
 101