1 \chapter*{Introduction \markboth{Introduction }{Introduction }}
3 \addcontentsline{toc}{chapter}{Introduction }
5 %%-------------------------------------------------------------------------------------------------------%%
7 \section*{1. General Introduction}
8 \addcontentsline{toc}{section}{1. General Introduction }
9 The need and the demand for more computing power have been increasing since the
10 birth of the first computing unit and they are not expected to slow down in the
11 coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Then, researchers and supercomputers
12 constructors have been regularly increasing the number of computing cores and
13 processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node.
14 These two approaches are the most common up to now to get more computing power, but they increase the energy consumption of the resulting computing architecture.
15 Indeed, the power consumed by a processor exponentially increases when its frequency is increased and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
16 As an example, the Chinese supercomputer
17 Tianhe-2 had the highest FLOPS in November 2015 according to the Top500 list
18 \cite{ref101}. However, it was also the most power hungry
19 platform with more than 3 million cores consuming around 17.8 megawatts.
20 Moreover, according to the U.S. annual energy outlook 2015
21 \cite{ref102}, the price of energy for 1 megawatt per hour
22 was approximately equal to \$70. Therefore, the price of the energy consumed by
23 the Tianhe-2 platform is approximately more than \$10 million each year.
24 Moreover, the platform generates a lot of heat and to prevent it from overheating a cooling
25 infrastructure \cite{ref103} which consumes a lot of energy must be implemented.
26 High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details.
27 An efficient computing platform must offer the highest number
28 of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN
29 which became the top of the Green500 list in November 2015 \cite{ref105}.
30 This heterogeneous platform executes more than 7 GFlops per watt while only consuming
33 For all these reasons energy reduction has become an important topic in the high performance
34 computing (HPC) field. To tackle this problem, many researchers use DVFS (Dynamic
35 Voltage and Frequency Scaling) operations which reduce dynamically the frequency and
36 voltage of cores and thus their energy consumption \cite{ref49}.
37 Indeed, modern CPUs offer a set of acceptable frequencies which are usually called gears, and the user or
38 the operating system can modify the frequency of the processor according to its
39 needs. However, DVFS reduces the number of FLOPS executed by the processor which may increase the execution
40 time of the application running over that processor.
41 Therefore researchers try to reduce the frequency to the minimum when processors are idle
42 (waiting for data from other processors or communicating with other processors).
43 Moreover, depending on their objectives, they use heuristics to find the best
44 frequency scaling factor during the computation. If they aim for performance they choose
45 the best frequency scaling factor that reduces the consumed energy while affecting as
46 little as possible the performance. On the other hand, if they aim for energy
47 reduction, the chosen frequency scaling factor must produce the most energy efficient
48 execution without considering the degradation of the performance. Whereas, it is
49 important to notice that lowering the frequency to the minimum value does not always
50 give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, a more important question is how to select the best frequency gears that minimize the total energy consumption and the maximize the performance of a parallel application, running over a parallel platform, at the same time?
56 \section*{2. Motivation of the Dissertation}
57 \addcontentsline{toc}{section}{2. Motivation of the Dissertation }
59 The main objective of an HPC system such as clusters, grids and supercomputers is to execute as fast as possible a given task over that system.
60 Hence, using DVFS to scale down the frequencies of the CPUs composing the system to reduce their energy consumption, it can also significantly degrade the performance of the executed program, especially if it is compute bound. A compute bound program contain a lot of computations and a relatively small amount of communicators and Inputs/Outputs operations. The execution time of the program is directly dependent on
61 the computing powers of the CPUs and their selected frequencies.
62 Therefore, the chosen frequency scaling factor must give the best possible trade-off between the energy reduction and the performance of the parallel application.
64 On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of parallel applications when scaling the frequency of the processors executing them because one affects the other. In order to evaluate the impact of scaling down the CPU's frequency on its energy consumption and computing power, mathematical models should be defined to predict them for different frequencies.
67 Furthermore, researchers use different optimization strategies to select the frequencies of the CPUs. They might be executed during the execution of the application (online) or during a pre-execution phase (offline). In our opinion a good approach should minimize the energy consumption while preserving the performance at the same time. Finally, it should also be applied to the application during its execution without requiring any training or profiling and with minimal overhead.
70 \section*{3. Main Contributions of this Dissertation}
71 \addcontentsline{toc}{section}{3. Main Contributions of this Dissertation}
73 The main objective of this work is to minimize the energy consumption of parallel applications with iterations running over clusters and grids while preserving their performance. The main contributions of this work can be summarized as follows:
75 \begin{enumerate} [I)]
77 \item Energy consumption and performance models for synchronous and asynchronous message passing applications with iterations were developed. These models take into consideration both the computation and communications times of these applications in addition to their relation to the frequency scaling factors.
79 \item The parallel applications with iterations were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connect these nodes on the energy consumption and the performance of parallel applications with iterations.
81 \item Depending on the proposed energy consumption and the performance models, a new objective function to optimize both the energy consumption and the performance of the parallel applications with iterations at the same were defined. It computes the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them.
83 \item New online frequency selecting algorithms for clusters and grids were developed. They use the new objective function and select the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they work without training and profiling.
86 \item The proposed algorithms were applied to the NAS parallel benchmarks \cite{ref65} and the Multi-splitting method. These applications offer different computations to communications ratios and a good testbed to evaluate the proposed algorithm in different scenarios.
88 \item The proposed algorithms were evaluated over the SimGrid simulator \cite{ref66} which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with the simulated ones.
89 The experiments were conducted over different number of nodes and different platform scenarios.
91 \item All the proposed methods were compared with either Rauber and Rünger \cite{ref47} method or Spiliopoulos et al. \cite{ref67} objective function. Both the simulation and real experiments showed that the proposed methods give better energy to performance trade-offs than the other methods.
95 \section*{4. Dissertation Outline}
96 \addcontentsline{toc}{section}{4. Dissertation Outline}
97 The dissertation is organized as follows: chapter \ref{ch1} presents different types of parallel architectures and parallel applications with iterations. It also presents an energy consumption model from the state of the art that can be used to measure the energy consumption of these applications.
98 Chapter \ref{ch2} describes the proposed energy and performance optimization method for synchronous applications with iterations running over homogeneous clusters. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous applications with iterations running over heterogeneous clusters and grids. In chapter \ref{ch4} the energy and performance models and the optimization method are adapted for asynchronous iterative applications running over grids. Finally, this dissertation ends with a summary and some perspective works.