X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/ThesisAhmed.git/blobdiff_plain/46e8710e38caa11a5ea1ac4d5878d02ebe1cb610..1e2dba6a4221941261eb9fc136576db92c7b1e56:/INTRODUCTION.tex diff --git a/INTRODUCTION.tex b/INTRODUCTION.tex index c69e7df..da10810 100644 --- a/INTRODUCTION.tex +++ b/INTRODUCTION.tex @@ -6,27 +6,28 @@ \section*{1. General Introduction} \addcontentsline{toc}{section}{1. General Introduction } - -The need and demand for more computing power have been increasing since the -birth of the first computing unit and it is not expected to slow down in the -coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Also, researchers and supercomputers +The need and the demand for more computing power have been increasing since the +birth of the first computing unit and they are not expected to slow down in the +coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Then, researchers and supercomputers constructors have been regularly increasing the number of computing cores and processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node. -Therefore, these two common approaches are the most common up to now to get more computing power, but they are increasing the energy consumption of the resulting computing architecture. -Indeed, the power consumed by a processor exponentially increases when its frequency increases and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node. +These two approaches are the most common up to now to get more computing power, but they increase the energy consumption of the resulting computing architecture. +Indeed, the power consumed by a processor exponentially increases when its frequency is increased and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node. As an example, the Chinese supercomputer Tianhe-2 had the highest FLOPS in November 2015 according to the Top500 list \cite{ref101}. However, it was also the most power hungry -platform with its over 3 million cores consuming around 17.8 megawatts. +platform with more than 3 million cores consuming around 17.8 megawatts. Moreover, according to the U.S. annual energy outlook 2015 \cite{ref102}, the price of energy for 1 megawatt per hour was approximately equal to \$70. Therefore, the price of the energy consumed by the Tianhe-2 platform is approximately more than \$10 million each year. -Moreover, the heat generated by the platform and therefore a cooling infrastructure \cite{ref103} which also consumes a lot of energy, must be implemented to keep the platform from overheating. High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details. -The computing platforms must be more energy efficient and offer the highest number +Moreover, the platform generates a lot of heat and to prevent it from overheating a cooling +infrastructure \cite{ref103} which consumes a lot of energy must be implemented. + High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details. +An efficient computing platform must offer the highest number of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN which became the top of the Green500 list in November 2015 \cite{ref105}. -This heterogeneous platform executes more than 7 GFlops per watt while consuming +This heterogeneous platform executes more than 7 GFlops per watt while only consuming 50.32 kilowatts. For all these reasons energy reduction has become an important topic in the high performance @@ -46,7 +47,7 @@ little as possible the performance. On the other hand, if they aim for energy reduction, the chosen frequency scaling factor must produce the most energy efficient execution without considering the degradation of the performance. Whereas, it is important to notice that lowering the frequency to the minimum value does not always -give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, the more important question is how to select the best frequency gears that minimizes the total energy consumption and the maximizes the performance of parallel application running over a parallel platform at the same time? +give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, a more important question is how to select the best frequency gears that minimize the total energy consumption and the maximize the performance of a parallel application, running over a parallel platform, at the same time? @@ -55,46 +56,46 @@ give the most energy efficient execution due to energy leakage that increases th \section*{2. Motivation of the Dissertation} \addcontentsline{toc}{section}{2. Motivation of the Dissertation } -The main objective of HPC systems is to execute as fast as possible the sequential application over a parallel architecture. -Hence, using DVFS to scale down the frequencies of CPUs composing the parallel architecture for energy reduction process, it can also significantly degrade the performance of the executed program if it is compute bound, when the program depends mainly of the computing power of the processor, and if a low CPU frequency is selected. +The main objective of an HPC system such as clusters, grids and supercomputers is to execute as fast as possible a given task over that system. +Hence, using DVFS to scale down the frequencies of the CPUs composing the system to reduce their energy consumption, it can also significantly degrade the performance of the executed program, especially if it is compute bound. A compute bound program contain a lot of computations and a relatively small amount of communicators and Inputs/Outputs operations. The execution time of the program is directly dependent on +the computing powers of the CPUs and their selected frequencies. Therefore, the chosen frequency scaling factor must give the best possible trade-off between the energy reduction and the performance of the parallel application. -On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of the parallel applications when scaling the frequency of processors executing them because one affects the other. There are a very few works in the state of the art have been dedicated for this problem. Therefore, mathematical models of both the energy consumption and performance of the parallel application running over a parallel platform are required and should be defined precisely to discover the best relation between them. +On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of parallel applications when scaling the frequency of the processors executing them because one affects the other. In order to evaluate the impact of scaling down the CPU's frequency on its energy consumption and computing power, mathematical models should be defined to predict them for different frequencies. + -Furthermore, researchers use different optimization strategies to select the frequencies that gives the best trade-off between the energy reduction and performance degradation ratio, which might be chosen during execution (online) or during a pre-execution phase (offline). Thus, the best optimization approach to optimize the energy consumption and the performance at the same time must be applied online with a very low overhead on the execution time of the parallel application. +Furthermore, researchers use different optimization strategies to select the frequencies of the CPUs. They might be executed during the execution of the application (online) or during a pre-execution phase (offline). In our opinion a good approach should minimize the energy consumption while preserving the performance at the same time. Finally, it should also be applied to the application during its execution without requiring any training or profiling and with minimal overhead. \section*{3. Main Contributions of this Dissertation} \addcontentsline{toc}{section}{3. Main Contributions of this Dissertation} -The main contributions of this dissertation focus on optimizing both the energy consumption and the performance of iterative parallel applications running over clusters and grids. The main contributions of this work summarize as follows: +The main objective of this work is to minimize the energy consumption of parallel applications with iterations running over clusters and grids while preserving their performance. The main contributions of this work can be summarized as follows: \begin{enumerate} [I)] -\item We develop an energy consumption and performance models for synchronous and asynchronous message passing iterative applications. These models take into consideration both the computation and communications times of these application, in addition to their relation with frequency scaling factors. +\item Energy consumption and performance models for synchronous and asynchronous message passing applications with iterations were developed. These models take into consideration both the computation and communications times of these applications in addition to their relation to the frequency scaling factors. -\item The iterative parallel applications were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connecting these nodes on the energy consumption and the performance of iterative applications. +\item The parallel applications with iterations were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connect these nodes on the energy consumption and the performance of parallel applications with iterations. -\item Depending on the proposed energy consumption and the performance models, we define a new objective function to optimize both the energy consumption and the performance of the iterative parallel applications at the same. The proposed objective function compute the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them. +\item Depending on the proposed energy consumption and the performance models, a new objective function to optimize both the energy consumption and the performance of the parallel applications with iterations at the same were defined. It computes the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them. -\item a new online frequencies selecting algorithms for cluster and grid are developed which used the new objective function. These algorithms selected the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they are working without training and profiling. +\item New online frequency selecting algorithms for clusters and grids were developed. They use the new objective function and select the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they work without training and profiling. -\item We conducted extensive simulation experiments over SimGrid simulator \cite{ref66}, which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with simulation ones. -The experimental results were executed over different number of nodes and different platform scenarios. -\item In both the simulation and real experiments, NAS parallel benchmarks \cite{ref65} and Multi-splitting method solving 3D problem with different sizes used as a parallel applications were executed on clusters and grids. The final goal, is to evaluate the proposed methods over these applications and test their adaptation to these applications when different computation and communication ratios are existed. +\item The proposed algorithms were applied to the NAS parallel benchmarks \cite{ref65} and the Multi-splitting method. These applications offer different computations to communications ratios and a good testbed to evaluate the proposed algorithm in different scenarios. -\item All the proposed methods of this work compared with two approaches found in the literature: Rauber and Rünger \cite{ref47} and Spiliopoulos et al. \cite{ref67} methods. Both the simulation and real testbed results showed that the proposed methods gives better energy and performance trade-off ratios than these methods. +\item The proposed algorithms were evaluated over the SimGrid simulator \cite{ref66} which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with the simulated ones. +The experiments were conducted over different number of nodes and different platform scenarios. +\item All the proposed methods were compared with either Rauber and Rünger \cite{ref47} method or Spiliopoulos et al. \cite{ref67} objective function. Both the simulation and real experiments showed that the proposed methods give better energy to performance trade-offs than the other methods. \end{enumerate} - \section*{4. Dissertation Outline} \addcontentsline{toc}{section}{4. Dissertation Outline} -The dissertation is organized as follows: chapter \ref{ch1} presents a scientific background about types of parallel architectures, the parallel iterative applications and the energy consumption model from the state of the art that can be used to measure the energy consumption of these applications. -Chapter \ref{ch2} describes the propose energy and performance optimization method for synchronous iterative applications running over homogeneous cluster. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous iterative applications running over heterogeneous cluster and grid. Chapter \ref{ch4} presents -the proposed energy and performance optimization method for asynchronous iterative applications running over grid. Finally, we conclude our work of this dissertation in chapter \ref{ch5}. +The dissertation is organized as follows: chapter \ref{ch1} presents different types of parallel architectures and parallel applications with iterations. It also presents an energy consumption model from the state of the art that can be used to measure the energy consumption of these applications. +Chapter \ref{ch2} describes the proposed energy and performance optimization method for synchronous applications with iterations running over homogeneous clusters. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous applications with iterations running over heterogeneous clusters and grids. In chapter \ref{ch4} the energy and performance models and the optimization method are adapted for asynchronous iterative applications running over grids. Finally, this dissertation ends with a summary and some perspective works. \ No newline at end of file