corrections

[ThesisAhmed.git] / INTRODUCTION.tex
diff --git a/INTRODUCTION.tex b/INTRODUCTION.tex

index c69e7dfb090ed389b076f0867de7852bb3404032..da10810aa50e60a3e582755bd195e985cd08fb5a 100644 (file)
--- a/INTRODUCTION.tex
+++ b/INTRODUCTION.tex
@@ -6,27 +6,28 @@
  
  \section*{1. General Introduction}  
  \addcontentsline{toc}{section}{1. General Introduction }
-
-The need and demand for more computing power have been increasing since the
-birth of the first computing unit and it is not expected to slow down in the
-coming years. To meet these demands, at first the frequency of the CPU was regularly increased  until  reaching the thermal limit. Also, researchers and supercomputers
+The need and the demand for more computing power have been increasing since the
+birth of the first computing unit and they are not expected to slow down in the
+coming years. To meet these demands, at first the frequency of the CPU was regularly increased  until  reaching the thermal limit. Then, researchers and supercomputers
  constructors have been regularly increasing the number of computing cores and
  processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node. 
-Therefore, these two  common approaches are the most common up to now to get more computing power, but they are increasing  the energy consumption of the resulting computing architecture.
-Indeed, the power consumed by a processor exponentially increases when its frequency increases and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
+These two  approaches are the most common up to now to get more computing power, but they increase the energy consumption of the resulting computing architecture.
+Indeed, the power consumed by a processor exponentially increases when its frequency is increased and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
  As an example, the Chinese supercomputer
  Tianhe-2 had the highest FLOPS in November 2015  according to the Top500 list
  \cite{ref101}.  However, it was also the most power hungry
-platform with its over 3 million cores consuming around 17.8 megawatts.
+platform with more than 3 million cores consuming around 17.8 megawatts.
  Moreover, according to the U.S.  annual energy outlook 2015
  \cite{ref102}, the price of energy for 1 megawatt per hour
  was approximately equal to \$70.  Therefore, the price of the energy consumed by
  the Tianhe-2 platform is approximately more than \$10 million each year.  
-Moreover,  the heat generated by the platform and therefore a cooling infrastructure \cite{ref103} which also consumes a lot of energy, must be implemented to keep the platform from overheating.  High CPU's temperatures can also drastically increase its energy consumption,  see \cite{ref104} for more details. 
-The computing platforms must be more energy efficient and offer the highest number
+Moreover, the platform generates a lot of heat and to prevent it from overheating a cooling 
+infrastructure \cite{ref103} which consumes a lot of energy must be implemented. 
+ High CPU's temperatures can also drastically increase its energy consumption,  see \cite{ref104} for more details. 
+An efficient computing platform must offer the highest number
  of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN
  which became the top of the Green500 list in November 2015 \cite{ref105}.
-This heterogeneous platform executes more than 7 GFlops per watt while consuming
+This heterogeneous platform executes more than 7 GFlops per watt while only consuming
  50.32 kilowatts. 
  
  For all these reasons energy reduction has become an important topic in the high performance
@@ -46,7 +47,7 @@ little as possible the performance.  On the other hand, if they aim for energy
  reduction, the chosen frequency scaling factor must produce the most energy efficient
  execution without considering the degradation of the performance. Whereas, it is
  important to notice that lowering the frequency to the minimum value does not always
-give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases.  However, the more important question is how to select the best frequency gears that minimizes the total energy consumption and the maximizes the performance of parallel application running over a parallel platform at the same time?
+give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases.  However, a  more important question is how to select the best frequency gears that minimize the total energy consumption and the maximize the performance of a parallel application, running over a parallel platform,  at the same time?
  
  
  
@@ -55,46 +56,46 @@ give the most energy efficient execution due to energy leakage that increases th
  \section*{2. Motivation of the Dissertation}
  \addcontentsline{toc}{section}{2. Motivation of the Dissertation }
  
-The main objective of HPC systems is to execute as fast as possible the sequential application over a parallel architecture.
-Hence, using DVFS to scale down the frequencies of CPUs composing the parallel architecture for energy reduction process, it can also significantly  degrade the performance of the executed program if it is compute bound, when the program depends mainly of the computing power of the processor, and if a low CPU frequency is selected.    
+The main objective of an HPC system such as  clusters, grids and supercomputers is to execute as fast as possible a given task  over that system.
+Hence, using DVFS to scale down the frequencies of the CPUs composing the system to reduce their energy consumption, it can also significantly  degrade the performance of the executed program, especially if it is compute bound. A compute bound program contain a lot of computations and a relatively small amount of communicators and Inputs/Outputs operations. The execution time of the program is directly dependent on 
+the computing powers of the CPUs and their selected frequencies.
  Therefore, the chosen frequency scaling factor must  give the best possible trade-off between the energy reduction and the performance of the parallel application. 
  
-On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of the parallel applications when scaling the frequency of  processors executing them because one affects the other. There are a very few works in the state of the art have been dedicated for this problem. Therefore, mathematical models of both the energy consumption and  performance of the parallel application running over a parallel platform are required and should be defined precisely to discover the best relation between them.
+On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of  parallel applications when scaling the frequency of  the processors executing them because one affects the other. In order to evaluate the impact of scaling down the CPU's frequency on its energy consumption and computing power, mathematical models should be defined to predict them for different frequencies.
+
  
-Furthermore, researchers use different optimization strategies to select the frequencies that gives the best trade-off between the energy reduction and performance degradation ratio, which might be chosen during execution (online) or during a pre-execution phase (offline). Thus, the best optimization approach to optimize the energy consumption and the performance at the same time  must be applied online with a very low overhead on the execution time of the parallel application.
+Furthermore, researchers use different optimization strategies to select the frequencies of the CPUs. They might be executed  during the execution of the application (online) or during a pre-execution phase (offline). In our opinion a good approach  should minimize the energy consumption while preserving the performance at the same time. Finally, it should also be applied to the application during its execution without requiring any training or profiling and with minimal overhead.
  
  
  \section*{3. Main Contributions of this Dissertation}
   \addcontentsline{toc}{section}{3. Main Contributions of this Dissertation}
  
-The main contributions of this dissertation focus on optimizing both the energy consumption and the performance of iterative parallel applications running over clusters and grids. The main contributions of this work summarize as follows: 
+The main objective of this work is to minimize the energy consumption of  parallel applications with iterations running over clusters and grids while preserving their performance. The main contributions of this work can be summarized as follows: 
  
  \begin{enumerate} [I)]
  
-\item  We develop an energy consumption and performance models for synchronous and asynchronous message passing iterative applications. These models take into consideration both the computation and communications times of these application, in addition to their relation with frequency scaling factors. 
+\item  Energy consumption and performance models for synchronous and asynchronous message passing  applications with iterations were developed. These models take into consideration both the computation and communications times of these applications in addition to their relation to the frequency scaling factors. 
  
-\item The iterative parallel applications were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity  in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which  connecting these nodes on the energy consumption and the performance of iterative  applications.
+\item The  parallel applications with iterations were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity  in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which  connect these nodes on the energy consumption and the performance of parallel  applications with iterations.
  
-\item Depending on the proposed energy consumption and the performance models, we define a new  objective function to optimize both  the energy consumption and the performance of the iterative parallel applications at the same. The proposed objective function compute the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them. 
+\item Depending on the proposed energy consumption and the performance models,  a new  objective function to optimize both  the energy consumption and the performance of the  parallel applications with iterations at the same were defined. It computes the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them. 
  
-\item  a new online frequencies selecting algorithms for cluster and grid are developed which used the new objective function. These algorithms  selected the frequency scaling factors that simultaneously optimize both the  energy consumption and  performance. They have a very small overhead when comparing them to other methods in the state of the art and they are working without training and profiling. 
+\item  New online frequency selecting algorithms for clusters and grids were developed. They use the new objective function and  select the frequency scaling factors that simultaneously optimize both the  energy consumption and  performance. They have a very small overhead when comparing them to other methods in the state of the art and they work without training and profiling. 
  
-\item We conducted extensive simulation experiments over SimGrid simulator \cite{ref66}, which offers  flexible and easy tools to built different types of parallel architectures.  Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and  compared with simulation ones.
-The experimental results were executed over different number of nodes and different  platform scenarios.
  
-\item  In both the simulation and real experiments, NAS parallel benchmarks \cite{ref65} and Multi-splitting method solving 3D problem with different sizes used as a parallel applications were executed on clusters and grids. The final goal, is to evaluate the proposed methods  over these applications and test their adaptation to these applications when different computation and communication ratios are existed.
+\item  The proposed algorithms were applied to the NAS parallel benchmarks \cite{ref65} and the Multi-splitting method.  These applications offer different computations to communications ratios and a good testbed to evaluate the proposed algorithm in different scenarios.
  
-\item All the proposed methods of this work compared with two approaches found in the literature: Rauber and Rünger    \cite{ref47} and  Spiliopoulos et al.  \cite{ref67} methods. Both the simulation and real testbed results  showed that the proposed methods gives better energy and performance trade-off ratios than these methods.
+\item The proposed algorithms were evaluated  over the SimGrid simulator \cite{ref66} which offers flexible and easy tools to built different types of parallel architectures.  Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and  compared with the simulated ones.
+The experiments  were conducted over different number of nodes and different  platform scenarios.
  
+\item All the proposed methods were compared with either Rauber and Rünger     \cite{ref47} method or  Spiliopoulos et al.  \cite{ref67} objective function. Both the simulation and real experiments showed that the proposed methods give better energy to performance trade-offs  than the other methods.
  \end{enumerate}
   
   
- 
  \section*{4. Dissertation Outline}
  \addcontentsline{toc}{section}{4. Dissertation Outline}
-The dissertation is organized as follows: chapter \ref{ch1} presents a scientific background about types of parallel architectures, the parallel iterative applications and the energy consumption model from the state of the art that can be used to measure the energy consumption of these applications. 
-Chapter \ref{ch2} describes the propose energy and performance optimization method for synchronous iterative applications running over homogeneous cluster. Chapter \ref{ch3}  presents two algorithms for the energy and performance optimization of synchronous iterative applications running over heterogeneous cluster and grid. Chapter \ref{ch4} presents  
-the proposed energy and performance optimization method for asynchronous iterative applications running over grid. Finally, we conclude our work of this dissertation in chapter \ref{ch5}.
+The dissertation is organized as follows: chapter \ref{ch1} presents  different types of parallel architectures and  parallel  applications with iterations. It also presents an  energy consumption model from the state of the art that can be used to measure the energy consumption of these applications. 
+Chapter \ref{ch2} describes the proposed energy and performance optimization method for synchronous  applications with iterations running over homogeneous clusters. Chapter \ref{ch3}  presents two algorithms for the energy and performance optimization of synchronous  applications  with iterations running over heterogeneous clusters and grids. In chapter \ref{ch4} the energy and performance models and the optimization method are adapted  for asynchronous iterative applications running over grids. Finally, this dissertation ends with a summary and some perspective works.
  
   
   
 \ No newline at end of file