\section*{1. General Introduction}
\addcontentsline{toc}{section}{1. General Introduction }
+The need and demand for more computing power have been increasing since the
+birth of the first computing unit and it is not expected to slow down in the
+coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Also, researchers and supercomputers
+constructors have been regularly increasing the number of computing cores and
+processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node.
+Therefore, these two common approaches are the most common up to now to get more computing power, but they are increasing the energy consumption of the resulting computing architecture.
+Indeed, the power consumed by a processor exponentially increases when its frequency increases and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
+As an example, the Chinese supercomputer
+Tianhe-2 had the highest FLOPS in November 2015 according to the Top500 list
+\cite{ref101}. However, it was also the most power hungry
+platform with its over 3 million cores consuming around 17.8 megawatts.
+Moreover, according to the U.S. annual energy outlook 2015
+\cite{ref102}, the price of energy for 1 megawatt per hour
+was approximately equal to \$70. Therefore, the price of the energy consumed by
+the Tianhe-2 platform is approximately more than \$10 million each year.
+Moreover, the heat generated by the platform and therefore a cooling infrastructure \cite{ref103} which also consumes a lot of energy, must be implemented to keep the platform from overheating. High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details.
+The computing platforms must be more energy efficient and offer the highest number
+of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN
+which became the top of the Green500 list in November 2015 \cite{ref105}.
+This heterogeneous platform executes more than 7 GFlops per watt while consuming
+50.32 kilowatts.
+
+For all these reasons energy reduction has become an important topic in the high performance
+computing (HPC) field. To tackle this problem, many researchers use DVFS (Dynamic
+Voltage and Frequency Scaling) operations which reduce dynamically the frequency and
+voltage of cores and thus their energy consumption \cite{ref49}.
+Indeed, modern CPUs offer a set of acceptable frequencies which are usually called gears, and the user or
+the operating system can modify the frequency of the processor according to its
+needs. However, DVFS reduces the number of FLOPS executed by the processor which may increase the execution
+time of the application running over that processor.
+Therefore researchers try to reduce the frequency to the minimum when processors are idle
+(waiting for data from other processors or communicating with other processors).
+Moreover, depending on their objectives, they use heuristics to find the best
+frequency scaling factor during the computation. If they aim for performance they choose
+the best frequency scaling factor that reduces the consumed energy while affecting as
+little as possible the performance. On the other hand, if they aim for energy
+reduction, the chosen frequency scaling factor must produce the most energy efficient
+execution without considering the degradation of the performance. Whereas, it is
+important to notice that lowering the frequency to the minimum value does not always
+give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, the more important question is how to select the best frequency gears that minimizes the total energy consumption and the maximizes the performance of parallel application running over a parallel platform at the same time?
+
+
+
+
\section*{2. Motivation of the Dissertation}
\addcontentsline{toc}{section}{2. Motivation of the Dissertation }
-\section*{3. The Objective of this Dissertation}
-\addcontentsline{toc}{section}{3. The Objective of this Dissertation}
+The main objective of HPC systems is to execute as fast as possible the sequential application over a parallel architecture.
+Hence, using DVFS to scale down the frequencies of CPUs composing the parallel architecture for energy reduction process, it can also significantly degrade the performance of the executed program if it is compute bound, when the program depends mainly of the computing power of the processor, and if a low CPU frequency is selected.
+Therefore, the chosen frequency scaling factor must give the best possible trade-off between the energy reduction and the performance of the parallel application.
+
+On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of the parallel applications when scaling the frequency of processors executing them because one affects the other. There are a very few works in the state of the art have been dedicated for this problem. Therefore, mathematical models of both the energy consumption and performance of the parallel application running over a parallel platform are required and should be defined precisely to discover the best relation between them.
+
+Furthermore, researchers use different optimization strategies to select the frequencies that gives the best trade-off between the energy reduction and performance degradation ratio, which might be chosen during execution (online) or during a pre-execution phase (offline). Thus, the best optimization approach to optimize the energy consumption and the performance at the same time must be applied online with a very low overhead on the execution time of the parallel application.
\section*{3. Main Contributions of this Dissertation}
\addcontentsline{toc}{section}{3. Main Contributions of this Dissertation}
+The main contributions of this dissertation focus on optimizing both the energy consumption and the performance of iterative parallel applications running over clusters and grids. The main contributions of this work summarize as follows:
-
-%\section{Methodology}
-%In this dissertation, analytical, as well as computational, methods are used.
+\begin{enumerate} [I)]
+\item We develop an energy consumption and performance models for synchronous and asynchronous message passing iterative applications. These models take into consideration both the computation and communications times of these application, in addition to their relation with frequency scaling factors.
-% \section{ Refereed Journal and Conference Publications}
+\item The iterative parallel applications were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connecting these nodes on the energy consumption and the performance of iterative applications.
+
+\item Depending on the proposed energy consumption and the performance models, we define a new objective function to optimize both the energy consumption and the performance of the iterative parallel applications at the same. The proposed objective function compute the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them.
+
+\item a new online frequencies selecting algorithms for cluster and grid are developed which used the new objective function. These algorithms selected the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they are working without training and profiling.
+
+\item We conducted extensive simulation experiments over SimGrid simulator \cite{ref66}, which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with simulation ones.
+The experimental results were executed over different number of nodes and different platform scenarios.
+
+\item In both the simulation and real experiments, NAS parallel benchmarks \cite{ref65} and Multi-splitting method solving 3D problem with different sizes used as a parallel applications were executed on clusters and grids. The final goal, is to evaluate the proposed methods over these applications and test their adaptation to these applications when different computation and communication ratios are existed.
+
+\item All the proposed methods of this work compared with two approaches found in the literature: Rauber and Rünger \cite{ref47} and Spiliopoulos et al. \cite{ref67} methods. Both the simulation and real testbed results showed that the proposed methods gives better energy and performance trade-off ratios than these methods.
+
+\end{enumerate}
+
+
\section*{4. Dissertation Outline}
\addcontentsline{toc}{section}{4. Dissertation Outline}
+The dissertation is organized as follows: chapter \ref{ch1} presents a scientific background about types of parallel architectures, the parallel iterative applications and the energy consumption model from the state of the art that can be used to measure the energy consumption of these applications.
+Chapter \ref{ch2} describes the propose energy and performance optimization method for synchronous iterative applications running over homogeneous cluster. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous iterative applications running over heterogeneous cluster and grid. Chapter \ref{ch4} presents
+the proposed energy and performance optimization method for asynchronous iterative applications running over grid. Finally, we conclude our work of this dissertation in chapter \ref{ch5}.
+
\ No newline at end of file