From: afanfakh Date: Thu, 23 Jun 2016 15:23:16 +0000 (+0200) Subject: adding some changes X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/ThesisAhmed.git/commitdiff_plain/7b4f492a307bee60f4dfc90adb0aee0bad5f48ed?ds=inline adding some changes --- diff --git a/ACRONYMS.tex b/ACRONYMS.tex index a18d0ca..8546dcf 100644 --- a/ACRONYMS.tex +++ b/ACRONYMS.tex @@ -1,3 +1,5 @@ + + \chapter*{abbreviations \markboth{abbreviations}{abbreviations}} \label{chap} \addcontentsline{toc}{chapter}{List of Abbreviations} @@ -28,9 +30,9 @@ \item[GPU] \textbf{G}raphical \textbf{P}rocessing \textbf{U}nit \item[HSA] \textbf{H}eterogeneous \textbf{S}caling \textbf{A}lgorithm \item[ILP] \textbf{I}nstruction \textbf{L}evel \textbf{P}arallelism -\item[LAN] \textbf{L}ocal \textbf{A}rea Network +\item[LAN] \textbf{L}ocal \textbf{A}rea \textbf{N}etwork \item[LLP] \textbf{L}oop \textbf{L}evel \textbf{P}arallelism -\item[LU] \textbf{L}ower-\textbf{U}pper Symmetric Gauss-Seidel +\item[LU] \textbf{L}ower-\textbf{U}pper \item[MaxDist] \textbf{Max}imum \textbf{Dist}ance \item[MG] \textbf{M}ulti-\textbf{G}rid \item[MIMD] \textbf{M}ultiple \textbf{I}nstruction and \textbf{M}ultiple \textbf{D}ata @@ -40,8 +42,8 @@ \item[MS] \textbf{M}ulti-\textbf{S}plitting \item[NAS] \textbf{N}umerical \textbf{A}eronautical \textbf{S}imulation \item[NASA] \textbf{N}ational \textbf{A}eronautics and \textbf{S}pace \textbf{A}dministrations -\item[OPENCL] \textbf{O}pen \textbf{C}omputing \textbf{L}anguage -\item[OPENMP] \textbf{O}pen \textbf{M}ulti-\textbf{P}rocessing +\item[OPENCL] \textbf{Open} \textbf{C}omputing \textbf{L}anguage +\item[OPENMP] \textbf{Open} \textbf{M}ulti-\textbf{P}rocessing \item[RENATER] \textbf{Ré}seau \textbf{Na}tional de \textbf{T}élécommunications pour la \textbf{T}echnologie, \newline\hspace*{3em} l'\textbf{E}nseignement et la \textbf{R}echerche \item[SIAC] \textbf{S}ynchronous \textbf{I}terations and \textbf{A}synchronous \textbf{C}ommunications diff --git a/Abstruct.tex b/Abstruct.tex index 9aa1432..4c99eb3 100644 --- a/Abstruct.tex +++ b/Abstruct.tex @@ -12,29 +12,33 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\emph{ \begin{center} \Large Energy Consumption Optimization of Parallel Iterative Applications using CPU Frequency Scaling\end{center}} +\emph{ \begin{center} \Large Energy Consumption Optimization of Parallel Applications with Iterations using CPU Frequency Scaling\end{center}} %\emph{ \begin{center} \large By \end{center}} -\emph{ \begin{center} \large Ahmed Badri Muslim Fanfakh \\ University of Franche-Comt\'e, 2016 \end{center}} +\emph{ \begin{center} \large Ahmed Badri Muslim FANFAKH \\ University of Franche-Comt\'e, 2016 \end{center}} %\emph{ \begin{center} \large The University of Franche-Comt\'e, 2015 \end{center}} -\emph{ \begin{center} \large Supervisors: Raphaël Couturier, Jean-Claude Charr and Arnaud Giersch \end{center}} +\emph{ \begin{center} \large Supervisors: Raphaël Couturier and Jean-Claude Charr \end{center}} In recent years, green computing has become an important topic in the supercomputing -research domain. However, the computing platforms are still consuming more and more energy due to the increasing number of nodes composing them. +research domain. However, the computing platforms are still consuming more and more energy due to the increase in the number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It can be used to reduce the power consumption of the CPU while computing, by lowering its frequency. However, lowering the frequency of a CPU may increase -the execution time of an application running on that processor. Therefore, the +the execution time of the application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected. -In this thesis, a set of works are presented to optimize the energy consumption -and the performance of the synchronous and asynchronous message passing iterative applications running over clusters and grids. The energy consumption and performance models for each type of iterative application -are well modelled and defined according to the characteristics of both the application itself and the parallel architecture executing this application. +This thesis, presents the algorithms developed to optimize the energy consumption +and the performance of synchronous and asynchronous message passing applications with iterations running over clusters or grids. The energy consumption and performance models for each type of parallel application predicts its execution time and energy consumption for any selected frequency +according to the characteristics of both the application and the architecture executing this application. -Firstly, we propose a frequency scaling factors selection algorithm for synchronous iterative applications running over a homogeneous platform. It is applied to the NAS parallel benchmarks and stimulated by SimGrid simulator. -Secondly, we develop two frequency scaling factors selection algorithms for synchronous iterative applications running over a heterogeneous cluster and grid. Both algorithms are implemented to the NAS parallel benchmarks and conducted over SimGrid simulator and Grid'5000 testbed. -Thirdly, we propose a frequency scaling factors selection algorithm for an asynchronous and a hybrid iterative applications running over a grid. The algorithm is evaluated over SimGrid simulator and Grid'5000 testbed while running a multi-splitting application that solves 3D problem. -All the proposed algorithms are compared to an existing methods, which are the Rauber and Rünger and the energy and delay products (EDP) methods. The comparison results show that all the proposed algorithms give better energy consumption and performance trade-off results. The proposed algorithms have a very small overhead on the execution time of the applications and they work online without training and profiling. -\textbf{KEY WORDS:} Dynamic voltage and frequency scaling, Grid computing, Energy optimization, iterative applications and frequency scaling online algorithm. \ No newline at end of file +The contribution of this thesis can be divided into three parts: Firstly, optimizing the trade-off between +the energy consumption and the performance of the message passing applications with synchronous iterations +running over homogeneous clusters. Secondly, adapting the energy and performance models to heterogeneous platforms where each node can have different specifications such as computing power, energy consumption, available frequency gears or network's latency and bandwidth. The frequency scaling algorithm was also modified to suit the heterogeneity of the platform. Thirdly, the models and the frequency scaling algorithm were completely rethought to take into considerations the asynchronism in the communication and computation. +All these models and algorithms were applied to message passing applications with iterations and evaluated +over either SimGrid simulator or Grid'5000 platform. The experiments showed that the proposed algorithms are +efficient and outperform existing methods such as the energy and delay product. They also introduce a small +runtime overhead and work online without any training or profiling. + +\textbf{KEY WORDS:} Dynamic voltage and frequency scaling, Grid computing, Energy optimization, parallel applications with iterations and online frequency scaling algorithm. \ No newline at end of file diff --git a/CHAPITRE_01.tex b/CHAPITRE_01.tex index c3b0641..cc23717 100644 --- a/CHAPITRE_01.tex +++ b/CHAPITRE_01.tex @@ -323,8 +323,8 @@ some examples for each type of the parallel programming models are discussed: \section{Iterative Methods} \label{ch1:3} -In this work, we are interested in solving linear equations which are well known in the scientific area. -It is generally expressed in the following form: +In this work, we are interested in solving system of linear equations which are very common in the scientific field. A system of linear equations can be expressed as follows: + \begin{equation} \label{eq:linear} @@ -332,14 +332,14 @@ It is generally expressed in the following form: \end{equation} Where $A$ is a two dimensional matrix of size $N \times N$, $x$ is the unknown vector, -and $b$ is a vector of constant, each of size $N$. There are two types of solution methods to solve this linear system. -The first type of methods is called \textbf{Direct methods}, which consist of a finite number of steps depending on the -size of the linear system to give the exact solution. If the problem size is very big, these methods are expensive or their -solutions are impossible in some cases. The second type is called \textbf{Iterative methods}, which computes -the same block of operations several times, starting from the initial vector until reaching the acceptable -approximation of the exact solution. However, they can be effectively applied in parallel. Moreover, iterative methods can be used to solve both linear and non-linear equations. +and $b$ is a vector of constant, each of size $N$. There are two types of solution methods to solve this linear system: the \textbf{direct} and the \textbf{iterative methods}. +A direct method executes a finite number of steps, depending on the +size of the linear system and gives the exact solution of the system. If the problem is very big, this method is expensive or its +solution is impossible in some cases. On the other hand, methods with iterations execute the same block of instructions many times. The number of iterations can be predefined or the application iterates until a criterion is satisfied. Iterative methods are methods with iterations that start from an initial guess and +improve successively the solution until reaching an acceptable approximation of the exact solution. +These methods are well adapted for large systems and can be easily parallelized. -The sequential iterative algorithm is typically organized as a series of steps essentially of the form: +A sequential iterative algorithm is typically organized as a series of steps essentially of the form: \begin{equation} \label{eq:iter} @@ -372,6 +372,7 @@ The sequential iterative algorithm at each iteration computes the value of the r \end{equation} Where $N$ is the size of the vector $X$. Then, the iterative sequential algorithm stops iterating if the maximum error between the last two successive solution vectors, as in \ref{eq:res}, is less than or equal to a threshold value. Otherwise, it replaces the new vector $X^{(k+1)}$ with the old vector $X^k$ and computes a new iteration. + \subsection{Synchronous Parallel Iterative method} \label{ch1:3:1} The sequential iterative algorithm \ref{sia} can be parallelized by executing it on many computing units. To solve this algorithm on $M$ computing units, first the elements of the problem vector $X$ must be subdivided into $M$ sub-vectors, $X^k=(X_1^k,\dots,X_M^k)$. diff --git a/CHAPITRE_02.tex b/CHAPITRE_02.tex index 48ea63c..c6a9a29 100644 --- a/CHAPITRE_02.tex +++ b/CHAPITRE_02.tex @@ -54,8 +54,8 @@ model for homogeneous platforms from other researchers. Section~\ref{ch2:3} des performance of MPI programs can be predicted. Section~\ref{ch2:4} presents the energy-performance objective function that maximizes the reduction of energy consumption while minimizing the degradation of the program's performance. -Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance -trade-off for a parallel iterative application. +Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance +trade-off for a parallel application with iterations Section~\ref{ch2:6} verifies the accuracy of the performance prediction model and presents the results of the proposed algorithm. It also shows the comparison results between our method and other existing methods. @@ -219,7 +219,7 @@ This model computes the frequency scaling factor which minimizes the energy con \section{Performance evaluation of MPI programs} \label{ch2:3} -The execution time of a parallel synchronous iterative application is equal +The execution time of a parallel synchronous application with iteration is equal to the execution time of its slowest task as in figure~(\ref{fig:homo}). If there is no communication in the application and it is not data bounded, the execution time of this parallel application is linearly proportional to the operational frequency. Any @@ -420,6 +420,8 @@ two successive frequencies. The simulated network link is 1 GB Ethernet \end{tabular} \label{table:platform-homo} \end{table} + + \subsection{Performance prediction verification} \label{ch2:6:1} In this section, the precision of the proposed performance prediction method @@ -444,7 +446,9 @@ maximum frequency by the new one see EQ~(\ref{eq:s}). In our cluster there are 18 available frequency states for each processor. This leads to 18 run states for each program. Seven MPI programs of the NAS parallel benchmarks were used: CG, MG, EP, FT, BT, LU and SP. Table \ref{table:NAS-dec} shows -the description of these seven benchmarks. +the description of these seven benchmarks. Some of these benchmarks are considered MPI parallel applications with synchronous iterations or iterative applications that repeat the same block of instructions until convergence. +However, the proposed method can be applied to any application that executes the same block +of instructions many times and it is not limited to iterative methods. Figure~(\ref{fig:pred}) presents plots of the real execution times compared to the simulated ones. The maximum normalized error between these two execution times varies between 0.0073 to 0.031 depending on the executed benchmark. The smallest prediction error @@ -457,13 +461,19 @@ was for CG and the worst one was for LU. \begin{tabular}{|l|l|l|} \hline Benchmark & Full Name & Description \\ \hline -CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}}Estimate the smallest eigenvalue of a large \\ sparse symmetric positive-definite matrix \\ using the inverse iteration with the conjugate \\ gradient method as a subroutine for solving \\ systems of linear equations\end{tabular} \\ \hline -MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}Approximate the solution to a three-dimensional \\ discrete Poisson equation using the V-cycle \\ multigrid method\end{tabular} \\ \hline -EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}}Generate independent Gaussian random \\ variates using the Marsaglia polar method\end{tabular} \\ \hline -FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}Solve a three-dimensional partial differential\\ equation (PDE) using the fast Fourier transform \\ (FFT)\end{tabular} \\ \hline -BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}}Solve a synthetic system of nonlinear PDEs \\ using three different algorithms involving \\ block tridiagonal, scalar pentadiagonal and \\ symmetric successive over-relaxation \\ (SSOR) solver kernels, respectively\end{tabular}} \\ \cline{1-2} +CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}} +It solves a system of linear equations by estimating\\ the smallest eigenvalue of a large sparse matrix \end{tabular}\\ \hline + +MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}It uses the multigrid method to approximate the solution \\of a three-dimensional discrete Poisson equation\end{tabular} + \\ \hline +EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}} It applies the Marsaglia polar method to randomly \\generates independent Gaussian variates +\end{tabular} \\ \hline +FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}It uses the fast Fourier transform to solve a \\ three-dimensional partial differential equation + +\end{tabular} \\ \hline +BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}} \\They solve nonlinear partial differential equations \end{tabular}} \\ \cline{1-2} LU & \begin{tabular}[c]{@{}l@{}}Lower-Upper symmetric \\ Gauss-Seidel\end{tabular} & \\ \cline{1-2} -SP & \begin{tabular}[c]{@{}l@{}}Scalar \\ Pentadiagonal\end{tabular} & \\ \hline +SP & \begin{tabular}[c]{@{}l@{}}Scalar Pentadiagonal\end{tabular} & \\ \hline \end{tabular} \end{table} @@ -669,8 +679,8 @@ Therefore, the energy consumption model of $N$ parallel task executed synchrono \end{equation} According to this model, the frequency scaling factor $S$ reduces the energy consumption of the homogeneous architecture by a factor of $S^{-2}$ and increases the execution time by a factor of $S$. -This model can be used to predict the energy consumption of the message passing synchronous iterative applications after gathering the computation and communication times of the first iteration. -Furthermore, it can be used to measure the energy consumption of the iterative application by multiplying the energy consumed of all tasks in one iteration by the number of the iterations. +This model can be used to predict the energy consumption of the message passing applications with synchronous iterations after gathering the computation and communication times of the first iteration. +Furthermore, it can be used to measure the energy consumption of the parallel application with iterations by multiplying the energy consumed of all tasks in one iteration by the number of the iterations. This model is used by the algorithm \ref{EPSA} to predict the energy consumption and to select the optimal frequency scaling factor. The new frequency $F_i$ can be computed as in \ref{eq:fi} while using the new selected frequency scaling factor. In the next section, algorithm \ref{EPSA} is re-evaluated while using this new energy model and the new results are presented. @@ -734,7 +744,7 @@ have applied it to the NAS benchmarks and it was compared to the Rauber and Rün method while being executed on the SimGrid simulator. The results showed that our method, outperforms the Rauber and Rünger's method in terms of energy-performance ratio. Finally, this chapter presents a new energy consumption model for parallel -synchronous iterative methods running on homogeneous clusters. This model takes into consideration + applications with synchronous iterations running on homogeneous clusters. This model takes into consideration both the computation and communication times and their relation with the frequency scaling factor. The results obtained using the new energy model have shown that different frequency scaling factors were selected which gave new experimental results that are more accurate and realistic. \ No newline at end of file diff --git a/CHAPITRE_03.tex b/CHAPITRE_03.tex index 27d95ec..953bcfc 100644 --- a/CHAPITRE_03.tex +++ b/CHAPITRE_03.tex @@ -68,16 +68,16 @@ clusters (heterogeneous CPUs) and grid platforms are presented. They select the frequencies that try to give the best trade-off between energy saving and performance degradation, for each node - computing the synchronous message passing iterative application. These algorithms have a small + computing the synchronous message passing application with iterations. These algorithms have a small overhead and work without training or profiling. They use new energy models - for message passing iterative synchronous applications running on both the heterogeneous + for message passing synchronous applications with iterations running on both the heterogeneous local cluster and the grid platform. The first proposed algorithm for a heterogeneous local cluster was evaluated on the SimGrid simulator while running the class C of the NAS parallel benchmarks. The experiments conducted over 8 heterogeneous nodes show that it reduces on average the energy consumption by 29.8\% while limiting the performance degradation to 3.8\%. The second proposed algorithm for a grid platform was evaluated on the Grid5000 testbed platform while running the class D of the NAS parallel benchmarks. - The experiments were run on 16 nodes, distributed on three clusters, and show that it reduces + The experiments were run on 16 nodes, distributed on three clusters, and show that the algorithm reduces on average the energy consumption by 30\% while the performance is on average only degraded by 3.2\%. Finally, both algorithms were compared to the EDP method. The comparison @@ -96,13 +96,12 @@ Section~\ref{ch3:3} shows the energy and performance models in addition to the f selecting algorithm of synchronous message passing programs running over a grid platform. Section~\ref{ch3:4} presents the results of applying the algorithm on the NAS parallel benchmarks (class D) and executing them on the Grid'5000 testbed. -The algorithm is also evaluated over multi-core architectures and over three different power scenarios. Moreover, section~\ref{ch3:4}, shows the comparison results between the proposed method and the EDP method. +The algorithm is also evaluated over multi-core architectures and over three different power scenarios. Moreover, Section~\ref{ch3:4}, shows the comparison results between the proposed method and the EDP method. Finally, in Section~\ref{ch3:concl} the chapter ends with a summary. \section{Related works} \label{ch3:relwork} -As same as in CPUs, DVFS is also allowed in GPUs to reduce their energy consumption. The process of selecting the appropriate frequency for a processor to satisfy some objectives, while taking into account all the constraints, is not a trivial operation. Many researchers used different @@ -119,7 +118,7 @@ sequential, parallel or distributed architecture, homogeneous or heterogeneous platform, synchronous or asynchronous application, \dots{} In this chapter, we are interested in reducing the energy consumption when running a message passing -iterative synchronous applications over a heterogeneous platform. Some +synchronous applications with iterations over a heterogeneous platform. Some works have already been done for such platforms which can be classified into two types of heterogeneous platforms: \begin{itemize} @@ -166,8 +165,8 @@ while respecting the given time constraint. This approach had considerable overhead. In contrast to the above described works, the work of this chapter presents the following contributions: \begin{enumerate} -\item two new energy and two performance models for message passing iterative - synchronous applications running over a heterogeneous local cluster and a grid platform. +\item two new energy and two performance models for message passing + synchronous applications with iterations running over a heterogeneous local cluster and a grid platform. All the models take into account the communications and the slack times. The models can predict the energy consumption and the execution time of the application. @@ -175,18 +174,18 @@ following contributions: local cluster and a grid platform. The algorithms have a very small overhead and do not need any training or profiling. They use a new optimization function which simultaneously maximizes the performance and minimizes the energy consumption - of a message passing iterative synchronous application. + of a message passing synchronous application with iterations. \end{enumerate} -\section[The energy optimization of a heterogeneous cluster]{The energy optimization of parallel iterative applications running over local heterogeneous +\section[The energy optimization of a heterogeneous cluster]{The energy optimization of parallel applications with iterations running over local heterogeneous clusters} \label{ch3:1} -\subsection{The execution time of message passing distributed iterative - applications on a heterogeneous local cluster} +\subsection{The execution time of message passing distributed + applications with iterations on a heterogeneous local cluster} \label{ch3:1:1} In this section, we are interested in reducing the energy consumption of message -passing distributed iterative synchronous applications running over heterogeneous local clusters. +passing distributed synchronous applications with iterations running over heterogeneous local clusters. In this work, a heterogeneous local cluster is defined as a collection of heterogeneous computing nodes interconnected via a high speed homogeneous network. Therefore, the nodes may have different characteristics such as computing @@ -200,7 +199,7 @@ have the same network bandwidth and latency. \label{fig:task-heter} \end{figure} -The overall execution time of a distributed iterative synchronous application +The overall execution time of a distributed synchronous application with iterations over a heterogeneous local cluster consists of the sum of the computation time and the communication time for every iteration on a node. However, due to the heterogeneous computation power of the computing nodes, slack times may occur @@ -227,8 +226,8 @@ Since in a heterogeneous cluster the nodes may have different characteristics, especially different frequency gears, when applying DVFS operations on these nodes, they may get different scaling factors represented by a scaling vector: $(S_1, S_2,\dots, S_N)$ where $S_i$ is the scaling factor of processor $i$. To -be able to predict the execution time of message passing synchronous iterative -applications running over a heterogeneous local cluster, for different vectors of +be able to predict the execution time of message passing synchronous +applications with iterations running over a heterogeneous local cluster, for different vectors of scaling factors, the communication time and the computation time for all the tasks must be measured during the first iteration before applying any DVFS operation. Then the execution time for one iteration of the application with any @@ -242,27 +241,27 @@ where $\TcpOld[i]$ is the computation time of processor $i$ during the first iteration. The model computes the maximum computation time with scaling factor from each node added to the communication time of the slowest node. It means only the communication time without any slack time is taken into -account. Therefore, the execution time of the iterative application is equal to +account. Therefore, the execution time of the application with iterations is equal to the execution time of one iteration as in (\ref{eq:perf_heter}) multiplied by the number of iterations of that application. This prediction model is improved from the model that predicts the execution time of message passing distributed applications for homogeneous -architectures presented in chapter \ref{ch2} section \ref{ch2:3}. The execution time prediction model is +architectures presented in Chapter \ref{ch2} Section \ref{ch2:3}. The execution time prediction model is used in the method that optimizes both the energy consumption and the performance -of iterative methods, which is presented in the following sections. +of parallel application with iterations, which is presented in the following sections. \subsection{Energy model for heterogeneous local cluster} \label{ch3:1:2} -In chapter \ref{ch2}, the dynamic and the static energy consumption of a +In Chapter \ref{ch2}, the dynamic and the static energy consumption of a processor is computed according to Equations \ref{eq:Edyn_new} and \ref{eq:Estatic_new} respectively. Then, the total energy consumption of a processor is the sum of these two metrics. Therefore, the overall energy consumption for the parallel tasks over a parallel cluster is the summation of the energies consumed by all the processors. In the considered heterogeneous platform, each processor $i$ may have different dynamic and static powers, noted as $\Pd[i]$ and $\Ps[i]$ -respectively. Therefore, even if the distributed message passing iterative -application is load balanced, the computation time of each CPU $i$ noted +respectively. Therefore, even if the distributed message passing +application with iterations is load balanced, the computation time of each CPU $i$ noted $\Tcp[i]$ may be different and different frequency scaling factors may be computed in order to decrease the overall energy consumption of the application and reduce the slack times. The communication time of a processor $i$ is noted as @@ -273,7 +272,7 @@ frequency scaling factor and the dynamic power of each node as in (\ref{eq:Edyn_new}), the static energy is computed as the sum of the execution time of one iteration as in \ref{eq:perf_heter} multiplied by the static power of each processor. The overall energy consumption of a message passing distributed application executed over a -heterogeneous cluster during one iteration is the summation of all the dynamic and +heterogeneous cluster during one iteration is the summation of the dynamic and static energies for all the processors. It is computed as follows: \begin{equation} \label{eq:energy-heter} @@ -286,7 +285,7 @@ Reducing the frequencies of the processors according to the vector of scaling factors $(S_1, S_2,\dots, S_N)$ may degrade the performance of the application and thus, increase the consumed static energy because the execution time is increased~\cite{ref78}. The overall energy consumption -for an iterative application can be measured by measuring the energy +for an application with iterations can be measured by measuring the energy consumption for one iteration as in (\ref{eq:energy-heter}) multiplied by the number of iterations of that application. @@ -303,14 +302,14 @@ the application might not be the optimal one. It is not trivial to select the appropriate frequency scaling factor for each processor while considering the characteristics of each processor (computation power, range of frequencies, dynamic and static powers) and the task it is executing (computation/communication -ratio). In chapter~\ref{ch2}, we proposed a method that selects the optimal +ratio). In Chapter~\ref{ch2}, we proposed a method that selects the optimal frequency scaling factor for a homogeneous cluster executing a message passing -iterative synchronous application while giving the best trade-off between the + synchronous application with iterations while giving the best trade-off between the energy consumption and the performance for such applications. In this section, this optimization method is improved while considering a heterogeneous clusters. As described before, the relation between the energy consumption and the execution time for an -application is complex and nonlinear. Thus, to find the trade-off relation between the energy consumption computed in Equation \ref{eq:energy-heter} and the performance with Equation \ref{eq:perf_heter} for the iterative message passing applications, first we need to normalize both term as follows: +application is complex and nonlinear. Thus, to find the trade-off relation between the energy consumption computed in Equation \ref{eq:energy-heter} and the performance with Equation \ref{eq:perf_heter} for the message passing applications with iterations, first we need to normalize both terms as follows: \begin{equation} @@ -334,7 +333,7 @@ application is complex and nonlinear. Thus, to find the trade-off relation betwe \begin{figure}[!t] \centering \includegraphics[width=.7\textwidth]{fig/ch3/heter} - \caption{The energy and performance relation in Heterogeneous cluster} + \caption{The energy and performance relation in heterogeneous cluster} \label{fig:rel-heter} \end{figure} @@ -431,8 +430,8 @@ for the node $i$. Then, the set of scaling factors that maximizes the objective In this section, Algorithm~\ref{HSA} is presented. It selects the frequency scaling factors vector that gives the best trade-off between minimizing the energy consumption and maximizing the performance of a message passing -synchronous iterative application executed on a heterogeneous local cluster. It works -online during the execution time of the iterative message passing program. It +synchronous application with iterations executed on a heterogeneous local cluster. It works +online during the execution time of the message passing program with iterations. It uses information gathered during the first iteration such as the computation time and the communication time in one iteration for each node. The algorithm is executed after the first iteration and returns a vector of optimal frequency @@ -440,7 +439,7 @@ scaling factors that satisfies the objective function (\ref{eq:max-heter}). The program applies DVFS operations to change the frequencies of the CPUs according to the computed scaling factors. This algorithm is called just once during the execution of the program. Algorithm~\ref{dvfs-heter} shows where and when the proposed -scaling algorithm is called in the iterative MPI program. +scaling algorithm is called in the MPI program with iterations. \begin{figure}[!t] \centering @@ -578,10 +577,10 @@ specifications of real Intel processors. The heterogeneous cluster had up to 144 nodes and had nodes from the four types in equal proportions, for example if a benchmark was executed on 8 nodes, 2 nodes from each type were used. Since the constructors of CPUs do not specify the dynamic and the static power of their -CPUs, for each type of node they were chosen proportionally to its computing +CPUs, for each type of node they were chosen proportionally to their computing powers (FLOPS). The dynamic power corresponds to 80\% of the overall power consumption while executing at the higher frequency and the -remaining 20\% is the static power. The same assumption was made in chapter \ref{ch2} and +remaining 20\% is the static power. The same assumption was made in Chapter \ref{ch2} and \cite{ref3}. Finally, These nodes were connected via an Ethernet network with 1 \textit{Gbit/s} bandwidth. @@ -824,7 +823,7 @@ more dynamic power is consumed when nodes are running on their maximum frequencies, thus, scaling down the frequency of the nodes results in higher energy savings than in the 70\%-30\% scenario. On the other hand, the performance degradation percentage is smaller in the 70\%-30\% -scenario compared to the 90\%-\%10 scenario. This is due to the +scenario compared to the 90\%-10\% scenario. This is due to the higher static power percentage in the first scenario which makes it more relevant in the overall consumed energy. Indeed, the static energy is related to the execution time and if the performance is degraded the amount of consumed @@ -844,14 +843,14 @@ lowering the frequency of some processors. On the other hand, the energy saving decreases when the 70\%-30\% scenario is used because the dynamic energy is less relevant in the overall consumed energy and lowering the frequency does not return big energy savings. Moreover, the average of the -performance degradation is decreased when using a higher ratio for static power +performance degradation is decreased when using a higher ratio for the static power (e.g. 70\%-30\% scenario and 80\%-20\% scenario). Since the proposed algorithm optimizes the energy consumption when using a higher ratio for the dynamic power, the algorithm selects bigger frequency scaling factors that results in more energy saving but degrade the performance, for example see Figure~\ref{fig:powers-heter} (b). The opposite happens when using a higher ratio for the static power, the algorithm proportionally selects smaller -scaling values which result in less energy saving but also less performance +scaling values which results in less energy saving but also less performance degradation. \begin{table}[!t] @@ -989,13 +988,13 @@ the energy reduction to performance trade-off, see Figure~\ref{fig:compare_ because it maximizes the distance between the energy saving and the performance degradation values while giving the same weight for both metrics. -\section[The energy optimization of grid]{The energy optimization of parallel iterative applications running over grids} +\section[The energy optimization of grid]{The energy optimization of parallel applications with iterations running over grids} \label{ch3:3} \subsection{The energy and performance models of grid platform} \label{ch3:3:1} In this section, we are interested in reducing the energy consumption of message -passing iterative synchronous applications running over +passing applications with synchronous iterations running over heterogeneous grid platforms. A heterogeneous grid platform could be defined as a collection of heterogeneous computing clusters interconnected via a long distance network which has a lower bandwidth and a higher latency than the local networks of the clusters. Each computing cluster in the grid is composed of homogeneous nodes that are connected together via a high speed network. However, nodes from distinct clusters may have different characteristics such as computing power (FLOPS), energy consumption, CPU's frequency range, network bandwidth and latency. @@ -1003,9 +1002,9 @@ and a higher latency than the local networks of the clusters. Each computing clu Since in a heterogeneous grid each cluster has different characteristics, when applying DVFS operations on the nodes of these clusters, they may get different scaling factors represented by a scaling vector: -$(S_{11}, S_{12},\dots, S_{NM})$ where $S_{ij}$ is the scaling factor of processor $j$ in cluster $i$ . To -be able to predict the execution time of message passing synchronous iterative -applications running over a heterogeneous grid, for different vectors of +$(S_{11}, S_{12},\dots, S_{NM})$ where $S_{ij}$ is the scaling factor of processor $j$ in cluster $i$. To +be able to predict the execution time of message passing +applications with synchronous iterations running over a heterogeneous grid, for different vectors of scaling factors, the communication time and the computation time for all the tasks must be measured during the first iteration before applying any DVFS operation. Then the execution time for one iteration of the application with any @@ -1024,7 +1023,7 @@ first iteration. The execution time for one iteration is equal to the sum of t and the slowest communication time without slack time during one iteration. The latter is equal to the communication time of the slowest node in the slowest cluster $h$. It means that only the communication time without any slack time is taken into account. -Therefore, the execution time of the iterative application is equal to +Therefore, the execution time of the parallel application with iterations is equal to the execution time of one iteration as in Equation (\ref{eq:perf-grid}) multiplied by the number of iterations of that application. @@ -1032,7 +1031,7 @@ number of iterations of that application. In the considered heterogeneous grid platform, each node $j$ in cluster $i$ may have different dynamic and static powers from the nodes of the other clusters, noted as $\Pd[ij]$ and $\Ps[ij]$ respectively. Therefore, even if the distributed -message passing iterative application is load balanced, the computation time of each CPU $j$ +message passing application with iterations is load balanced, the computation time of each CPU $j$ in cluster $i$ noted $\Tcp[ij]$ may be different and different frequency scaling factors may be computed in order to decrease the overall energy consumption of the application and reduce slack times. The communication time of a processor $j$ in cluster $i$ is noted as @@ -1055,7 +1054,7 @@ static energies for $M_i$ processors in $N$ clusters. It is computed as follows To optimize both of the energy consumption model computed by \ref{eq:energy-grid} and the performance model computed by \ref{eq:perf-grid}, -they must be normalized as in \ref{eq:enorm-heter} and \ref{eq:pnorm-heter} Equations respectively. +they must be normalized as in Equation \ref{eq:enorm-heter} and Equation \ref{eq:pnorm-heter} respectively. While the original energy consumption is the consumed energy with the maximum frequency for all the nodes computed as follows: @@ -1149,7 +1148,7 @@ of scaling factors that satisfies (\ref{eq:max-grid}) can be selected. \begin{figure}[!t] \centering \includegraphics[scale=0.7]{fig/ch3/init_freq} - \caption{Selecting the initial frequencies in grid} + \caption{Selecting the initial frequencies in the grid architecture} \label{fig:st_freq-grid} \end{figure} @@ -1165,9 +1164,9 @@ In this section, the scaling factors selection algorithm for a grid, Algorithm~\ is presented. It selects the vector of frequency scaling factors that gives the best trade-off between minimizing the energy consumption and maximizing the performance of a message passing -synchronous iterative application executed on a grid. + application with synchronous iterations executed on a grid. It is similar to the frequency selection algorithm for heterogeneous -local clusters presented in section \ref{ch3:1:4}. +local clusters presented in Section \ref{ch3:1:4}. The value of the initial frequency scaling factor for each node is inversely proportional to its computation time that was gathered in the first iteration. The initial @@ -1237,7 +1236,7 @@ $\lbrace\Theta_1,\Theta_2\rbrace$ is the time interval for the measured idle po Therefore, the dynamic power of one core is computed as the difference between the maximum measured value in maximum powers vector and the minimum measured value in the idle powers vector. -On the other hand, the static power consumption by one core is a part of the measured idle power consumption of the node. Since in Grid'5000 there is no way to measure precisely the consumed static power and it was assumed, as in sections \ref{ch3:2} and \ref{ch2:6}, that the static power represents a ratio of the dynamic power, the value of the static power is assumed to be equal to 20\% of the dynamic power consumption of the core. +On the other hand, the static power consumption by one core is a part of the measured idle power consumption of the node. Since in Grid'5000 there is no way to measure precisely the consumed static power and it was assumed, as in Sections \ref{ch3:2} and \ref{ch2:6}, that the static power represents a ratio of the dynamic power, the value of the static power is assumed to be equal to 20\% of the dynamic power consumption of the core. In the experiments presented in the following sections, two sites of Grid'5000 were used, Lyon and Nancy sites. These two sites have in total seven different clusters as shown in Figure~\ref{fig:grid5000}. @@ -1297,7 +1296,6 @@ The benchmarks have seven different classes, S, W, A, B, C, D and E, that repres \end{tabular} \label{table:grid5000-1} \end{table} -CPUs \subsection{The experimental results of the scaling algorithm on a Grid} @@ -1387,7 +1385,7 @@ results in a lower energy consumption. Indeed, the dynamic consumed power is exponentially related to the CPU's frequency value. On the other hand, the increase in the number of computing nodes can increase the communication times and thus produces less energy saving depending on the benchmarks being executed. The results of the benchmarks CG, MG, BT and FT show more -energy saving percentage in the one site scenario when executed over 16 nodes than on 32 nodes. LU and SP consume more energy with 16 nodes than with 32 node on one site because their computations to communications ratio is not affected by the increase of the number of local communications. +energy saving percentage in the one site scenario when executed over 16 nodes than on 32 nodes. LU and SP consume more energy with 16 nodes than with 32 nodes on one site because their computations to communications ratio is not affected by the increase of the number of local communications. \begin{figure}[!h] \centering \centering @@ -1400,21 +1398,21 @@ energy saving percentage in the one site scenario when executed over 16 nodes th \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/eng_s.eps} -\caption{The energy reduction while executing the NAS benchmarks over different scenarios} +\caption{The energy reduction percentages while executing the NAS benchmarks over different scenarios} \label{fig:eng_s} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/per_d.eps} -\caption{The performance degradation of the NAS benchmarks over different scenarios} +\caption{The performance degradation percentages of the NAS benchmarks over different scenarios} \label{fig:per_d} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/dist.eps} -\caption{The trade-off distance between the energy reduction and the performance of the NAS benchmarks +\caption{The trade-off distance percentages between the energy reduction and the performance of the NAS benchmarks over different scenarios} \label{fig:dist-grid} \end{figure*} @@ -1486,13 +1484,13 @@ Scenario name & Cluster name & Nodes per cluster & \begin{figure}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/time.eps} - \caption{The execution times of NAS benchmarks running over the one core and the multi-core scenarios} + \caption{The execution times of the NAS benchmarks running over the one core and the multi-core scenarios} \label{fig:time-mc} \end{figure} \begin{figure}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/eng_con.eps} - \caption{The energy consumptions and execution times of NAS benchmarks over one core and multi-core per node architectures} + \caption{The energy consumptions and execution times of the NAS benchmarks over one core and multi-core per node architectures} \label{fig:eng-cons-mc} \end{figure} @@ -1518,21 +1516,21 @@ scenarios because there are no or small communications. Contrary to EP and MG, \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/eng_s_mc.eps} - \caption{The energy saving of running NAS benchmarks over one core and multi-core scenarios} + \caption{The energy saving percentages of running NAS benchmarks over one core and multi-core scenarios} \label{fig:eng-s-mc} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/per_d_mc.eps} - \caption{The performance degradation of running NAS benchmarks over one core and multi-core scenarios} + \caption{The performance degradation percentages of running NAS benchmarks over one core and multi-core scenarios} \label{fig:per-d-mc} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/dist_mc.eps} - \caption{The trade-off distance of running NAS benchmarks over one core and multi-core scenarios} + \caption{The trade-off distance percentages of running NAS benchmarks over one core and multi-core scenarios} \label{fig:dist-mc} \end{figure*} The energy saving percentages of all the NAS benchmarks running over these two scenarios are presented in Figure~\ref{fig:eng-s-mc}. @@ -1577,7 +1575,7 @@ In these experiments, the class D of the NAS parallel benchmarks were executed o \begin{figure}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/dist_pow.eps} - \caption{The trade-off distance between the energy reduction and the performance of the NAS benchmarks over the three power scenarios} + \caption{The trade-off distance percentages between the energy reduction and the performance of the NAS benchmarks over the three power scenarios} \label{fig:dist-pow} \end{figure} @@ -1632,21 +1630,21 @@ presented in Figures~\ref{fig:edp-eng}, \ref{fig:edp-perf} and \ref{fig:edp-dis \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/edp_eng} -\caption{The energy reduction induced by the Maxdist method and the EDP method} +\caption{The energy reduction percentages induced by the Maxdist method and the EDP method} \label{fig:edp-eng} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/edp_per} -\caption{The performance degradation induced by the Maxdist method and the EDP method} +\caption{The performance degradation percentages induced by the Maxdist method and the EDP method} \label{fig:edp-perf} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/edp_dist} -\caption{The trade-off distance between the energy consumption reduction and the performance for the Maxdist method and the EDP method} +\caption{The trade-off distance percentages between the energy consumption reduction and the performance for the Maxdist method and the EDP method} \label{fig:edp-dist} \end{figure*} @@ -1658,7 +1656,7 @@ Moreover, the proposed scaling algorithm gives the same weight for these two met Whereas, the EDP algorithm gives sometimes negative trade-off values for some benchmarks in the two sites scenarios. These negative trade-off values mean that the performance degradation percentage is higher than the energy saving percentage. The high positive values of the trade-off distance percentage mean that the energy saving percentage is much higher than the performance degradation percentage. -The complexity of both algoriths, Maxdist and EDP, are of order $O(N \cdot M_i \cdot F_j)$ and +The complexity of both algorithms, Maxdist and EDP, are of order $O(N \cdot M_i \cdot F_j)$ and $O(N \cdot M_i \cdot F_j^2)$ respectively, where $N$ is the number of the clusters, $M_i$ is the number of nodes and $F_j$ is the maximum number of available frequencies of node $j$. When Maxdist is applied to a benchmark that is being executed over 32 nodes distributed between Nancy and Lyon sites, it takes on average $0.01$ $ms$ to compute the best frequencies while the EDP method is on average ten times slower over the same architecture. @@ -1669,8 +1667,8 @@ In this chapter, two new online frequency scaling factors selecting algorithms maximum distance (optimal trade-off) between the predicted energy and the predicted performance curves for a heterogeneous cluster and grid. Both algorithms use a new energy models for measuring and predicting the energy consumption of message passing -iterative applications running over a heterogeneous local cluster and a grid platform. -Firstly, the proposed scaling factors selection algorithm for a heterogeneous local cluster is applied to the class C of NAS parallel benchmarks and simulated by SimGrid. + applications with iterations running over a heterogeneous local cluster and a grid platform. +Firstly, the proposed scaling factors selection algorithm for a heterogeneous local cluster is applied to the class C of the NAS parallel benchmarks and simulated by SimGrid. The results of the simulations showed that the algorithm on average reduces by 29.8\% the energy consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%. The algorithm also selects different scaling factors according to the percentage of the computing and communication times, and according to the diff --git a/CHAPITRE_04.tex b/CHAPITRE_04.tex index 66a6f0d..d9aaf56 100644 --- a/CHAPITRE_04.tex +++ b/CHAPITRE_04.tex @@ -36,7 +36,7 @@ Section~\ref{ch4:5} details the proposed frequency selecting algorithm. Section~\ref{ch4:6} presents the iterative multi-splitting application which is a hybrid method and was used as a benchmark to evaluate the efficiency of the proposed algorithm. Section~\ref{ch4:7} presents the simulation results of applying the algorithm on the multi-splitting application and executing it on different grid scenarios. It also shows the results of running -three different power scenarios and comparing them. Moreover, in the last subsection, the proposed algorithm is compared to the energy and delay product (EDP) method. Section \ref{ch4:8} presenting the results of real experiments executed over the Grid'5000 platform and compared to the EDP method. Finally, the chapter ends with a summary. +three different power scenarios and comparing them. Moreover, in the last subsection, the proposed algorithm is compared to the energy and delay product (EDP) method. Section \ref{ch4:8} presents the results of real experiments executed over the Grid'5000 platform and compared to the EDP method. Finally, the chapter ends with a summary. @@ -420,11 +420,11 @@ of this application is not directly related to the number of computing nodes. \label{fig:eng_time_dvfs} \end{figure} Figure \ref{fig:eng_time_dvfs} (a) shows that the energy - consumption of all four versions of the method, running over the 8 grid scenarios described in Table \ref{table:comp}, are not affected by the increase in the number of computing nodes. MS without applying DVFS operations had the same behavior. On the other hand, Figure \ref{fig:eng_time_dvfs} (b) shows that the execution time of the MS application with DVFS operations + consumption of all four versions of the method, running over the 8 grid scenarios described in Table \ref{table:comp}, are not affected by the increase in the number of computing nodes. MS without applying DVFS operations had the same behaviour. On the other hand, Figure \ref{fig:eng_time_dvfs} (b) shows that the execution time of the MS application with DVFS operations decreases in inverse proportion to the number of nodes. Moreover, it can be noticed that the asynchronous MS with synchronous DVFS consumes less energy when compared to the other versions of the method. Two reasons explain this energy consumption reduction: \begin{enumerate} - \item The asynchronous MS with synchronous DVFS version uses synchronous DVFS communications which allow it to apply the new computed frequencies at the begining of the second iteration. Thus, reducing the consumption of dynamic energy by the application from the second iteration until the end of the application. Whereas in - asynchronous DVFS versions where the DVFS communications are asynchronous, the new frequencies cannot be computed at the end of the first iteration and consequently cannot be applied at the begining of the second iteration. + \item The asynchronous MS with synchronous DVFS version uses synchronous DVFS communications which allow it to apply the new computed frequencies at the beginning of the second iteration. Thus, reducing the consumption of dynamic energy by the application from the second iteration until the end of the application. Whereas in + asynchronous DVFS versions where the DVFS communications are asynchronous, the new frequencies cannot be computed at the end of the first iteration and consequently cannot be applied at the beginning of the second iteration. Indeed, since the performance information gathered during the first iteration is not sent synchronously at the end of the first iteration, fast nodes might execute many iterations before receiving the performance information, computing the new frequencies based on this information and applying the new computed frequencies. Therefore, many iterations might be computed by CPUs running on their highest frequency and consuming more dynamic energy than scaled down processors. \item As shown in Figure \ref{fig:eng_time_ms} (b), the execution time of the asynchronous MS version is lower than the execution time of the synchronous MS version because there is no idle time in the asynchronous version and the communications are overlapped by computations. Since the consumption of static energy is proportional to the execution time, the asynchronous MS version consumes less static energy than the synchronous version. @@ -470,7 +470,7 @@ degradation percentages which means that the new execution time of a given versi $16.9\%$. While the worst case is the synchronous MS with synchronous DVFS where the performance is on average degraded by $2.9\%$ when compared to the reference method. - The energy consumption and performance tradeoff between these five versions is presented in Figure \ref{fig:dist}. + The energy consumption and performance trade-off between these five versions is presented in Figure \ref{fig:dist}. These distance values are computed as the differences between the energy saving and the performance degradation percentages as in the optimization function (\ref{eq:max-grid}). Thus, the best MS version is the one that has the maximum distance between the energy saving and performance degradation. The distance can be negative if the energy saving percentage is less than the performance degradation percentage. @@ -716,14 +716,14 @@ Size & Method &\begin{tabular}[c]{@{}l@{}}E Table \ref{table:exper} shows that there are positive and negative performance degradation percentages. A negative value means that the new execution time of a given version of the application is less than the execution time of the synchronous MS without DVFS. Therefore, the version with the smallest negative performance degradation percentage has actually the best speed up when compared to the other versions. - The energy consumption and performance tradeoffs between these four versions can be computed as in the optimization Function + The energy consumption and performance trade-offs between these four versions can be computed as in the optimization Function (\ref{eq:max-grid}). The asynchronous MS applying synchronously the HSA algorithm gives the best distance which is equal to $48.41\%$. This version saves up to $26.93\%$ of energy and even reduces the execution time of the application by $21.48\%$. This overall improvement is due to combining asynchronous computing and the synchronous application of the HSA algorithm. Finally, this section shows that the obtained results over Grid'5000 are comparable to the -simulation results of section \ref{ch4:7:2}, the asynchronous MS applying synchronously the HSA algorithm is the best version in both of sections. Moreover, the results over Grid'5000 are better +simulation results of Section \ref{ch4:7:2}, the asynchronous MS applying synchronously the HSA algorithm is the best version in both of sections. Moreover, the results over Grid'5000 are better than simulation results because the computing clusters used in the Grid'5000 experiments are more heterogeneous in term of the computing power and network characteristics than the simulated platform with SimGrid. For example, the nodes in StRemi cluster have lower computing powers compared to the other used three clusters of Grid'5000 platform. As a result, the increase in the heterogeneity between the clusters' computing nodes increases the idle times which forces the proposed algorithm to select a big scaling factors and thus saving more energy. @@ -734,7 +734,7 @@ As a result, the increase in the heterogeneity between the clusters' computing n \subsection{Comparing the HSA algorithm to the energy and delay product method} \label{res-comp} -The EDP algorithm, described in section \ref{ch4:7:5}, was applied synchronously and asynchronously to both the synchronous and asynchronous MS application of size $N=400^3$. The experiments were conducted over 4 distributed clusters, described in Table \ref{table:grid5000}, and 8 homogeneous nodes were used from each cluster. +The EDP algorithm, described in Section \ref{ch4:7:5}, was applied synchronously and asynchronously to both the synchronous and asynchronous MS application of size $N=400^3$. The experiments were conducted over 4 distributed clusters, described in Table \ref{table:grid5000}, and 8 homogeneous nodes were used from each cluster. Table \ref{table:comapre} presents the results of energy saving, performance degradation and distance percentages when applying the EDP method on four different MS versions. Figure \ref{fig:compare} compares the distance percentages, computed as the difference between energy saving and performance degradation percentages, of the EDP and HSA algorithms. This comparison shows that the proposed HSA algorithm gives better energy reduction and performance trade-off than the EDP method. EDP gives better results when evaluated over Grid'5000 than over the simulator because the nodes used from Grid'5000 are more heterogeneous than those simulated with SimGrid. @@ -759,9 +759,11 @@ Async MS with Async DVFS & 10.32 & -17.04 & 27.36 \\ \hlin \caption{Comparing the trade-off percentages of HSA and EDP methods over the Grid'5000} \label{fig:compare} \end{figure} + + + \section{Conclusions} \label{ch4:9} - This chapter presents a new online frequency selection algorithm for asynchronous iterative applications running over a grid. It selects the best vector of frequencies that maximizes the distance between the predicted energy consumption and the predicted execution time. @@ -770,7 +772,7 @@ energy and performance models to predict the energy consumption and the executi The proposed algorithm was evaluated twice over the SimGrid simulator and Grid'5000 testbed while running a multi-splitting (MS) application that solves 3D problems. The experiments were executed over different grid scenarios composed of different numbers of clusters and different numbers of nodes per cluster. - The HSA algorithm was applied synchronously and asynchronously on a synchronous and an asynchronous version of the MS application. Both the simulation and real experiment results show that applying synchronous HSA algorithm on an asynchronous MS application gives the best tradeoff between energy consumption reduction and performance compared to other scenarios. + The HSA algorithm was applied synchronously and asynchronously on a synchronous and an asynchronous version of the MS application. Both the simulation and real experiment results show that applying synchronous HSA algorithm on an asynchronous MS application gives the best trade-off between energy consumption reduction and performance compared to other scenarios. In the simulation results, this scenario saves on average the energy consumption by 22\% and reduces the execution time of the application by 5.72\%. This version optimizes both of the dynamic energy consumption by applying synchronously the HSA algorithm at the end of the first iteration and the static energy consumption by using asynchronous communications between nodes from different clusters which are overlapped by computations. The HSA algorithm was also evaluated over three power scenarios. As expected, the algorithm selects different vectors of frequencies for each power scenario. The highest energy consumption reduction was achieved in the power scenario with the highest dynamic power and the lowest performance degradation was obtained in the power scenario with the highest static power. The proposed algorithm was compared to another method that uses the well known energy and delay product as an objective function. diff --git a/CONCLUSION.tex b/CONCLUSION.tex index 06cdaac..b5faa80 100644 --- a/CONCLUSION.tex +++ b/CONCLUSION.tex @@ -4,68 +4,62 @@ \section{Conclusion} In this dissertation, we have proposed a method to optimize both the energy consumption and -the performance of synchronous and asynchronous iterative applications running over +the performance at the same time of synchronous and asynchronous applications with iterations running over cluster and grid platforms. Dynamic voltage and frequency scaling (DVFS) technique was used to lower the frequency of the processor to reduce its energy consumption while computing. Reducing -the frequency of the processor reduces the computing power of a processor and thus the performance of the application running over that processor is decreased too. However, as the first step in this work, different energy consumption and performance models were developed to effectively used in the prediction and measurement processes for energy and performance of iterative applications. Depending on these models, an objective function was defined the best trade-off relation between the energy and the performance. Then, this objective function was used in algorithms which optimize both energy consumption and the performance of the iterative application at the same time. +the frequency of the processor decreases its computing power which might increase the execution time. +In this work, different energy consumption and performance models were developed to predict the energy consumption and performance of parallel applications with iterations. Depending on these models, an objective function was defined as the best trade-off relation between the energy consumption and the performance of the parallel application. This objective function was used in the frequency selecting algorithms which optimize at the same time both the energy consumption and the performance of the parallel application with iterations. -The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been categorized using some hardware and software techniques. Different parallel architectures have been -demonstrated and classified according to how the processing units and the memory model connected to each others. Both shared and distributed platforms as well as their depending parallel programming models have been categorized. The two types of parallel iterative applications: synchronous and asynchronous ones have -been discussed and compared to each others. It showed that applying the synchronous iterative methods are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous iterative methods are more suited to the distributed heterogeneous clusters. At the end of this chapter, an energy consumption model proposed in the literature to measure the energy consumption of parallel applications was explained. This model does not consider the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture when there are different power consumption values for each type of processor. +The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been classified according to the used hardware and software techniques. Different parallel architectures have also been +described and classified according to the used memory model: shared and distributed memory. The two types of parallel applications with iterations: synchronous and asynchronous ones have been discussed and compared to each others. Synchronous distributed applications are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous ones are more suited to grids. At the end of this chapter, an energy consumption model proposed in the literature to estimate the energy consumption of parallel applications was explained. This model does not take into account the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture where each type of processor might have a different power consumption value. -In the second part of the dissertation, we have developed a new energy and performance models for -synchronous and asynchronous message passing applications running over a cluster and a grid. To simultaneously optimize the energy and performance of these applications, the trade-off relation between the two metrics have been defined as a maximum distance between the predicted energy and performance curves to select the best possible frequency scaling factors. We have proposed four different frequencies selecting algorithms to select the best frequencies that gives minimum energy reduction with maximum possible performance at the same time. They used the computation and communication times measured at the first iteration of the iterative application to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online with a very small overhead on the execution time of a parallel application. +In the second part of the dissertation, a new energy and performance models for +synchronous and asynchronous message passing applications with iterations running over clusters and grid, were presented. To simultaneously optimize the energy and performance of these applications, the trade-off relation has been defined as the maximum distance between the predicted energy and performance curves. This objective function is used by the frequency selecting algorithm to select the available frequency scaling factors that give the optimal energy consumption to performance trade-off. We have proposed four different frequency scaling algorithms, each one of them is adapted to a different execution context, such as synchronous or asynchronous communications, homogeneous or heterogeneous nodes, and local or distributed architectures. They used the computation and communication times measured at the first iteration of the parallel application with iterations to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online and introduce a very small runtime overhead. They also do not require any profiling or training. -In chapter \ref{ch2}, we were proposed a new online scaling factor selection method that optimized simultaneously the energy and performance of a distributed synchronous application running on a homogeneous cluster. We have applied this algorithm to the NAS benchmarks of the class C and evaluated by the SimGrid simulator. Firstly, Rauber and Rünger’s energy model used by the proposed algorithm to select best frequency gear. The proposed algorithm was compared to the Rauber and Rünger optimization method, which gives better energy and performance trade-off ratios compare to them. Secondly, a new energy consumption model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new enenrgy model was used by the proposed algorithm to select different frequencies. Thus, a new simulation results have been shown, which are more accurate and realistic than other results obtained using the Rauber and Rünger's energy model. -In chapter \ref{ch3}, we were proposed two new online frequency scaling factors selecting algorithms to select the best possible vectors of frequency scaling factors that give best trade-off between the predicted energy and the predicted performance values of synchronous iterative application running over a heterogeneous cluster and a grid. Each algorithm depends on a new energy and performance models, which takes into account the underline parallel platform being used. Firstly, the proposed -scaling factors selection algorithm for a heterogeneous local cluster was implemented to the NAS parallel benchmarks of the class C and simulated by SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%. -Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and when different values have been used for static and dynamic powers of the CPU. Secondly, the proposed scaling factors selection algorithm for a grid was implemented to the NAS parallel benchmarks of the class D and executed over Grid5000 testbed platform. The experiments on 16 nodes, distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% the performance. -The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architecture or consume different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs. -Both of the proposed algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting a vectors of frequencies that give a better trade-off between energy consumption reduction and performance. -In chapter \ref{ch4}, we were presented a new online frequency selection algorithm for asynchronous iterative applications running over a grid. The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing +In chapter \ref{ch2}, a new online scaling factor selection method that optimizes simultaneously the energy and performance of a distributed synchronous application with iterations running on a homogeneous cluster has been proposed. This algorithm was applied to the NAS benchmarks of the class C and executed over the SimGrid simulator. Firstly, Rauber and Rünger’s energy model was used in the proposed algorithm to select the best frequency gear. The proposed algorithm was compared to the Rauber and Rünger's optimization method. The results of the comparison showed that the proposed algorithm gives better energy to performance trade-off ratios compared to their methods while using the same energy model. Secondly, a new energy consumption model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new energy model was used by the proposed algorithm. The new simulation results demonstrated that the new model is more accurate and realistic than the previous one. + +In chapter \ref{ch3}, two new online frequency scaling factors selecting algorithms adapted for synchronous application with iterations running over a heterogeneous cluster and a grid were presented. Each algorithm uses new energy and performance models which take into account the characteristics of the parallel platform being used. Firstly, the proposed +scaling factors selection algorithm for a heterogeneous local cluster was applied to the NAS parallel benchmarks and evaluated over SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the class C of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance to 3.8\%. +Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and when different static and dynamic CPU powers have been used. Secondly, the proposed scaling factors selection algorithm for a grid was applied to the NAS parallel benchmarks and the class D of these benchmarks was executed over the Grid5000 testbed platform. The experiments conducted over 16 nodes distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% their performance. +The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architectures or consume different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs. +Both of the proposed algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting vectors of frequencies that give a better trade-off between energy consumption reduction and performance. + +In chapter \ref{ch4}, a new online frequency selection algorithm were adapted for asynchronous iterative applications running over a grid was presented. The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing iterative applications running over a grid. The proposed algorithm was evaluated twice over the SimGrid simulator and Grid’5000 testbed while running a multi-splitting (MS) application that solves 3D problems. The experiments were executed over different grid scenarios composed of different numbers of clusters and different numbers of nodes -per cluster. The proposed algorithm was applied synchronously and asynchronously on a -synchronous and an asynchronous version of the MS application. Both the simulation -and real experiment results show that applying synchronous frequency selecting algorithm on an +per cluster. The proposed algorithm was applied synchronously and asynchronously on +synchronous and asynchronous versions of the MS iterative application. Both the simulations +and real experiments results showed that applying synchronously the frequency selecting algorithm on an asynchronous MS application gives the best tradeoff between energy consumption reduction -and performance compared to other scenarios. In the simulation results, this scenario -saves on average the energy consumption by 22\% and reduces the execution time of +and performance when compared to the other scenarios. In the simulation results, this scenario +reduces on average the energy consumption by 22\% and decreases the execution time of the application by 5.72\%. This version optimizes both of the dynamic energy -consumption by applying synchronously the HSA algorithm at the end of the first iteration and the -static energy consumption by using asynchronous communications between nodes from +consumption by applying synchronously the HSA algorithm at the end of the first iteration of the iterative application and the static energy consumption by using asynchronous communications between nodes from different clusters which are overlapped by computations. The proposed algorithm was also -evaluated over three power scenarios, which selects different vectors of frequencies proportionally to dynamic and static powers values. More energy reduction has been achieved when the ratio of the -dynamic power have been increase and vice versa. whereas, the performance degradation percentages were decreased when the static power ratio has been increased. -In the Grid’5000 results, this scenario saves the energy consumption by 26.93\% and -reduces the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid. -In both of the Simulation and Grid'5000 testbed experiments, we have compared the proposed algorithm to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives +evaluated over three power scenarios which selects different vectors of frequencies proportionally to the dynamic and static powers values. More energy reduction was achieved when the ratio of the +dynamic power was increased and vice versa. Whereas, the performance degradation percentages were decreased when the static power ratio was increased. +In the Grid’5000 experiments, this scenario reduces the energy consumption by 26.93\% and +decreases the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid. +In both of the Simulations and real experiments, the proposed algorithm was compared to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives a better trade-off between the energy consumption reduction and the performance. -Finally, we outline some perspectives that will be applied to this work in the future as in the next section. \section{Perspectives} In the near future, we will adapt the proposed algorithms to take into consideration the variability between some iterations. For example, each proposed algorithm can be executed twice: after the first iteration the frequencies are scaled down according to the execution times measured in the first iteration, then after a fixed number of iterations, the frequencies are adjusted according to the execution times measured during the fixed number of iterations. If the computing power of the system is constantly changing, it would be interesting to implement a mechanism that detects this change and adjusts the frequencies according to the variability of the system. +Also, it would be interesting to evaluate the scalability of the proposed algorithms by running them on large platforms composed of many thousands of cores. The scalability of the algorithms can be improved by distributing them in a hierarchical manner where a leader is chosen for each cluster or a group of nodes to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration. -Also, it would be interesting to evaluate the scalability of the proposed algorithm by running it on large platforms composed of many thousands of cores. The scalability of the algorithm can be improved by distributing it in a hierarchical manner where a leader is chosen for each cluster or a group of nodes to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration. - -The proposed algorithms would evaluate to other message passing iterative methods in order to see how it adapts to the characteristics of the new method. +The proposed algorithms should be applied to other message passing methods with iterations in order to see how they adapt to the characteristics of these methods. Also, it would be interesting to explore if a relation can be found between the numbers of asynchronous iterations required to global convergence and the applied frequencies to the nodes. The number of iterations required by each node for global convergence is not known in advance and the change in CPUs frequencies changes the number of iterations required by each node for global convergence. -Furthermore, the proposed algorithms for a heterogeneous platforms, chapters \ref{ch3} and \ref{ch4}, should be applied to a heterogeneous platform composed of CPUs and GPUs, where all works of energy minimization in the literature over a collection of GPUs and CPUs platform have been shown more energy efficiency compared those composed from only CPUs. +Furthermore, the proposed algorithms for heterogeneous platforms, in chapters \ref{ch3} and \ref{ch4}, should be applied to heterogeneous platforms composed of CPUs and GPUs. Indeed, most of the works in the +green computing field showed that these mixed platforms of GPUs and CPUs are more energy efficient than those composed of only CPUS. -Finally, it would be interesting to compare the accuracy of the -results of the proposed energy models to the values given by instruments +Finally, it would be interesting to verify the accuracy of the +results returned by the energy models by comparing them to the values given by instruments that measure the energy consumptions of CPUs during the execution time, as in \cite{ref106}. - - - - - - diff --git a/Dedication.tex b/Dedication.tex index 40a6918..ad84041 100644 --- a/Dedication.tex +++ b/Dedication.tex @@ -8,5 +8,3 @@ \addcontentsline{toc}{chapter}{Dedication} %DEDICATION - - diff --git a/INTRODUCTION.tex b/INTRODUCTION.tex index c69e7df..da10810 100644 --- a/INTRODUCTION.tex +++ b/INTRODUCTION.tex @@ -6,27 +6,28 @@ \section*{1. General Introduction} \addcontentsline{toc}{section}{1. General Introduction } - -The need and demand for more computing power have been increasing since the -birth of the first computing unit and it is not expected to slow down in the -coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Also, researchers and supercomputers +The need and the demand for more computing power have been increasing since the +birth of the first computing unit and they are not expected to slow down in the +coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Then, researchers and supercomputers constructors have been regularly increasing the number of computing cores and processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node. -Therefore, these two common approaches are the most common up to now to get more computing power, but they are increasing the energy consumption of the resulting computing architecture. -Indeed, the power consumed by a processor exponentially increases when its frequency increases and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node. +These two approaches are the most common up to now to get more computing power, but they increase the energy consumption of the resulting computing architecture. +Indeed, the power consumed by a processor exponentially increases when its frequency is increased and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node. As an example, the Chinese supercomputer Tianhe-2 had the highest FLOPS in November 2015 according to the Top500 list \cite{ref101}. However, it was also the most power hungry -platform with its over 3 million cores consuming around 17.8 megawatts. +platform with more than 3 million cores consuming around 17.8 megawatts. Moreover, according to the U.S. annual energy outlook 2015 \cite{ref102}, the price of energy for 1 megawatt per hour was approximately equal to \$70. Therefore, the price of the energy consumed by the Tianhe-2 platform is approximately more than \$10 million each year. -Moreover, the heat generated by the platform and therefore a cooling infrastructure \cite{ref103} which also consumes a lot of energy, must be implemented to keep the platform from overheating. High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details. -The computing platforms must be more energy efficient and offer the highest number +Moreover, the platform generates a lot of heat and to prevent it from overheating a cooling +infrastructure \cite{ref103} which consumes a lot of energy must be implemented. + High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details. +An efficient computing platform must offer the highest number of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN which became the top of the Green500 list in November 2015 \cite{ref105}. -This heterogeneous platform executes more than 7 GFlops per watt while consuming +This heterogeneous platform executes more than 7 GFlops per watt while only consuming 50.32 kilowatts. For all these reasons energy reduction has become an important topic in the high performance @@ -46,7 +47,7 @@ little as possible the performance. On the other hand, if they aim for energy reduction, the chosen frequency scaling factor must produce the most energy efficient execution without considering the degradation of the performance. Whereas, it is important to notice that lowering the frequency to the minimum value does not always -give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, the more important question is how to select the best frequency gears that minimizes the total energy consumption and the maximizes the performance of parallel application running over a parallel platform at the same time? +give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, a more important question is how to select the best frequency gears that minimize the total energy consumption and the maximize the performance of a parallel application, running over a parallel platform, at the same time? @@ -55,46 +56,46 @@ give the most energy efficient execution due to energy leakage that increases th \section*{2. Motivation of the Dissertation} \addcontentsline{toc}{section}{2. Motivation of the Dissertation } -The main objective of HPC systems is to execute as fast as possible the sequential application over a parallel architecture. -Hence, using DVFS to scale down the frequencies of CPUs composing the parallel architecture for energy reduction process, it can also significantly degrade the performance of the executed program if it is compute bound, when the program depends mainly of the computing power of the processor, and if a low CPU frequency is selected. +The main objective of an HPC system such as clusters, grids and supercomputers is to execute as fast as possible a given task over that system. +Hence, using DVFS to scale down the frequencies of the CPUs composing the system to reduce their energy consumption, it can also significantly degrade the performance of the executed program, especially if it is compute bound. A compute bound program contain a lot of computations and a relatively small amount of communicators and Inputs/Outputs operations. The execution time of the program is directly dependent on +the computing powers of the CPUs and their selected frequencies. Therefore, the chosen frequency scaling factor must give the best possible trade-off between the energy reduction and the performance of the parallel application. -On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of the parallel applications when scaling the frequency of processors executing them because one affects the other. There are a very few works in the state of the art have been dedicated for this problem. Therefore, mathematical models of both the energy consumption and performance of the parallel application running over a parallel platform are required and should be defined precisely to discover the best relation between them. +On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of parallel applications when scaling the frequency of the processors executing them because one affects the other. In order to evaluate the impact of scaling down the CPU's frequency on its energy consumption and computing power, mathematical models should be defined to predict them for different frequencies. + -Furthermore, researchers use different optimization strategies to select the frequencies that gives the best trade-off between the energy reduction and performance degradation ratio, which might be chosen during execution (online) or during a pre-execution phase (offline). Thus, the best optimization approach to optimize the energy consumption and the performance at the same time must be applied online with a very low overhead on the execution time of the parallel application. +Furthermore, researchers use different optimization strategies to select the frequencies of the CPUs. They might be executed during the execution of the application (online) or during a pre-execution phase (offline). In our opinion a good approach should minimize the energy consumption while preserving the performance at the same time. Finally, it should also be applied to the application during its execution without requiring any training or profiling and with minimal overhead. \section*{3. Main Contributions of this Dissertation} \addcontentsline{toc}{section}{3. Main Contributions of this Dissertation} -The main contributions of this dissertation focus on optimizing both the energy consumption and the performance of iterative parallel applications running over clusters and grids. The main contributions of this work summarize as follows: +The main objective of this work is to minimize the energy consumption of parallel applications with iterations running over clusters and grids while preserving their performance. The main contributions of this work can be summarized as follows: \begin{enumerate} [I)] -\item We develop an energy consumption and performance models for synchronous and asynchronous message passing iterative applications. These models take into consideration both the computation and communications times of these application, in addition to their relation with frequency scaling factors. +\item Energy consumption and performance models for synchronous and asynchronous message passing applications with iterations were developed. These models take into consideration both the computation and communications times of these applications in addition to their relation to the frequency scaling factors. -\item The iterative parallel applications were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connecting these nodes on the energy consumption and the performance of iterative applications. +\item The parallel applications with iterations were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connect these nodes on the energy consumption and the performance of parallel applications with iterations. -\item Depending on the proposed energy consumption and the performance models, we define a new objective function to optimize both the energy consumption and the performance of the iterative parallel applications at the same. The proposed objective function compute the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them. +\item Depending on the proposed energy consumption and the performance models, a new objective function to optimize both the energy consumption and the performance of the parallel applications with iterations at the same were defined. It computes the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them. -\item a new online frequencies selecting algorithms for cluster and grid are developed which used the new objective function. These algorithms selected the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they are working without training and profiling. +\item New online frequency selecting algorithms for clusters and grids were developed. They use the new objective function and select the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they work without training and profiling. -\item We conducted extensive simulation experiments over SimGrid simulator \cite{ref66}, which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with simulation ones. -The experimental results were executed over different number of nodes and different platform scenarios. -\item In both the simulation and real experiments, NAS parallel benchmarks \cite{ref65} and Multi-splitting method solving 3D problem with different sizes used as a parallel applications were executed on clusters and grids. The final goal, is to evaluate the proposed methods over these applications and test their adaptation to these applications when different computation and communication ratios are existed. +\item The proposed algorithms were applied to the NAS parallel benchmarks \cite{ref65} and the Multi-splitting method. These applications offer different computations to communications ratios and a good testbed to evaluate the proposed algorithm in different scenarios. -\item All the proposed methods of this work compared with two approaches found in the literature: Rauber and Rünger \cite{ref47} and Spiliopoulos et al. \cite{ref67} methods. Both the simulation and real testbed results showed that the proposed methods gives better energy and performance trade-off ratios than these methods. +\item The proposed algorithms were evaluated over the SimGrid simulator \cite{ref66} which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with the simulated ones. +The experiments were conducted over different number of nodes and different platform scenarios. +\item All the proposed methods were compared with either Rauber and Rünger \cite{ref47} method or Spiliopoulos et al. \cite{ref67} objective function. Both the simulation and real experiments showed that the proposed methods give better energy to performance trade-offs than the other methods. \end{enumerate} - \section*{4. Dissertation Outline} \addcontentsline{toc}{section}{4. Dissertation Outline} -The dissertation is organized as follows: chapter \ref{ch1} presents a scientific background about types of parallel architectures, the parallel iterative applications and the energy consumption model from the state of the art that can be used to measure the energy consumption of these applications. -Chapter \ref{ch2} describes the propose energy and performance optimization method for synchronous iterative applications running over homogeneous cluster. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous iterative applications running over heterogeneous cluster and grid. Chapter \ref{ch4} presents -the proposed energy and performance optimization method for asynchronous iterative applications running over grid. Finally, we conclude our work of this dissertation in chapter \ref{ch5}. +The dissertation is organized as follows: chapter \ref{ch1} presents different types of parallel architectures and parallel applications with iterations. It also presents an energy consumption model from the state of the art that can be used to measure the energy consumption of these applications. +Chapter \ref{ch2} describes the proposed energy and performance optimization method for synchronous applications with iterations running over homogeneous clusters. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous applications with iterations running over heterogeneous clusters and grids. In chapter \ref{ch4} the energy and performance models and the optimization method are adapted for asynchronous iterative applications running over grids. Finally, this dissertation ends with a summary and some perspective works. \ No newline at end of file diff --git a/PUBLICATIONS.tex b/PUBLICATIONS.tex index 1eb920f..411b2b1 100644 --- a/PUBLICATIONS.tex +++ b/PUBLICATIONS.tex @@ -1,12 +1,39 @@ - \chapter*{Publications} \label{chap:publications} \addcontentsline{toc}{part}{Publications} \section*{Journal Articles} +\begin{enumerate}[$\lbrack$1$\rbrack$] +\item Ahmed Fanfakh, Jean-Claude Charr, Raphaël Couturier, Arnaud Giersch. Optimizing Energy Consumption with + DVFS for Message Passing Iterative Applications on Grid Architectures. \textit{Journal of Computational + Science, 2016, ($1^{st}$ Revision Submitted)} +\item Ahmed Fanfakh, Jean-Claude Charr, Raphaël Couturier, Arnaud Giersch. Energy Consumption Reduction for + Asynchronous Message Passing Applications. \textit{Journal of Supercomputing, 2016, (Submitted)} +\end{enumerate} \section*{Conference Articles} +\begin{enumerate}[$\lbrack$1$\rbrack$] + +\item Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh, Arnaud Giersch. Dynamic Frequency Scaling for + Energy Consumption Reduction in Distributed MPI Programs. \textit{ISPA 2014: The $12^{th}$ IEEE + International Symposium on Parallel and Distributed Processing with Applications}, pp. + 225-230. IEEE Computer Society, Milan, Italy (2014). + +\item Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh, Arnaud Giersch. Energy Consumption Reduction + with DVFS for Message Passing Iterative Applications on Heterogeneous Architectures. + \textit{The $16^{th}$ IEEE International Workshop on Parallel and Distributed Scientific and + Engineering Computing}. pp. 922-931. IEEE Computer Society, INDIA (2015). + +\item Ahmed Fanfakh, Jean-Claude Charr, Raphaël Couturier, Arnaud Giersch. CPUs Energy Consumption + Reduction for Asynchronous Parallel Methods Running over Grids. \textit{The $19^{th}$ IEEE + International Conference on Computational Science and Engineering}. IEEE Computer Society, + Paris (2016). \textit{(Submitted)} + +\end{enumerate} + + +\cleardoublepage \ No newline at end of file diff --git a/Thesis.tex b/Thesis.tex index 0063bac..72b8c47 100644 --- a/Thesis.tex +++ b/Thesis.tex @@ -15,7 +15,7 @@ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \include{Abstruct} -\include{Resume} +%\include{Resume} %% Sommaire \tableofcontents @@ -35,12 +35,12 @@ \setlength{\parindent}{0.5cm} \addcontentsline{toc}{chapter}{List of Abbreviations} -%% tr + \include{ACRONYMS} -\include{Dedication} +%\include{Dedication} -\include{acknowledgements} +%\include{acknowledgements} %% Citation %\include{CITATION} @@ -57,20 +57,20 @@ \include{CHAPITRE_01} - \part{Contributions} \include{CHAPITRE_02} + \include{CHAPITRE_03} + \include{CHAPITRE_04} \part{Conclusion and Perspectives} %% Conclusion et perspectives \include{CONCLUSION} - \setlength{\parindent}{0cm} --%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% Publications \include{PUBLICATIONS} diff --git a/entete.tex b/entete.tex index 4461b86..79aaa66 100644 --- a/entete.tex +++ b/entete.tex @@ -36,6 +36,7 @@ \usepackage{afterpage} \usepackage{commath} \usepackage[autolanguage,np]{numprint} + %\usepackage{algorithm2e} \newcommand\blankpage{% \null @@ -77,7 +78,7 @@ %% The second mandatory parameter is the date of the PhD defense. %% The third mandatory parameter is the reference number given by the University Library after the PhD defense. %\declarethesis[Sous-titre]{Titre}{17 septembre 2012}{XXX} -\declarethesis{Energy Consumption Optimization of Parallel Iterative Applications using CPU Frequency Scaling}{Date}{21306697} +\declarethesis{Energy Consumption Optimization of Parallel Applications with Iterations using CPU Frequency Scaling}{Date}{21306697} %%-------------------- %% Set the author of the PhD thesis @@ -99,8 +100,8 @@ % \addjury {} {Prof Ye-Qiong SONG} {University of Lorraine} {Reviewer} %\addjury{} {Assoc Prof Hamida SEBA (HDR)} {University of Claude Bernard Lyon1} {Reviewer} %\addjury {}{Prof Sylvain CONTASSOT-VIVIER} {University of Lorraine} {Examiner} -%\addjury {} {Prof Raphaël Couturier} {University of Franche-Comt\'e} {Supervisor} -%\addjury {} {Asst Prof Karine Deschinkel} {University of Franche-Comt\'e} { Co-supervisor} +\addjury {} {Prof Raphaël Couturier} {University of Franche-Comt\'e} {Supervisor} +\addjury {} {Asst Prof Jean-Claude Charr} {University of Franche-Comt\'e} { Co-supervisor} %\addjury {} {Asst Prof Michel Salomon} {University of Franche-Comt\'e} {Co-supervisor} diff --git a/fig/ch3/dist-eps-converted-to.pdf b/fig/ch3/dist-eps-converted-to.pdf index 3d2fbbf..a0e5424 100644 Binary files a/fig/ch3/dist-eps-converted-to.pdf and b/fig/ch3/dist-eps-converted-to.pdf differ diff --git a/fig/ch3/dist.eps b/fig/ch3/dist.eps index 01ffcbc..8bc759b 100644 --- a/fig/ch3/dist.eps +++ b/fig/ch3/dist.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:39:41 2015 +%%CreationDate: Tue Jun 14 18:30:33 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:39:41 2015) + /CreationDate (Tue Jun 14 18:30:33 2016) /DOCINFO pdfmark end } ifelse @@ -849,7 +849,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)] +[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/dist_mc-eps-converted-to.pdf b/fig/ch3/dist_mc-eps-converted-to.pdf index 66a1560..304d804 100644 Binary files a/fig/ch3/dist_mc-eps-converted-to.pdf and b/fig/ch3/dist_mc-eps-converted-to.pdf differ diff --git a/fig/ch3/dist_mc.eps b/fig/ch3/dist_mc.eps index 240fa99..a922b1b 100644 --- a/fig/ch3/dist_mc.eps +++ b/fig/ch3/dist_mc.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 17:45:05 2015 +%%CreationDate: Tue Jun 14 18:34:45 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 17:45:05 2015) + /CreationDate (Tue Jun 14 18:34:45 2016) /DOCINFO pdfmark end } ifelse @@ -823,7 +823,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)] +[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/dist_pow-eps-converted-to.pdf b/fig/ch3/dist_pow-eps-converted-to.pdf index 7b069ab..4352d37 100644 Binary files a/fig/ch3/dist_pow-eps-converted-to.pdf and b/fig/ch3/dist_pow-eps-converted-to.pdf differ diff --git a/fig/ch3/dist_pow.eps b/fig/ch3/dist_pow.eps index 488d6ff..e8fc58f 100644 --- a/fig/ch3/dist_pow.eps +++ b/fig/ch3/dist_pow.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:45:35 2015 +%%CreationDate: Tue Jun 14 18:33:00 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:45:35 2015) + /CreationDate (Tue Jun 14 18:33:00 2016) /DOCINFO pdfmark end } ifelse @@ -823,7 +823,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)] +[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)] ] -60.0 MCshow grestore LTb @@ -839,11 +839,11 @@ LTb 1.000 UL LT0 0.10 0.10 0.44 C LCb setrgbcolor -1761 3246 M +1761 3324 M [ [(Helvetica) 140.0 0.0 true true 0 (10% static power)] ] -46.7 MRshow LT0 -0.10 0.10 0.44 C 1.000 1827 3213 327 66 BoxColFill +0.10 0.10 0.44 C 1.000 1827 3291 327 66 BoxColFill 0.10 0.10 0.44 C 1.000 921 352 88 2178 BoxColFill 0.10 0.10 0.44 C 1.000 1467 352 88 2060 BoxColFill 0.10 0.10 0.44 C 1.000 2013 352 88 1905 BoxColFill @@ -856,11 +856,11 @@ LT0 1.000 UL LT1 0.24 0.70 0.44 C LCb setrgbcolor -1761 3114 M +1761 3192 M [ [(Helvetica) 140.0 0.0 true true 0 (20% static power)] ] -46.7 MRshow LT1 -0.24 0.70 0.44 C 1.000 1827 3081 327 66 BoxColFill +0.24 0.70 0.44 C 1.000 1827 3159 327 66 BoxColFill 0.24 0.70 0.44 C 1.000 1030 352 88 1855 BoxColFill 0.24 0.70 0.44 C 1.000 1576 352 88 2234 BoxColFill 0.24 0.70 0.44 C 1.000 2122 352 88 1856 BoxColFill @@ -873,11 +873,11 @@ LT1 1.000 UL LT0 1.00 0.71 0.76 C LCb setrgbcolor -1761 2982 M +1761 3060 M [ [(Helvetica) 140.0 0.0 true true 0 (30% static power)] ] -46.7 MRshow LT0 -1.00 0.71 0.76 C 1.000 1827 2949 327 66 BoxColFill +1.00 0.71 0.76 C 1.000 1827 3027 327 66 BoxColFill 1.00 0.71 0.76 C 1.000 1139 352 88 266 BoxColFill 1.00 0.71 0.76 C 1.000 1685 352 88 1018 BoxColFill 1.00 0.71 0.76 C 1.000 2231 352 88 1441 BoxColFill diff --git a/fig/ch3/edp_dist-eps-converted-to.pdf b/fig/ch3/edp_dist-eps-converted-to.pdf index afdfc82..88ec309 100644 Binary files a/fig/ch3/edp_dist-eps-converted-to.pdf and b/fig/ch3/edp_dist-eps-converted-to.pdf differ diff --git a/fig/ch3/edp_dist.eps b/fig/ch3/edp_dist.eps index 8389b5d..806c379 100644 --- a/fig/ch3/edp_dist.eps +++ b/fig/ch3/edp_dist.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Apr 21 10:02:20 2016 +%%CreationDate: Tue Jun 14 18:36:20 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Apr 21 10:02:20 2016) + /CreationDate (Tue Jun 14 18:36:20 2016) /DOCINFO pdfmark end } ifelse @@ -875,7 +875,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)] +[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/edp_eng-eps-converted-to.pdf b/fig/ch3/edp_eng-eps-converted-to.pdf index c6e22e3..174d079 100644 Binary files a/fig/ch3/edp_eng-eps-converted-to.pdf and b/fig/ch3/edp_eng-eps-converted-to.pdf differ diff --git a/fig/ch3/edp_eng.eps b/fig/ch3/edp_eng.eps index 850de41..7606196 100644 --- a/fig/ch3/edp_eng.eps +++ b/fig/ch3/edp_eng.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:36:19 2015 +%%CreationDate: Tue Jun 14 18:05:51 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:36:19 2015) + /CreationDate (Tue Jun 14 18:05:51 2016) /DOCINFO pdfmark end } ifelse @@ -849,7 +849,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)] +[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/edp_per-eps-converted-to.pdf b/fig/ch3/edp_per-eps-converted-to.pdf index 8da5afb..b2727bd 100644 Binary files a/fig/ch3/edp_per-eps-converted-to.pdf and b/fig/ch3/edp_per-eps-converted-to.pdf differ diff --git a/fig/ch3/edp_per.eps b/fig/ch3/edp_per.eps index 1fb48ec..85c760c 100644 --- a/fig/ch3/edp_per.eps +++ b/fig/ch3/edp_per.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:36:20 2015 +%%CreationDate: Tue Jun 14 18:25:02 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:36:20 2015) + /CreationDate (Tue Jun 14 18:25:02 2016) /DOCINFO pdfmark end } ifelse @@ -797,7 +797,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)] +[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/energy-eps-converted-to.pdf b/fig/ch3/energy-eps-converted-to.pdf index 6a37f89..5634eb5 100644 Binary files a/fig/ch3/energy-eps-converted-to.pdf and b/fig/ch3/energy-eps-converted-to.pdf differ diff --git a/fig/ch3/energy.eps b/fig/ch3/energy.eps index 2a54918..d97049e 100644 --- a/fig/ch3/energy.eps +++ b/fig/ch3/energy.eps @@ -1,7 +1,7 @@ %!PS-Adobe-2.0 EPSF-2.0 %%Title: energy.eps -%%Creator: gnuplot 4.6 patchlevel 0 -%%CreationDate: Thu Feb 19 16:44:03 2015 +%%Creator: gnuplot 4.6 patchlevel 6 +%%CreationDate: Tue Jun 14 17:35:21 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 320 239 %%EndComments @@ -47,7 +47,7 @@ gnudict begin } if } def % -% Gnuplot Prolog Version 4.4 (August 2010) +% Gnuplot Prolog Version 4.6 (September 2012) % %/SuppressPDFMark true def % @@ -314,7 +314,8 @@ gnudict begin /PatternFill {gsave /PFa [ 9 2 roll ] def PFa 0 get PFa 2 get 2 div add PFa 1 get PFa 3 get 2 div add translate PFa 2 get -2 div PFa 3 get -2 div PFa 2 get PFa 3 get Rec - gsave 1 setgray fill grestore clip + TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse + clip currentlinewidth 0.5 mul setlinewidth /PFs PFa 2 get dup mul PFa 3 get dup mul add sqrt def 0 0 M PFa 5 get rotate PFs -2 div dup translate @@ -428,11 +429,11 @@ systemdict /pdfmark known not { SDict begin [ /Title (energy.eps) /Subject (gnuplot plot) - /Creator (gnuplot 4.6 patchlevel 0) + /Creator (gnuplot 4.6 patchlevel 6) /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Feb 19 16:44:03 2015) + /CreationDate (Tue Jun 14 17:35:21 2016) /DOCINFO pdfmark end } ifelse @@ -450,223 +451,267 @@ newpath BackgroundColor 0 lt 3 1 roll 0 lt exch 0 lt or or not {BackgroundColor C 1.000 0 0 5400.00 3780.00 BoxColFill} if 1.000 UL LTb -602 588 M +602 448 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont -518 588 M +/vshift -63 def +518 448 M ( 0) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 948 M +602 825 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont -518 948 M +/vshift -63 def +518 825 M ( 5) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 1308 M +602 1201 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 10) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 1668 M +602 1578 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 15) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 2028 M +602 1954 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 20) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 2387 M +602 2331 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 25) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 2747 M +602 2707 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 30) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 3107 M +602 3084 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 35) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 3467 M +602 3460 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 40) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -604 588 M +604 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -604 448 M +/vshift -63 def +604 308 M ( 0) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -1088 588 M +1088 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 16) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -1572 588 M +1572 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 32) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -2057 588 M +2057 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 48) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -2541 588 M +2541 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 64) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -3026 588 M +3026 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 80) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -3510 588 M +3510 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 96) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -3995 588 M +3995 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 112) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -4479 588 M +4479 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 128) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -4964 588 M +4964 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 144) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb 1.000 UL LTb 602 3611 N -602 588 L +602 448 L 4545 0 V -0 3023 V +0 3163 V -4545 0 V Z stroke LCb setrgbcolor /Helvetica findfont 210 scalefont setfont -112 2099 M +/vshift -70 def +112 2029 M currentpoint gsave translate -270 rotate 0 0 M -(Energy saving) Cshow +(Energy saving %) Cshow grestore /Helvetica findfont 140 scalefont setfont +/vshift -46 def LTb LCb setrgbcolor /Helvetica findfont 210 scalefont setfont -2974 98 M +/vshift -70 def +2874 98 M ( Number of nodes) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LTb 1.000 UP /Helvetica findfont 190 scalefont setfont -649 620 M +/vshift -63 def +649 482 M ( ) Lshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb % Begin plot #1 @@ -675,24 +720,26 @@ LTb LT0 0.00 0.00 1.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 1259 3443 M (CG) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.00 0.00 1.00 C 709 3443 M 298 0 V -725 3047 M -846 2838 L -242 -357 V -484 -730 V -969 -578 V -4479 876 L -725 3047 Box -846 2838 Box -1088 2481 Box -1572 1751 Box -2541 1173 Box -4479 876 Box +725 3020 M +846 2802 L +242 -374 V +484 -763 V +969 -605 V +4479 750 L +725 3020 Box +846 2802 Box +1088 2428 Box +1572 1665 Box +2541 1060 Box +4479 750 Box 858 3443 Box % End plot #1 % Begin plot #2 @@ -701,24 +748,26 @@ LT0 LT0 1.00 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 1893 3443 M (MG) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 1.00 0.00 0.00 C 1343 3443 M 298 0 V -725 3134 M -846 3020 L -242 -93 V -484 -256 V -969 24 V -4479 1940 L -725 3134 TriD -846 3020 TriD -1088 2927 TriD -1572 2671 TriD -2541 2695 TriD -4479 1940 TriD +725 3112 M +846 2992 L +242 -97 V +484 -268 V +969 26 V +4479 1862 L +725 3112 TriD +846 2992 TriD +1088 2895 TriD +1572 2627 TriD +2541 2653 TriD +4479 1862 TriD 1492 3443 TriD % End plot #2 % Begin plot #3 @@ -727,24 +776,26 @@ LT0 LT0 0.50 0.00 0.50 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 2527 3443 M (EP) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.50 0.00 0.50 C 1977 3443 M 298 0 V -725 2519 M -121 15 V -242 -13 V -484 9 V -969 10 V -1938 -2 V -725 2519 Star -846 2534 Star -1088 2521 Star -1572 2530 Star -2541 2540 Star -4479 2538 Star +725 2468 M +121 17 V +242 -15 V +484 10 V +969 11 V +1938 -3 V +725 2468 Star +846 2485 Star +1088 2470 Star +1572 2480 Star +2541 2491 Star +4479 2488 Star 2126 3443 Star % End plot #3 % Begin plot #4 @@ -753,24 +804,26 @@ LT0 LT0 0.18 0.31 0.31 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 3161 3443 M (LU) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.18 0.31 0.31 C 2611 3443 M 298 0 V -725 3036 M -846 2620 L -242 41 V -484 -374 V -969 -118 V -4479 1788 L -725 3036 TriUF -846 2620 TriUF -1088 2661 TriUF -1572 2287 TriUF -2541 2169 TriUF -4479 1788 TriUF +725 3009 M +846 2574 L +242 43 V +484 -391 V +969 -124 V +4479 1704 L +725 3009 TriUF +846 2574 TriUF +1088 2617 TriUF +1572 2226 TriUF +2541 2102 TriUF +4479 1704 TriUF 2760 3443 TriUF % End plot #4 % Begin plot #5 @@ -779,24 +832,26 @@ LT0 LT0 0.18 0.55 0.34 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 3795 3443 M (BT) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.18 0.55 0.34 C 3245 3443 M 298 0 V -725 3133 M -876 2915 L -212 189 V +725 3111 M +876 2882 L +212 198 V 606 -24 V -847 -187 V -4964 2257 L -725 3133 BoxF -876 2915 BoxF -1088 3104 BoxF -1694 3080 BoxF -2541 2893 BoxF -4964 2257 BoxF +847 -196 V +4964 2194 L +725 3111 BoxF +876 2882 BoxF +1088 3080 BoxF +1694 3056 BoxF +2541 2860 BoxF +4964 2194 BoxF 3394 3443 BoxF % End plot #5 % Begin plot #6 @@ -805,24 +860,26 @@ LT0 LT0 0.85 0.65 0.13 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 4429 3443 M (SP) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.85 0.65 0.13 C 3879 3443 M 298 0 V -725 3123 M -876 2368 L -212 -161 V +725 3100 M +876 2311 L +212 -169 V 606 17 V -847 152 V -2423 136 V -725 3123 Circle -876 2368 Circle -1088 2207 Circle -1694 2224 Circle -2541 2376 Circle -4964 2512 Circle +847 160 V +2423 142 V +725 3100 Circle +876 2311 Circle +1088 2142 Circle +1694 2159 Circle +2541 2319 Circle +4964 2461 Circle 4028 3443 Circle % End plot #6 % Begin plot #7 @@ -831,32 +888,34 @@ LT0 LT0 0.55 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 5063 3443 M (FT) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.55 0.00 0.00 C 4513 3443 M 298 0 V -725 3148 M -846 2821 L -242 41 V -484 -612 V -969 -212 V -4479 1513 L -725 3148 CircleF -846 2821 CircleF -1088 2862 CircleF -1572 2250 CircleF -2541 2038 CircleF -4479 1513 CircleF +725 3127 M +846 2784 L +242 43 V +484 -640 V +969 -222 V +4479 1416 L +725 3127 CircleF +846 2784 CircleF +1088 2827 CircleF +1572 2187 CircleF +2541 1965 CircleF +4479 1416 CircleF 4662 3443 CircleF % End plot #7 1.000 UL LTb 602 3611 N -602 588 L +602 448 L 4545 0 V -0 3023 V +0 3163 V -4545 0 V Z stroke 1.000 UP @@ -866,3 +925,5 @@ stroke grestore end showpage +%%Trailer +%%DocumentFonts: Helvetica diff --git a/fig/ch3/eng_pow-eps-converted-to.pdf b/fig/ch3/eng_pow-eps-converted-to.pdf index de6f852..1fd83fc 100644 Binary files a/fig/ch3/eng_pow-eps-converted-to.pdf and b/fig/ch3/eng_pow-eps-converted-to.pdf differ diff --git a/fig/ch3/eng_pow.eps b/fig/ch3/eng_pow.eps index d17c80a..b543059 100644 --- a/fig/ch3/eng_pow.eps +++ b/fig/ch3/eng_pow.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:43:52 2015 +%%CreationDate: Tue Jun 14 17:56:53 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:43:52 2015) + /CreationDate (Tue Jun 14 17:56:53 2016) /DOCINFO pdfmark end } ifelse @@ -849,7 +849,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)] +[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/eng_s-eps-converted-to.pdf b/fig/ch3/eng_s-eps-converted-to.pdf index eb9dd93..bab95b7 100644 Binary files a/fig/ch3/eng_s-eps-converted-to.pdf and b/fig/ch3/eng_s-eps-converted-to.pdf differ diff --git a/fig/ch3/eng_s.eps b/fig/ch3/eng_s.eps index f944e95..fccc602 100644 --- a/fig/ch3/eng_s.eps +++ b/fig/ch3/eng_s.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:40:02 2015 +%%CreationDate: Tue Jun 14 17:51:05 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:40:02 2015) + /CreationDate (Tue Jun 14 17:51:05 2016) /DOCINFO pdfmark end } ifelse @@ -849,7 +849,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)] +[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/eng_s_mc-eps-converted-to.pdf b/fig/ch3/eng_s_mc-eps-converted-to.pdf index 386f43d..95719d5 100644 Binary files a/fig/ch3/eng_s_mc-eps-converted-to.pdf and b/fig/ch3/eng_s_mc-eps-converted-to.pdf differ diff --git a/fig/ch3/eng_s_mc.eps b/fig/ch3/eng_s_mc.eps index aa7c105..3a98094 100644 --- a/fig/ch3/eng_s_mc.eps +++ b/fig/ch3/eng_s_mc.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 17:39:34 2015 +%%CreationDate: Tue Jun 14 18:01:58 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 17:39:34 2015) + /CreationDate (Tue Jun 14 18:01:58 2016) /DOCINFO pdfmark end } ifelse @@ -823,7 +823,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)] +[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/per_d-eps-converted-to.pdf b/fig/ch3/per_d-eps-converted-to.pdf index 2320504..65fa223 100644 Binary files a/fig/ch3/per_d-eps-converted-to.pdf and b/fig/ch3/per_d-eps-converted-to.pdf differ diff --git a/fig/ch3/per_d.eps b/fig/ch3/per_d.eps index 8fb93bb..2e22cae 100644 --- a/fig/ch3/per_d.eps +++ b/fig/ch3/per_d.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:39:50 2015 +%%CreationDate: Tue Jun 14 17:54:22 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:39:50 2015) + /CreationDate (Tue Jun 14 17:54:22 2016) /DOCINFO pdfmark end } ifelse @@ -797,7 +797,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)] +[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/per_d_mc-eps-converted-to.pdf b/fig/ch3/per_d_mc-eps-converted-to.pdf index 4bc6d6c..b2dbb1a 100644 Binary files a/fig/ch3/per_d_mc-eps-converted-to.pdf and b/fig/ch3/per_d_mc-eps-converted-to.pdf differ diff --git a/fig/ch3/per_d_mc.eps b/fig/ch3/per_d_mc.eps index 19d87f1..473c58c 100644 --- a/fig/ch3/per_d_mc.eps +++ b/fig/ch3/per_d_mc.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 17:36:36 2015 +%%CreationDate: Tue Jun 14 18:03:55 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 17:36:36 2015) + /CreationDate (Tue Jun 14 18:03:55 2016) /DOCINFO pdfmark end } ifelse @@ -779,7 +779,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)] +[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)] ] -60.0 MCshow grestore LTb diff --git a/fig/ch3/per_deg-eps-converted-to.pdf b/fig/ch3/per_deg-eps-converted-to.pdf index 821f861..263870f 100644 Binary files a/fig/ch3/per_deg-eps-converted-to.pdf and b/fig/ch3/per_deg-eps-converted-to.pdf differ diff --git a/fig/ch3/per_deg.eps b/fig/ch3/per_deg.eps index e1ef706..3071f71 100644 --- a/fig/ch3/per_deg.eps +++ b/fig/ch3/per_deg.eps @@ -1,7 +1,7 @@ %!PS-Adobe-2.0 EPSF-2.0 %%Title: per_deg.eps -%%Creator: gnuplot 4.6 patchlevel 0 -%%CreationDate: Thu Feb 19 16:42:46 2015 +%%Creator: gnuplot 4.6 patchlevel 6 +%%CreationDate: Tue Jun 14 17:37:32 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 320 239 %%EndComments @@ -47,7 +47,7 @@ gnudict begin } if } def % -% Gnuplot Prolog Version 4.4 (August 2010) +% Gnuplot Prolog Version 4.6 (September 2012) % %/SuppressPDFMark true def % @@ -314,7 +314,8 @@ gnudict begin /PatternFill {gsave /PFa [ 9 2 roll ] def PFa 0 get PFa 2 get 2 div add PFa 1 get PFa 3 get 2 div add translate PFa 2 get -2 div PFa 3 get -2 div PFa 2 get PFa 3 get Rec - gsave 1 setgray fill grestore clip + TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse + clip currentlinewidth 0.5 mul setlinewidth /PFs PFa 2 get dup mul PFa 3 get dup mul add sqrt def 0 0 M PFa 5 get rotate PFs -2 div dup translate @@ -428,11 +429,11 @@ systemdict /pdfmark known not { SDict begin [ /Title (per_deg.eps) /Subject (gnuplot plot) - /Creator (gnuplot 4.6 patchlevel 0) + /Creator (gnuplot 4.6 patchlevel 6) /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Feb 19 16:42:46 2015) + /CreationDate (Tue Jun 14 17:37:32 2016) /DOCINFO pdfmark end } ifelse @@ -450,173 +451,207 @@ newpath BackgroundColor 0 lt 3 1 roll 0 lt exch 0 lt or or not {BackgroundColor C 1.000 0 0 5400.00 3780.00 BoxColFill} if 1.000 UL LTb -602 674 M +602 538 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont -518 674 M +/vshift -63 def +518 538 M ( 0) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 1538 M +602 1442 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 5) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 2402 M +602 2346 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 10) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -602 3266 M +602 3250 M 63 0 V 4482 0 R -63 0 V /Helvetica findfont 190 scalefont setfont +/vshift -63 def -4566 0 R ( 15) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -604 588 M +604 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -604 448 M +/vshift -63 def +604 308 M ( 0) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -1088 588 M +1088 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 16) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -1572 588 M +1572 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 32) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -2057 588 M +2057 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 48) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -2541 588 M +2541 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 64) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -3026 588 M +3026 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 80) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -3510 588 M +3510 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 96) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -3995 588 M +3995 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 112) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -4479 588 M +4479 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 128) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb -4964 588 M +4964 448 M 0 63 V -0 2960 R +0 3100 R 0 -63 V /Helvetica findfont 190 scalefont setfont -0 -3100 R +/vshift -63 def +0 -3240 R ( 144) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb 1.000 UL LTb 602 3611 N -602 588 L +602 448 L 4545 0 V -0 3023 V +0 3163 V -4545 0 V Z stroke LCb setrgbcolor /Helvetica findfont 210 scalefont setfont -112 2099 M +/vshift -70 def +112 2029 M currentpoint gsave translate -270 rotate 0 0 M -(Performance degradation) Cshow +(Performance degradation %) Cshow grestore /Helvetica findfont 140 scalefont setfont +/vshift -46 def LTb LCb setrgbcolor /Helvetica findfont 210 scalefont setfont -2974 98 M +/vshift -70 def +2874 98 M ( Number of nodes) Cshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LTb 1.000 UP /Helvetica findfont 190 scalefont setfont -649 752 M +/vshift -63 def +649 620 M ( ) Lshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def 1.000 UL LTb % Begin plot #1 @@ -625,24 +660,26 @@ LTb LT0 0.00 0.00 1.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 1259 3443 M (CG) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.00 0.00 1.00 C 709 3443 M 298 0 V -725 1835 M -121 70 V -242 423 V -484 -737 V -2541 965 L -4479 865 L -725 1835 Box -846 1905 Box -1088 2328 Box -1572 1591 Box -2541 965 Box -4479 865 Box +725 1753 M +121 73 V +242 443 V +484 -772 V +2541 843 L +4479 738 L +725 1753 Box +846 1826 Box +1088 2269 Box +1572 1497 Box +2541 843 Box +4479 738 Box 858 3443 Box % End plot #1 % Begin plot #2 @@ -651,24 +688,26 @@ LT0 LT0 1.00 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 1893 3443 M (MG) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 1.00 0.00 0.00 C 1343 3443 M 298 0 V -725 1424 M -121 359 V -242 -63 V -484 483 V -969 951 V -4479 2469 L -725 1424 TriD -846 1783 TriD -1088 1720 TriD -1572 2203 TriD -2541 3154 TriD -4479 2469 TriD +725 1322 M +121 376 V +242 -66 V +484 506 V +969 995 V +4479 2416 L +725 1322 TriD +846 1698 TriD +1088 1632 TriD +1572 2138 TriD +2541 3133 TriD +4479 2416 TriD 1492 3443 TriD % End plot #2 % Begin plot #3 @@ -677,24 +716,26 @@ LT0 LT0 0.50 0.00 0.50 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 2527 3443 M (EP) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.50 0.00 0.50 C 1977 3443 M 298 0 V -725 1199 M -846 760 L -242 12 V -484 -94 V -969 534 V -4479 680 L -725 1199 Star -846 760 Star -1088 772 Star -1572 678 Star -2541 1212 Star -4479 680 Star +725 1087 M +846 628 L +242 13 V +484 -99 V +969 559 V +4479 544 L +725 1087 Star +846 628 Star +1088 641 Star +1572 542 Star +2541 1101 Star +4479 544 Star 2126 3443 Star % End plot #3 % Begin plot #4 @@ -703,24 +744,26 @@ LT0 LT0 0.18 0.31 0.31 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 3161 3443 M (LU) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.18 0.31 0.31 C 2611 3443 M 298 0 V -725 1739 M -846 676 L -242 1132 V -484 -709 V -969 211 V -4479 1082 L -725 1739 TriUF -846 676 TriUF -1088 1808 TriUF -1572 1099 TriUF -2541 1310 TriUF -4479 1082 TriUF +725 1653 M +846 540 L +242 1184 V +1572 983 L +969 220 V +4479 965 L +725 1653 TriUF +846 540 TriUF +1088 1724 TriUF +1572 983 TriUF +2541 1203 TriUF +4479 965 TriUF 2760 3443 TriUF % End plot #4 % Begin plot #5 @@ -729,24 +772,26 @@ LT0 LT0 0.18 0.55 0.34 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 3795 3443 M (BT) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.18 0.55 0.34 C 3245 3443 M 298 0 V -725 2140 M -876 2038 L -212 -362 V +725 2072 M +876 1965 L +212 -378 V 606 6 V -847 1114 V -4964 896 L -725 2140 BoxF -876 2038 BoxF -1088 1676 BoxF -1694 1682 BoxF -2541 2796 BoxF -4964 896 BoxF +847 1166 V +4964 770 L +725 2072 BoxF +876 1965 BoxF +1088 1587 BoxF +1694 1593 BoxF +2541 2759 BoxF +4964 770 BoxF 3394 3443 BoxF % End plot #5 % Begin plot #6 @@ -755,24 +800,26 @@ LT0 LT0 0.85 0.65 0.13 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 4429 3443 M (SP) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.85 0.65 0.13 C 3879 3443 M 298 0 V -725 1658 M -876 1155 L -212 180 V -606 -64 V -2541 680 L +725 1567 M +876 1042 L +212 187 V +606 -66 V +2541 544 L 2423 3 V -725 1658 Circle -876 1155 Circle -1088 1335 Circle -1694 1271 Circle -2541 680 Circle -4964 683 Circle +725 1567 Circle +876 1042 Circle +1088 1229 Circle +1694 1163 Circle +2541 544 Circle +4964 547 Circle 4028 3443 Circle % End plot #6 % Begin plot #7 @@ -781,32 +828,34 @@ LT0 LT0 0.55 0.00 0.00 C LCb setrgbcolor /Helvetica findfont 190 scalefont setfont +/vshift -63 def 5063 3443 M (FT) Rshow /Helvetica findfont 140 scalefont setfont +/vshift -46 def LT0 0.55 0.00 0.00 C 4513 3443 M 298 0 V -725 847 M -121 267 V -242 680 V -484 -617 V -969 339 V -4479 1166 L -725 847 CircleF -846 1114 CircleF -1088 1794 CircleF -1572 1177 CircleF -2541 1516 CircleF -4479 1166 CircleF +725 719 M +846 998 L +242 712 V +484 -646 V +969 355 V +4479 1052 L +725 719 CircleF +846 998 CircleF +1088 1710 CircleF +1572 1064 CircleF +2541 1419 CircleF +4479 1052 CircleF 4662 3443 CircleF % End plot #7 1.000 UL LTb 602 3611 N -602 588 L +602 448 L 4545 0 V -0 3023 V +0 3163 V -4545 0 V Z stroke 1.000 UP diff --git a/fig/ch3/per_pow-eps-converted-to.pdf b/fig/ch3/per_pow-eps-converted-to.pdf index 472848b..dc25bd6 100644 Binary files a/fig/ch3/per_pow-eps-converted-to.pdf and b/fig/ch3/per_pow-eps-converted-to.pdf differ diff --git a/fig/ch3/per_pow.eps b/fig/ch3/per_pow.eps index 6f42b28..0fb8db6 100644 --- a/fig/ch3/per_pow.eps +++ b/fig/ch3/per_pow.eps @@ -1,6 +1,6 @@ %!PS-Adobe-2.0 %%Creator: gnuplot 4.6 patchlevel 6 -%%CreationDate: Thu Oct 22 12:44:35 2015 +%%CreationDate: Tue Jun 14 17:58:31 2016 %%DocumentFonts: (atend) %%BoundingBox: 50 50 554 410 %%Orientation: Portrait @@ -506,7 +506,7 @@ SDict begin [ /Author (afanfakh) % /Producer (gnuplot) % /Keywords () - /CreationDate (Thu Oct 22 12:44:35 2015) + /CreationDate (Tue Jun 14 17:58:31 2016) /DOCINFO pdfmark end } ifelse @@ -745,7 +745,7 @@ Z stroke LCb setrgbcolor 88 1910 M currentpoint gsave translate -270 rotate 0 0 moveto -[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)] +[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)] ] -60.0 MCshow grestore LTb diff --git a/upmext-spimufcphdthesis.cfg b/upmext-spimufcphdthesis.cfg index 5680d4c..81116ab 100644 --- a/upmext-spimufcphdthesis.cfg +++ b/upmext-spimufcphdthesis.cfg @@ -104,10 +104,10 @@ \begin{center}% -{\huge \textbf{Energy Consumption Optimization of Parallel Iterative Applications using -CPU \\ Frequency Scaling}} \\[.5cm] +{\huge \textbf{Energy Consumption Optimization of Parallel Applications with iterations using \\ +CPU Frequency Scaling}} \\[.5cm] {\large By}\\[.5cm]% -{\huge Ahmed Badri Muslim Fanfakh}\\[.5cm]% +{\huge Ahmed Badri Muslim FANFAKH}\\[.5cm]% \end{center} \begin{center}%