+
+
\chapter*{abbreviations \markboth{abbreviations}{abbreviations}}
\label{chap}
\addcontentsline{toc}{chapter}{List of Abbreviations}
\item[GPU] \textbf{G}raphical \textbf{P}rocessing \textbf{U}nit
\item[HSA] \textbf{H}eterogeneous \textbf{S}caling \textbf{A}lgorithm
\item[ILP] \textbf{I}nstruction \textbf{L}evel \textbf{P}arallelism
-\item[LAN] \textbf{L}ocal \textbf{A}rea Network
+\item[LAN] \textbf{L}ocal \textbf{A}rea \textbf{N}etwork
\item[LLP] \textbf{L}oop \textbf{L}evel \textbf{P}arallelism
-\item[LU] \textbf{L}ower-\textbf{U}pper Symmetric Gauss-Seidel
+\item[LU] \textbf{L}ower-\textbf{U}pper
\item[MaxDist] \textbf{Max}imum \textbf{Dist}ance
\item[MG] \textbf{M}ulti-\textbf{G}rid
\item[MIMD] \textbf{M}ultiple \textbf{I}nstruction and \textbf{M}ultiple \textbf{D}ata
\item[MS] \textbf{M}ulti-\textbf{S}plitting
\item[NAS] \textbf{N}umerical \textbf{A}eronautical \textbf{S}imulation
\item[NASA] \textbf{N}ational \textbf{A}eronautics and \textbf{S}pace \textbf{A}dministrations
-\item[OPENCL] \textbf{O}pen \textbf{C}omputing \textbf{L}anguage
-\item[OPENMP] \textbf{O}pen \textbf{M}ulti-\textbf{P}rocessing
+\item[OPENCL] \textbf{Open} \textbf{C}omputing \textbf{L}anguage
+\item[OPENMP] \textbf{Open} \textbf{M}ulti-\textbf{P}rocessing
\item[RENATER] \textbf{Ré}seau \textbf{Na}tional de \textbf{T}élécommunications pour la \textbf{T}echnologie, \newline\hspace*{3em}
l'\textbf{E}nseignement et la \textbf{R}echerche
\item[SIAC] \textbf{S}ynchronous \textbf{I}terations and \textbf{A}synchronous \textbf{C}ommunications
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\emph{ \begin{center} \Large Energy Consumption Optimization of Parallel Iterative Applications using CPU Frequency Scaling\end{center}}
+\emph{ \begin{center} \Large Energy Consumption Optimization of Parallel Applications with Iterations using CPU Frequency Scaling\end{center}}
%\emph{ \begin{center} \large By \end{center}}
-\emph{ \begin{center} \large Ahmed Badri Muslim Fanfakh \\ University of Franche-Comt\'e, 2016 \end{center}}
+\emph{ \begin{center} \large Ahmed Badri Muslim FANFAKH \\ University of Franche-Comt\'e, 2016 \end{center}}
%\emph{ \begin{center} \large The University of Franche-Comt\'e, 2015 \end{center}}
-\emph{ \begin{center} \large Supervisors: Raphaël Couturier, Jean-Claude Charr and Arnaud Giersch \end{center}}
+\emph{ \begin{center} \large Supervisors: Raphaël Couturier and Jean-Claude Charr \end{center}}
In recent years, green computing has become an important topic in the supercomputing
-research domain. However, the computing platforms are still consuming more and more energy due to the increasing number of nodes composing them.
+research domain. However, the computing platforms are still consuming more and more energy due to the increase in the number of nodes composing them.
To minimize the operating costs of these platforms many techniques have
been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It
can be used to reduce the power consumption of the CPU while computing, by
lowering its frequency. However, lowering the frequency of a CPU may increase
-the execution time of an application running on that processor. Therefore, the
+the execution time of the application running on that processor. Therefore, the
frequency that gives the best trade-off between the energy consumption and the
performance of an application must be selected.
-In this thesis, a set of works are presented to optimize the energy consumption
-and the performance of the synchronous and asynchronous message passing iterative applications running over clusters and grids. The energy consumption and performance models for each type of iterative application
-are well modelled and defined according to the characteristics of both the application itself and the parallel architecture executing this application.
+This thesis, presents the algorithms developed to optimize the energy consumption
+and the performance of synchronous and asynchronous message passing applications with iterations running over clusters or grids. The energy consumption and performance models for each type of parallel application predicts its execution time and energy consumption for any selected frequency
+according to the characteristics of both the application and the architecture executing this application.
-Firstly, we propose a frequency scaling factors selection algorithm for synchronous iterative applications running over a homogeneous platform. It is applied to the NAS parallel benchmarks and stimulated by SimGrid simulator.
-Secondly, we develop two frequency scaling factors selection algorithms for synchronous iterative applications running over a heterogeneous cluster and grid. Both algorithms are implemented to the NAS parallel benchmarks and conducted over SimGrid simulator and Grid'5000 testbed.
-Thirdly, we propose a frequency scaling factors selection algorithm for an asynchronous and a hybrid iterative applications running over a grid. The algorithm is evaluated over SimGrid simulator and Grid'5000 testbed while running a multi-splitting application that solves 3D problem.
-All the proposed algorithms are compared to an existing methods, which are the Rauber and Rünger and the energy and delay products (EDP) methods. The comparison results show that all the proposed algorithms give better energy consumption and performance trade-off results. The proposed algorithms have a very small overhead on the execution time of the applications and they work online without training and profiling.
-\textbf{KEY WORDS:} Dynamic voltage and frequency scaling, Grid computing, Energy optimization, iterative applications and frequency scaling online algorithm.
\ No newline at end of file
+The contribution of this thesis can be divided into three parts: Firstly, optimizing the trade-off between
+the energy consumption and the performance of the message passing applications with synchronous iterations
+running over homogeneous clusters. Secondly, adapting the energy and performance models to heterogeneous platforms where each node can have different specifications such as computing power, energy consumption, available frequency gears or network's latency and bandwidth. The frequency scaling algorithm was also modified to suit the heterogeneity of the platform. Thirdly, the models and the frequency scaling algorithm were completely rethought to take into considerations the asynchronism in the communication and computation.
+All these models and algorithms were applied to message passing applications with iterations and evaluated
+over either SimGrid simulator or Grid'5000 platform. The experiments showed that the proposed algorithms are
+efficient and outperform existing methods such as the energy and delay product. They also introduce a small
+runtime overhead and work online without any training or profiling.
+
+\textbf{KEY WORDS:} Dynamic voltage and frequency scaling, Grid computing, Energy optimization, parallel applications with iterations and online frequency scaling algorithm.
\ No newline at end of file
\section{Iterative Methods}
\label{ch1:3}
-In this work, we are interested in solving linear equations which are well known in the scientific area.
-It is generally expressed in the following form:
+In this work, we are interested in solving system of linear equations which are very common in the scientific field. A system of linear equations can be expressed as follows:
+
\begin{equation}
\label{eq:linear}
\end{equation}
Where $A$ is a two dimensional matrix of size $N \times N$, $x$ is the unknown vector,
-and $b$ is a vector of constant, each of size $N$. There are two types of solution methods to solve this linear system.
-The first type of methods is called \textbf{Direct methods}, which consist of a finite number of steps depending on the
-size of the linear system to give the exact solution. If the problem size is very big, these methods are expensive or their
-solutions are impossible in some cases. The second type is called \textbf{Iterative methods}, which computes
-the same block of operations several times, starting from the initial vector until reaching the acceptable
-approximation of the exact solution. However, they can be effectively applied in parallel. Moreover, iterative methods can be used to solve both linear and non-linear equations.
+and $b$ is a vector of constant, each of size $N$. There are two types of solution methods to solve this linear system: the \textbf{direct} and the \textbf{iterative methods}.
+A direct method executes a finite number of steps, depending on the
+size of the linear system and gives the exact solution of the system. If the problem is very big, this method is expensive or its
+solution is impossible in some cases. On the other hand, methods with iterations execute the same block of instructions many times. The number of iterations can be predefined or the application iterates until a criterion is satisfied. Iterative methods are methods with iterations that start from an initial guess and
+improve successively the solution until reaching an acceptable approximation of the exact solution.
+These methods are well adapted for large systems and can be easily parallelized.
-The sequential iterative algorithm is typically organized as a series of steps essentially of the form:
+A sequential iterative algorithm is typically organized as a series of steps essentially of the form:
\begin{equation}
\label{eq:iter}
\end{equation}
Where $N$ is the size of the vector $X$. Then, the iterative sequential algorithm stops iterating if the maximum error between the last two successive solution vectors, as in \ref{eq:res}, is less than or equal to a threshold value. Otherwise, it replaces the new vector $X^{(k+1)}$ with the old vector $X^k$ and computes a new iteration.
+
\subsection{Synchronous Parallel Iterative method}
\label{ch1:3:1}
The sequential iterative algorithm \ref{sia} can be parallelized by executing it on many computing units. To solve this algorithm on $M$ computing units, first the elements of the problem vector $X$ must be subdivided into $M$ sub-vectors, $X^k=(X_1^k,\dots,X_M^k)$.
performance of MPI programs can be predicted. Section~\ref{ch2:4} presents
the energy-performance objective function that maximizes the reduction of energy
consumption while minimizing the degradation of the program's performance.
-Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance
-trade-off for a parallel iterative application.
+Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance
+trade-off for a parallel application with iterations
Section~\ref{ch2:6} verifies the accuracy of the performance prediction model
and presents the results of the proposed algorithm. It also shows the
comparison results between our method and other existing methods.
\section{Performance evaluation of MPI programs}
\label{ch2:3}
-The execution time of a parallel synchronous iterative application is equal
+The execution time of a parallel synchronous application with iteration is equal
to the execution time of its slowest task as in figure~(\ref{fig:homo}).
If there is no communication in the application and it is not data bounded, the execution time of this
parallel application is linearly proportional to the operational frequency. Any
\end{tabular}
\label{table:platform-homo}
\end{table}
+
+
\subsection{Performance prediction verification}
\label{ch2:6:1}
In this section, the precision of the proposed performance prediction method
In our cluster there are 18 available frequency states for each processor. This
leads to 18 run states for each program. Seven MPI programs of the NAS
parallel benchmarks were used: CG, MG, EP, FT, BT, LU and SP. Table \ref{table:NAS-dec} shows
-the description of these seven benchmarks.
+the description of these seven benchmarks. Some of these benchmarks are considered MPI parallel applications with synchronous iterations or iterative applications that repeat the same block of instructions until convergence.
+However, the proposed method can be applied to any application that executes the same block
+of instructions many times and it is not limited to iterative methods.
Figure~(\ref{fig:pred}) presents plots of the real execution times compared to the simulated ones. The maximum
normalized error between these two execution times varies between 0.0073 to
0.031 depending on the executed benchmark. The smallest prediction error
\begin{tabular}{|l|l|l|}
\hline
Benchmark & Full Name & Description \\ \hline
-CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}}Estimate the smallest eigenvalue of a large \\ sparse symmetric positive-definite matrix \\ using the inverse iteration with the conjugate \\ gradient method as a subroutine for solving \\ systems of linear equations\end{tabular} \\ \hline
-MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}Approximate the solution to a three-dimensional \\ discrete Poisson equation using the V-cycle \\ multigrid method\end{tabular} \\ \hline
-EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}}Generate independent Gaussian random \\ variates using the Marsaglia polar method\end{tabular} \\ \hline
-FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}Solve a three-dimensional partial differential\\ equation (PDE) using the fast Fourier transform \\ (FFT)\end{tabular} \\ \hline
-BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}}Solve a synthetic system of nonlinear PDEs \\ using three different algorithms involving \\ block tridiagonal, scalar pentadiagonal and \\ symmetric successive over-relaxation \\ (SSOR) solver kernels, respectively\end{tabular}} \\ \cline{1-2}
+CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}}
+It solves a system of linear equations by estimating\\ the smallest eigenvalue of a large sparse matrix \end{tabular}\\ \hline
+
+MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}It uses the multigrid method to approximate the solution \\of a three-dimensional discrete Poisson equation\end{tabular}
+ \\ \hline
+EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}} It applies the Marsaglia polar method to randomly \\generates independent Gaussian variates
+\end{tabular} \\ \hline
+FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}It uses the fast Fourier transform to solve a \\ three-dimensional partial differential equation
+
+\end{tabular} \\ \hline
+BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}} \\They solve nonlinear partial differential equations \end{tabular}} \\ \cline{1-2}
LU & \begin{tabular}[c]{@{}l@{}}Lower-Upper symmetric \\ Gauss-Seidel\end{tabular} & \\ \cline{1-2}
-SP & \begin{tabular}[c]{@{}l@{}}Scalar \\ Pentadiagonal\end{tabular} & \\ \hline
+SP & \begin{tabular}[c]{@{}l@{}}Scalar Pentadiagonal\end{tabular} & \\ \hline
\end{tabular}
\end{table}
\end{equation}
According to this model, the frequency scaling factor $S$ reduces the energy consumption of the homogeneous architecture by a factor of $S^{-2}$ and increases the execution time by a factor of $S$.
-This model can be used to predict the energy consumption of the message passing synchronous iterative applications after gathering the computation and communication times of the first iteration.
-Furthermore, it can be used to measure the energy consumption of the iterative application by multiplying the energy consumed of all tasks in one iteration by the number of the iterations.
+This model can be used to predict the energy consumption of the message passing applications with synchronous iterations after gathering the computation and communication times of the first iteration.
+Furthermore, it can be used to measure the energy consumption of the parallel application with iterations by multiplying the energy consumed of all tasks in one iteration by the number of the iterations.
This model is used by the algorithm \ref{EPSA} to predict the energy consumption and to select the optimal frequency scaling factor. The new frequency $F_i$ can be computed as in \ref{eq:fi} while using the new selected frequency scaling factor.
In the next section, algorithm \ref{EPSA} is re-evaluated while using this new energy model and the new results are presented.
method while being executed on the SimGrid simulator. The results showed that
our method, outperforms the Rauber and Rünger's method in terms of energy-performance
ratio. Finally, this chapter presents a new energy consumption model for parallel
-synchronous iterative methods running on homogeneous clusters. This model takes into consideration
+ applications with synchronous iterations running on homogeneous clusters. This model takes into consideration
both the computation and communication times and their relation with the frequency scaling
factor. The results obtained using the new energy model have shown that different frequency scaling factors
were selected which gave new experimental results that are more accurate and realistic.
\ No newline at end of file
clusters (heterogeneous CPUs) and grid platforms are presented.
They select the frequencies that try to give the best
trade-off between energy saving and performance degradation, for each node
- computing the synchronous message passing iterative application. These algorithms have a small
+ computing the synchronous message passing application with iterations. These algorithms have a small
overhead and work without training or profiling. They use new energy models
- for message passing iterative synchronous applications running on both the heterogeneous
+ for message passing synchronous applications with iterations running on both the heterogeneous
local cluster and the grid platform. The first proposed algorithm for a heterogeneous local
cluster was evaluated on the SimGrid simulator while running the class C of the NAS parallel
benchmarks. The experiments conducted over 8 heterogeneous nodes show that it reduces on
average the energy consumption by 29.8\% while limiting the performance degradation to 3.8\%.
The second proposed algorithm for a grid platform was evaluated on the Grid5000 testbed
platform while running the class D of the NAS parallel benchmarks.
- The experiments were run on 16 nodes, distributed on three clusters, and show that it reduces
+ The experiments were run on 16 nodes, distributed on three clusters, and show that the algorithm reduces
on average the energy consumption by 30\% while the performance is on average only degraded
by 3.2\%.
Finally, both algorithms were compared to the EDP method. The comparison
selecting algorithm of synchronous message passing programs running over a grid platform.
Section~\ref{ch3:4} presents the results of applying the algorithm on the
NAS parallel benchmarks (class D) and executing them on the Grid'5000 testbed.
-The algorithm is also evaluated over multi-core architectures and over three different power scenarios. Moreover, section~\ref{ch3:4}, shows the comparison results between the proposed method and the EDP method.
+The algorithm is also evaluated over multi-core architectures and over three different power scenarios. Moreover, Section~\ref{ch3:4}, shows the comparison results between the proposed method and the EDP method.
Finally, in Section~\ref{ch3:concl} the chapter ends with a summary.
\section{Related works}
\label{ch3:relwork}
-As same as in CPUs, DVFS is also allowed in GPUs to reduce their energy consumption.
The process of selecting the appropriate frequency for a
processor to satisfy some objectives, while taking into account all the
constraints, is not a trivial operation. Many researchers used different
platform, synchronous or asynchronous application, \dots{}
In this chapter, we are interested in reducing the energy consumption when running a message passing
-iterative synchronous applications over a heterogeneous platform. Some
+synchronous applications with iterations over a heterogeneous platform. Some
works have already been done for such platforms which can be classified into
two types of heterogeneous platforms:
\begin{itemize}
overhead. In contrast to the above described works, the work of this chapter presents the
following contributions:
\begin{enumerate}
-\item two new energy and two performance models for message passing iterative
- synchronous applications running over a heterogeneous local cluster and a grid platform.
+\item two new energy and two performance models for message passing
+ synchronous applications with iterations running over a heterogeneous local cluster and a grid platform.
All the models take into account the communications and the slack times. The models can predict the
energy consumption and the execution time of the application.
local cluster and a grid platform. The algorithms have a very small overhead and do not need any
training or profiling. They use a new optimization function which
simultaneously maximizes the performance and minimizes the energy consumption
- of a message passing iterative synchronous application.
+ of a message passing synchronous application with iterations.
\end{enumerate}
-\section[The energy optimization of a heterogeneous cluster]{The energy optimization of parallel iterative applications running over local heterogeneous
+\section[The energy optimization of a heterogeneous cluster]{The energy optimization of parallel applications with iterations running over local heterogeneous
clusters}
\label{ch3:1}
-\subsection{The execution time of message passing distributed iterative
- applications on a heterogeneous local cluster}
+\subsection{The execution time of message passing distributed
+ applications with iterations on a heterogeneous local cluster}
\label{ch3:1:1}
In this section, we are interested in reducing the energy consumption of message
-passing distributed iterative synchronous applications running over heterogeneous local clusters.
+passing distributed synchronous applications with iterations running over heterogeneous local clusters.
In this work, a heterogeneous local cluster is defined as a collection of
heterogeneous computing nodes interconnected via a high speed homogeneous
network. Therefore, the nodes may have different characteristics such as computing
\label{fig:task-heter}
\end{figure}
-The overall execution time of a distributed iterative synchronous application
+The overall execution time of a distributed synchronous application with iterations
over a heterogeneous local cluster consists of the sum of the computation time and
the communication time for every iteration on a node. However, due to the
heterogeneous computation power of the computing nodes, slack times may occur
especially different frequency gears, when applying DVFS operations on these
nodes, they may get different scaling factors represented by a scaling vector:
$(S_1, S_2,\dots, S_N)$ where $S_i$ is the scaling factor of processor $i$. To
-be able to predict the execution time of message passing synchronous iterative
-applications running over a heterogeneous local cluster, for different vectors of
+be able to predict the execution time of message passing synchronous
+applications with iterations running over a heterogeneous local cluster, for different vectors of
scaling factors, the communication time and the computation time for all the
tasks must be measured during the first iteration before applying any DVFS
operation. Then the execution time for one iteration of the application with any
iteration. The model computes the maximum computation time with
scaling factor from each node added to the communication time of the slowest
node. It means only the communication time without any slack time is taken into
-account. Therefore, the execution time of the iterative application is equal to
+account. Therefore, the execution time of the application with iterations is equal to
the execution time of one iteration as in (\ref{eq:perf_heter}) multiplied by the
number of iterations of that application.
This prediction model is improved from the model that predicts the execution time
of message passing distributed applications for homogeneous
-architectures presented in chapter \ref{ch2} section \ref{ch2:3}. The execution time prediction model is
+architectures presented in Chapter \ref{ch2} Section \ref{ch2:3}. The execution time prediction model is
used in the method that optimizes both the energy consumption and the performance
-of iterative methods, which is presented in the following sections.
+of parallel application with iterations, which is presented in the following sections.
\subsection{Energy model for heterogeneous local cluster}
\label{ch3:1:2}
-In chapter \ref{ch2}, the dynamic and the static energy consumption of a
+In Chapter \ref{ch2}, the dynamic and the static energy consumption of a
processor is computed according to Equations \ref{eq:Edyn_new} and \ref{eq:Estatic_new} respectively. Then, the total energy consumption of a processor is the sum of these two metrics.
Therefore, the overall energy consumption for the parallel tasks over a parallel cluster
is the summation of the energies consumed by all the processors.
In the considered heterogeneous platform, each processor $i$ may have
different dynamic and static powers, noted as $\Pd[i]$ and $\Ps[i]$
-respectively. Therefore, even if the distributed message passing iterative
-application is load balanced, the computation time of each CPU $i$ noted
+respectively. Therefore, even if the distributed message passing
+application with iterations is load balanced, the computation time of each CPU $i$ noted
$\Tcp[i]$ may be different and different frequency scaling factors may be
computed in order to decrease the overall energy consumption of the application
and reduce the slack times. The communication time of a processor $i$ is noted as
(\ref{eq:Edyn_new}), the static energy is computed as the sum of the execution time
of one iteration as in \ref{eq:perf_heter} multiplied by the static power of each processor.
The overall energy consumption of a message passing distributed application executed over a
-heterogeneous cluster during one iteration is the summation of all the dynamic and
+heterogeneous cluster during one iteration is the summation of the dynamic and
static energies for all the processors. It is computed as follows:
\begin{equation}
\label{eq:energy-heter}
factors $(S_1, S_2,\dots, S_N)$ may degrade the performance of the application
and thus, increase the consumed static energy because the execution time is
increased~\cite{ref78}. The overall energy consumption
-for an iterative application can be measured by measuring the energy
+for an application with iterations can be measured by measuring the energy
consumption for one iteration as in (\ref{eq:energy-heter}) multiplied by the number
of iterations of that application.
appropriate frequency scaling factor for each processor while considering the
characteristics of each processor (computation power, range of frequencies,
dynamic and static powers) and the task it is executing (computation/communication
-ratio). In chapter~\ref{ch2}, we proposed a method that selects the optimal
+ratio). In Chapter~\ref{ch2}, we proposed a method that selects the optimal
frequency scaling factor for a homogeneous cluster executing a message passing
-iterative synchronous application while giving the best trade-off between the
+ synchronous application with iterations while giving the best trade-off between the
energy consumption and the performance for such applications.
In this section, this optimization method is improved while considering a heterogeneous clusters.
As described before, the relation between the energy consumption and the execution time for an
-application is complex and nonlinear. Thus, to find the trade-off relation between the energy consumption computed in Equation \ref{eq:energy-heter} and the performance with Equation \ref{eq:perf_heter} for the iterative message passing applications, first we need to normalize both term as follows:
+application is complex and nonlinear. Thus, to find the trade-off relation between the energy consumption computed in Equation \ref{eq:energy-heter} and the performance with Equation \ref{eq:perf_heter} for the message passing applications with iterations, first we need to normalize both terms as follows:
\begin{equation}
\begin{figure}[!t]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/heter}
- \caption{The energy and performance relation in Heterogeneous cluster}
+ \caption{The energy and performance relation in heterogeneous cluster}
\label{fig:rel-heter}
\end{figure}
In this section, Algorithm~\ref{HSA} is presented. It selects the frequency
scaling factors vector that gives the best trade-off between minimizing the
energy consumption and maximizing the performance of a message passing
-synchronous iterative application executed on a heterogeneous local cluster. It works
-online during the execution time of the iterative message passing program. It
+synchronous application with iterations executed on a heterogeneous local cluster. It works
+online during the execution time of the message passing program with iterations. It
uses information gathered during the first iteration such as the computation
time and the communication time in one iteration for each node. The algorithm is
executed after the first iteration and returns a vector of optimal frequency
program applies DVFS operations to change the frequencies of the CPUs according
to the computed scaling factors. This algorithm is called just once during the
execution of the program. Algorithm~\ref{dvfs-heter} shows where and when the proposed
-scaling algorithm is called in the iterative MPI program.
+scaling algorithm is called in the MPI program with iterations.
\begin{figure}[!t]
\centering
144 nodes and had nodes from the four types in equal proportions, for example if
a benchmark was executed on 8 nodes, 2 nodes from each type were used. Since the
constructors of CPUs do not specify the dynamic and the static power of their
-CPUs, for each type of node they were chosen proportionally to its computing
+CPUs, for each type of node they were chosen proportionally to their computing
powers (FLOPS). The dynamic power corresponds to 80\% of the overall power consumption while executing at
the higher frequency and the
-remaining 20\% is the static power. The same assumption was made in chapter \ref{ch2} and
+remaining 20\% is the static power. The same assumption was made in Chapter \ref{ch2} and
\cite{ref3}. Finally, These nodes were connected via an Ethernet network with 1 \textit{Gbit/s} bandwidth.
frequencies, thus, scaling down the frequency of the nodes results in higher
energy savings than in the 70\%-30\% scenario. On the other hand,
the performance degradation percentage is smaller in the 70\%-30\%
-scenario compared to the 90\%-\%10 scenario. This is due to the
+scenario compared to the 90\%-10\% scenario. This is due to the
higher static power percentage in the first scenario which makes it more
relevant in the overall consumed energy. Indeed, the static energy is related
to the execution time and if the performance is degraded the amount of consumed
decreases when the 70\%-30\% scenario is used because the dynamic
energy is less relevant in the overall consumed energy and lowering the
frequency does not return big energy savings. Moreover, the average of the
-performance degradation is decreased when using a higher ratio for static power
+performance degradation is decreased when using a higher ratio for the static power
(e.g. 70\%-30\% scenario and 80\%-20\% scenario). Since the proposed
algorithm optimizes the energy consumption when
using a higher ratio for the dynamic power, the algorithm selects bigger frequency
scaling factors that results in more energy saving but degrade the performance, for
example see Figure~\ref{fig:powers-heter} (b). The opposite happens when using a
higher ratio for the static power, the algorithm proportionally selects smaller
-scaling values which result in less energy saving but also less performance
+scaling values which results in less energy saving but also less performance
degradation.
\begin{table}[!t]
because it maximizes the distance between the energy saving and the performance
degradation values while giving the same weight for both metrics.
-\section[The energy optimization of grid]{The energy optimization of parallel iterative applications running over grids}
+\section[The energy optimization of grid]{The energy optimization of parallel applications with iterations running over grids}
\label{ch3:3}
\subsection{The energy and performance models of grid platform}
\label{ch3:3:1}
In this section, we are interested in reducing the energy consumption of message
-passing iterative synchronous applications running over
+passing applications with synchronous iterations running over
heterogeneous grid platforms. A heterogeneous grid platform could be defined as a collection of
heterogeneous computing clusters interconnected via a long distance network which has a lower bandwidth
and a higher latency than the local networks of the clusters. Each computing cluster in the grid is composed of homogeneous nodes that are connected together via a high speed network. However, nodes from distinct clusters may have different characteristics such as computing power (FLOPS), energy consumption, CPU's frequency range, network bandwidth and latency.
Since in a heterogeneous grid each cluster has different characteristics,
when applying DVFS operations on the nodes
of these clusters, they may get different scaling factors represented by a scaling vector:
-$(S_{11}, S_{12},\dots, S_{NM})$ where $S_{ij}$ is the scaling factor of processor $j$ in cluster $i$ . To
-be able to predict the execution time of message passing synchronous iterative
-applications running over a heterogeneous grid, for different vectors of
+$(S_{11}, S_{12},\dots, S_{NM})$ where $S_{ij}$ is the scaling factor of processor $j$ in cluster $i$. To
+be able to predict the execution time of message passing
+applications with synchronous iterations running over a heterogeneous grid, for different vectors of
scaling factors, the communication time and the computation time for all the
tasks must be measured during the first iteration before applying any DVFS
operation. Then the execution time for one iteration of the application with any
and the slowest communication time without slack time during one iteration.
The latter is equal to the communication time of the slowest node in the slowest cluster $h$.
It means that only the communication time without any slack time is taken into account.
-Therefore, the execution time of the iterative application is equal to
+Therefore, the execution time of the parallel application with iterations is equal to
the execution time of one iteration as in Equation (\ref{eq:perf-grid}) multiplied by the
number of iterations of that application.
In the considered heterogeneous grid platform, each node $j$ in cluster $i$ may have
different dynamic and static powers from the nodes of the other clusters,
noted as $\Pd[ij]$ and $\Ps[ij]$ respectively. Therefore, even if the distributed
-message passing iterative application is load balanced, the computation time of each CPU $j$
+message passing application with iterations is load balanced, the computation time of each CPU $j$
in cluster $i$ noted $\Tcp[ij]$ may be different and different frequency scaling factors may be
computed in order to decrease the overall energy consumption of the application
and reduce slack times. The communication time of a processor $j$ in cluster $i$ is noted as
To optimize both of the energy consumption model computed by \ref{eq:energy-grid} and the performance model computed by \ref{eq:perf-grid},
-they must be normalized as in \ref{eq:enorm-heter} and \ref{eq:pnorm-heter} Equations respectively.
+they must be normalized as in Equation \ref{eq:enorm-heter} and Equation \ref{eq:pnorm-heter} respectively.
While the original energy consumption is the consumed energy with the
maximum frequency for all the nodes computed as follows:
\begin{figure}[!t]
\centering
\includegraphics[scale=0.7]{fig/ch3/init_freq}
- \caption{Selecting the initial frequencies in grid}
+ \caption{Selecting the initial frequencies in the grid architecture}
\label{fig:st_freq-grid}
\end{figure}
is presented. It selects the vector of frequency
scaling factors that gives the best trade-off between minimizing the
energy consumption and maximizing the performance of a message passing
-synchronous iterative application executed on a grid.
+ application with synchronous iterations executed on a grid.
It is similar to the frequency selection algorithm for heterogeneous
-local clusters presented in section \ref{ch3:1:4}.
+local clusters presented in Section \ref{ch3:1:4}.
The value of the initial frequency scaling factor for each node is inversely proportional to its
computation time that was gathered in the first iteration. The initial
Therefore, the dynamic power of one core is computed as the difference between the maximum
measured value in maximum powers vector and the minimum measured value in the idle powers vector.
-On the other hand, the static power consumption by one core is a part of the measured idle power consumption of the node. Since in Grid'5000 there is no way to measure precisely the consumed static power and it was assumed, as in sections \ref{ch3:2} and \ref{ch2:6}, that the static power represents a ratio of the dynamic power, the value of the static power is assumed to be equal to 20\% of the dynamic power consumption of the core.
+On the other hand, the static power consumption by one core is a part of the measured idle power consumption of the node. Since in Grid'5000 there is no way to measure precisely the consumed static power and it was assumed, as in Sections \ref{ch3:2} and \ref{ch2:6}, that the static power represents a ratio of the dynamic power, the value of the static power is assumed to be equal to 20\% of the dynamic power consumption of the core.
In the experiments presented in the following sections, two sites of Grid'5000 were used, Lyon and Nancy sites. These two sites have in total seven different clusters as shown in Figure~\ref{fig:grid5000}.
\end{tabular}
\label{table:grid5000-1}
\end{table}
-CPUs
\subsection{The experimental results of the scaling algorithm on a Grid}
is exponentially related to the CPU's frequency value. On the other hand, the increase in the number of computing nodes can
increase the communication times and thus produces less energy saving depending on the
benchmarks being executed. The results of the benchmarks CG, MG, BT and FT show more
-energy saving percentage in the one site scenario when executed over 16 nodes than on 32 nodes. LU and SP consume more energy with 16 nodes than with 32 node on one site because their computations to communications ratio is not affected by the increase of the number of local communications.
+energy saving percentage in the one site scenario when executed over 16 nodes than on 32 nodes. LU and SP consume more energy with 16 nodes than with 32 nodes on one site because their computations to communications ratio is not affected by the increase of the number of local communications.
\begin{figure}[!h]
\centering
\centering
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/eng_s.eps}
-\caption{The energy reduction while executing the NAS benchmarks over different scenarios}
+\caption{The energy reduction percentages while executing the NAS benchmarks over different scenarios}
\label{fig:eng_s}
\end{figure*}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/per_d.eps}
-\caption{The performance degradation of the NAS benchmarks over different scenarios}
+\caption{The performance degradation percentages of the NAS benchmarks over different scenarios}
\label{fig:per_d}
\end{figure*}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/dist.eps}
-\caption{The trade-off distance between the energy reduction and the performance of the NAS benchmarks
+\caption{The trade-off distance percentages between the energy reduction and the performance of the NAS benchmarks
over different scenarios}
\label{fig:dist-grid}
\end{figure*}
\begin{figure}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/time.eps}
- \caption{The execution times of NAS benchmarks running over the one core and the multi-core scenarios}
+ \caption{The execution times of the NAS benchmarks running over the one core and the multi-core scenarios}
\label{fig:time-mc}
\end{figure}
\begin{figure}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/eng_con.eps}
- \caption{The energy consumptions and execution times of NAS benchmarks over one core and multi-core per node architectures}
+ \caption{The energy consumptions and execution times of the NAS benchmarks over one core and multi-core per node architectures}
\label{fig:eng-cons-mc}
\end{figure}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/eng_s_mc.eps}
- \caption{The energy saving of running NAS benchmarks over one core and multi-core scenarios}
+ \caption{The energy saving percentages of running NAS benchmarks over one core and multi-core scenarios}
\label{fig:eng-s-mc}
\end{figure*}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/per_d_mc.eps}
- \caption{The performance degradation of running NAS benchmarks over one core and multi-core scenarios}
+ \caption{The performance degradation percentages of running NAS benchmarks over one core and multi-core scenarios}
\label{fig:per-d-mc}
\end{figure*}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/dist_mc.eps}
- \caption{The trade-off distance of running NAS benchmarks over one core and multi-core scenarios}
+ \caption{The trade-off distance percentages of running NAS benchmarks over one core and multi-core scenarios}
\label{fig:dist-mc}
\end{figure*}
The energy saving percentages of all the NAS benchmarks running over these two scenarios are presented in Figure~\ref{fig:eng-s-mc}.
\begin{figure}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/dist_pow.eps}
- \caption{The trade-off distance between the energy reduction and the performance of the NAS benchmarks over the three power scenarios}
+ \caption{The trade-off distance percentages between the energy reduction and the performance of the NAS benchmarks over the three power scenarios}
\label{fig:dist-pow}
\end{figure}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/edp_eng}
-\caption{The energy reduction induced by the Maxdist method and the EDP method}
+\caption{The energy reduction percentages induced by the Maxdist method and the EDP method}
\label{fig:edp-eng}
\end{figure*}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/edp_per}
-\caption{The performance degradation induced by the Maxdist method and the EDP method}
+\caption{The performance degradation percentages induced by the Maxdist method and the EDP method}
\label{fig:edp-perf}
\end{figure*}
\begin{figure*}[!h]
\centering
\includegraphics[width=.7\textwidth]{fig/ch3/edp_dist}
-\caption{The trade-off distance between the energy consumption reduction and the performance for the Maxdist method and the EDP method}
+\caption{The trade-off distance percentages between the energy consumption reduction and the performance for the Maxdist method and the EDP method}
\label{fig:edp-dist}
\end{figure*}
Whereas, the EDP algorithm gives sometimes negative trade-off values for some benchmarks in the two sites scenarios.
These negative trade-off values mean that the performance degradation percentage is higher than the energy saving percentage.
The high positive values of the trade-off distance percentage mean that the energy saving percentage is much higher than the performance degradation percentage.
-The complexity of both algoriths, Maxdist and EDP, are of order $O(N \cdot M_i \cdot F_j)$ and
+The complexity of both algorithms, Maxdist and EDP, are of order $O(N \cdot M_i \cdot F_j)$ and
$O(N \cdot M_i \cdot F_j^2)$ respectively, where $N$ is the number of the clusters, $M_i$ is the number of nodes and $F_j$ is the
maximum number of available frequencies of node $j$. When Maxdist is applied to a benchmark that is being executed over 32 nodes distributed between Nancy and Lyon sites, it takes on average $0.01$ $ms$ to compute the best frequencies while the EDP method is on average ten times slower over the same architecture.
maximum distance (optimal trade-off) between the predicted energy and the
predicted performance curves for a heterogeneous cluster and grid. Both algorithms use a
new energy models for measuring and predicting the energy consumption of message passing
-iterative applications running over a heterogeneous local cluster and a grid platform.
-Firstly, the proposed scaling factors selection algorithm for a heterogeneous local cluster is applied to the class C of NAS parallel benchmarks and simulated by SimGrid.
+ applications with iterations running over a heterogeneous local cluster and a grid platform.
+Firstly, the proposed scaling factors selection algorithm for a heterogeneous local cluster is applied to the class C of the NAS parallel benchmarks and simulated by SimGrid.
The results of the simulations showed that the algorithm on average reduces by 29.8\% the energy
consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%. The algorithm also selects different scaling factors according to
the percentage of the computing and communication times, and according to the
Section~\ref{ch4:6} presents the iterative multi-splitting application which is a hybrid method and was used as a benchmark to evaluate the efficiency of the proposed algorithm.
Section~\ref{ch4:7} presents the simulation results of applying the algorithm on the multi-splitting application
and executing it on different grid scenarios. It also shows the results of running
-three different power scenarios and comparing them. Moreover, in the last subsection, the proposed algorithm is compared to the energy and delay product (EDP) method. Section \ref{ch4:8} presenting the results of real experiments executed over the Grid'5000 platform and compared to the EDP method. Finally, the chapter ends with a summary.
+three different power scenarios and comparing them. Moreover, in the last subsection, the proposed algorithm is compared to the energy and delay product (EDP) method. Section \ref{ch4:8} presents the results of real experiments executed over the Grid'5000 platform and compared to the EDP method. Finally, the chapter ends with a summary.
\label{fig:eng_time_dvfs}
\end{figure}
Figure \ref{fig:eng_time_dvfs} (a) shows that the energy
- consumption of all four versions of the method, running over the 8 grid scenarios described in Table \ref{table:comp}, are not affected by the increase in the number of computing nodes. MS without applying DVFS operations had the same behavior. On the other hand, Figure \ref{fig:eng_time_dvfs} (b) shows that the execution time of the MS application with DVFS operations
+ consumption of all four versions of the method, running over the 8 grid scenarios described in Table \ref{table:comp}, are not affected by the increase in the number of computing nodes. MS without applying DVFS operations had the same behaviour. On the other hand, Figure \ref{fig:eng_time_dvfs} (b) shows that the execution time of the MS application with DVFS operations
decreases in inverse proportion to the number of nodes. Moreover, it can be noticed that the asynchronous MS with synchronous DVFS consumes less energy when compared to the other versions of the method. Two reasons explain this energy consumption reduction:
\begin{enumerate}
- \item The asynchronous MS with synchronous DVFS version uses synchronous DVFS communications which allow it to apply the new computed frequencies at the begining of the second iteration. Thus, reducing the consumption of dynamic energy by the application from the second iteration until the end of the application. Whereas in
- asynchronous DVFS versions where the DVFS communications are asynchronous, the new frequencies cannot be computed at the end of the first iteration and consequently cannot be applied at the begining of the second iteration.
+ \item The asynchronous MS with synchronous DVFS version uses synchronous DVFS communications which allow it to apply the new computed frequencies at the beginning of the second iteration. Thus, reducing the consumption of dynamic energy by the application from the second iteration until the end of the application. Whereas in
+ asynchronous DVFS versions where the DVFS communications are asynchronous, the new frequencies cannot be computed at the end of the first iteration and consequently cannot be applied at the beginning of the second iteration.
Indeed, since the performance information gathered during the first iteration is not sent synchronously at the end of the first iteration, fast nodes might execute many iterations before receiving the performance information, computing the new frequencies based on this information and applying the new computed frequencies. Therefore, many iterations might be computed by CPUs running on their highest frequency and consuming more dynamic energy than scaled down processors.
\item As shown in Figure \ref{fig:eng_time_ms} (b), the execution time of the asynchronous MS version is lower than the execution time of the synchronous MS version because there is no idle time in the asynchronous version and the communications are overlapped by computations. Since the consumption of static energy is proportional to the execution time, the asynchronous MS version consumes less static energy than the synchronous version.
$16.9\%$. While the worst case is the synchronous MS with synchronous DVFS where the performance is on average degraded by $2.9\%$ when compared to the reference method.
- The energy consumption and performance tradeoff between these five versions is presented in Figure \ref{fig:dist}.
+ The energy consumption and performance trade-off between these five versions is presented in Figure \ref{fig:dist}.
These distance values are computed as the differences between the energy saving
and the performance degradation percentages as in the optimization function
(\ref{eq:max-grid}). Thus, the best MS version is the one that has the maximum distance between the energy saving and performance degradation. The distance can be negative if the energy saving percentage is less than the performance degradation percentage.
Table \ref{table:exper} shows that there are positive and negative performance
degradation percentages. A negative value means that the new execution time of a given version of the application is less than the execution time of the synchronous MS without DVFS.
Therefore, the version with the smallest negative performance degradation percentage has actually the best speed up when compared to the other versions.
- The energy consumption and performance tradeoffs between these four versions can be computed as in the optimization Function
+ The energy consumption and performance trade-offs between these four versions can be computed as in the optimization Function
(\ref{eq:max-grid}). The asynchronous MS applying synchronously the HSA algorithm gives the best distance which is equal to $48.41\%$.
This version saves up to $26.93\%$ of energy and even reduces the execution time of the application by
$21.48\%$. This overall improvement is due to combining asynchronous computing and the synchronous application of the HSA algorithm.
Finally, this section shows that the obtained results over Grid'5000 are comparable to the
-simulation results of section \ref{ch4:7:2}, the asynchronous MS applying synchronously the HSA algorithm is the best version in both of sections. Moreover, the results over Grid'5000 are better
+simulation results of Section \ref{ch4:7:2}, the asynchronous MS applying synchronously the HSA algorithm is the best version in both of sections. Moreover, the results over Grid'5000 are better
than simulation results because the computing clusters used in the Grid'5000 experiments are more heterogeneous in term of the computing power and network characteristics than the simulated platform with SimGrid. For example, the nodes in StRemi cluster have lower computing powers compared to the other used three clusters of Grid'5000 platform.
As a result, the increase in the heterogeneity between the clusters' computing nodes increases the idle times which forces the proposed algorithm to select a big scaling factors and thus saving more energy.
\subsection{Comparing the HSA algorithm to the energy and delay product method}
\label{res-comp}
-The EDP algorithm, described in section \ref{ch4:7:5}, was applied synchronously and asynchronously to both the synchronous and asynchronous MS application of size $N=400^3$. The experiments were conducted over 4 distributed clusters, described in Table \ref{table:grid5000}, and 8 homogeneous nodes were used from each cluster.
+The EDP algorithm, described in Section \ref{ch4:7:5}, was applied synchronously and asynchronously to both the synchronous and asynchronous MS application of size $N=400^3$. The experiments were conducted over 4 distributed clusters, described in Table \ref{table:grid5000}, and 8 homogeneous nodes were used from each cluster.
Table \ref{table:comapre} presents the results of energy saving, performance degradation and distance percentages when applying the EDP method on four different MS versions.
Figure \ref{fig:compare} compares the distance percentages, computed as the difference between energy saving and performance degradation percentages, of the EDP and HSA
algorithms. This comparison shows that the proposed HSA algorithm gives better energy reduction and performance trade-off than the EDP method. EDP gives better results when evaluated over Grid'5000 than over the simulator because the nodes used from Grid'5000 are more heterogeneous than those simulated with SimGrid.
\caption{Comparing the trade-off percentages of HSA and EDP methods over the Grid'5000}
\label{fig:compare}
\end{figure}
+
+
+
\section{Conclusions}
\label{ch4:9}
-
This chapter presents a new online frequency selection algorithm for asynchronous iterative
applications running over a grid. It selects the best vector of frequencies that maximizes
the distance between the predicted energy consumption and the predicted execution time.
The proposed algorithm was evaluated twice over the SimGrid simulator and Grid'5000 testbed while running a multi-splitting (MS) application that solves 3D problems.
The experiments were executed over different
grid scenarios composed of different numbers of clusters and different numbers of nodes per cluster.
- The HSA algorithm was applied synchronously and asynchronously on a synchronous and an asynchronous version of the MS application. Both the simulation and real experiment results show that applying synchronous HSA algorithm on an asynchronous MS application gives the best tradeoff between energy consumption reduction and performance compared to other scenarios.
+ The HSA algorithm was applied synchronously and asynchronously on a synchronous and an asynchronous version of the MS application. Both the simulation and real experiment results show that applying synchronous HSA algorithm on an asynchronous MS application gives the best trade-off between energy consumption reduction and performance compared to other scenarios.
In the simulation results, this scenario saves on average the energy consumption by 22\% and reduces the execution time of the application by 5.72\%. This version optimizes both of the dynamic energy consumption by applying synchronously the HSA algorithm at the end of the first iteration and the static energy consumption by using asynchronous communications between nodes from different clusters which are overlapped by computations. The HSA algorithm was also evaluated over three power scenarios. As expected, the algorithm selects different vectors of frequencies for each power scenario. The highest energy consumption reduction was achieved in the power scenario with the highest dynamic power and the lowest performance degradation was obtained in the power scenario with the highest static power.
The proposed algorithm was compared to another method that
uses the well known energy and delay product as an objective function.
\section{Conclusion}
In this dissertation, we have proposed a method to optimize both the energy consumption and
-the performance of synchronous and asynchronous iterative applications running over
+the performance at the same time of synchronous and asynchronous applications with iterations running over
cluster and grid platforms. Dynamic voltage and frequency scaling (DVFS) technique was used to
lower the frequency of the processor to reduce its energy consumption while computing. Reducing
-the frequency of the processor reduces the computing power of a processor and thus the performance of the application running over that processor is decreased too. However, as the first step in this work, different energy consumption and performance models were developed to effectively used in the prediction and measurement processes for energy and performance of iterative applications. Depending on these models, an objective function was defined the best trade-off relation between the energy and the performance. Then, this objective function was used in algorithms which optimize both energy consumption and the performance of the iterative application at the same time.
+the frequency of the processor decreases its computing power which might increase the execution time.
+In this work, different energy consumption and performance models were developed to predict the energy consumption and performance of parallel applications with iterations. Depending on these models, an objective function was defined as the best trade-off relation between the energy consumption and the performance of the parallel application. This objective function was used in the frequency selecting algorithms which optimize at the same time both the energy consumption and the performance of the parallel application with iterations.
-The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been categorized using some hardware and software techniques. Different parallel architectures have been
-demonstrated and classified according to how the processing units and the memory model connected to each others. Both shared and distributed platforms as well as their depending parallel programming models have been categorized. The two types of parallel iterative applications: synchronous and asynchronous ones have
-been discussed and compared to each others. It showed that applying the synchronous iterative methods are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous iterative methods are more suited to the distributed heterogeneous clusters. At the end of this chapter, an energy consumption model proposed in the literature to measure the energy consumption of parallel applications was explained. This model does not consider the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture when there are different power consumption values for each type of processor.
+The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been classified according to the used hardware and software techniques. Different parallel architectures have also been
+described and classified according to the used memory model: shared and distributed memory. The two types of parallel applications with iterations: synchronous and asynchronous ones have been discussed and compared to each others. Synchronous distributed applications are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous ones are more suited to grids. At the end of this chapter, an energy consumption model proposed in the literature to estimate the energy consumption of parallel applications was explained. This model does not take into account the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture where each type of processor might have a different power consumption value.
-In the second part of the dissertation, we have developed a new energy and performance models for
-synchronous and asynchronous message passing applications running over a cluster and a grid. To simultaneously optimize the energy and performance of these applications, the trade-off relation between the two metrics have been defined as a maximum distance between the predicted energy and performance curves to select the best possible frequency scaling factors. We have proposed four different frequencies selecting algorithms to select the best frequencies that gives minimum energy reduction with maximum possible performance at the same time. They used the computation and communication times measured at the first iteration of the iterative application to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online with a very small overhead on the execution time of a parallel application.
+In the second part of the dissertation, a new energy and performance models for
+synchronous and asynchronous message passing applications with iterations running over clusters and grid, were presented. To simultaneously optimize the energy and performance of these applications, the trade-off relation has been defined as the maximum distance between the predicted energy and performance curves. This objective function is used by the frequency selecting algorithm to select the available frequency scaling factors that give the optimal energy consumption to performance trade-off. We have proposed four different frequency scaling algorithms, each one of them is adapted to a different execution context, such as synchronous or asynchronous communications, homogeneous or heterogeneous nodes, and local or distributed architectures. They used the computation and communication times measured at the first iteration of the parallel application with iterations to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online and introduce a very small runtime overhead. They also do not require any profiling or training.
-In chapter \ref{ch2}, we were proposed a new online scaling factor selection method that optimized simultaneously the energy and performance of a distributed synchronous application running on a homogeneous cluster. We have applied this algorithm to the NAS benchmarks of the class C and evaluated by the SimGrid simulator. Firstly, Rauber and Rünger’s energy model used by the proposed algorithm to select best frequency gear. The proposed algorithm was compared to the Rauber and Rünger optimization method, which gives better energy and performance trade-off ratios compare to them. Secondly, a new energy consumption model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new enenrgy model was used by the proposed algorithm to select different frequencies. Thus, a new simulation results have been shown, which are more accurate and realistic than other results obtained using the Rauber and Rünger's energy model.
-In chapter \ref{ch3}, we were proposed two new online frequency scaling factors selecting algorithms to select the best possible vectors of frequency scaling factors that give best trade-off between the predicted energy and the predicted performance values of synchronous iterative application running over a heterogeneous cluster and a grid. Each algorithm depends on a new energy and performance models, which takes into account the underline parallel platform being used. Firstly, the proposed
-scaling factors selection algorithm for a heterogeneous local cluster was implemented to the NAS parallel benchmarks of the class C and simulated by SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%.
-Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and when different values have been used for static and dynamic powers of the CPU. Secondly, the proposed scaling factors selection algorithm for a grid was implemented to the NAS parallel benchmarks of the class D and executed over Grid5000 testbed platform. The experiments on 16 nodes, distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% the performance.
-The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architecture or consume different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs.
-Both of the proposed algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting a vectors of frequencies that give a better trade-off between energy consumption reduction and performance.
-In chapter \ref{ch4}, we were presented a new online frequency selection algorithm for asynchronous iterative applications running over a grid. The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing
+In chapter \ref{ch2}, a new online scaling factor selection method that optimizes simultaneously the energy and performance of a distributed synchronous application with iterations running on a homogeneous cluster has been proposed. This algorithm was applied to the NAS benchmarks of the class C and executed over the SimGrid simulator. Firstly, Rauber and Rünger’s energy model was used in the proposed algorithm to select the best frequency gear. The proposed algorithm was compared to the Rauber and Rünger's optimization method. The results of the comparison showed that the proposed algorithm gives better energy to performance trade-off ratios compared to their methods while using the same energy model. Secondly, a new energy consumption model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new energy model was used by the proposed algorithm. The new simulation results demonstrated that the new model is more accurate and realistic than the previous one.
+
+In chapter \ref{ch3}, two new online frequency scaling factors selecting algorithms adapted for synchronous application with iterations running over a heterogeneous cluster and a grid were presented. Each algorithm uses new energy and performance models which take into account the characteristics of the parallel platform being used. Firstly, the proposed
+scaling factors selection algorithm for a heterogeneous local cluster was applied to the NAS parallel benchmarks and evaluated over SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the class C of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance to 3.8\%.
+Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and when different static and dynamic CPU powers have been used. Secondly, the proposed scaling factors selection algorithm for a grid was applied to the NAS parallel benchmarks and the class D of these benchmarks was executed over the Grid5000 testbed platform. The experiments conducted over 16 nodes distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% their performance.
+The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architectures or consume different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs.
+Both of the proposed algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting vectors of frequencies that give a better trade-off between energy consumption reduction and performance.
+
+In chapter \ref{ch4}, a new online frequency selection algorithm were adapted for asynchronous iterative applications running over a grid was presented. The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing
iterative applications running over a grid. The proposed algorithm was evaluated twice
over the SimGrid simulator and Grid’5000 testbed while running a multi-splitting (MS)
application that solves 3D problems. The experiments were executed over different grid
scenarios composed of different numbers of clusters and different numbers of nodes
-per cluster. The proposed algorithm was applied synchronously and asynchronously on a
-synchronous and an asynchronous version of the MS application. Both the simulation
-and real experiment results show that applying synchronous frequency selecting algorithm on an
+per cluster. The proposed algorithm was applied synchronously and asynchronously on
+synchronous and asynchronous versions of the MS iterative application. Both the simulations
+and real experiments results showed that applying synchronously the frequency selecting algorithm on an
asynchronous MS application gives the best tradeoff between energy consumption reduction
-and performance compared to other scenarios. In the simulation results, this scenario
-saves on average the energy consumption by 22\% and reduces the execution time of
+and performance when compared to the other scenarios. In the simulation results, this scenario
+reduces on average the energy consumption by 22\% and decreases the execution time of
the application by 5.72\%. This version optimizes both of the dynamic energy
-consumption by applying synchronously the HSA algorithm at the end of the first iteration and the
-static energy consumption by using asynchronous communications between nodes from
+consumption by applying synchronously the HSA algorithm at the end of the first iteration of the iterative application and the static energy consumption by using asynchronous communications between nodes from
different clusters which are overlapped by computations. The proposed algorithm was also
-evaluated over three power scenarios, which selects different vectors of frequencies proportionally to dynamic and static powers values. More energy reduction has been achieved when the ratio of the
-dynamic power have been increase and vice versa. whereas, the performance degradation percentages were decreased when the static power ratio has been increased.
-In the Grid’5000 results, this scenario saves the energy consumption by 26.93\% and
-reduces the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid.
-In both of the Simulation and Grid'5000 testbed experiments, we have compared the proposed algorithm to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives
+evaluated over three power scenarios which selects different vectors of frequencies proportionally to the dynamic and static powers values. More energy reduction was achieved when the ratio of the
+dynamic power was increased and vice versa. Whereas, the performance degradation percentages were decreased when the static power ratio was increased.
+In the Grid’5000 experiments, this scenario reduces the energy consumption by 26.93\% and
+decreases the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid.
+In both of the Simulations and real experiments, the proposed algorithm was compared to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives
a better trade-off between the energy consumption reduction and the performance.
-Finally, we outline some perspectives that will be applied to this work in the future as in the next section.
\section{Perspectives}
In the near future, we will adapt the proposed algorithms to take into consideration the variability between some iterations. For example, each proposed algorithm can be executed twice: after the first iteration the frequencies are scaled down according to the execution times measured in the first iteration, then after a fixed number of iterations, the frequencies are adjusted according to the execution times measured during the fixed number of iterations. If the computing power of the system is constantly changing, it would be interesting to implement a mechanism that detects this change and adjusts the frequencies according to the variability of the system.
+Also, it would be interesting to evaluate the scalability of the proposed algorithms by running them on large platforms composed of many thousands of cores. The scalability of the algorithms can be improved by distributing them in a hierarchical manner where a leader is chosen for each cluster or a group of nodes to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration.
-Also, it would be interesting to evaluate the scalability of the proposed algorithm by running it on large platforms composed of many thousands of cores. The scalability of the algorithm can be improved by distributing it in a hierarchical manner where a leader is chosen for each cluster or a group of nodes to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration.
-
-The proposed algorithms would evaluate to other message passing iterative methods in order to see how it adapts to the characteristics of the new method.
+The proposed algorithms should be applied to other message passing methods with iterations in order to see how they adapt to the characteristics of these methods.
Also, it would be interesting to explore if a relation can be found between the numbers of asynchronous iterations required to global convergence and the applied frequencies to the
nodes. The number of iterations required by each node for global convergence
is not known in advance and the change in CPUs frequencies changes the
number of iterations required by each node for global convergence.
-Furthermore, the proposed algorithms for a heterogeneous platforms, chapters \ref{ch3} and \ref{ch4}, should be applied to a heterogeneous platform composed of CPUs and GPUs, where all works of energy minimization in the literature over a collection of GPUs and CPUs platform have been shown more energy efficiency compared those composed from only CPUs.
+Furthermore, the proposed algorithms for heterogeneous platforms, in chapters \ref{ch3} and \ref{ch4}, should be applied to heterogeneous platforms composed of CPUs and GPUs. Indeed, most of the works in the
+green computing field showed that these mixed platforms of GPUs and CPUs are more energy efficient than those composed of only CPUS.
-Finally, it would be interesting to compare the accuracy of the
-results of the proposed energy models to the values given by instruments
+Finally, it would be interesting to verify the accuracy of the
+results returned by the energy models by comparing them to the values given by instruments
that measure the energy consumptions of CPUs during the execution time, as in \cite{ref106}.
-
-
-
-
-
-
\addcontentsline{toc}{chapter}{Dedication}
%DEDICATION
-
-
\section*{1. General Introduction}
\addcontentsline{toc}{section}{1. General Introduction }
-
-The need and demand for more computing power have been increasing since the
-birth of the first computing unit and it is not expected to slow down in the
-coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Also, researchers and supercomputers
+The need and the demand for more computing power have been increasing since the
+birth of the first computing unit and they are not expected to slow down in the
+coming years. To meet these demands, at first the frequency of the CPU was regularly increased until reaching the thermal limit. Then, researchers and supercomputers
constructors have been regularly increasing the number of computing cores and
processors in supercomputers. Many parallel and distributed architectures, such as multi-core, clusters and grids, were implemented in order to obtain more computing power. This approach consists in using at the same time many computing nodes to solve a big problem that cannot be solved on a single node.
-Therefore, these two common approaches are the most common up to now to get more computing power, but they are increasing the energy consumption of the resulting computing architecture.
-Indeed, the power consumed by a processor exponentially increases when its frequency increases and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
+These two approaches are the most common up to now to get more computing power, but they increase the energy consumption of the resulting computing architecture.
+Indeed, the power consumed by a processor exponentially increases when its frequency is increased and a platform consisting of $N$ computing nodes consumes as much as the sum of the power consumed by each computing node.
As an example, the Chinese supercomputer
Tianhe-2 had the highest FLOPS in November 2015 according to the Top500 list
\cite{ref101}. However, it was also the most power hungry
-platform with its over 3 million cores consuming around 17.8 megawatts.
+platform with more than 3 million cores consuming around 17.8 megawatts.
Moreover, according to the U.S. annual energy outlook 2015
\cite{ref102}, the price of energy for 1 megawatt per hour
was approximately equal to \$70. Therefore, the price of the energy consumed by
the Tianhe-2 platform is approximately more than \$10 million each year.
-Moreover, the heat generated by the platform and therefore a cooling infrastructure \cite{ref103} which also consumes a lot of energy, must be implemented to keep the platform from overheating. High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details.
-The computing platforms must be more energy efficient and offer the highest number
+Moreover, the platform generates a lot of heat and to prevent it from overheating a cooling
+infrastructure \cite{ref103} which consumes a lot of energy must be implemented.
+ High CPU's temperatures can also drastically increase its energy consumption, see \cite{ref104} for more details.
+An efficient computing platform must offer the highest number
of FLOPS per watt possible, such as the Shoubu-ExaScaler from RIKEN
which became the top of the Green500 list in November 2015 \cite{ref105}.
-This heterogeneous platform executes more than 7 GFlops per watt while consuming
+This heterogeneous platform executes more than 7 GFlops per watt while only consuming
50.32 kilowatts.
For all these reasons energy reduction has become an important topic in the high performance
reduction, the chosen frequency scaling factor must produce the most energy efficient
execution without considering the degradation of the performance. Whereas, it is
important to notice that lowering the frequency to the minimum value does not always
-give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, the more important question is how to select the best frequency gears that minimizes the total energy consumption and the maximizes the performance of parallel application running over a parallel platform at the same time?
+give the most energy efficient execution due to energy leakage that increases the total energy consumption of the CPU when the execution time increases. However, a more important question is how to select the best frequency gears that minimize the total energy consumption and the maximize the performance of a parallel application, running over a parallel platform, at the same time?
\section*{2. Motivation of the Dissertation}
\addcontentsline{toc}{section}{2. Motivation of the Dissertation }
-The main objective of HPC systems is to execute as fast as possible the sequential application over a parallel architecture.
-Hence, using DVFS to scale down the frequencies of CPUs composing the parallel architecture for energy reduction process, it can also significantly degrade the performance of the executed program if it is compute bound, when the program depends mainly of the computing power of the processor, and if a low CPU frequency is selected.
+The main objective of an HPC system such as clusters, grids and supercomputers is to execute as fast as possible a given task over that system.
+Hence, using DVFS to scale down the frequencies of the CPUs composing the system to reduce their energy consumption, it can also significantly degrade the performance of the executed program, especially if it is compute bound. A compute bound program contain a lot of computations and a relatively small amount of communicators and Inputs/Outputs operations. The execution time of the program is directly dependent on
+the computing powers of the CPUs and their selected frequencies.
Therefore, the chosen frequency scaling factor must give the best possible trade-off between the energy reduction and the performance of the parallel application.
-On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of the parallel applications when scaling the frequency of processors executing them because one affects the other. There are a very few works in the state of the art have been dedicated for this problem. Therefore, mathematical models of both the energy consumption and performance of the parallel application running over a parallel platform are required and should be defined precisely to discover the best relation between them.
+On the other hand, the relation between energy consumption and the execution time of parallel applications is complex and non-linear. It is very hard to optimize both the energy consumption and the performance of parallel applications when scaling the frequency of the processors executing them because one affects the other. In order to evaluate the impact of scaling down the CPU's frequency on its energy consumption and computing power, mathematical models should be defined to predict them for different frequencies.
+
-Furthermore, researchers use different optimization strategies to select the frequencies that gives the best trade-off between the energy reduction and performance degradation ratio, which might be chosen during execution (online) or during a pre-execution phase (offline). Thus, the best optimization approach to optimize the energy consumption and the performance at the same time must be applied online with a very low overhead on the execution time of the parallel application.
+Furthermore, researchers use different optimization strategies to select the frequencies of the CPUs. They might be executed during the execution of the application (online) or during a pre-execution phase (offline). In our opinion a good approach should minimize the energy consumption while preserving the performance at the same time. Finally, it should also be applied to the application during its execution without requiring any training or profiling and with minimal overhead.
\section*{3. Main Contributions of this Dissertation}
\addcontentsline{toc}{section}{3. Main Contributions of this Dissertation}
-The main contributions of this dissertation focus on optimizing both the energy consumption and the performance of iterative parallel applications running over clusters and grids. The main contributions of this work summarize as follows:
+The main objective of this work is to minimize the energy consumption of parallel applications with iterations running over clusters and grids while preserving their performance. The main contributions of this work can be summarized as follows:
\begin{enumerate} [I)]
-\item We develop an energy consumption and performance models for synchronous and asynchronous message passing iterative applications. These models take into consideration both the computation and communications times of these application, in addition to their relation with frequency scaling factors.
+\item Energy consumption and performance models for synchronous and asynchronous message passing applications with iterations were developed. These models take into consideration both the computation and communications times of these applications in addition to their relation to the frequency scaling factors.
-\item The iterative parallel applications were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connecting these nodes on the energy consumption and the performance of iterative applications.
+\item The parallel applications with iterations were executed over different parallel architectures such as: homogeneous local cluster, heterogeneous local cluster and distributed clusters (grid platform). The main goal behind using these different platforms is to study the effect of the heterogeneity in the computing powers of the the commuting nodes and the heterogeneity in the communication networks which connect these nodes on the energy consumption and the performance of parallel applications with iterations.
-\item Depending on the proposed energy consumption and the performance models, we define a new objective function to optimize both the energy consumption and the performance of the iterative parallel applications at the same. The proposed objective function compute the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them.
+\item Depending on the proposed energy consumption and the performance models, a new objective function to optimize both the energy consumption and the performance of the parallel applications with iterations at the same were defined. It computes the maximum distance between the predicted energy consumption and the predicted performance curves to define the best possible trade-off between them.
-\item a new online frequencies selecting algorithms for cluster and grid are developed which used the new objective function. These algorithms selected the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they are working without training and profiling.
+\item New online frequency selecting algorithms for clusters and grids were developed. They use the new objective function and select the frequency scaling factors that simultaneously optimize both the energy consumption and performance. They have a very small overhead when comparing them to other methods in the state of the art and they work without training and profiling.
-\item We conducted extensive simulation experiments over SimGrid simulator \cite{ref66}, which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with simulation ones.
-The experimental results were executed over different number of nodes and different platform scenarios.
-\item In both the simulation and real experiments, NAS parallel benchmarks \cite{ref65} and Multi-splitting method solving 3D problem with different sizes used as a parallel applications were executed on clusters and grids. The final goal, is to evaluate the proposed methods over these applications and test their adaptation to these applications when different computation and communication ratios are existed.
+\item The proposed algorithms were applied to the NAS parallel benchmarks \cite{ref65} and the Multi-splitting method. These applications offer different computations to communications ratios and a good testbed to evaluate the proposed algorithm in different scenarios.
-\item All the proposed methods of this work compared with two approaches found in the literature: Rauber and Rünger \cite{ref47} and Spiliopoulos et al. \cite{ref67} methods. Both the simulation and real testbed results showed that the proposed methods gives better energy and performance trade-off ratios than these methods.
+\item The proposed algorithms were evaluated over the SimGrid simulator \cite{ref66} which offers flexible and easy tools to built different types of parallel architectures. Furthermore, real experiments were conducted over Grid'5000 testbed \cite{ref21} and compared with the simulated ones.
+The experiments were conducted over different number of nodes and different platform scenarios.
+\item All the proposed methods were compared with either Rauber and Rünger \cite{ref47} method or Spiliopoulos et al. \cite{ref67} objective function. Both the simulation and real experiments showed that the proposed methods give better energy to performance trade-offs than the other methods.
\end{enumerate}
-
\section*{4. Dissertation Outline}
\addcontentsline{toc}{section}{4. Dissertation Outline}
-The dissertation is organized as follows: chapter \ref{ch1} presents a scientific background about types of parallel architectures, the parallel iterative applications and the energy consumption model from the state of the art that can be used to measure the energy consumption of these applications.
-Chapter \ref{ch2} describes the propose energy and performance optimization method for synchronous iterative applications running over homogeneous cluster. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous iterative applications running over heterogeneous cluster and grid. Chapter \ref{ch4} presents
-the proposed energy and performance optimization method for asynchronous iterative applications running over grid. Finally, we conclude our work of this dissertation in chapter \ref{ch5}.
+The dissertation is organized as follows: chapter \ref{ch1} presents different types of parallel architectures and parallel applications with iterations. It also presents an energy consumption model from the state of the art that can be used to measure the energy consumption of these applications.
+Chapter \ref{ch2} describes the proposed energy and performance optimization method for synchronous applications with iterations running over homogeneous clusters. Chapter \ref{ch3} presents two algorithms for the energy and performance optimization of synchronous applications with iterations running over heterogeneous clusters and grids. In chapter \ref{ch4} the energy and performance models and the optimization method are adapted for asynchronous iterative applications running over grids. Finally, this dissertation ends with a summary and some perspective works.
\ No newline at end of file
-
\chapter*{Publications}
\label{chap:publications}
\addcontentsline{toc}{part}{Publications}
\section*{Journal Articles}
+\begin{enumerate}[$\lbrack$1$\rbrack$]
+\item Ahmed Fanfakh, Jean-Claude Charr, Raphaël Couturier, Arnaud Giersch. Optimizing Energy Consumption with
+ DVFS for Message Passing Iterative Applications on Grid Architectures. \textit{Journal of Computational
+ Science, 2016, ($1^{st}$ Revision Submitted)}
+\item Ahmed Fanfakh, Jean-Claude Charr, Raphaël Couturier, Arnaud Giersch. Energy Consumption Reduction for
+ Asynchronous Message Passing Applications. \textit{Journal of Supercomputing, 2016, (Submitted)}
+\end{enumerate}
\section*{Conference Articles}
+\begin{enumerate}[$\lbrack$1$\rbrack$]
+
+\item Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh, Arnaud Giersch. Dynamic Frequency Scaling for
+ Energy Consumption Reduction in Distributed MPI Programs. \textit{ISPA 2014: The $12^{th}$ IEEE
+ International Symposium on Parallel and Distributed Processing with Applications}, pp.
+ 225-230. IEEE Computer Society, Milan, Italy (2014).
+
+\item Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh, Arnaud Giersch. Energy Consumption Reduction
+ with DVFS for Message Passing Iterative Applications on Heterogeneous Architectures.
+ \textit{The $16^{th}$ IEEE International Workshop on Parallel and Distributed Scientific and
+ Engineering Computing}. pp. 922-931. IEEE Computer Society, INDIA (2015).
+
+\item Ahmed Fanfakh, Jean-Claude Charr, Raphaël Couturier, Arnaud Giersch. CPUs Energy Consumption
+ Reduction for Asynchronous Parallel Methods Running over Grids. \textit{The $19^{th}$ IEEE
+ International Conference on Computational Science and Engineering}. IEEE Computer Society,
+ Paris (2016). \textit{(Submitted)}
+
+\end{enumerate}
+
+
+\cleardoublepage
\ No newline at end of file
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\include{Abstruct}
-\include{Resume}
+%\include{Resume}
%% Sommaire
\tableofcontents
\setlength{\parindent}{0.5cm}
\addcontentsline{toc}{chapter}{List of Abbreviations}
-%% tr
+
\include{ACRONYMS}
-\include{Dedication}
+%\include{Dedication}
-\include{acknowledgements}
+%\include{acknowledgements}
%% Citation
%\include{CITATION}
\include{CHAPITRE_01}
-
\part{Contributions}
\include{CHAPITRE_02}
+
\include{CHAPITRE_03}
+
\include{CHAPITRE_04}
\part{Conclusion and Perspectives}
%% Conclusion et perspectives
\include{CONCLUSION}
-
\setlength{\parindent}{0cm}
--%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Publications
\include{PUBLICATIONS}
\usepackage{afterpage}
\usepackage{commath}
\usepackage[autolanguage,np]{numprint}
+
%\usepackage{algorithm2e}
\newcommand\blankpage{%
\null
%% The second mandatory parameter is the date of the PhD defense.
%% The third mandatory parameter is the reference number given by the University Library after the PhD defense.
%\declarethesis[Sous-titre]{Titre}{17 septembre 2012}{XXX}
-\declarethesis{Energy Consumption Optimization of Parallel Iterative Applications using CPU Frequency Scaling}{Date}{21306697}
+\declarethesis{Energy Consumption Optimization of Parallel Applications with Iterations using CPU Frequency Scaling}{Date}{21306697}
%%--------------------
%% Set the author of the PhD thesis
% \addjury {} {Prof Ye-Qiong SONG} {University of Lorraine} {Reviewer}
%\addjury{} {Assoc Prof Hamida SEBA (HDR)} {University of Claude Bernard Lyon1} {Reviewer}
%\addjury {}{Prof Sylvain CONTASSOT-VIVIER} {University of Lorraine} {Examiner}
-%\addjury {} {Prof Raphaël Couturier} {University of Franche-Comt\'e} {Supervisor}
-%\addjury {} {Asst Prof Karine Deschinkel} {University of Franche-Comt\'e} { Co-supervisor}
+\addjury {} {Prof Raphaël Couturier} {University of Franche-Comt\'e} {Supervisor}
+\addjury {} {Asst Prof Jean-Claude Charr} {University of Franche-Comt\'e} { Co-supervisor}
%\addjury {} {Asst Prof Michel Salomon} {University of Franche-Comt\'e} {Co-supervisor}
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:39:41 2015
+%%CreationDate: Tue Jun 14 18:30:33 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:39:41 2015)
+ /CreationDate (Tue Jun 14 18:30:33 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 17:45:05 2015
+%%CreationDate: Tue Jun 14 18:34:45 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 17:45:05 2015)
+ /CreationDate (Tue Jun 14 18:34:45 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:45:35 2015
+%%CreationDate: Tue Jun 14 18:33:00 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:45:35 2015)
+ /CreationDate (Tue Jun 14 18:33:00 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)]
] -60.0 MCshow
grestore
LTb
1.000 UL
LT0
0.10 0.10 0.44 C LCb setrgbcolor
-1761 3246 M
+1761 3324 M
[ [(Helvetica) 140.0 0.0 true true 0 (10% static power)]
] -46.7 MRshow
LT0
-0.10 0.10 0.44 C 1.000 1827 3213 327 66 BoxColFill
+0.10 0.10 0.44 C 1.000 1827 3291 327 66 BoxColFill
0.10 0.10 0.44 C 1.000 921 352 88 2178 BoxColFill
0.10 0.10 0.44 C 1.000 1467 352 88 2060 BoxColFill
0.10 0.10 0.44 C 1.000 2013 352 88 1905 BoxColFill
1.000 UL
LT1
0.24 0.70 0.44 C LCb setrgbcolor
-1761 3114 M
+1761 3192 M
[ [(Helvetica) 140.0 0.0 true true 0 (20% static power)]
] -46.7 MRshow
LT1
-0.24 0.70 0.44 C 1.000 1827 3081 327 66 BoxColFill
+0.24 0.70 0.44 C 1.000 1827 3159 327 66 BoxColFill
0.24 0.70 0.44 C 1.000 1030 352 88 1855 BoxColFill
0.24 0.70 0.44 C 1.000 1576 352 88 2234 BoxColFill
0.24 0.70 0.44 C 1.000 2122 352 88 1856 BoxColFill
1.000 UL
LT0
1.00 0.71 0.76 C LCb setrgbcolor
-1761 2982 M
+1761 3060 M
[ [(Helvetica) 140.0 0.0 true true 0 (30% static power)]
] -46.7 MRshow
LT0
-1.00 0.71 0.76 C 1.000 1827 2949 327 66 BoxColFill
+1.00 0.71 0.76 C 1.000 1827 3027 327 66 BoxColFill
1.00 0.71 0.76 C 1.000 1139 352 88 266 BoxColFill
1.00 0.71 0.76 C 1.000 1685 352 88 1018 BoxColFill
1.00 0.71 0.76 C 1.000 2231 352 88 1441 BoxColFill
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Apr 21 10:02:20 2016
+%%CreationDate: Tue Jun 14 18:36:20 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Apr 21 10:02:20 2016)
+ /CreationDate (Tue Jun 14 18:36:20 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Tradeoff distance %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:36:19 2015
+%%CreationDate: Tue Jun 14 18:05:51 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:36:19 2015)
+ /CreationDate (Tue Jun 14 18:05:51 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:36:20 2015
+%%CreationDate: Tue Jun 14 18:25:02 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:36:20 2015)
+ /CreationDate (Tue Jun 14 18:25:02 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0 EPSF-2.0
%%Title: energy.eps
-%%Creator: gnuplot 4.6 patchlevel 0
-%%CreationDate: Thu Feb 19 16:44:03 2015
+%%Creator: gnuplot 4.6 patchlevel 6
+%%CreationDate: Tue Jun 14 17:35:21 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 320 239
%%EndComments
} if
} def
%
-% Gnuplot Prolog Version 4.4 (August 2010)
+% Gnuplot Prolog Version 4.6 (September 2012)
%
%/SuppressPDFMark true def
%
/PatternFill {gsave /PFa [ 9 2 roll ] def
PFa 0 get PFa 2 get 2 div add PFa 1 get PFa 3 get 2 div add translate
PFa 2 get -2 div PFa 3 get -2 div PFa 2 get PFa 3 get Rec
- gsave 1 setgray fill grestore clip
+ TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse
+ clip
currentlinewidth 0.5 mul setlinewidth
/PFs PFa 2 get dup mul PFa 3 get dup mul add sqrt def
0 0 M PFa 5 get rotate PFs -2 div dup translate
SDict begin [
/Title (energy.eps)
/Subject (gnuplot plot)
- /Creator (gnuplot 4.6 patchlevel 0)
+ /Creator (gnuplot 4.6 patchlevel 6)
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Feb 19 16:44:03 2015)
+ /CreationDate (Tue Jun 14 17:35:21 2016)
/DOCINFO pdfmark
end
} ifelse
BackgroundColor 0 lt 3 1 roll 0 lt exch 0 lt or or not {BackgroundColor C 1.000 0 0 5400.00 3780.00 BoxColFill} if
1.000 UL
LTb
-602 588 M
+602 448 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
-518 588 M
+/vshift -63 def
+518 448 M
( 0) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 948 M
+602 825 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
-518 948 M
+/vshift -63 def
+518 825 M
( 5) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 1308 M
+602 1201 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 10) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 1668 M
+602 1578 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 15) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 2028 M
+602 1954 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 20) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 2387 M
+602 2331 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 25) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 2747 M
+602 2707 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 30) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 3107 M
+602 3084 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 35) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 3467 M
+602 3460 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 40) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-604 588 M
+604 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-604 448 M
+/vshift -63 def
+604 308 M
( 0) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-1088 588 M
+1088 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 16) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-1572 588 M
+1572 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 32) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-2057 588 M
+2057 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 48) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-2541 588 M
+2541 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 64) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-3026 588 M
+3026 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 80) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-3510 588 M
+3510 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 96) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-3995 588 M
+3995 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 112) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-4479 588 M
+4479 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 128) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-4964 588 M
+4964 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 144) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
1.000 UL
LTb
602 3611 N
-602 588 L
+602 448 L
4545 0 V
-0 3023 V
+0 3163 V
-4545 0 V
Z stroke
LCb setrgbcolor
/Helvetica findfont 210 scalefont setfont
-112 2099 M
+/vshift -70 def
+112 2029 M
currentpoint gsave translate -270 rotate 0 0 M
-(Energy saving) Cshow
+(Energy saving %) Cshow
grestore
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LTb
LCb setrgbcolor
/Helvetica findfont 210 scalefont setfont
-2974 98 M
+/vshift -70 def
+2874 98 M
( Number of nodes) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LTb
1.000 UP
/Helvetica findfont 190 scalefont setfont
-649 620 M
+/vshift -63 def
+649 482 M
( ) Lshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
% Begin plot #1
LT0
0.00 0.00 1.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
1259 3443 M
(CG) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.00 0.00 1.00 C 709 3443 M
298 0 V
-725 3047 M
-846 2838 L
-242 -357 V
-484 -730 V
-969 -578 V
-4479 876 L
-725 3047 Box
-846 2838 Box
-1088 2481 Box
-1572 1751 Box
-2541 1173 Box
-4479 876 Box
+725 3020 M
+846 2802 L
+242 -374 V
+484 -763 V
+969 -605 V
+4479 750 L
+725 3020 Box
+846 2802 Box
+1088 2428 Box
+1572 1665 Box
+2541 1060 Box
+4479 750 Box
858 3443 Box
% End plot #1
% Begin plot #2
LT0
1.00 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
1893 3443 M
(MG) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
1.00 0.00 0.00 C 1343 3443 M
298 0 V
-725 3134 M
-846 3020 L
-242 -93 V
-484 -256 V
-969 24 V
-4479 1940 L
-725 3134 TriD
-846 3020 TriD
-1088 2927 TriD
-1572 2671 TriD
-2541 2695 TriD
-4479 1940 TriD
+725 3112 M
+846 2992 L
+242 -97 V
+484 -268 V
+969 26 V
+4479 1862 L
+725 3112 TriD
+846 2992 TriD
+1088 2895 TriD
+1572 2627 TriD
+2541 2653 TriD
+4479 1862 TriD
1492 3443 TriD
% End plot #2
% Begin plot #3
LT0
0.50 0.00 0.50 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
2527 3443 M
(EP) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.50 0.00 0.50 C 1977 3443 M
298 0 V
-725 2519 M
-121 15 V
-242 -13 V
-484 9 V
-969 10 V
-1938 -2 V
-725 2519 Star
-846 2534 Star
-1088 2521 Star
-1572 2530 Star
-2541 2540 Star
-4479 2538 Star
+725 2468 M
+121 17 V
+242 -15 V
+484 10 V
+969 11 V
+1938 -3 V
+725 2468 Star
+846 2485 Star
+1088 2470 Star
+1572 2480 Star
+2541 2491 Star
+4479 2488 Star
2126 3443 Star
% End plot #3
% Begin plot #4
LT0
0.18 0.31 0.31 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
3161 3443 M
(LU) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.18 0.31 0.31 C 2611 3443 M
298 0 V
-725 3036 M
-846 2620 L
-242 41 V
-484 -374 V
-969 -118 V
-4479 1788 L
-725 3036 TriUF
-846 2620 TriUF
-1088 2661 TriUF
-1572 2287 TriUF
-2541 2169 TriUF
-4479 1788 TriUF
+725 3009 M
+846 2574 L
+242 43 V
+484 -391 V
+969 -124 V
+4479 1704 L
+725 3009 TriUF
+846 2574 TriUF
+1088 2617 TriUF
+1572 2226 TriUF
+2541 2102 TriUF
+4479 1704 TriUF
2760 3443 TriUF
% End plot #4
% Begin plot #5
LT0
0.18 0.55 0.34 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
3795 3443 M
(BT) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.18 0.55 0.34 C 3245 3443 M
298 0 V
-725 3133 M
-876 2915 L
-212 189 V
+725 3111 M
+876 2882 L
+212 198 V
606 -24 V
-847 -187 V
-4964 2257 L
-725 3133 BoxF
-876 2915 BoxF
-1088 3104 BoxF
-1694 3080 BoxF
-2541 2893 BoxF
-4964 2257 BoxF
+847 -196 V
+4964 2194 L
+725 3111 BoxF
+876 2882 BoxF
+1088 3080 BoxF
+1694 3056 BoxF
+2541 2860 BoxF
+4964 2194 BoxF
3394 3443 BoxF
% End plot #5
% Begin plot #6
LT0
0.85 0.65 0.13 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
4429 3443 M
(SP) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.85 0.65 0.13 C 3879 3443 M
298 0 V
-725 3123 M
-876 2368 L
-212 -161 V
+725 3100 M
+876 2311 L
+212 -169 V
606 17 V
-847 152 V
-2423 136 V
-725 3123 Circle
-876 2368 Circle
-1088 2207 Circle
-1694 2224 Circle
-2541 2376 Circle
-4964 2512 Circle
+847 160 V
+2423 142 V
+725 3100 Circle
+876 2311 Circle
+1088 2142 Circle
+1694 2159 Circle
+2541 2319 Circle
+4964 2461 Circle
4028 3443 Circle
% End plot #6
% Begin plot #7
LT0
0.55 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
5063 3443 M
(FT) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.55 0.00 0.00 C 4513 3443 M
298 0 V
-725 3148 M
-846 2821 L
-242 41 V
-484 -612 V
-969 -212 V
-4479 1513 L
-725 3148 CircleF
-846 2821 CircleF
-1088 2862 CircleF
-1572 2250 CircleF
-2541 2038 CircleF
-4479 1513 CircleF
+725 3127 M
+846 2784 L
+242 43 V
+484 -640 V
+969 -222 V
+4479 1416 L
+725 3127 CircleF
+846 2784 CircleF
+1088 2827 CircleF
+1572 2187 CircleF
+2541 1965 CircleF
+4479 1416 CircleF
4662 3443 CircleF
% End plot #7
1.000 UL
LTb
602 3611 N
-602 588 L
+602 448 L
4545 0 V
-0 3023 V
+0 3163 V
-4545 0 V
Z stroke
1.000 UP
grestore
end
showpage
+%%Trailer
+%%DocumentFonts: Helvetica
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:43:52 2015
+%%CreationDate: Tue Jun 14 17:56:53 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:43:52 2015)
+ /CreationDate (Tue Jun 14 17:56:53 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:40:02 2015
+%%CreationDate: Tue Jun 14 17:51:05 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:40:02 2015)
+ /CreationDate (Tue Jun 14 17:51:05 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 17:39:34 2015
+%%CreationDate: Tue Jun 14 18:01:58 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 17:39:34 2015)
+ /CreationDate (Tue Jun 14 18:01:58 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Energy saving %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:39:50 2015
+%%CreationDate: Tue Jun 14 17:54:22 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:39:50 2015)
+ /CreationDate (Tue Jun 14 17:54:22 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 17:36:36 2015
+%%CreationDate: Tue Jun 14 18:03:55 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 17:36:36 2015)
+ /CreationDate (Tue Jun 14 18:03:55 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)]
] -60.0 MCshow
grestore
LTb
%!PS-Adobe-2.0 EPSF-2.0
%%Title: per_deg.eps
-%%Creator: gnuplot 4.6 patchlevel 0
-%%CreationDate: Thu Feb 19 16:42:46 2015
+%%Creator: gnuplot 4.6 patchlevel 6
+%%CreationDate: Tue Jun 14 17:37:32 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 320 239
%%EndComments
} if
} def
%
-% Gnuplot Prolog Version 4.4 (August 2010)
+% Gnuplot Prolog Version 4.6 (September 2012)
%
%/SuppressPDFMark true def
%
/PatternFill {gsave /PFa [ 9 2 roll ] def
PFa 0 get PFa 2 get 2 div add PFa 1 get PFa 3 get 2 div add translate
PFa 2 get -2 div PFa 3 get -2 div PFa 2 get PFa 3 get Rec
- gsave 1 setgray fill grestore clip
+ TransparentPatterns {} {gsave 1 setgray fill grestore} ifelse
+ clip
currentlinewidth 0.5 mul setlinewidth
/PFs PFa 2 get dup mul PFa 3 get dup mul add sqrt def
0 0 M PFa 5 get rotate PFs -2 div dup translate
SDict begin [
/Title (per_deg.eps)
/Subject (gnuplot plot)
- /Creator (gnuplot 4.6 patchlevel 0)
+ /Creator (gnuplot 4.6 patchlevel 6)
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Feb 19 16:42:46 2015)
+ /CreationDate (Tue Jun 14 17:37:32 2016)
/DOCINFO pdfmark
end
} ifelse
BackgroundColor 0 lt 3 1 roll 0 lt exch 0 lt or or not {BackgroundColor C 1.000 0 0 5400.00 3780.00 BoxColFill} if
1.000 UL
LTb
-602 674 M
+602 538 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
-518 674 M
+/vshift -63 def
+518 538 M
( 0) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 1538 M
+602 1442 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 5) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 2402 M
+602 2346 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 10) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-602 3266 M
+602 3250 M
63 0 V
4482 0 R
-63 0 V
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
-4566 0 R
( 15) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-604 588 M
+604 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-604 448 M
+/vshift -63 def
+604 308 M
( 0) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-1088 588 M
+1088 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 16) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-1572 588 M
+1572 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 32) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-2057 588 M
+2057 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 48) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-2541 588 M
+2541 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 64) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-3026 588 M
+3026 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 80) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-3510 588 M
+3510 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 96) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-3995 588 M
+3995 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 112) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-4479 588 M
+4479 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 128) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
-4964 588 M
+4964 448 M
0 63 V
-0 2960 R
+0 3100 R
0 -63 V
/Helvetica findfont 190 scalefont setfont
-0 -3100 R
+/vshift -63 def
+0 -3240 R
( 144) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
1.000 UL
LTb
602 3611 N
-602 588 L
+602 448 L
4545 0 V
-0 3023 V
+0 3163 V
-4545 0 V
Z stroke
LCb setrgbcolor
/Helvetica findfont 210 scalefont setfont
-112 2099 M
+/vshift -70 def
+112 2029 M
currentpoint gsave translate -270 rotate 0 0 M
-(Performance degradation) Cshow
+(Performance degradation %) Cshow
grestore
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LTb
LCb setrgbcolor
/Helvetica findfont 210 scalefont setfont
-2974 98 M
+/vshift -70 def
+2874 98 M
( Number of nodes) Cshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LTb
1.000 UP
/Helvetica findfont 190 scalefont setfont
-649 752 M
+/vshift -63 def
+649 620 M
( ) Lshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
1.000 UL
LTb
% Begin plot #1
LT0
0.00 0.00 1.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
1259 3443 M
(CG) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.00 0.00 1.00 C 709 3443 M
298 0 V
-725 1835 M
-121 70 V
-242 423 V
-484 -737 V
-2541 965 L
-4479 865 L
-725 1835 Box
-846 1905 Box
-1088 2328 Box
-1572 1591 Box
-2541 965 Box
-4479 865 Box
+725 1753 M
+121 73 V
+242 443 V
+484 -772 V
+2541 843 L
+4479 738 L
+725 1753 Box
+846 1826 Box
+1088 2269 Box
+1572 1497 Box
+2541 843 Box
+4479 738 Box
858 3443 Box
% End plot #1
% Begin plot #2
LT0
1.00 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
1893 3443 M
(MG) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
1.00 0.00 0.00 C 1343 3443 M
298 0 V
-725 1424 M
-121 359 V
-242 -63 V
-484 483 V
-969 951 V
-4479 2469 L
-725 1424 TriD
-846 1783 TriD
-1088 1720 TriD
-1572 2203 TriD
-2541 3154 TriD
-4479 2469 TriD
+725 1322 M
+121 376 V
+242 -66 V
+484 506 V
+969 995 V
+4479 2416 L
+725 1322 TriD
+846 1698 TriD
+1088 1632 TriD
+1572 2138 TriD
+2541 3133 TriD
+4479 2416 TriD
1492 3443 TriD
% End plot #2
% Begin plot #3
LT0
0.50 0.00 0.50 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
2527 3443 M
(EP) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.50 0.00 0.50 C 1977 3443 M
298 0 V
-725 1199 M
-846 760 L
-242 12 V
-484 -94 V
-969 534 V
-4479 680 L
-725 1199 Star
-846 760 Star
-1088 772 Star
-1572 678 Star
-2541 1212 Star
-4479 680 Star
+725 1087 M
+846 628 L
+242 13 V
+484 -99 V
+969 559 V
+4479 544 L
+725 1087 Star
+846 628 Star
+1088 641 Star
+1572 542 Star
+2541 1101 Star
+4479 544 Star
2126 3443 Star
% End plot #3
% Begin plot #4
LT0
0.18 0.31 0.31 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
3161 3443 M
(LU) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.18 0.31 0.31 C 2611 3443 M
298 0 V
-725 1739 M
-846 676 L
-242 1132 V
-484 -709 V
-969 211 V
-4479 1082 L
-725 1739 TriUF
-846 676 TriUF
-1088 1808 TriUF
-1572 1099 TriUF
-2541 1310 TriUF
-4479 1082 TriUF
+725 1653 M
+846 540 L
+242 1184 V
+1572 983 L
+969 220 V
+4479 965 L
+725 1653 TriUF
+846 540 TriUF
+1088 1724 TriUF
+1572 983 TriUF
+2541 1203 TriUF
+4479 965 TriUF
2760 3443 TriUF
% End plot #4
% Begin plot #5
LT0
0.18 0.55 0.34 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
3795 3443 M
(BT) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.18 0.55 0.34 C 3245 3443 M
298 0 V
-725 2140 M
-876 2038 L
-212 -362 V
+725 2072 M
+876 1965 L
+212 -378 V
606 6 V
-847 1114 V
-4964 896 L
-725 2140 BoxF
-876 2038 BoxF
-1088 1676 BoxF
-1694 1682 BoxF
-2541 2796 BoxF
-4964 896 BoxF
+847 1166 V
+4964 770 L
+725 2072 BoxF
+876 1965 BoxF
+1088 1587 BoxF
+1694 1593 BoxF
+2541 2759 BoxF
+4964 770 BoxF
3394 3443 BoxF
% End plot #5
% Begin plot #6
LT0
0.85 0.65 0.13 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
4429 3443 M
(SP) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.85 0.65 0.13 C 3879 3443 M
298 0 V
-725 1658 M
-876 1155 L
-212 180 V
-606 -64 V
-2541 680 L
+725 1567 M
+876 1042 L
+212 187 V
+606 -66 V
+2541 544 L
2423 3 V
-725 1658 Circle
-876 1155 Circle
-1088 1335 Circle
-1694 1271 Circle
-2541 680 Circle
-4964 683 Circle
+725 1567 Circle
+876 1042 Circle
+1088 1229 Circle
+1694 1163 Circle
+2541 544 Circle
+4964 547 Circle
4028 3443 Circle
% End plot #6
% Begin plot #7
LT0
0.55 0.00 0.00 C LCb setrgbcolor
/Helvetica findfont 190 scalefont setfont
+/vshift -63 def
5063 3443 M
(FT) Rshow
/Helvetica findfont 140 scalefont setfont
+/vshift -46 def
LT0
0.55 0.00 0.00 C 4513 3443 M
298 0 V
-725 847 M
-121 267 V
-242 680 V
-484 -617 V
-969 339 V
-4479 1166 L
-725 847 CircleF
-846 1114 CircleF
-1088 1794 CircleF
-1572 1177 CircleF
-2541 1516 CircleF
-4479 1166 CircleF
+725 719 M
+846 998 L
+242 712 V
+484 -646 V
+969 355 V
+4479 1052 L
+725 719 CircleF
+846 998 CircleF
+1088 1710 CircleF
+1572 1064 CircleF
+2541 1419 CircleF
+4479 1052 CircleF
4662 3443 CircleF
% End plot #7
1.000 UL
LTb
602 3611 N
-602 588 L
+602 448 L
4545 0 V
-0 3023 V
+0 3163 V
-4545 0 V
Z stroke
1.000 UP
%!PS-Adobe-2.0
%%Creator: gnuplot 4.6 patchlevel 6
-%%CreationDate: Thu Oct 22 12:44:35 2015
+%%CreationDate: Tue Jun 14 17:58:31 2016
%%DocumentFonts: (atend)
%%BoundingBox: 50 50 554 410
%%Orientation: Portrait
/Author (afanfakh)
% /Producer (gnuplot)
% /Keywords ()
- /CreationDate (Thu Oct 22 12:44:35 2015)
+ /CreationDate (Tue Jun 14 17:58:31 2016)
/DOCINFO pdfmark
end
} ifelse
LCb setrgbcolor
88 1910 M
currentpoint gsave translate -270 rotate 0 0 moveto
-[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation)]
+[ [(Helvetica) 180.0 0.0 true true 0 (Performance degradation %)]
] -60.0 MCshow
grestore
LTb
\begin{center}%
-{\huge \textbf{Energy Consumption Optimization of Parallel Iterative Applications using
-CPU \\ Frequency Scaling}} \\[.5cm]
+{\huge \textbf{Energy Consumption Optimization of Parallel Applications with iterations using \\
+CPU Frequency Scaling}} \\[.5cm]
{\large By}\\[.5cm]%
-{\huge Ahmed Badri Muslim Fanfakh}\\[.5cm]%
+{\huge Ahmed Badri Muslim FANFAKH}\\[.5cm]%
\end{center}
\begin{center}%