X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/ThesisAhmed.git/blobdiff_plain/46e8710e38caa11a5ea1ac4d5878d02ebe1cb610..7b4f492a307bee60f4dfc90adb0aee0bad5f48ed:/CHAPITRE_03.tex diff --git a/CHAPITRE_03.tex b/CHAPITRE_03.tex index 27d95ec..953bcfc 100644 --- a/CHAPITRE_03.tex +++ b/CHAPITRE_03.tex @@ -68,16 +68,16 @@ clusters (heterogeneous CPUs) and grid platforms are presented. They select the frequencies that try to give the best trade-off between energy saving and performance degradation, for each node - computing the synchronous message passing iterative application. These algorithms have a small + computing the synchronous message passing application with iterations. These algorithms have a small overhead and work without training or profiling. They use new energy models - for message passing iterative synchronous applications running on both the heterogeneous + for message passing synchronous applications with iterations running on both the heterogeneous local cluster and the grid platform. The first proposed algorithm for a heterogeneous local cluster was evaluated on the SimGrid simulator while running the class C of the NAS parallel benchmarks. The experiments conducted over 8 heterogeneous nodes show that it reduces on average the energy consumption by 29.8\% while limiting the performance degradation to 3.8\%. The second proposed algorithm for a grid platform was evaluated on the Grid5000 testbed platform while running the class D of the NAS parallel benchmarks. - The experiments were run on 16 nodes, distributed on three clusters, and show that it reduces + The experiments were run on 16 nodes, distributed on three clusters, and show that the algorithm reduces on average the energy consumption by 30\% while the performance is on average only degraded by 3.2\%. Finally, both algorithms were compared to the EDP method. The comparison @@ -96,13 +96,12 @@ Section~\ref{ch3:3} shows the energy and performance models in addition to the f selecting algorithm of synchronous message passing programs running over a grid platform. Section~\ref{ch3:4} presents the results of applying the algorithm on the NAS parallel benchmarks (class D) and executing them on the Grid'5000 testbed. -The algorithm is also evaluated over multi-core architectures and over three different power scenarios. Moreover, section~\ref{ch3:4}, shows the comparison results between the proposed method and the EDP method. +The algorithm is also evaluated over multi-core architectures and over three different power scenarios. Moreover, Section~\ref{ch3:4}, shows the comparison results between the proposed method and the EDP method. Finally, in Section~\ref{ch3:concl} the chapter ends with a summary. \section{Related works} \label{ch3:relwork} -As same as in CPUs, DVFS is also allowed in GPUs to reduce their energy consumption. The process of selecting the appropriate frequency for a processor to satisfy some objectives, while taking into account all the constraints, is not a trivial operation. Many researchers used different @@ -119,7 +118,7 @@ sequential, parallel or distributed architecture, homogeneous or heterogeneous platform, synchronous or asynchronous application, \dots{} In this chapter, we are interested in reducing the energy consumption when running a message passing -iterative synchronous applications over a heterogeneous platform. Some +synchronous applications with iterations over a heterogeneous platform. Some works have already been done for such platforms which can be classified into two types of heterogeneous platforms: \begin{itemize} @@ -166,8 +165,8 @@ while respecting the given time constraint. This approach had considerable overhead. In contrast to the above described works, the work of this chapter presents the following contributions: \begin{enumerate} -\item two new energy and two performance models for message passing iterative - synchronous applications running over a heterogeneous local cluster and a grid platform. +\item two new energy and two performance models for message passing + synchronous applications with iterations running over a heterogeneous local cluster and a grid platform. All the models take into account the communications and the slack times. The models can predict the energy consumption and the execution time of the application. @@ -175,18 +174,18 @@ following contributions: local cluster and a grid platform. The algorithms have a very small overhead and do not need any training or profiling. They use a new optimization function which simultaneously maximizes the performance and minimizes the energy consumption - of a message passing iterative synchronous application. + of a message passing synchronous application with iterations. \end{enumerate} -\section[The energy optimization of a heterogeneous cluster]{The energy optimization of parallel iterative applications running over local heterogeneous +\section[The energy optimization of a heterogeneous cluster]{The energy optimization of parallel applications with iterations running over local heterogeneous clusters} \label{ch3:1} -\subsection{The execution time of message passing distributed iterative - applications on a heterogeneous local cluster} +\subsection{The execution time of message passing distributed + applications with iterations on a heterogeneous local cluster} \label{ch3:1:1} In this section, we are interested in reducing the energy consumption of message -passing distributed iterative synchronous applications running over heterogeneous local clusters. +passing distributed synchronous applications with iterations running over heterogeneous local clusters. In this work, a heterogeneous local cluster is defined as a collection of heterogeneous computing nodes interconnected via a high speed homogeneous network. Therefore, the nodes may have different characteristics such as computing @@ -200,7 +199,7 @@ have the same network bandwidth and latency. \label{fig:task-heter} \end{figure} -The overall execution time of a distributed iterative synchronous application +The overall execution time of a distributed synchronous application with iterations over a heterogeneous local cluster consists of the sum of the computation time and the communication time for every iteration on a node. However, due to the heterogeneous computation power of the computing nodes, slack times may occur @@ -227,8 +226,8 @@ Since in a heterogeneous cluster the nodes may have different characteristics, especially different frequency gears, when applying DVFS operations on these nodes, they may get different scaling factors represented by a scaling vector: $(S_1, S_2,\dots, S_N)$ where $S_i$ is the scaling factor of processor $i$. To -be able to predict the execution time of message passing synchronous iterative -applications running over a heterogeneous local cluster, for different vectors of +be able to predict the execution time of message passing synchronous +applications with iterations running over a heterogeneous local cluster, for different vectors of scaling factors, the communication time and the computation time for all the tasks must be measured during the first iteration before applying any DVFS operation. Then the execution time for one iteration of the application with any @@ -242,27 +241,27 @@ where $\TcpOld[i]$ is the computation time of processor $i$ during the first iteration. The model computes the maximum computation time with scaling factor from each node added to the communication time of the slowest node. It means only the communication time without any slack time is taken into -account. Therefore, the execution time of the iterative application is equal to +account. Therefore, the execution time of the application with iterations is equal to the execution time of one iteration as in (\ref{eq:perf_heter}) multiplied by the number of iterations of that application. This prediction model is improved from the model that predicts the execution time of message passing distributed applications for homogeneous -architectures presented in chapter \ref{ch2} section \ref{ch2:3}. The execution time prediction model is +architectures presented in Chapter \ref{ch2} Section \ref{ch2:3}. The execution time prediction model is used in the method that optimizes both the energy consumption and the performance -of iterative methods, which is presented in the following sections. +of parallel application with iterations, which is presented in the following sections. \subsection{Energy model for heterogeneous local cluster} \label{ch3:1:2} -In chapter \ref{ch2}, the dynamic and the static energy consumption of a +In Chapter \ref{ch2}, the dynamic and the static energy consumption of a processor is computed according to Equations \ref{eq:Edyn_new} and \ref{eq:Estatic_new} respectively. Then, the total energy consumption of a processor is the sum of these two metrics. Therefore, the overall energy consumption for the parallel tasks over a parallel cluster is the summation of the energies consumed by all the processors. In the considered heterogeneous platform, each processor $i$ may have different dynamic and static powers, noted as $\Pd[i]$ and $\Ps[i]$ -respectively. Therefore, even if the distributed message passing iterative -application is load balanced, the computation time of each CPU $i$ noted +respectively. Therefore, even if the distributed message passing +application with iterations is load balanced, the computation time of each CPU $i$ noted $\Tcp[i]$ may be different and different frequency scaling factors may be computed in order to decrease the overall energy consumption of the application and reduce the slack times. The communication time of a processor $i$ is noted as @@ -273,7 +272,7 @@ frequency scaling factor and the dynamic power of each node as in (\ref{eq:Edyn_new}), the static energy is computed as the sum of the execution time of one iteration as in \ref{eq:perf_heter} multiplied by the static power of each processor. The overall energy consumption of a message passing distributed application executed over a -heterogeneous cluster during one iteration is the summation of all the dynamic and +heterogeneous cluster during one iteration is the summation of the dynamic and static energies for all the processors. It is computed as follows: \begin{equation} \label{eq:energy-heter} @@ -286,7 +285,7 @@ Reducing the frequencies of the processors according to the vector of scaling factors $(S_1, S_2,\dots, S_N)$ may degrade the performance of the application and thus, increase the consumed static energy because the execution time is increased~\cite{ref78}. The overall energy consumption -for an iterative application can be measured by measuring the energy +for an application with iterations can be measured by measuring the energy consumption for one iteration as in (\ref{eq:energy-heter}) multiplied by the number of iterations of that application. @@ -303,14 +302,14 @@ the application might not be the optimal one. It is not trivial to select the appropriate frequency scaling factor for each processor while considering the characteristics of each processor (computation power, range of frequencies, dynamic and static powers) and the task it is executing (computation/communication -ratio). In chapter~\ref{ch2}, we proposed a method that selects the optimal +ratio). In Chapter~\ref{ch2}, we proposed a method that selects the optimal frequency scaling factor for a homogeneous cluster executing a message passing -iterative synchronous application while giving the best trade-off between the + synchronous application with iterations while giving the best trade-off between the energy consumption and the performance for such applications. In this section, this optimization method is improved while considering a heterogeneous clusters. As described before, the relation between the energy consumption and the execution time for an -application is complex and nonlinear. Thus, to find the trade-off relation between the energy consumption computed in Equation \ref{eq:energy-heter} and the performance with Equation \ref{eq:perf_heter} for the iterative message passing applications, first we need to normalize both term as follows: +application is complex and nonlinear. Thus, to find the trade-off relation between the energy consumption computed in Equation \ref{eq:energy-heter} and the performance with Equation \ref{eq:perf_heter} for the message passing applications with iterations, first we need to normalize both terms as follows: \begin{equation} @@ -334,7 +333,7 @@ application is complex and nonlinear. Thus, to find the trade-off relation betwe \begin{figure}[!t] \centering \includegraphics[width=.7\textwidth]{fig/ch3/heter} - \caption{The energy and performance relation in Heterogeneous cluster} + \caption{The energy and performance relation in heterogeneous cluster} \label{fig:rel-heter} \end{figure} @@ -431,8 +430,8 @@ for the node $i$. Then, the set of scaling factors that maximizes the objective In this section, Algorithm~\ref{HSA} is presented. It selects the frequency scaling factors vector that gives the best trade-off between minimizing the energy consumption and maximizing the performance of a message passing -synchronous iterative application executed on a heterogeneous local cluster. It works -online during the execution time of the iterative message passing program. It +synchronous application with iterations executed on a heterogeneous local cluster. It works +online during the execution time of the message passing program with iterations. It uses information gathered during the first iteration such as the computation time and the communication time in one iteration for each node. The algorithm is executed after the first iteration and returns a vector of optimal frequency @@ -440,7 +439,7 @@ scaling factors that satisfies the objective function (\ref{eq:max-heter}). The program applies DVFS operations to change the frequencies of the CPUs according to the computed scaling factors. This algorithm is called just once during the execution of the program. Algorithm~\ref{dvfs-heter} shows where and when the proposed -scaling algorithm is called in the iterative MPI program. +scaling algorithm is called in the MPI program with iterations. \begin{figure}[!t] \centering @@ -578,10 +577,10 @@ specifications of real Intel processors. The heterogeneous cluster had up to 144 nodes and had nodes from the four types in equal proportions, for example if a benchmark was executed on 8 nodes, 2 nodes from each type were used. Since the constructors of CPUs do not specify the dynamic and the static power of their -CPUs, for each type of node they were chosen proportionally to its computing +CPUs, for each type of node they were chosen proportionally to their computing powers (FLOPS). The dynamic power corresponds to 80\% of the overall power consumption while executing at the higher frequency and the -remaining 20\% is the static power. The same assumption was made in chapter \ref{ch2} and +remaining 20\% is the static power. The same assumption was made in Chapter \ref{ch2} and \cite{ref3}. Finally, These nodes were connected via an Ethernet network with 1 \textit{Gbit/s} bandwidth. @@ -824,7 +823,7 @@ more dynamic power is consumed when nodes are running on their maximum frequencies, thus, scaling down the frequency of the nodes results in higher energy savings than in the 70\%-30\% scenario. On the other hand, the performance degradation percentage is smaller in the 70\%-30\% -scenario compared to the 90\%-\%10 scenario. This is due to the +scenario compared to the 90\%-10\% scenario. This is due to the higher static power percentage in the first scenario which makes it more relevant in the overall consumed energy. Indeed, the static energy is related to the execution time and if the performance is degraded the amount of consumed @@ -844,14 +843,14 @@ lowering the frequency of some processors. On the other hand, the energy saving decreases when the 70\%-30\% scenario is used because the dynamic energy is less relevant in the overall consumed energy and lowering the frequency does not return big energy savings. Moreover, the average of the -performance degradation is decreased when using a higher ratio for static power +performance degradation is decreased when using a higher ratio for the static power (e.g. 70\%-30\% scenario and 80\%-20\% scenario). Since the proposed algorithm optimizes the energy consumption when using a higher ratio for the dynamic power, the algorithm selects bigger frequency scaling factors that results in more energy saving but degrade the performance, for example see Figure~\ref{fig:powers-heter} (b). The opposite happens when using a higher ratio for the static power, the algorithm proportionally selects smaller -scaling values which result in less energy saving but also less performance +scaling values which results in less energy saving but also less performance degradation. \begin{table}[!t] @@ -989,13 +988,13 @@ the energy reduction to performance trade-off, see Figure~\ref{fig:compare_ because it maximizes the distance between the energy saving and the performance degradation values while giving the same weight for both metrics. -\section[The energy optimization of grid]{The energy optimization of parallel iterative applications running over grids} +\section[The energy optimization of grid]{The energy optimization of parallel applications with iterations running over grids} \label{ch3:3} \subsection{The energy and performance models of grid platform} \label{ch3:3:1} In this section, we are interested in reducing the energy consumption of message -passing iterative synchronous applications running over +passing applications with synchronous iterations running over heterogeneous grid platforms. A heterogeneous grid platform could be defined as a collection of heterogeneous computing clusters interconnected via a long distance network which has a lower bandwidth and a higher latency than the local networks of the clusters. Each computing cluster in the grid is composed of homogeneous nodes that are connected together via a high speed network. However, nodes from distinct clusters may have different characteristics such as computing power (FLOPS), energy consumption, CPU's frequency range, network bandwidth and latency. @@ -1003,9 +1002,9 @@ and a higher latency than the local networks of the clusters. Each computing clu Since in a heterogeneous grid each cluster has different characteristics, when applying DVFS operations on the nodes of these clusters, they may get different scaling factors represented by a scaling vector: -$(S_{11}, S_{12},\dots, S_{NM})$ where $S_{ij}$ is the scaling factor of processor $j$ in cluster $i$ . To -be able to predict the execution time of message passing synchronous iterative -applications running over a heterogeneous grid, for different vectors of +$(S_{11}, S_{12},\dots, S_{NM})$ where $S_{ij}$ is the scaling factor of processor $j$ in cluster $i$. To +be able to predict the execution time of message passing +applications with synchronous iterations running over a heterogeneous grid, for different vectors of scaling factors, the communication time and the computation time for all the tasks must be measured during the first iteration before applying any DVFS operation. Then the execution time for one iteration of the application with any @@ -1024,7 +1023,7 @@ first iteration. The execution time for one iteration is equal to the sum of t and the slowest communication time without slack time during one iteration. The latter is equal to the communication time of the slowest node in the slowest cluster $h$. It means that only the communication time without any slack time is taken into account. -Therefore, the execution time of the iterative application is equal to +Therefore, the execution time of the parallel application with iterations is equal to the execution time of one iteration as in Equation (\ref{eq:perf-grid}) multiplied by the number of iterations of that application. @@ -1032,7 +1031,7 @@ number of iterations of that application. In the considered heterogeneous grid platform, each node $j$ in cluster $i$ may have different dynamic and static powers from the nodes of the other clusters, noted as $\Pd[ij]$ and $\Ps[ij]$ respectively. Therefore, even if the distributed -message passing iterative application is load balanced, the computation time of each CPU $j$ +message passing application with iterations is load balanced, the computation time of each CPU $j$ in cluster $i$ noted $\Tcp[ij]$ may be different and different frequency scaling factors may be computed in order to decrease the overall energy consumption of the application and reduce slack times. The communication time of a processor $j$ in cluster $i$ is noted as @@ -1055,7 +1054,7 @@ static energies for $M_i$ processors in $N$ clusters. It is computed as follows To optimize both of the energy consumption model computed by \ref{eq:energy-grid} and the performance model computed by \ref{eq:perf-grid}, -they must be normalized as in \ref{eq:enorm-heter} and \ref{eq:pnorm-heter} Equations respectively. +they must be normalized as in Equation \ref{eq:enorm-heter} and Equation \ref{eq:pnorm-heter} respectively. While the original energy consumption is the consumed energy with the maximum frequency for all the nodes computed as follows: @@ -1149,7 +1148,7 @@ of scaling factors that satisfies (\ref{eq:max-grid}) can be selected. \begin{figure}[!t] \centering \includegraphics[scale=0.7]{fig/ch3/init_freq} - \caption{Selecting the initial frequencies in grid} + \caption{Selecting the initial frequencies in the grid architecture} \label{fig:st_freq-grid} \end{figure} @@ -1165,9 +1164,9 @@ In this section, the scaling factors selection algorithm for a grid, Algorithm~\ is presented. It selects the vector of frequency scaling factors that gives the best trade-off between minimizing the energy consumption and maximizing the performance of a message passing -synchronous iterative application executed on a grid. + application with synchronous iterations executed on a grid. It is similar to the frequency selection algorithm for heterogeneous -local clusters presented in section \ref{ch3:1:4}. +local clusters presented in Section \ref{ch3:1:4}. The value of the initial frequency scaling factor for each node is inversely proportional to its computation time that was gathered in the first iteration. The initial @@ -1237,7 +1236,7 @@ $\lbrace\Theta_1,\Theta_2\rbrace$ is the time interval for the measured idle po Therefore, the dynamic power of one core is computed as the difference between the maximum measured value in maximum powers vector and the minimum measured value in the idle powers vector. -On the other hand, the static power consumption by one core is a part of the measured idle power consumption of the node. Since in Grid'5000 there is no way to measure precisely the consumed static power and it was assumed, as in sections \ref{ch3:2} and \ref{ch2:6}, that the static power represents a ratio of the dynamic power, the value of the static power is assumed to be equal to 20\% of the dynamic power consumption of the core. +On the other hand, the static power consumption by one core is a part of the measured idle power consumption of the node. Since in Grid'5000 there is no way to measure precisely the consumed static power and it was assumed, as in Sections \ref{ch3:2} and \ref{ch2:6}, that the static power represents a ratio of the dynamic power, the value of the static power is assumed to be equal to 20\% of the dynamic power consumption of the core. In the experiments presented in the following sections, two sites of Grid'5000 were used, Lyon and Nancy sites. These two sites have in total seven different clusters as shown in Figure~\ref{fig:grid5000}. @@ -1297,7 +1296,6 @@ The benchmarks have seven different classes, S, W, A, B, C, D and E, that repres \end{tabular} \label{table:grid5000-1} \end{table} -CPUs \subsection{The experimental results of the scaling algorithm on a Grid} @@ -1387,7 +1385,7 @@ results in a lower energy consumption. Indeed, the dynamic consumed power is exponentially related to the CPU's frequency value. On the other hand, the increase in the number of computing nodes can increase the communication times and thus produces less energy saving depending on the benchmarks being executed. The results of the benchmarks CG, MG, BT and FT show more -energy saving percentage in the one site scenario when executed over 16 nodes than on 32 nodes. LU and SP consume more energy with 16 nodes than with 32 node on one site because their computations to communications ratio is not affected by the increase of the number of local communications. +energy saving percentage in the one site scenario when executed over 16 nodes than on 32 nodes. LU and SP consume more energy with 16 nodes than with 32 nodes on one site because their computations to communications ratio is not affected by the increase of the number of local communications. \begin{figure}[!h] \centering \centering @@ -1400,21 +1398,21 @@ energy saving percentage in the one site scenario when executed over 16 nodes th \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/eng_s.eps} -\caption{The energy reduction while executing the NAS benchmarks over different scenarios} +\caption{The energy reduction percentages while executing the NAS benchmarks over different scenarios} \label{fig:eng_s} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/per_d.eps} -\caption{The performance degradation of the NAS benchmarks over different scenarios} +\caption{The performance degradation percentages of the NAS benchmarks over different scenarios} \label{fig:per_d} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/dist.eps} -\caption{The trade-off distance between the energy reduction and the performance of the NAS benchmarks +\caption{The trade-off distance percentages between the energy reduction and the performance of the NAS benchmarks over different scenarios} \label{fig:dist-grid} \end{figure*} @@ -1486,13 +1484,13 @@ Scenario name & Cluster name & Nodes per cluster & \begin{figure}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/time.eps} - \caption{The execution times of NAS benchmarks running over the one core and the multi-core scenarios} + \caption{The execution times of the NAS benchmarks running over the one core and the multi-core scenarios} \label{fig:time-mc} \end{figure} \begin{figure}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/eng_con.eps} - \caption{The energy consumptions and execution times of NAS benchmarks over one core and multi-core per node architectures} + \caption{The energy consumptions and execution times of the NAS benchmarks over one core and multi-core per node architectures} \label{fig:eng-cons-mc} \end{figure} @@ -1518,21 +1516,21 @@ scenarios because there are no or small communications. Contrary to EP and MG, \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/eng_s_mc.eps} - \caption{The energy saving of running NAS benchmarks over one core and multi-core scenarios} + \caption{The energy saving percentages of running NAS benchmarks over one core and multi-core scenarios} \label{fig:eng-s-mc} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/per_d_mc.eps} - \caption{The performance degradation of running NAS benchmarks over one core and multi-core scenarios} + \caption{The performance degradation percentages of running NAS benchmarks over one core and multi-core scenarios} \label{fig:per-d-mc} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/dist_mc.eps} - \caption{The trade-off distance of running NAS benchmarks over one core and multi-core scenarios} + \caption{The trade-off distance percentages of running NAS benchmarks over one core and multi-core scenarios} \label{fig:dist-mc} \end{figure*} The energy saving percentages of all the NAS benchmarks running over these two scenarios are presented in Figure~\ref{fig:eng-s-mc}. @@ -1577,7 +1575,7 @@ In these experiments, the class D of the NAS parallel benchmarks were executed o \begin{figure}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/dist_pow.eps} - \caption{The trade-off distance between the energy reduction and the performance of the NAS benchmarks over the three power scenarios} + \caption{The trade-off distance percentages between the energy reduction and the performance of the NAS benchmarks over the three power scenarios} \label{fig:dist-pow} \end{figure} @@ -1632,21 +1630,21 @@ presented in Figures~\ref{fig:edp-eng}, \ref{fig:edp-perf} and \ref{fig:edp-dis \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/edp_eng} -\caption{The energy reduction induced by the Maxdist method and the EDP method} +\caption{The energy reduction percentages induced by the Maxdist method and the EDP method} \label{fig:edp-eng} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/edp_per} -\caption{The performance degradation induced by the Maxdist method and the EDP method} +\caption{The performance degradation percentages induced by the Maxdist method and the EDP method} \label{fig:edp-perf} \end{figure*} \begin{figure*}[!h] \centering \includegraphics[width=.7\textwidth]{fig/ch3/edp_dist} -\caption{The trade-off distance between the energy consumption reduction and the performance for the Maxdist method and the EDP method} +\caption{The trade-off distance percentages between the energy consumption reduction and the performance for the Maxdist method and the EDP method} \label{fig:edp-dist} \end{figure*} @@ -1658,7 +1656,7 @@ Moreover, the proposed scaling algorithm gives the same weight for these two met Whereas, the EDP algorithm gives sometimes negative trade-off values for some benchmarks in the two sites scenarios. These negative trade-off values mean that the performance degradation percentage is higher than the energy saving percentage. The high positive values of the trade-off distance percentage mean that the energy saving percentage is much higher than the performance degradation percentage. -The complexity of both algoriths, Maxdist and EDP, are of order $O(N \cdot M_i \cdot F_j)$ and +The complexity of both algorithms, Maxdist and EDP, are of order $O(N \cdot M_i \cdot F_j)$ and $O(N \cdot M_i \cdot F_j^2)$ respectively, where $N$ is the number of the clusters, $M_i$ is the number of nodes and $F_j$ is the maximum number of available frequencies of node $j$. When Maxdist is applied to a benchmark that is being executed over 32 nodes distributed between Nancy and Lyon sites, it takes on average $0.01$ $ms$ to compute the best frequencies while the EDP method is on average ten times slower over the same architecture. @@ -1669,8 +1667,8 @@ In this chapter, two new online frequency scaling factors selecting algorithms maximum distance (optimal trade-off) between the predicted energy and the predicted performance curves for a heterogeneous cluster and grid. Both algorithms use a new energy models for measuring and predicting the energy consumption of message passing -iterative applications running over a heterogeneous local cluster and a grid platform. -Firstly, the proposed scaling factors selection algorithm for a heterogeneous local cluster is applied to the class C of NAS parallel benchmarks and simulated by SimGrid. + applications with iterations running over a heterogeneous local cluster and a grid platform. +Firstly, the proposed scaling factors selection algorithm for a heterogeneous local cluster is applied to the class C of the NAS parallel benchmarks and simulated by SimGrid. The results of the simulations showed that the algorithm on average reduces by 29.8\% the energy consumption of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance by 3.8\%. The algorithm also selects different scaling factors according to the percentage of the computing and communication times, and according to the