X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy2.git/blobdiff_plain/10207d5367e3906a00aea279701d9444e0ab7aef..fe79318beef87d419e3952d9a1140372bd71bf4e:/mpi-energy2-extension/Heter_paper.tex diff --git a/mpi-energy2-extension/Heter_paper.tex b/mpi-energy2-extension/Heter_paper.tex index 0906133..a7aa66d 100644 --- a/mpi-energy2-extension/Heter_paper.tex +++ b/mpi-energy2-extension/Heter_paper.tex @@ -183,7 +183,7 @@ used in the method to optimize both the energy consumption and the performance of iterative methods, which is presented in the following sections. -\subsection{Energy model for heterogeneous platform} +\subsection{Energy model for heterogeneous grid platform} Many researchers~\cite{Malkowski_energy.efficient.high.performance.computing, Rauber_Analytical.Modeling.for.Energy,Zhuo_Energy.efficient.Dynamic.Task.Scheduling, @@ -831,7 +831,8 @@ communication ratio. Moreover, as shown in the figure \ref{fig:time_sen}, the ex are less by approximately double, linear speed-up, for most of the benchmarks comparing to the one site with 16 nodes scenario. This leads to increased the number of the critical nodes which any one of them may increased the overall the execution time of the benchmarks. The EP benchmarks is gives the bigger performance degradation ratio, because there is no -communications and no slack times in this benchmarks which their performance govern +communications and no slack times in this benchmarks which their performance controlled by +the computing powers of the nodes. The tradeoff between these scenarios can be computed as in the tradeoff function \ref{eq:max}. Figure \ref{fig:dist}, presents the tradeoff distance for all benchmarks over all platform scenarios. The one site scenario with 16 and 32 nodes had the best tradeoff distance @@ -841,6 +842,9 @@ which on average is up 26\%. Therefore, the tradeoff distance is related linear percentage. Finally, the best energy and performance tradeoff depends on the all of the following: 1) the computations to communications ratio when there is a communications and slack times, 2) the differences in computing powers between the computing nodes and 3) the differences in static and the dynamic powers of the nodes.} + + + \subsection{The experimental results of multicores clusters} \label{sec.res-mc} The grid'5000 clusters have different number of cores embedded in their nodes @@ -993,17 +997,175 @@ Scenario name & Cluster name & \begin{tabular}[c]{@{}c@ \label{fig:dist-mc} \end{figure} -\subsection{The results for different power consumption scenarios} -\label{sec.compare} +\subsection{The results of using different static power consumption scenarios} +\label{sec.pow_sen} +The static power consumption for one core of the computing node is the leakage power +consumption when this core is in the idle state. The node's idle state power value that measured +as in section \ref{sec.grid5000} had many power consumptions embedded such as +all cores static powers in addition to the power consumption of the other devices. So, the static power for one core +can't measured precisely. On the other hand, while the static power consumption of +one core representing the core's power when there is no any computation, thus +the majority of ratio of the total power consumption is depends on the dynamic power consumption. +Despite that, the static power consumption is becomes more important when the execution time +increased using DVFS. Therefore, the objective of this section is to verify the ability of the proposed +frequencies selecting algorithm when the static power consumption is changed. + +All the results obtained in the previous sections depend on the measured dynamic power +consumptions as in table \ref{table:grid5000}. Moreover, the static power consumption is assumed for +one core represents 20\% of the measured dynamic power of that core. +This assumption is extended in this section to taking into account others ratios for the static power consumption. +In addition to the previous ratio of the static power consumption, two other scenarios are used which +all of these scenarios can be denoted as follow: +\begin{itemize} +\item 10\% of static power scenario +\item 20\% of static power scenario +\item 30\% of static power scenario +\end{itemize} +These three scenarios represented the ratio of the static power consumption that can be computed from +the dynamic power consumption of the core. The NAS benchmarks of class D are executed over 16 nodes +in the Nancy site using three clusters: Graphite, Graphene and Griffon. As same as used before, the one site 16 nodes +platform scenario explained in the last experiments, as in table \ref{tab:sc}, is uses to run +the NAS benchmarks with these static power scenarios. + \begin{figure} + \centering + \includegraphics[scale=0.5]{fig/eng_pow.eps} + \caption{The energy saving percentages for NAS benchmarks of the three power scenario} + \label{fig:eng-pow} +\end{figure} + +\begin{figure} + \centering + \includegraphics[scale=0.5]{fig/per_pow.eps} + \caption{The performance degradation percentages for NAS benchmarks of the three power scenario} + \label{fig:per-pow} +\end{figure} -\subsection{The comparison of the proposed scaling algorithm } +\begin{figure} + \centering + \includegraphics[scale=0.5]{fig/dist_pow.eps} + \caption{The tradeoff distance for NAS benchmarks of the three power scenario} + \label{fig:dist-pow} +\end{figure} + +\begin{figure} + \centering + \includegraphics[scale=0.47]{fig/three_scenarios.pdf} + \caption{Comparing the selected frequencies of MG benchmarks for three static power scenarios} + \label{fig:fre-pow} +\end{figure} + +The energy saving percentages of NAS benchmarks with these three static power scenarios are presented +in figure \ref{fig:eng_sen}. This figure showed the 10\% of static power scenario +gives the biggest energy saving percentage comparing to 20\% and 30\% static power +scenario. When using smaller ratio of static power consumption, the proposed +frequencies selecting algorithm selects smaller frequencies, bigger scaling factors, +because the static energy consumption not increased significantly the overall energy +consumption. Therefore, more energy reduction can be achieved when the frequencies are scaled down. +For example figure \ref{fig:fre-pow}, illustrated that the proposed algorithm +proportionally scaled down the new computed frequencies with the overall predicted energy +consumption. The results of 30\% static power scenario gives the smallest energy saving percentages +because the new selected frequencies produced smaller ratio in the reduced energy consumption. +Furthermore, The proposed algorithm tries to limit selecting smaller frequencies that increased +the static energy consumption if the static power consumption is increased. +The performance degradation percentages are presented in the figure \ref{fig:per-pow}, +the 30\% of static power scenario had less performance degradation percentage, because +bigger frequencies was selected due to the big ratio in the static power consumption. +The inverse was happens in the 20\% and 30\% scenario, the algorithm was selected +biggest frequencies, smaller scaling factors, according to this increased in the static power ratios. +The tradoff distance for the NAS benchmarks with these three static powers scenarios +are presented in the figure \ref{fig:dist}. The results showed that the tradeoff +distance is the best when the 10\% of static power scenario is used, and this percentage +is decreased for the other two scenarios propositionally to their static power ratios. +In EP benchmarks, the results of energy saving, performance degradation and tradeoff +distance are showed small differences when the these static power scenarios were used, +because this benchmark not has communications. The proposed algorithm is selected +same frequencies in this benchmark when all these static power scenarios are used. +The small differences in the results are due to the static power is consumed during the computation +times side by side to the dynamic power consumption, knowing that the dynamic power consumption +representing the highest ratio in the total power consumption of the core, then any change in +the static power during these times have less affect on the overall energy consumption. While the +inverse was happens for the rest of the benchmarks which have the communications +that increased the static energy consumption linearly to the mount of communications +in these benchmarks. + + + +\subsection{The comparison of the proposed frequencies selecting algorithm } \label{sec.compare_EDP} +The tradeoff between the energy consumption and the performance of the parallel +application had significant importance in the domain of the research. +Many researchers, \cite{EDP_for_multi_processors,Energy_aware_application_scheduling,Exploring_Energy_Performance_TradeOffs}, +are optimized the tradeoff between the energy and performance using the energy and delay product, $EDP=energy \times delay$. +This model is used by Spiliopoulos et al. algorithm \cite{Spiliopoulos_Green.governors.Adaptive.DVFS}, +the objective is to selects the suitable frequencies that minimized EDP product for the multicores +architecture when DVFS is used. Moreover, their algorithm is applied online which synchronously optimized the energy consumption +and the execution time. Both energy consumption and execution time of a processor are predicted by the their algorithm. +In this section the proposed frequency selection algorithm, called Maxdist is compared with Spiliopoulos et al. algorithm, called EDP. +To make both of the algorithms follow the same direction and fairly comparing them, the same energy model, equation \ref{eq:energy} and +the execution time model, equation \ref{eq:perf}, are used in the prediction process to select the best vector of the frequencies. +In contrast, the proposed algorithm starts the search space from the lower bound computed as in equation the \ref{eq:Fint}. Also, the algorithm +stops the search process when reaching to the lower bound as mentioned before. While, the EDP algorithm is developed to start from the +same upper bound until it reach to the minimum available frequencies. Finally, resulting the algorithm is an exhaustive search algorithm that +test all possible frequencies, starting from the initial frequencies, and selecting those minimized the EDP products. + +Both algorithms were applied to NAS benchmarks class D over 16 nodes selected from grid'5000 clusters. +The participating computing nodes are distributed between two sites to had two different scenarios. +These scenarios are two sites and one site scenarios that explained previously. +The experimental results of the energy saving, performance degradation and tradeoff distance are +presented in the figures \ref{fig:edp-eng}, \ref{fig:edp-perf} and \ref{fig:edp-dist} respectively. + +In one site scenario the proposed frequencies selection algorithm outperform the EDP algorithm +in term of energy and performance for all of the benchmarks. While, the compassion results from the two sites scenario +showed that the proposed algorithm outperform EDP algorithm for all benchmarks except MG benchmark. +In case of MG benchmark the are small communications and bigger frequencies selected in EDP algorithm +decreased the performance degradation more than the frequencies selected by Maxdist algorithm. +While the energy saving percentage are higher for Maxdist algorithm. + +Generally, the proposed algorithm gives better results for all benchmarks because it +optimized the distance between the energy saving and the performance degradation. +Whereas, in EDP algorithm gives negative tradeoff for some benchmarks in the two sites scenarios. +These negative tradeoffs mean the performance degradation percentage is higher than energy saving percentage. +The higher positive value for tradeoff distance is mean the best energy and performance tradeoff is achieved synchronously, when +the energy saving percentage is much higher than the performance degradation percentage +The time complexity of the proposed algorithm is $O(N \cdot M \cdot F)$, where $N$ is the number of the clusters, +$M$ is the number of nodes and $F$ is the maximum number of available frequencies. The algorithm is selected +the best frequencies in small execution time, on average is equal to 0.01 $ms$ when it works over 32 nodes. +While the EDP algorithm was slower than Maxdist algorithm by ten times, where their execution time on average +takes 0.1 $ms$ to selects the suitable frequencies over 32 nodes. +The time complexity of this algorithm is $O(N^2 \cdot M^2 \cdot F)$. + + + + + + + +\begin{figure} + \centering + \includegraphics[scale=0.5]{fig/edp_eng} + \caption{Comparing of the energy saving for the proposed method with EDP method} + \label{fig:edp-eng} +\end{figure} + +\begin{figure} + \centering + \includegraphics[scale=0.5]{fig/edp_per} + \caption{Comparing of the performance degradation for the proposed method with EDP method} + \label{fig:edp-perf} +\end{figure} +\begin{figure} + \centering + \includegraphics[scale=0.5]{fig/edp_dist} + \caption{Comparing of the tradeoff distance for the proposed method with EDP method} + \label{fig:edp-dist} +\end{figure} + \section{Conclusion} \label{sec.concl}