different dynamic and static powers from the nodes of the other clusters,
noted as $\Pd[ij]$ and $\Ps[ij]$ respectively. Therefore, even if the distributed
message passing iterative application is load balanced, the computation time of each CPU $j$
-in cluster $i$ noted $\Tcp[ij]$ may be different and different frequency scaling factors may be
+in cluster $i$ noted $\Tcp[ij]$ may be slightly different due to the delay caused by the scheduler of the operating system. Therefore, different frequency scaling factors may be
computed in order to decrease the overall energy consumption of the application
-and reduce slack times. The communication time of a processor $j$ in cluster $i$ is noted as
+and reduce the slack times. The communication time of a processor $j$ in cluster $i$ is noted as
$\Tcm[ij]$ and could contain slack times when communicating with slower nodes,
see Figure~\ref{fig:heter}. Therefore, all nodes do not have equal
communication times. While the dynamic energy is computed according to the
of the slowest node and the computation time of the node $i$ as follows:
\begin{equation}
\label{eq:Scp}
- \Scp[ij] = \frac{ \mathop{\max_{i=1,\dots N}}_{j=1,\dots,M}(\Tcp[ij])} {\Tcp[ij]}
+ \Scp[ij] = \frac{ \mathop{\max\limits_{i=1,\dots N}}\limits_{j=1,\dots,M}(\Tcp[ij])} {\Tcp[ij]}
\end{equation}
Using the initial frequency scaling factors computed in (\ref{eq:Scp}), the
algorithm computes the initial frequencies for all nodes as a ratio between the
\centering
\begin{tabular}{|*{7}{c|}}
\hline
- Cluster & CPU & Max & Min & Diff. & no. of cores & dynamic power \\
- Name & model & Freq. & Freq. & Freq. & per CPU & of one core \\
- & & GHz & GHz & GHz & & \\
+ & & Max & Min & Diff. & & \\
+ Cluster & CPU & Freq. & Freq. & Freq. & No. of cores & Dynamic power \\
+ Name & model & GHz & GHz & GHz & per CPU & of one core \\
\hline
- & Intel & 2.3 & 1.2 & 0.1 & 6 & \np[W]{35} \\
- Taurus & Xeon & & & & & \\
- & E5-2630 & & & & & \\
+ & Intel & & & & & \\
+ Taurus & Xeon & 2.3 & 1.2 & 0.1 & 6 & \np[W]{35} \\
+ & E5-2630 & & & & & \\
\hline
- & Intel & 2.53 & 1.2 & 0.133 & 4 & \np[W]{23} \\
- Graphene & Xeon & & & & & \\
- & X3440 & & & & & \\
+ & Intel & & & & & \\
+ Graphene & Xeon & 2.53 & 1.2 & 0.133 & 4 & \np[W]{23} \\
+ & X3440 & & & & & \\
\hline
- & Intel & 2.5 & 2 & 0.5 & 4 & \np[W]{46} \\
- Griffon & Xeon & & & & & \\
- & L5420 & & & & & \\
+ & Intel & & & & & \\
+ Griffon & Xeon & 2.5 & 2 & 0.5 & 4 & \np[W]{46} \\
+ & L5420 & & & & & \\
\hline
- & Intel & 2 & 1.2 & 0.1 & 8 & \np[W]{35} \\
- Graphite & Xeon & & & & & \\
- & E5-2650 & & & & & \\
+ & Intel & & & & & \\
+ Graphite & Xeon & 2 & 1.2 & 0.1 & 8 & \np[W]{35} \\
+ & E5-2650 & & & & & \\
\hline
\end{tabular}
\label{table:grid5000}
The overall energy consumption of all the benchmarks solving the class D instance and
using the proposed frequency selection algorithm is measured
using the equation of the reduced energy consumption, Equation~\ref{eq:energy}. This model uses the measured dynamic power showed in Table~\ref{table:grid5000}
-
and the static
power is assumed to be equal to 20\% of the dynamic power. The execution
time is measured for all the benchmarks over these different scenarios.
More energy reduction can be gained when this ratio is big because it pushes the proposed scaling algorithm to select smaller frequencies that decrease the dynamic power consumption. These experiments also showed that the energy
consumption and the execution times of the EP and MG benchmarks do not change significantly over these two
scenarios because there are no or small communications. Contrary to EP and MG, the energy consumptions and the execution times of the rest of the benchmarks vary according to the communication times that are different from one scenario to the other.
-
+\begin{figure*}[t]
+ \centering
+ \subfloat[The energy saving of running NAS benchmarks over one core and multicores scenarios]{%
+ \includegraphics[width=.48\textwidth]{fig/eng_s_mc.eps}\label{fig:eng-s-mc}} \hspace{0.4cm}%
+ \subfloat[The performance degradation of running NAS benchmarks over one core and multicores scenarios
+ ]{%
+ \includegraphics[width=.48\textwidth]{fig/per_d_mc.eps}\label{fig:per-d-mc}}\hspace{0.4cm}%
+ \subfloat[The tradeoff distance of running NAS benchmarks over one core and multicores scenarios]{%
+ \includegraphics[width=.48\textwidth]{fig/dist_mc.eps}\label{fig:dist-mc}}
+ \label{fig:exp-res}
+ \caption{The experimental results of one core and multi-cores scenarios}
+\end{figure*}
The energy saving percentages of all NAS benchmarks running over these two scenarios are presented in Figure~\ref{fig:eng-s-mc}.
The figure shows that the energy saving percentages in the one
-\begin{figure*}[t]
- \centering
- \subfloat[The energy saving of running NAS benchmarks over one core and multicores scenarios]{%
- \includegraphics[width=.48\textwidth]{fig/eng_s_mc.eps}\label{fig:eng-s-mc}} \hspace{0.4cm}%
- \subfloat[The performance degradation of running NAS benchmarks over one core and multicores scenarios
- ]{%
- \includegraphics[width=.48\textwidth]{fig/per_d_mc.eps}\label{fig:per-d-mc}}\hspace{0.4cm}%
- \subfloat[The tradeoff distance of running NAS benchmarks over one core and multicores scenarios]{%
- \includegraphics[width=.48\textwidth]{fig/dist_mc.eps}\label{fig:dist-mc}}
- \label{fig:exp-res}
- \caption{The experimental results of one core and multi-cores scenarios}
-\end{figure*}
+
In these experiments, class D of the NAS parallel benchmarks are executed over the Nancy site. 16 computing nodes from the three clusters, Graphite, Graphene and Griffon, where used in this experiment.
-\begin{figure}
+\begin{figure*}[t]
\centering
\subfloat[The energy saving percentages for the nodes executing the NAS benchmarks over the three power scenarios]{%
\includegraphics[width=.48\textwidth]{fig/eng_pow.eps}\label{fig:eng-pow}} \hspace{0.4cm}%
\includegraphics[width=.48\textwidth]{fig/dist_pow.eps}\label{fig:dist-pow}}
\label{fig:exp-pow}
\caption{The experimental results of different static power scenarios}
-\end{figure}
+\end{figure*}
In the near future, we would like to develop a similar method that is adapted to
asynchronous iterative applications where iterations are not synchronized and communications are overlapped with computations.
- The development of
-such a method might require a new energy model because the
+The development of such a method might require a new energy model because the
number of iterations is not known in advance and depends on
the global convergence of the iterative system.