-\begin{algorithm}
- \begin{algorithmic}[1]
- % \footnotesize
- \Require ~
- \begin{description}
- \item[{$\Tcp[i]$}] array of all computation times for all nodes during one iteration and with highest frequency.
- \item[{$\Tcm[i]$}] array of all communication times for all nodes during one iteration and with highest frequency.
- \item[{$\Fmax[i]$}] array of the maximum frequencies for all nodes.
- \item[{$\Pd[i]$}] array of the dynamic powers for all nodes.
- \item[{$\Ps[i]$}] array of the static powers for all nodes.
- \item[{$\Fdiff[i]$}] array of the difference between two successive frequencies for all nodes.
- \end{description}
- \Ensure $\Sopt[1],\Sopt[2] \dots, \Sopt[N]$ is a vector of optimal scaling factors
-
- \State $\Scp[i] \gets \frac{\max_{i=1,2,\dots,N}(\Tcp[i])}{\Tcp[i]} $
- \State $F_{i} \gets \frac{\Fmax[i]}{\Scp[i]},~{i=1,2,\cdots,N}$
- \State Round the computed initial frequencies $F_i$ to the closest one available in each node.
- \If{(not the first frequency)}
- \State $F_i \gets F_i+\Fdiff[i],~i=1,\dots,N.$
- \EndIf
- \State $\Told \gets \max_{i=1,\dots,N} (\Tcp[i]+\Tcm[i])$
- % \State $\Eoriginal \gets \sum_{i=1}^{N}{( \Pd[i] \cdot \Tcp[i])} +\sum_{i=1}^{N} {(\Ps[i] \cdot \Told)}$
- \State $\Eoriginal \gets \sum_{i=1}^{N}{( \Pd[i] \cdot \Tcp[i] + \Ps[i] \cdot \Told)}$
- \State $\Sopt[i] \gets 1,~i=1,\dots,N. $
- \State $\Dist \gets 0 $
- \While {(all nodes not reach their minimum frequency)}
- \If{(not the last freq. \textbf{and} not the slowest node)}
- \State $F_i \gets F_i - \Fdiff[i],~i=1,\dots,N.$
- \State $S_i \gets \frac{\Fmax[i]}{F_i},~i=1,\dots,N.$
- \EndIf
- \State $\Tnew \gets \max_{i=1,\dots,N} (\Tcp[i] \cdot S_{i}) + \MinTcm $
-% \State $\Ereduced \gets \sum_{i=1}^{N}{(S_i^{-2} \cdot \Pd[i] \cdot \Tcp[i])} + \sum_{i=1}^{N} {(\Ps[i] \cdot \rlap{\Tnew)}} $
- \State $\Ereduced \gets \sum_{i=1}^{N}{(S_i^{-2} \cdot \Pd[i] \cdot \Tcp[i] + \Ps[i] \cdot \rlap{\Tnew)}} $
- \State $\Pnorm \gets \frac{\Told}{\Tnew}$
- \State $\Enorm\gets \frac{\Ereduced}{\Eoriginal}$
- \If{$(\Pnorm - \Enorm > \Dist)$}
- \State $\Sopt[i] \gets S_{i},~i=1,\dots,N. $
- \State $\Dist \gets \Pnorm - \Enorm$
- \EndIf
- \EndWhile
- \State Return $\Sopt[1],\Sopt[2],\dots,\Sopt[N]$
- \end{algorithmic}
- \caption{frequency scaling factors selection algorithm}
- \label{HSA}
-\end{algorithm}
-
-\begin{algorithm}
- \begin{algorithmic}[1]
- % \footnotesize
- \For {$k=1$ to \textit{some iterations}}
- \State Computations section.
- \State Communications section.
- \If {$(k=1)$}
- \State Gather all times of computation and\newline\hspace*{3em}%
- communication from each node.
- \State Call Algorithm \ref{HSA}.
- \State Compute the new frequencies from the\newline\hspace*{3em}%
- returned optimal scaling factors.
- \State Set the new frequencies to nodes.
- \EndIf
- \EndFor
- \end{algorithmic}
- \caption{DVFS algorithm}
- \label{dvfs}
-\end{algorithm}
-
-\subsection{The evaluation of the proposed algorithm}
-\label{sec.verif.algo}
-The precision of the proposed algorithm mainly depends on the execution time
-prediction model defined in (\ref{eq:perf}) and the energy model computed by
-(\ref{eq:energy}). The energy model is also significantly dependent on the
-execution time model because the static energy is linearly related to the
-execution time and the dynamic energy is related to the computation time. So,
-all the works presented in this paper are based on the execution time model. To
-verify this model, the predicted execution time was compared to the real
-execution time over SimGrid/SMPI simulator,
-v3.10~\cite{casanova+giersch+legrand+al.2014.versatile}, for all the NAS
-parallel benchmarks NPB v3.3 \cite{NAS.Parallel.Benchmarks}, running class B on
-8 or 9 nodes. The comparison showed that the proposed execution time model is
-very precise, the maximum normalized difference between the predicted execution
-time and the real execution time is equal to 0.03 for all the NAS benchmarks.
-
-Since the proposed algorithm is not an exact method it does not test all the
-possible solutions (vectors of scaling factors) in the search space. To prove
-its efficiency, it was compared on small instances to a brute force search
-algorithm that tests all the possible solutions. The brute force algorithm was
-applied to different NAS benchmarks classes with different number of nodes. The
-solutions returned by the brute force algorithm and the proposed algorithm were
-identical and the proposed algorithm was on average 10 times faster than the
-brute force algorithm. It has a small execution time: for a heterogeneous
-cluster composed of four different types of nodes having the characteristics
-presented in Table~\ref{table:platform}, it takes on average \np[ms]{0.04} for 4
-nodes and \np[ms]{0.15} on average for 144 nodes to compute the best scaling
-factors vector. The algorithm complexity is $O(F\cdot (N \cdot4) )$, where $F$
-is the number of iterations and $N$ is the number of computing nodes. The
-algorithm needs from 12 to 20 iterations to select the best vector of frequency
-scaling factors that gives the results of the next sections.
-