-\begin{algorithm}
- \begin{algorithmic}[1]
- % \footnotesize
- \Require ~
- \begin{description}
- \item[$\Tcp_i$] array of all computation times for all nodes during one iteration and with highest frequency.
- \item[$\Tcm_i$] array of all communication times for all nodes during one iteration and with highest frequency.
- \item[$\Fmax_i$] array of the maximum frequencies for all nodes.
- \item[$\Pd_i$] array of the dynamic powers for all nodes.
- \item[$\Ps_i$] array of the static powers for all nodes.
- \item[$\Fdiff_i$] array of the difference between two successive frequencies for all nodes.
- \end{description}
- \Ensure $\Sopt_1,\Sopt_2 \dots, \Sopt_N$ is a vector of optimal scaling factors
-
- \State $\Scp_i \gets \frac{\max_{i=1,2,\dots,N}(\Tcp_i)}{\Tcp_i} $
- \State $F_{i} \gets \frac{\Fmax_i}{\Scp_i},~{i=1,2,\cdots,N}$
- \State Round the computed initial frequencies $F_i$ to the closest one available in each node.
- \If{(not the first frequency)}
- \State $F_i \gets F_i+\Fdiff_i,~i=1,\dots,N.$
- \EndIf
- \State $\Told \gets max_{~i=1,\dots,N } (\Tcp_i+\Tcm_i)$
- % \State $\Eoriginal \gets \sum_{i=1}^{N}{( \Pd_i \cdot \Tcp_i)} +\sum_{i=1}^{N} {(\Ps_i \cdot \Told)}$
- \State $\Eoriginal \gets \sum_{i=1}^{N}{( \Pd_i \cdot \Tcp_i + \Ps_i \cdot \Told)}$
- \State $\Sopt_{i} \gets 1,~i=1,\dots,N. $
- \State $\Dist \gets 0 $
- \While {(all nodes not reach their minimum frequency)}
- \If{(not the last freq. \textbf{and} not the slowest node)}
- \State $F_i \gets F_i - \Fdiff_i,~i=1,\dots,N.$
- \State $S_i \gets \frac{\Fmax_i}{F_i},~i=1,\dots,N.$
- \EndIf
- \State $\Tnew \gets max_\textit{~i=1,\dots,N} (\Tcp_{i} \cdot S_{i}) + \MinTcm $
-% \State $\Ereduced \gets \sum_{i=1}^{N}{(S_i^{-2} \cdot \Pd_i \cdot \Tcp_i)} + \sum_{i=1}^{N} {(\Ps_i \cdot \rlap{\Tnew)}} $
- \State $\Ereduced \gets \sum_{i=1}^{N}{(S_i^{-2} \cdot \Pd_i \cdot \Tcp_i + \Ps_i \cdot \rlap{\Tnew)}} $
- \State $\Pnorm \gets \frac{\Told}{\Tnew}$
- \State $\Enorm\gets \frac{\Ereduced}{\Eoriginal}$
- \If{$(\Pnorm - \Enorm > \Dist)$}
- \State $\Sopt_{i} \gets S_{i},~i=1,\dots,N. $
- \State $\Dist \gets \Pnorm - \Enorm$
- \EndIf
- \EndWhile
- \State Return $\Sopt_1,\Sopt_2,\dots,\Sopt_N$
- \end{algorithmic}
- \caption{frequency scaling factors selection algorithm}
- \label{HSA}
-\end{algorithm}
-
-\begin{algorithm}
- \begin{algorithmic}[1]
- % \footnotesize
- \For {$k=1$ to \textit{some iterations}}
- \State Computations section.
- \State Communications section.
- \If {$(k=1)$}
- \State Gather all times of computation and\newline\hspace*{3em}%
- communication from each node.
- \State Call Algorithm \ref{HSA}.
- \State Compute the new frequencies from the\newline\hspace*{3em}%
- returned optimal scaling factors.
- \State Set the new frequencies to nodes.
- \EndIf
- \EndFor
- \end{algorithmic}
- \caption{DVFS algorithm}
- \label{dvfs}
-\end{algorithm}
-
-\subsection{The evaluation of the proposed algorithm}
-\label{sec.verif.algo}
-The precision of the proposed algorithm mainly depends on the execution time
-prediction model defined in (\ref{eq:perf}) and the energy model computed by
-(\ref{eq:energy}). The energy model is also significantly dependent on the
-execution time model because the static energy is linearly related to the
-execution time and the dynamic energy is related to the computation time. So,
-all the works presented in this paper are based on the execution time model. To
-verify this model, the predicted execution time was compared to the real
-execution time over SimGrid/SMPI simulator,
-v3.10~\cite{casanova+giersch+legrand+al.2014.versatile}, for all the NAS
-parallel benchmarks NPB v3.3 \cite{NAS.Parallel.Benchmarks}, running class B on
-8 or 9 nodes. The comparison showed that the proposed execution time model is
-very precise, the maximum normalized difference between the predicted execution
-time and the real execution time is equal to 0.03 for all the NAS benchmarks.
-
-Since the proposed algorithm is not an exact method it does not test all the
-possible solutions (vectors of scaling factors) in the search space. To prove
-its efficiency, it was compared on small instances to a brute force search
-algorithm that tests all the possible solutions. The brute force algorithm was
-applied to different NAS benchmarks classes with different number of nodes. The
-solutions returned by the brute force algorithm and the proposed algorithm were
-identical and the proposed algorithm was on average 10 times faster than the
-brute force algorithm. It has a small execution time: for a heterogeneous
-cluster composed of four different types of nodes having the characteristics
-presented in Table~\ref{table:platform}, it takes on average \np[ms]{0.04} for 4
-nodes and \np[ms]{0.15} on average for 144 nodes to compute the best scaling
-factors vector. The algorithm complexity is $O(F\cdot (N \cdot4) )$, where $F$
-is the number of iterations and $N$ is the number of computing nodes. The
-algorithm needs from 12 to 20 iterations to select the best vector of frequency
-scaling factors that gives the results of the next sections.
-