-\begin{algorithm}
- \begin{algorithmic}[1]
- % \footnotesize
- \Require ~
- \begin{description}
- \item[$Tcp_i$] array of all computation times for all nodes during one iteration and with highest frequency.
- \item[$Tcm_i$] array of all communication times for all nodes during one iteration and with highest frequency.
- \item[$Fmax_i$] array of the maximum frequencies for all nodes.
- \item[$Pd_i$] array of the dynamic powers for all nodes.
- \item[$Ps_i$] array of the static powers for all nodes.
- \item[$Fdiff_i$] array of the difference between two successive frequencies for all nodes.
- \end{description}
- \Ensure $Sopt_1,Sopt_2 \dots, Sopt_N$ is a vector of optimal scaling factors
-
- \State $ Scp_i \gets \frac{\max_{i=1,2,\dots,N}(Tcp_i)}{Tcp_i} $
- \State $F_{i} \gets \frac{Fmax_i}{Scp_i},~{i=1,2,\cdots,N}$
- \State Round the computed initial frequencies $F_i$ to the closest one available in each node.
- \If{(not the first frequency)}
- \State $F_i \gets F_i+Fdiff_i,~i=1,\dots,N.$
- \EndIf
- \State $T_\textit{Old} \gets max_{~i=1,\dots,N } (Tcp_i+Tcm_i)$
- \State $E_\textit{Original} \gets \sum_{i=1}^{N}{( Pd_i \cdot Tcp_i)} +\sum_{i=1}^{N} {(Ps_i \cdot T_{Old})}$
- \State $Sopt_{i} \gets 1,~i=1,\dots,N. $
- \State $Dist \gets 0 $
- \While {(all nodes not reach their minimum frequency)}
- \If{(not the last freq. \textbf{and} not the slowest node)}
- \State $F_i \gets F_i - Fdiff_i,~i=1,\dots,N.$
- \State $S_i \gets \frac{Fmax_i}{F_i},~i=1,\dots,N.$
- \EndIf
- \State $T_{New} \gets max_\textit{~i=1,\dots,N} (Tcp_{i} \cdot S_{i}) + MinTcm $
- \State $E_\textit{Reduced} \gets \sum_{i=1}^{N}{(S_i^{-2} \cdot Pd_i \cdot Tcp_i)} + $ \hspace*{43 mm}
- $\sum_{i=1}^{N} {(Ps_i \cdot T_{New})} $
- \State $ P_\textit{Norm} \gets \frac{T_\textit{Old}}{T_\textit{New}}$
- \State $E_\textit{Norm}\gets \frac{E_\textit{Reduced}}{E_\textit{Original}}$
- \If{$(\Pnorm - \Enorm > \Dist)$}
- \State $Sopt_{i} \gets S_{i},~i=1,\dots,N. $
- \State $\Dist \gets \Pnorm - \Enorm$
- \EndIf
- \EndWhile
- \State Return $Sopt_1,Sopt_2,\dots,Sopt_N$
- \end{algorithmic}
- \caption{frequency scaling factors selection algorithm}
- \label{HSA}
-\end{algorithm}
-
-\begin{algorithm}
- \begin{algorithmic}[1]
- % \footnotesize
- \For {$k=1$ to \textit{some iterations}}
- \State Computations section.
- \State Communications section.
- \If {$(k=1)$}
- \State Gather all times of computation and\newline\hspace*{3em}%
- communication from each node.
- \State Call Algorithm \ref{HSA}.
- \State Compute the new frequencies from the\newline\hspace*{3em}%
- returned optimal scaling factors.
- \State Set the new frequencies to nodes.
- \EndIf
- \EndFor
- \end{algorithmic}
- \caption{DVFS algorithm}
- \label{dvfs}
-\end{algorithm}
-
-\subsection{The evaluation of the proposed algorithm}
-\label{sec.verif.algo}
-The precision of the proposed algorithm mainly depends on the execution time
-prediction model defined in (\ref{eq:perf}) and the energy model computed by
-(\ref{eq:energy}). The energy model is also significantly dependent on the
-execution time model because the static energy is linearly related to the
-execution time and the dynamic energy is related to the computation time. So,
-all the works presented in this paper are based on the execution time model. To
-verify this model, the predicted execution time was compared to the real
-execution time over SimGrid/SMPI simulator,
-v3.10~\cite{casanova+giersch+legrand+al.2014.versatile}, for all the NAS
-parallel benchmarks NPB v3.3 \cite{NAS.Parallel.Benchmarks}, running class B on
-8 or 9 nodes. The comparison showed that the proposed execution time model is
-very precise, the maximum normalized difference between the predicted execution
-time and the real execution time is equal to 0.03 for all the NAS benchmarks.
-
-Since the proposed algorithm is not an exact method it does not test all the
-possible solutions (vectors of scaling factors) in the search space. To prove
-its efficiency, it was compared on small instances to a brute force search
-algorithm that tests all the possible solutions. The brute force algorithm was
-applied to different NAS benchmarks classes with different number of nodes. The
-solutions returned by the brute force algorithm and the proposed algorithm were
-identical and the proposed algorithm was on average 10 times faster than the
-brute force algorithm. It has a small execution time: for a heterogeneous
-cluster composed of four different types of nodes having the characteristics
-presented in Table~\ref{table:platform}, it takes on average \np[ms]{0.04} for 4
-nodes and \np[ms]{0.15} on average for 144 nodes to compute the best scaling
-factors vector. The algorithm complexity is $O(F\cdot (N \cdot4) )$, where $F$
-is the number of iterations and $N$ is the number of computing nodes. The
-algorithm needs from 12 to 20 iterations to select the best vector of frequency
-scaling factors that gives the results of the next sections.
-