-\subsection{The algorithm details}
-In this section, Algorithm~\ref{HSA} is presented. It selects the frequency
-scaling factors vector that gives the best trade-off between minimizing the
-energy consumption and maximizing the performance of a message passing
-synchronous iterative application executed on a heterogeneous platform. It works
-online during the execution time of the iterative message passing program. It
-uses information gathered during the first iteration such as the computation
-time and the communication time in one iteration for each node. The algorithm is
-executed after the first iteration and returns a vector of optimal frequency
-scaling factors that satisfies the objective function (\ref{eq:max}). The
-program applies DVFS operations to change the frequencies of the CPUs according
-to the computed scaling factors. This algorithm is called just once during the
-execution of the program. Algorithm~\ref{dvfs} shows where and when the proposed
-scaling algorithm is called in the iterative MPI program.
-
-The nodes in a heterogeneous platform have different computing powers, thus
-while executing message passing iterative synchronous applications, fast nodes
-have to wait for the slower ones to finish their computations before being able
-to synchronously communicate with them as in Figure~\ref{fig:heter}. These
-periods are called idle or slack times. The algorithm takes into account this
-problem and tries to reduce these slack times when selecting the frequency
-scaling factors vector. At first, it selects initial frequency scaling factors
-that increase the execution times of fast nodes and minimize the differences
-between the computation times of fast and slow nodes. The value of the initial
-frequency scaling factor for each node is inversely proportional to its
-computation time that was gathered from the first iteration. These initial
-frequency scaling factors are computed as a ratio between the computation time
-of the slowest node and the computation time of the node $i$ as follows:
-\begin{equation}
- \label{eq:Scp}
- \Scp[i] = \frac{\max_{i=1,2,\dots,N}(\Tcp[i])}{\Tcp[i]}
-\end{equation}
-Using the initial frequency scaling factors computed in (\ref{eq:Scp}), the algorithm computes
-the initial frequencies for all nodes as a ratio between the maximum frequency of node $i$
-and the computation scaling factor $\Scp[i]$ as follows:
-\begin{equation}
- \label{eq:Fint}
- F_{i} = \frac{\Fmax[i]}{\Scp[i]},~{i=1,2,\dots,N}
-\end{equation}
-If the computed initial frequency for a node is not available in the gears of
-that node, it is replaced by the nearest available frequency. In
-Figure~\ref{fig:st_freq}, the nodes are sorted by their computing power in
-ascending order and the frequencies of the faster nodes are scaled down
-according to the computed initial frequency scaling factors. The resulting new
-frequencies are highlighted in Figure~\ref{fig:st_freq}. This set of
-frequencies can be considered as a higher bound for the search space of the
-optimal vector of frequencies because selecting frequency scaling factors higher
-than the higher bound will not improve the performance of the application and it
-will increase its overall energy consumption. Therefore the algorithm that
-selects the frequency scaling factors starts the search method from these
-initial frequencies and takes a downward search direction toward lower
-frequencies. The algorithm iterates on all left frequencies, from the higher
-bound until all nodes reach their minimum frequencies, to compute their overall
-energy consumption and performance, and select the optimal frequency scaling
-factors vector. At each iteration the algorithm determines the slowest node
-according to the equation (\ref{eq:perf}) and keeps its frequency unchanged,
-while it lowers the frequency of all other nodes by one gear. The new overall
-energy consumption and execution time are computed according to the new scaling
-factors. The optimal set of frequency scaling factors is the set that gives the
-highest distance according to the objective function (\ref{eq:max}).
-
-Figures~\ref{fig:r1} and \ref{fig:r2} illustrate the normalized performance and
-consumed energy for an application running on a homogeneous platform and a
-heterogeneous platform respectively while increasing the scaling factors. It can
-be noticed that in a homogeneous platform the search for the optimal scaling
-factor should start from the maximum frequency because the performance and the
-consumed energy decrease from the beginning of the plot. On the other hand,
-in the heterogeneous platform the performance is maintained at the beginning of
-the plot even if the frequencies of the faster nodes decrease until the
-computing power of scaled down nodes are lower than the slowest node. In other
-words, until they reach the higher bound. It can also be noticed that the higher
-the difference between the faster nodes and the slower nodes is, the bigger the
-maximum distance between the energy curve and the performance curve is while
- the scaling factors are varying which results in bigger energy savings.
-\begin{figure}[!t]
- \centering
- \includegraphics[scale=0.5]{fig/start_freq}
- \caption{Selecting the initial frequencies}
- \label{fig:st_freq}
-\end{figure}
-
-
-
-