-In this section we are proposed a heterogeneous scaling algorithm,
-(figure~\ref{HSA}), that selects the optimal vector of the frequency scaling factors from each
-node. The algorithm is numerates the suitable range of available vectors of the frequency scaling
-factors for all node in a heterogeneous cluster, returns a vector of optimal
-frequency scaling factors define as $Sopt_1,Sopt_2,\dots,Sopt_N$. Using heterogeneous cluster
-has different computing powers is produces different workloads for each node. Therefore, the fastest nodes waiting at the
-synchronous barrier for the slowest nodes to finish there work as in figure
-(\ref{fig:heter}). Our algorithm is takes into account these imbalanced workloads
-when is starts to search for selecting the best vector of the frequency scaling factors. So, the
-algorithm is selects the initial frequencies values for each node proportional
-to the times of computations that gathered from the first iteration. As an
-example in figure (\ref{fig:st_freq}), the algorithm don't tests the first
-frequencies of the computing nodes until their frequencies is converge to the
-frequency of the slowest node. The operational frequency gear not surly related to computing power, therefore the algorithm
-rapprochement the frequencies according to the computing power of each frequency. Moreover, If the algorithm is starts to test change the
-frequency of the slowest node from the first gear, we are loosing the performance and
-then the best trade-off relation (the maximum distance) be not reachable. This case will be similar
-to a homogeneous cluster when all nodes scale down their frequencies together from
-the first gears. Therefore, there is a small distance between the energy and
-the performance curves in a homogeneous cluster compare to heterogeneous one, for example see the figure(\ref{fig:r1}) and figure(\ref{fig:r2}) . Then the
-algorithm starts to search for the optimal vector of the frequency scaling factors from the selected initial
-frequencies until all node reach their minimum frequencies.
-\begin{figure}[t]
+\subsection{The algorithm details}
+In this section, Algorithm~\ref{HSA} is presented. It selects the frequency
+scaling factors vector that gives the best trade-off between minimizing the
+energy consumption and maximizing the performance of a message passing
+synchronous iterative application executed on a heterogeneous platform. It works
+online during the execution time of the iterative message passing program. It
+uses information gathered during the first iteration such as the computation
+time and the communication time in one iteration for each node. The algorithm is
+executed after the first iteration and returns a vector of optimal frequency
+scaling factors that satisfies the objective function (\ref{eq:max}). The
+program applies DVFS operations to change the frequencies of the CPUs according
+to the computed scaling factors. This algorithm is called just once during the
+execution of the program. Algorithm~\ref{dvfs} shows where and when the proposed
+scaling algorithm is called in the iterative MPI program.
+
+The nodes in a heterogeneous platform have different computing powers, thus
+while executing message passing iterative synchronous applications, fast nodes
+have to wait for the slower ones to finish their computations before being able
+to synchronously communicate with them as in Figure~\ref{fig:heter}. These
+periods are called idle or slack times. The algorithm takes into account this
+problem and tries to reduce these slack times when selecting the frequency
+scaling factors vector. At first, it selects initial frequency scaling factors
+that increase the execution times of fast nodes and minimize the differences
+between the computation times of fast and slow nodes. The value of the initial
+frequency scaling factor for each node is inversely proportional to its
+computation time that was gathered from the first iteration. These initial
+frequency scaling factors are computed as a ratio between the computation time
+of the slowest node and the computation time of the node $i$ as follows:
+\begin{equation}
+ \label{eq:Scp}
+ \Scp[i] = \frac{\max_{i=1,2,\dots,N}(\Tcp[i])}{\Tcp[i]}
+\end{equation}
+Using the initial frequency scaling factors computed in (\ref{eq:Scp}), the algorithm computes
+the initial frequencies for all nodes as a ratio between the maximum frequency of node $i$
+and the computation scaling factor $\Scp[i]$ as follows:
+\begin{equation}
+ \label{eq:Fint}
+ F_{i} = \frac{\Fmax[i]}{\Scp[i]},~{i=1,2,\dots,N}
+\end{equation}
+If the computed initial frequency for a node is not available in the gears of
+that node, it is replaced by the nearest available frequency. In
+Figure~\ref{fig:st_freq}, the nodes are sorted by their computing power in
+ascending order and the frequencies of the faster nodes are scaled down
+according to the computed initial frequency scaling factors. The resulting new
+frequencies are colored in blue in Figure~\ref{fig:st_freq}. This set of
+frequencies can be considered as a higher bound for the search space of the
+optimal vector of frequencies because selecting frequency scaling factors higher
+than the higher bound will not improve the performance of the application and it
+will increase its overall energy consumption. Therefore the algorithm that
+selects the frequency scaling factors starts the search method from these
+initial frequencies and takes a downward search direction toward lower
+frequencies. The algorithm iterates on all left frequencies, from the higher
+bound until all nodes reach their minimum frequencies, to compute their overall
+energy consumption and performance, and select the optimal frequency scaling
+factors vector. At each iteration the algorithm determines the slowest node
+according to the equation (\ref{eq:perf}) and keeps its frequency unchanged,
+while it lowers the frequency of all other nodes by one gear. The new overall
+energy consumption and execution time are computed according to the new scaling
+factors. The optimal set of frequency scaling factors is the set that gives the
+highest distance according to the objective function (\ref{eq:max}).
+
+Figures~\ref{fig:r1} and \ref{fig:r2} illustrate the normalized performance and
+consumed energy for an application running on a homogeneous platform and a
+heterogeneous platform respectively while increasing the scaling factors. It can
+be noticed that in a homogeneous platform the search for the optimal scaling
+factor should start from the maximum frequency because the performance and the
+consumed energy decrease from the beginning of the plot. On the other hand,
+in the heterogeneous platform the performance is maintained at the beginning of
+the plot even if the frequencies of the faster nodes decrease until the
+computing power of scaled down nodes are lower than the slowest node. In other
+words, until they reach the higher bound. It can also be noticed that the higher
+the difference between the faster nodes and the slower nodes is, the bigger the
+maximum distance between the energy curve and the performance curve is while
+ the scaling factors are varying which results in bigger energy savings.
+\begin{figure}[!t]