+
+%\begin{minipage}{\textwidth}
+
+\begin{algorithm}[tp]
+ \caption{DVFS}
+ \label{dvfs}
+ \begin{algorithmic}[1]
+ \For {$k:=1$ to $Some-Iterations \; $}
+ \State -Computations section.
+ \State -Communications section.
+ \If {$(k=1)$}
+ \State -Gather all times of computation and\par\hspace{13 pt} communication from each node.
+ \State -Call algorithm~\ref{EPSA} with these times.
+ \State -Compute the new frequency from the \par\hspace{13 pt} returned optimal scaling factor.
+ \State -Set the new frequency to the CPU.
+ \EndIf
+\EndFor
+\end{algorithmic}
+\end{algorithm}
+After obtaining the optimal scaling factor, the program
+calculates the new frequency $F_i$ for each task proportionally to its time
+value $T_i$. By substitution of EQ~(\ref{eq:s}) in EQ~(\ref{eq:si}), we
+can calculate the new frequency $F_i$ as follows:
+\begin{equation}
+ \label{eq:fi}
+ F_i = \frac{F_\textit{max} \cdot T_i}{S_\textit{optimal} \cdot T_\textit{max}}
+\end{equation}
+According to this equation all the nodes may have the same frequency value if
+they have balanced workloads, otherwise, they take different frequencies when
+having imbalanced workloads. Thus, EQ~(\ref{eq:fi}) adapts the frequency of the CPU to the nodes' workloads to maintain performance.
+
+\section{Experimental results}
+\label{sec.expe}
+Our experiments are executed on the simulator SimGrid/SMPI
+v3.10. We configure the simulator to use a homogeneous cluster with one core per
+node. The
+detailed characteristics of our platform file are shown in the
+table~(\ref{table:platform}).
+Each node in the cluster has 18 frequency values
+from 2.5 GHz to 800 MHz with 100 MHz difference between each two successive
+frequencies. The simulated network link is 1 GB Ethernet (TCP/IP).
+The backbone of the cluster simulates a high performance switch.
+
+\subsection{Performance prediction verification}
+
+In this section we evaluate the precision of our performance prediction method based on EQ~(\ref{eq:tnew}) by applying it the NAS benchmarks. The NAS programs are executed with the class B option for comparing the
+real execution time with the predicted execution time. Each program runs offline
+with all available scaling factors on 8 or 9 nodes (depending on the benchmark) to produce real execution
+time values. These scaling factors are computed by dividing the maximum
+frequency by the new one see EQ~(\ref{eq:s}).
+\begin{figure*}[t]
+ \centering
+ \includegraphics[width=.328\textwidth]{fig/cg_per}\hfill%
+ \includegraphics[width=.328\textwidth]{fig/mg_pre}\hfill%
+ % \includegraphics[width=.4\textwidth]{fig/bt_pre}\qquad%
+ \includegraphics[width=.328\textwidth]{fig/lu_pre}\hfill%
+ \caption{Comparing predicted to real execution time}
+ \label{fig:pred}
+\end{figure*}
+%see Figure~\ref{fig:pred}
+In our cluster there are 18 available frequency states for each processor.
+This leads to 18 run states for each program. We use seven MPI programs of the
+ NAS parallel benchmarks: CG, MG, EP, FT, BT, LU
+and SP. Figure~(\ref{fig:pred}) presents plots of the real execution times and the simulated ones. The maximum normalized error between the predicted execution time and the real time (SimGrid time) for all programs is between 0.0073 to 0.031. The better case is for CG and the worse case is for LU.
+\subsection{The experimental results for the scaling algorithm }
+The proposed algorithm was applied to seven MPI programs of the NAS
+benchmarks (EP, CG, MG, FT, BT, LU and SP) which were run with three classes (A, B and
+C). For each instance the benchmarks were executed on a number of processors
+proportional to the size of the class. Each class represents the problem size
+ascending from the class A to C. Additionally, depending on some speed up points
+for each class we run the classes A, B and C on 4, 8 or 9 and 16 nodes
+respectively.
+Depending on EQ~(\ref{eq:energy}), we measure the energy consumption for all
+the NAS MPI programs while assuming the power dynamic with the highest frequency is equal to \np[W]{20} and
+the power static is equal to \np[W]{4} for all experiments. These power values were also
+used by Rauber and Rünger in~\cite{3}. The results showed that the algorithm selected