+
+%\begin{minipage}{\textwidth}
+
+\begin{algorithm}[tp]
+ \caption{DVFS}
+ \label{dvfs}
+ \begin{algorithmic}[1]
+ \For {$k:=1$ to \textit{some iterations}}
+ \State Computations section.
+ \State Communications section.
+ \If {$(k=1)$}
+ \State Gather all times of computation and\newline\hspace*{3em}%
+ communication from each node.
+ \State Call algorithm~\ref{EPSA} with these times.
+ \State Compute the new frequency from the\newline\hspace*{3em}%
+ returned optimal scaling factor.
+ \State Set the new frequency to the CPU.
+ \EndIf
+\EndFor
+\end{algorithmic}
+\end{algorithm}
+After obtaining the optimal scaling factor, the program
+calculates the new frequency $F_i$ for each task proportionally to its time
+value $T_i$. By substitution of EQ~(\ref{eq:s}) in EQ~(\ref{eq:si}), we
+can calculate the new frequency $F_i$ as follows:
+\begin{equation}
+ \label{eq:fi}
+ F_i = \frac{F_\textit{max} \cdot T_i}{S_\textit{optimal} \cdot T_\textit{max}}
+\end{equation}
+According to this equation all the nodes may have the same frequency value if
+they have balanced workloads, otherwise, they take different frequencies when
+having imbalanced workloads. Thus, EQ~(\ref{eq:fi}) adapts the frequency of the CPU to the nodes' workloads to maintain performance.
+
+\section{Experimental results}
+\label{sec.expe}
+Our experiments are executed on the simulator SimGrid/SMPI
+v3.10. We configure the simulator to use a homogeneous cluster with one core per
+node. The
+detailed characteristics of our platform file are shown in the
+table~(\ref{table:platform}).
+Each node in the cluster has 18 frequency values
+from \np[GHz]{2.5} to \np[MHz]{800} with \np[MHz]{100} difference between each two successive
+frequencies. The simulated network link is \np[GB]{1} Ethernet (TCP/IP).
+The backbone of the cluster simulates a high performance switch.
+
+\subsection{Performance prediction verification}
+
+In this section we evaluate the precision of our performance prediction method based on EQ~(\ref{eq:tnew}) by applying it the NAS benchmarks. The NAS programs are executed with the class B option for comparing the
+real execution time with the predicted execution time. Each program runs offline
+with all available scaling factors on 8 or 9 nodes (depending on the benchmark) to produce real execution
+time values. These scaling factors are computed by dividing the maximum
+frequency by the new one see EQ~(\ref{eq:s}).
+\begin{figure*}[t]
+ \centering
+ \includegraphics[width=.328\textwidth]{fig/cg_per}\hfill%
+ \includegraphics[width=.328\textwidth]{fig/mg_pre}\hfill%
+ % \includegraphics[width=.4\textwidth]{fig/bt_pre}\qquad%
+ \includegraphics[width=.328\textwidth]{fig/lu_pre}\hfill%
+ \caption{Comparing predicted to real execution time}
+ \label{fig:pred}
+\end{figure*}
+%see Figure~\ref{fig:pred}
+In our cluster there are 18 available frequency states for each processor.
+This leads to 18 run states for each program. We use seven MPI programs of the
+ NAS parallel benchmarks: CG, MG, EP, FT, BT, LU
+and SP. Figure~(\ref{fig:pred}) presents plots of the real execution times and the simulated ones. The maximum normalized error between these two execution times varies between \np{0.0073}\AG[]{unit?} to \np{0.031} dependent on the executed benchmark. The smallest prediction error was for CG and the worst one was for LU.
+\subsection{The experimental results for the scaling algorithm }
+The proposed algorithm was applied to seven MPI programs of the NAS
+benchmarks (EP, CG, MG, FT, BT, LU and SP) which were run with three classes (A, B and
+C). For each instance the benchmarks were executed on a number of processors
+proportional to the size of the class. Each class represents the problem size
+ascending from the class A to C. Additionally, depending on some speed up points
+for each class we run the classes A, B and C on 4, 8 or 9 and 16 nodes
+respectively.
+Depending on EQ~(\ref{eq:energy}), we measure the energy consumption for all
+the NAS MPI programs while assuming the power dynamic with the highest frequency is equal to \np[W]{20} and
+the power static is equal to \np[W]{4} for all experiments. These power values were also
+used by Rauber and Rünger in~\cite{3}. The results showed that the algorithm selected