SimGrid/SMPI v3.10~\cite{ref66} over an homogeneous
distributed memory architecture.
-This chapter is composed of two parts. The first part, the proposed frequency scaling selection algorithm uses the energy model of the Rauber and Rünger \cite{ref47} and is compared to Rauber and Rünger's method. The comparison results show that our algorithm gives better energy-time trade-off. In the second part, a new energy model that takes into account both the communication and computation times of the MPI programs running over a homogeneous cluster is developed.
+This chapter is composed of two parts. In the first part, the proposed frequency scaling selection algorithm uses the energy model of Rauber and Rünger \cite{ref47} and is compared to Rauber and Rünger's method. The comparison results show that our algorithm gives better energy-time trade-off. In the second part, a new energy model that takes into account both the communication and computation times of the MPI programs running over a homogeneous cluster is developed.
It also shows the new results obtained using the new energy model. The results are compared to the ones given by Rauber and Rünger's energy model.
performance of MPI programs can be predicted. Section~\ref{ch2:4} presents
the energy-performance objective function that maximizes the reduction of energy
consumption while minimizing the degradation of the program's performance.
-Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance
-trade-off for a parallel iterative application.
+Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance
+trade-off for a parallel application with iterations
Section~\ref{ch2:6} verifies the accuracy of the performance prediction model
and presents the results of the proposed algorithm. It also shows the
comparison results between our method and other existing methods.
\section{Performance evaluation of MPI programs}
\label{ch2:3}
-The execution time of a parallel synchronous iterative application is equal
+The execution time of a parallel synchronous application with iteration is equal
to the execution time of its slowest task as in figure~(\ref{fig:homo}).
If there is no communication in the application and it is not data bounded, the execution time of this
parallel application is linearly proportional to the operational frequency. Any
\end{tabular}
\label{table:platform-homo}
\end{table}
+
+
\subsection{Performance prediction verification}
\label{ch2:6:1}
In this section, the precision of the proposed performance prediction method
In our cluster there are 18 available frequency states for each processor. This
leads to 18 run states for each program. Seven MPI programs of the NAS
parallel benchmarks were used: CG, MG, EP, FT, BT, LU and SP. Table \ref{table:NAS-dec} shows
-the description of these seven benchmarks.
+the description of these seven benchmarks. Some of these benchmarks are considered MPI parallel applications with synchronous iterations or iterative applications that repeat the same block of instructions until convergence.
+However, the proposed method can be applied to any application that executes the same block
+of instructions many times and it is not limited to iterative methods.
Figure~(\ref{fig:pred}) presents plots of the real execution times compared to the simulated ones. The maximum
normalized error between these two execution times varies between 0.0073 to
0.031 depending on the executed benchmark. The smallest prediction error
\begin{tabular}{|l|l|l|}
\hline
Benchmark & Full Name & Description \\ \hline
-CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}}Estimate the smallest eigenvalue of a large \\ sparse symmetric positive-definite matrix \\ using the inverse iteration with the conjugate \\ gradient method as a subroutine for solving \\ systems of linear equations\end{tabular} \\ \hline
-MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}Approximate the solution to a three-dimensional \\ discrete Poisson equation using the V-cycle \\ multigrid method\end{tabular} \\ \hline
-EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}}Generate independent Gaussian random \\ variates using the Marsaglia polar method\end{tabular} \\ \hline
-FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}Solve a three-dimensional partial differential\\ equation (PDE) using the fast Fourier transform \\ (FFT)\end{tabular} \\ \hline
-BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}}Solve a synthetic system of nonlinear PDEs \\ using three different algorithms involving \\ block tridiagonal, scalar pentadiagonal and \\ symmetric successive over-relaxation \\ (SSOR) solver kernels, respectively\end{tabular}} \\ \cline{1-2}
+CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}}
+It solves a system of linear equations by estimating\\ the smallest eigenvalue of a large sparse matrix \end{tabular}\\ \hline
+
+MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}It uses the multigrid method to approximate the solution \\of a three-dimensional discrete Poisson equation\end{tabular}
+ \\ \hline
+EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}} It applies the Marsaglia polar method to randomly \\generates independent Gaussian variates
+\end{tabular} \\ \hline
+FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}It uses the fast Fourier transform to solve a \\ three-dimensional partial differential equation
+
+\end{tabular} \\ \hline
+BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}} \\They solve nonlinear partial differential equations \end{tabular}} \\ \cline{1-2}
LU & \begin{tabular}[c]{@{}l@{}}Lower-Upper symmetric \\ Gauss-Seidel\end{tabular} & \\ \cline{1-2}
-SP & \begin{tabular}[c]{@{}l@{}}Scalar \\ Pentadiagonal\end{tabular} & \\ \hline
+SP & \begin{tabular}[c]{@{}l@{}}Scalar Pentadiagonal\end{tabular} & \\ \hline
\end{tabular}
\end{table}
\end{equation}
According to this model, the frequency scaling factor $S$ reduces the energy consumption of the homogeneous architecture by a factor of $S^{-2}$ and increases the execution time by a factor of $S$.
-This model can be used to predict the energy consumption of the message passing synchronous iterative applications after gathering the computation and communication times of the first iteration.
-Furthermore, it can be used to measure the energy consumption of the iterative application by multiplying the energy consumed of all tasks in one iteration by the number of the iterations.
+This model can be used to predict the energy consumption of the message passing applications with synchronous iterations after gathering the computation and communication times of the first iteration.
+Furthermore, it can be used to measure the energy consumption of the parallel application with iterations by multiplying the energy consumed of all tasks in one iteration by the number of the iterations.
This model is used by the algorithm \ref{EPSA} to predict the energy consumption and to select the optimal frequency scaling factor. The new frequency $F_i$ can be computed as in \ref{eq:fi} while using the new selected frequency scaling factor.
In the next section, algorithm \ref{EPSA} is re-evaluated while using this new energy model and the new results are presented.
method while being executed on the SimGrid simulator. The results showed that
our method, outperforms the Rauber and Rünger's method in terms of energy-performance
ratio. Finally, this chapter presents a new energy consumption model for parallel
-synchronous iterative methods running on homogeneous clusters. This model takes into consideration
+ applications with synchronous iterations running on homogeneous clusters. This model takes into consideration
both the computation and communication times and their relation with the frequency scaling
factor. The results obtained using the new energy model have shown that different frequency scaling factors
were selected which gave new experimental results that are more accurate and realistic.
\ No newline at end of file