X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/ThesisAhmed.git/blobdiff_plain/24a72dfea558e479cbc7ff62cbe57f0af5050525..18280a326d3fbcc20a96c315709b73e844887f46:/CHAPITRE_02.tex diff --git a/CHAPITRE_02.tex b/CHAPITRE_02.tex index 48ea63c..c6a9a29 100644 --- a/CHAPITRE_02.tex +++ b/CHAPITRE_02.tex @@ -54,8 +54,8 @@ model for homogeneous platforms from other researchers. Section~\ref{ch2:3} des performance of MPI programs can be predicted. Section~\ref{ch2:4} presents the energy-performance objective function that maximizes the reduction of energy consumption while minimizing the degradation of the program's performance. -Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance -trade-off for a parallel iterative application. +Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance +trade-off for a parallel application with iterations Section~\ref{ch2:6} verifies the accuracy of the performance prediction model and presents the results of the proposed algorithm. It also shows the comparison results between our method and other existing methods. @@ -219,7 +219,7 @@ This model computes the frequency scaling factor which minimizes the energy con \section{Performance evaluation of MPI programs} \label{ch2:3} -The execution time of a parallel synchronous iterative application is equal +The execution time of a parallel synchronous application with iteration is equal to the execution time of its slowest task as in figure~(\ref{fig:homo}). If there is no communication in the application and it is not data bounded, the execution time of this parallel application is linearly proportional to the operational frequency. Any @@ -420,6 +420,8 @@ two successive frequencies. The simulated network link is 1 GB Ethernet \end{tabular} \label{table:platform-homo} \end{table} + + \subsection{Performance prediction verification} \label{ch2:6:1} In this section, the precision of the proposed performance prediction method @@ -444,7 +446,9 @@ maximum frequency by the new one see EQ~(\ref{eq:s}). In our cluster there are 18 available frequency states for each processor. This leads to 18 run states for each program. Seven MPI programs of the NAS parallel benchmarks were used: CG, MG, EP, FT, BT, LU and SP. Table \ref{table:NAS-dec} shows -the description of these seven benchmarks. +the description of these seven benchmarks. Some of these benchmarks are considered MPI parallel applications with synchronous iterations or iterative applications that repeat the same block of instructions until convergence. +However, the proposed method can be applied to any application that executes the same block +of instructions many times and it is not limited to iterative methods. Figure~(\ref{fig:pred}) presents plots of the real execution times compared to the simulated ones. The maximum normalized error between these two execution times varies between 0.0073 to 0.031 depending on the executed benchmark. The smallest prediction error @@ -457,13 +461,19 @@ was for CG and the worst one was for LU. \begin{tabular}{|l|l|l|} \hline Benchmark & Full Name & Description \\ \hline -CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}}Estimate the smallest eigenvalue of a large \\ sparse symmetric positive-definite matrix \\ using the inverse iteration with the conjugate \\ gradient method as a subroutine for solving \\ systems of linear equations\end{tabular} \\ \hline -MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}Approximate the solution to a three-dimensional \\ discrete Poisson equation using the V-cycle \\ multigrid method\end{tabular} \\ \hline -EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}}Generate independent Gaussian random \\ variates using the Marsaglia polar method\end{tabular} \\ \hline -FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}Solve a three-dimensional partial differential\\ equation (PDE) using the fast Fourier transform \\ (FFT)\end{tabular} \\ \hline -BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}}Solve a synthetic system of nonlinear PDEs \\ using three different algorithms involving \\ block tridiagonal, scalar pentadiagonal and \\ symmetric successive over-relaxation \\ (SSOR) solver kernels, respectively\end{tabular}} \\ \cline{1-2} +CG & Conjugate Gradiant & \begin{tabular}[c]{@{}l@{}} +It solves a system of linear equations by estimating\\ the smallest eigenvalue of a large sparse matrix \end{tabular}\\ \hline + +MG & MultiGrid & \begin{tabular}[c]{@{}l@{}}It uses the multigrid method to approximate the solution \\of a three-dimensional discrete Poisson equation\end{tabular} + \\ \hline +EP & Embarrassingly Parallel & \begin{tabular}[c]{@{}l@{}} It applies the Marsaglia polar method to randomly \\generates independent Gaussian variates +\end{tabular} \\ \hline +FT & Fast Fourier Transform & \begin{tabular}[c]{@{}l@{}}It uses the fast Fourier transform to solve a \\ three-dimensional partial differential equation + +\end{tabular} \\ \hline +BT & Block Tridiagonal & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}} \\They solve nonlinear partial differential equations \end{tabular}} \\ \cline{1-2} LU & \begin{tabular}[c]{@{}l@{}}Lower-Upper symmetric \\ Gauss-Seidel\end{tabular} & \\ \cline{1-2} -SP & \begin{tabular}[c]{@{}l@{}}Scalar \\ Pentadiagonal\end{tabular} & \\ \hline +SP & \begin{tabular}[c]{@{}l@{}}Scalar Pentadiagonal\end{tabular} & \\ \hline \end{tabular} \end{table} @@ -669,8 +679,8 @@ Therefore, the energy consumption model of $N$ parallel task executed synchrono \end{equation} According to this model, the frequency scaling factor $S$ reduces the energy consumption of the homogeneous architecture by a factor of $S^{-2}$ and increases the execution time by a factor of $S$. -This model can be used to predict the energy consumption of the message passing synchronous iterative applications after gathering the computation and communication times of the first iteration. -Furthermore, it can be used to measure the energy consumption of the iterative application by multiplying the energy consumed of all tasks in one iteration by the number of the iterations. +This model can be used to predict the energy consumption of the message passing applications with synchronous iterations after gathering the computation and communication times of the first iteration. +Furthermore, it can be used to measure the energy consumption of the parallel application with iterations by multiplying the energy consumed of all tasks in one iteration by the number of the iterations. This model is used by the algorithm \ref{EPSA} to predict the energy consumption and to select the optimal frequency scaling factor. The new frequency $F_i$ can be computed as in \ref{eq:fi} while using the new selected frequency scaling factor. In the next section, algorithm \ref{EPSA} is re-evaluated while using this new energy model and the new results are presented. @@ -734,7 +744,7 @@ have applied it to the NAS benchmarks and it was compared to the Rauber and Rün method while being executed on the SimGrid simulator. The results showed that our method, outperforms the Rauber and Rünger's method in terms of energy-performance ratio. Finally, this chapter presents a new energy consumption model for parallel -synchronous iterative methods running on homogeneous clusters. This model takes into consideration + applications with synchronous iterations running on homogeneous clusters. This model takes into consideration both the computation and communication times and their relation with the frequency scaling factor. The results obtained using the new energy model have shown that different frequency scaling factors were selected which gave new experimental results that are more accurate and realistic. \ No newline at end of file