corrections

[ThesisAhmed.git] / CHAPITRE_02.tex
diff --git a/CHAPITRE_02.tex b/CHAPITRE_02.tex

index 48ea63c809cecf59d4bd3cc5d4615b30d5e81a66..c3b12450884555c322a234f1ba0e8e1cf77dec28 100644 (file)
--- a/CHAPITRE_02.tex
+++ b/CHAPITRE_02.tex
@@ -44,7 +44,7 @@ benchmarks (NPB v3.3) developed by the  NASA~\cite{ref65}.  Our experiments are
  SimGrid/SMPI v3.10~\cite{ref66} over an homogeneous
  distributed memory architecture. 
  
  SimGrid/SMPI v3.10~\cite{ref66} over an homogeneous
  distributed memory architecture. 
  
-This chapter is composed of two parts. The first part, the proposed frequency scaling selection algorithm  uses the energy model of the Rauber and Rünger \cite{ref47} and  is compared to Rauber and Rünger's method.  The comparison results show that our algorithm gives better energy-time trade-off. In the second part,  a new energy model  that takes into account both the communication and computation times of the MPI programs running over a homogeneous cluster is developed.
+This chapter is composed of two parts. In the first part, the proposed frequency scaling selection algorithm  uses the energy model of  Rauber and Rünger \cite{ref47} and  is compared to Rauber and Rünger's method.  The comparison results show that our algorithm gives better energy-time trade-off. In the second part,  a new energy model  that takes into account both the communication and computation times of the MPI programs running over a homogeneous cluster is developed.
  It  also shows the new results obtained using the new energy model. The results are compared to the  ones given by Rauber and Rünger's energy model.  
  
  
  It  also shows the new results obtained using the new energy model. The results are compared to the  ones given by Rauber and Rünger's energy model.  
  
  
@@ -54,8 +54,8 @@ model for homogeneous platforms from other researchers.  Section~\ref{ch2:3} des
  performance of MPI programs can be predicted.  Section~\ref{ch2:4} presents
  the energy-performance objective function that maximizes the reduction of energy
  consumption while minimizing the degradation of the program's performance.
  performance of MPI programs can be predicted.  Section~\ref{ch2:4} presents
  the energy-performance objective function that maximizes the reduction of energy
  consumption while minimizing the degradation of the program's performance.
-Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance
-trade-off for a parallel iterative application.
+Section~\ref{ch2:5} details the algorithm that returns the scaling factor that gives the best energy-performance 
+trade-off for a parallel  application with iterations
  Section~\ref{ch2:6} verifies the accuracy of the performance prediction model
  and presents the results of the proposed algorithm.  It also shows the
  comparison results between our method and other existing methods.  
  Section~\ref{ch2:6} verifies the accuracy of the performance prediction model
  and presents the results of the proposed algorithm.  It also shows the
  comparison results between our method and other existing methods.  
@@ -219,7 +219,7 @@ This model  computes the frequency scaling factor which minimizes the energy con
  \section{Performance evaluation of MPI programs}
  \label{ch2:3}
  
  \section{Performance evaluation of MPI programs}
  \label{ch2:3}
  
-The execution time of a parallel synchronous iterative application is equal
+The execution time of a parallel synchronous  application with iteration is equal
  to the execution time of its slowest task as in figure~(\ref{fig:homo}).  
  If there is no communication in the application and it is not data bounded, the execution time of this 
  parallel application is linearly proportional to the operational frequency. Any 
  to the execution time of its slowest task as in figure~(\ref{fig:homo}).  
  If there is no communication in the application and it is not data bounded, the execution time of this 
  parallel application is linearly proportional to the operational frequency. Any 
@@ -420,6 +420,8 @@ two successive frequencies.  The simulated network link is 1 GB Ethernet
    \end{tabular}
    \label{table:platform-homo}
  \end{table}
    \end{tabular}
    \label{table:platform-homo}
  \end{table}
+
+
  \subsection{Performance prediction verification}
  \label{ch2:6:1}
  In this section, the precision of the proposed performance prediction method 
  \subsection{Performance prediction verification}
  \label{ch2:6:1}
  In this section, the precision of the proposed performance prediction method 
@@ -444,7 +446,9 @@ maximum frequency by the new one see EQ~(\ref{eq:s}).
  In our cluster there are 18 available frequency states for each processor.  This
  leads to 18 run states for each program.  Seven MPI programs of the NAS
  parallel benchmarks were used: CG, MG, EP, FT, BT, LU and SP. Table \ref{table:NAS-dec} shows 
  In our cluster there are 18 available frequency states for each processor.  This
  leads to 18 run states for each program.  Seven MPI programs of the NAS
  parallel benchmarks were used: CG, MG, EP, FT, BT, LU and SP. Table \ref{table:NAS-dec} shows 
-the description of these seven benchmarks. 
+the description of these seven benchmarks. Some of these benchmarks are considered  MPI parallel applications with synchronous iterations  or iterative applications that repeat the same block of instructions until convergence. 
+However,  the proposed method can be applied to any application that executes the same block
+of instructions many times and it is not limited to iterative methods. 
  Figure~(\ref{fig:pred}) presents plots of the real execution times compared to the simulated ones.  The maximum
  normalized error between these two execution times varies between 0.0073 to
  0.031 depending on the executed benchmark.  The smallest prediction error
  Figure~(\ref{fig:pred}) presents plots of the real execution times compared to the simulated ones.  The maximum
  normalized error between these two execution times varies between 0.0073 to
  0.031 depending on the executed benchmark.  The smallest prediction error
@@ -457,13 +461,19 @@ was for CG and the worst one was for LU.
  \begin{tabular}{|l|l|l|}
  \hline
  Benchmark & Full Name                                                                     & Description                                                                                                                                                                                                                                                                     \\ \hline
  \begin{tabular}{|l|l|l|}
  \hline
  Benchmark & Full Name                                                                     & Description                                                                                                                                                                                                                                                                     \\ \hline
-CG        & Conjugate Gradiant                                                            & \begin{tabular}[c]{@{}l@{}}Estimate the smallest eigenvalue of a large \\ sparse  symmetric positive-definite matrix \\ using the  inverse iteration with the conjugate \\ gradient method as a subroutine for solving \\ systems of linear equations\end{tabular}              \\ \hline
-MG        & MultiGrid                                                                     & \begin{tabular}[c]{@{}l@{}}Approximate the solution to a three-dimensional \\ discrete Poisson equation using the V-cycle \\ multigrid method\end{tabular}                                                                                                                      \\ \hline
-EP        & Embarrassingly Parallel                                                       & \begin{tabular}[c]{@{}l@{}}Generate independent Gaussian random \\ variates using the Marsaglia polar method\end{tabular}                                                                                                                                                       \\ \hline
-FT        & Fast Fourier Transform                                                        & \begin{tabular}[c]{@{}l@{}}Solve a three-dimensional partial differential\\ equation (PDE) using the fast Fourier transform \\ (FFT)\end{tabular}                                                                                                                               \\ \hline
-BT        & Block Tridiagonal                                                             & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}}Solve a synthetic system of nonlinear PDEs \\ using three different algorithms involving \\ block tridiagonal, scalar pentadiagonal and \\ symmetric successive over-relaxation \\ (SSOR) solver kernels, respectively\end{tabular}} \\ \cline{1-2}
+CG        & Conjugate Gradiant                                                            & \begin{tabular}[c]{@{}l@{}}
+It solves a system of linear equations by estimating\\ the smallest eigenvalue of a large sparse  matrix \end{tabular}\\ \hline
+
+MG        & MultiGrid                                                                     & \begin{tabular}[c]{@{}l@{}}It uses the multigrid method to approximate the solution \\of a three-dimensional discrete Poisson equation\end{tabular}                                                                                                                     
+ \\ \hline
+EP       & Embarrassingly Parallel                                                       & \begin{tabular}[c]{@{}l@{}} It applies the Marsaglia polar method to randomly \\generates independent Gaussian variates
+\end{tabular}                                                                                                                                                       \\ \hline
+FT        & Fast Fourier Transform                                                        & \begin{tabular}[c]{@{}l@{}}It uses the fast Fourier transform to solve a \\ three-dimensional partial differential equation
+
+\end{tabular}                                                                                                                               \\ \hline
+BT        & Block Tridiagonal                                                             & \multirow{3}{*}{\begin{tabular}[c]{@{}l@{}} \\They solve nonlinear partial differential equations \end{tabular}} \\ \cline{1-2}
  LU        & \begin{tabular}[c]{@{}l@{}}Lower-Upper symmetric \\ Gauss-Seidel\end{tabular} &                                                                                                                                                                                                                                                                                 \\ \cline{1-2}
  LU        & \begin{tabular}[c]{@{}l@{}}Lower-Upper symmetric \\ Gauss-Seidel\end{tabular} &                                                                                                                                                                                                                                                                                 \\ \cline{1-2}
-SP        & \begin{tabular}[c]{@{}l@{}}Scalar \\ Pentadiagonal\end{tabular}               &                                                                                                                                                                                                                                                                                 \\ \hline
+SP        & \begin{tabular}[c]{@{}l@{}}Scalar Pentadiagonal\end{tabular}               &                                                                                                                                                                                                                                                                                 \\ \hline
  \end{tabular}
  \end{table}
  
  \end{tabular}
  \end{table}
  
@@ -669,8 +679,8 @@ Therefore, the energy consumption model of $N$ parallel task executed  synchrono
  \end{equation}
   
  According to this model, the frequency scaling factor $S$ reduces the energy consumption of the homogeneous architecture by a factor of $S^{-2}$ and increases the execution time by a factor of  $S$.
  \end{equation}
   
  According to this model, the frequency scaling factor $S$ reduces the energy consumption of the homogeneous architecture by a factor of $S^{-2}$ and increases the execution time by a factor of  $S$.
-This model can be used to predict the energy consumption of the message passing synchronous iterative applications after gathering the computation and communication times of the first iteration.
-Furthermore, it can be used to measure the energy consumption of the iterative application by multiplying the energy consumed of all tasks in  one iteration  by the number of the iterations. 
+This model can be used to predict the energy consumption of the message passing   applications with synchronous iterations after gathering the computation and communication times of the first iteration.
+Furthermore, it can be used to measure the energy consumption of the parallel  application with iterations  by multiplying the energy consumed of all tasks in  one iteration  by the number of the iterations. 
  
  This model is used by the algorithm \ref{EPSA}  to predict the energy consumption and to select the optimal frequency scaling factor. The new frequency $F_i$ can be computed as in \ref{eq:fi} while using the new selected frequency scaling factor.         
  In the next section,  algorithm \ref{EPSA} is re-evaluated while using this new energy model and the new results are presented.
  
  This model is used by the algorithm \ref{EPSA}  to predict the energy consumption and to select the optimal frequency scaling factor. The new frequency $F_i$ can be computed as in \ref{eq:fi} while using the new selected frequency scaling factor.         
  In the next section,  algorithm \ref{EPSA} is re-evaluated while using this new energy model and the new results are presented.
@@ -734,7 +744,7 @@ have applied it to the NAS benchmarks and it was compared to the Rauber and Rün
  method while being executed on the SimGrid simulator.  The results showed that
  our method, outperforms the Rauber and Rünger's method in terms of energy-performance
  ratio. Finally, this chapter presents a new energy consumption model for  parallel
  method while being executed on the SimGrid simulator.  The results showed that
  our method, outperforms the Rauber and Rünger's method in terms of energy-performance
  ratio. Finally, this chapter presents a new energy consumption model for  parallel
-synchronous iterative methods running on homogeneous clusters. This model takes into consideration 
+  applications with synchronous iterations running on homogeneous clusters. This model takes into consideration 
  both  the computation and communication times and their relation with the frequency scaling 
  factor. The results obtained using the new energy model have shown that different frequency scaling factors 
  were  selected which gave new experimental results that are more accurate and realistic.
 \ No newline at end of file
  both  the computation and communication times and their relation with the frequency scaling 
  factor. The results obtained using the new energy model have shown that different frequency scaling factors 
  were  selected which gave new experimental results that are more accurate and realistic.
 \ No newline at end of file