X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/mpi-energy2.git/blobdiff_plain/880d3292a3cbcef0558f76eb03b22722e9e31db0..refs/heads/master:/mpi-energy2-extension/Heter_paper.tex?ds=inline diff --git a/mpi-energy2-extension/Heter_paper.tex b/mpi-energy2-extension/Heter_paper.tex index 329d526..9108cb1 100644 --- a/mpi-energy2-extension/Heter_paper.tex +++ b/mpi-energy2-extension/Heter_paper.tex @@ -392,7 +392,7 @@ where $N$ is the number of clusters in the grid, $M_i$ is the number of nodes and $\Tcm[hj]$ is the communication time of processor $j$ in the cluster $h$ during the first iteration. The execution time for one iteration is equal to the sum of the maximum computation time for all nodes with the new scaling factors and the communication time of the slowest node without slack time during one iteration. - The slowest node $h$ is the node which takes the maximum execution time to execute an iteration before scaling down its frequency. +The slowest node in cluster $h$ is the node which takes the maximum execution time to execute an iteration before scaling down its frequency. It means that only the communication time without any slack time is taken into account. Therefore, the execution time of the application is equal to the execution time of one iteration as in Equation (\ref{eq:perf}) multiplied by the @@ -512,7 +512,7 @@ static energies for $M_i$ processors in $N$ clusters. It is computed as follows E = \sum_{i=1}^{N} \sum_{i=1}^{M_i} {(S_{ij}^{-2} \cdot \Pd[ij] \cdot \Tcp[ij])} + \sum_{i=1}^{N} \sum_{j=1}^{M_i} (\Ps[ij] \cdot {} \\ (\mathop{\max_{i=1,\dots N}}_{j=1,\dots,M_i}({\Tcp[ij]} \cdot S_{ij}) - +\mathop{\min_{j=1,\dots M_i}} (\Tcm[hj]) )) + +\mathop{\min_{j=1,\dots M_h}} (\Tcm[hj]) )) \end{multline} @@ -596,13 +596,13 @@ computed as in (\ref{eq:eorginal}). While the main goal is to optimize the energy and execution time at the same time, the normalized energy and execution time curves do not evolve (increase/decrease) in the same way. According to (\ref{eq:pnorm}) and (\ref{eq:enorm}), the -vector of frequency scaling factors $S_1,S_2,\dots,S_N$ reduces both the energy +vector of frequency scaling factors $S_{11},S_{12},\dots,S_{NM_i}$ reduces both the energy and the execution time, but the main objective is to produce maximum energy reduction with minimum execution time reduction. This problem can be solved by making the optimization process for energy and execution time follow the same evolution according to the vector of scaling factors -$(S_{11}, S_{12},\dots, S_{NM})$. Therefore, the equation of the +$(S_{11}, S_{12},\dots, S_{NM_i})$. Therefore, the equation of the normalized execution time is inverted which gives the normalized performance equation, as follows: \begin{equation} @@ -1033,7 +1033,7 @@ nodes when the communications occur in high speed network does not decrease the communication ratio. The performance degradation percentage of the EP benchmark after applying the scaling factors selection algorithm is the highest in comparison to -the other benchmarks. Indeed, in the EP benchmark, there are no communication and slack times and its +the other benchmarks. Indeed, in the EP benchmark, there are no communication and no slack times and its performance degradation percentage only depends on the frequencies values selected by the algorithm for the computing nodes. The rest of the benchmarks showed different performance degradation percentages which decrease when the communication times increase and vice versa. @@ -1098,7 +1098,7 @@ Scenario name & Cluster name & Nodes per cluster & The execution times for most of the NAS benchmarks are higher over the multi-core per node scenario than over the single core per node scenario. Indeed, - the communication times are higher in the one site multi-core scenario than in the latter scenario because all the cores of a node share the same node network link which can be saturated when running communication bound applications. Moreover, the cores of a node share the memory bus which can be also saturated and become a bottleneck. + the communication times are higher in the multi-core scenario than in the latter scenario because all the cores of a node share the same node network link which can be saturated when running communication bound applications. Moreover, the cores of a node share the memory bus which can be also saturated and become a bottleneck. Moreover, the energy consumptions of the NAS benchmarks are lower over the one core scenario than over the multi-core scenario because the first scenario had less execution time than the latter which results in less static energy being consumed. @@ -1266,7 +1266,8 @@ the global convergence of the iterative system. Finally, it would be interesting \section*{Acknowledgment} This work has been partially supported by the Labex ACTION project (contract -``ANR-11-LABX-01-01''). Computations have been performed on the Grid'5000 platform. As a PhD student, +``ANR-11-LABX-01-01''). Computations have been performed on the Grid'5000 +platform and on the mésocentre of Franche-Comté. As a PhD student, Mr. Ahmed Fanfakh, would like to thank the University of Babylon (Iraq) for supporting his work.