2 \chapter{Conclusion and Perspectives}
6 In this dissertation, we have proposed a method to optimize both the energy consumption and
7 the performance at the same time of synchronous and asynchronous applications with iterations running over
8 cluster and grid platforms. Dynamic voltage and frequency scaling (DVFS) technique was used to
9 lower the frequency of the processor to reduce its energy consumption while computing. Reducing
10 the frequency of the processor decreases its computing power which might increase the execution time.
11 In this work, different energy consumption and performance models were developed to predict the energy consumption and performance of parallel applications with iterations. Depending on these models, an objective function was defined as the best trade-off relation between the energy consumption and the performance of the parallel application. This objective function was used in the frequency selecting algorithms which optimize at the same time both the energy consumption and the performance of the parallel application with iterations.
13 The first part of this dissertation, chapter \ref{ch1}, presented different types of parallelism levels which have been classified according to the used hardware and software techniques. Different parallel architectures have also been
14 described and classified according to the used memory model: shared and distributed memory. The two types of parallel applications with iterations: synchronous and asynchronous ones have been discussed and compared to each others. Synchronous distributed applications are well adapted to local homogeneous clusters with a high speed network link, while the asynchronous ones are more suited to grids. At the end of this chapter, an energy consumption model proposed in the literature to estimate the energy consumption of parallel applications was explained. This model does not take into account the communication time of the parallel application being executed. Also, it is not well adapted to a heterogeneous architecture where each type of processor might have a different power consumption value.
16 In the second part of the dissertation, a new energy and performance models for
17 synchronous and asynchronous message passing applications with iterations running over clusters and grid, were presented. To simultaneously optimize the energy and performance of these applications, the trade-off relation has been defined as the maximum distance between the predicted energy and performance curves. This objective function is used by the frequency selecting algorithm to select the available frequency scaling factors that give the optimal energy consumption to performance trade-off. We have proposed four different frequency scaling algorithms, each one of them is adapted to a different execution context, such as synchronous or asynchronous communications, homogeneous or heterogeneous nodes, and local or distributed architectures. They used the computation and communication times measured at the first iteration of the parallel application with iterations to predict the energy consumption and the performance of the parallel application at every available frequency. All these algorithms work online and introduce a very small runtime overhead. They also do not require any profiling or training.
19 In chapter \ref{ch2}, a new online scaling factor selection method that optimizes simultaneously the energy and performance of a distributed synchronous application with iterations running on a homogeneous cluster has been proposed. This algorithm was applied to the NAS benchmarks of the class C and executed over the SimGrid simulator. Firstly, Rauber and Rünger’s energy model was used in the proposed algorithm to select the best frequency gear. The proposed algorithm was compared to the Rauber and Rünger's optimization method. The results of the comparison showed that the proposed algorithm gives better energy to performance trade-off ratios compared to their methods while using the same energy model. Secondly, a new energy consumption model was developed to take into consideration both the computation and communication times and their relation with the frequency scaling factor. The new energy model was used by the proposed algorithm. The new simulation results demonstrated that the new model is more accurate and realistic than the previous one.
21 In chapter \ref{ch3}, two new online frequency scaling factors selecting algorithms adapted for synchronous application with iterations running over a heterogeneous cluster and a grid were presented. Each algorithm uses new energy and performance models which take into account the characteristics of the parallel platform being used. Firstly, the proposed
22 scaling factors selection algorithm for a heterogeneous local cluster was applied to the NAS parallel benchmarks and evaluated over SimGrid. The results of the experiments showed that the algorithm on average reduces by 29.8\% the energy consumption of the class C of the NAS benchmarks executed over 8 nodes while limiting the degradation of the performance to 3.8\%.
23 Different frequency scaling factors were selected by the algorithm according to the ratio between the computation and communication times when different number of nodes were used, and when different static and dynamic CPU powers have been used. Secondly, the proposed scaling factors selection algorithm for a grid was applied to the NAS parallel benchmarks and the class D of these benchmarks was executed over the Grid5000 testbed platform. The experiments conducted over 16 nodes distributed over three clusters, showed that the algorithm on average reduces by 30\% the energy consumption for all the NAS benchmarks while on average only degrading by 3.2\% their performance.
24 The algorithm was also evaluated in different scenarios that vary in the distribution of the computing nodes between different clusters’ sites or use multi-cores per node architectures or consume different static power values. The algorithm selects different vectors of frequencies according to the computations and communication times ratios, and the values of the static and measured dynamic powers of the CPUs.
25 Both of the proposed algorithms were compared to another method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithms outperform the latter by selecting vectors of frequencies that give a better trade-off between energy consumption reduction and performance.
27 In chapter \ref{ch4}, a new online frequency selection algorithm were adapted for asynchronous iterative applications running over a grid was presented. The algorithm uses new energy and performance models to predict the energy consumption and the execution time of asynchronous or hybrid message passing
28 iterative applications running over a grid. The proposed algorithm was evaluated twice
29 over the SimGrid simulator and Grid’5000 testbed while running a multi-splitting (MS)
30 application that solves 3D problems. The experiments were executed over different grid
31 scenarios composed of different numbers of clusters and different numbers of nodes
32 per cluster. The proposed algorithm was applied synchronously and asynchronously on
33 synchronous and asynchronous versions of the MS iterative application. Both the simulations
34 and real experiments results showed that applying synchronously the frequency selecting algorithm on an
35 asynchronous MS application gives the best tradeoff between energy consumption reduction
36 and performance when compared to the other scenarios. In the simulation results, this scenario
37 reduces on average the energy consumption by 22\% and decreases the execution time of
38 the application by 5.72\%. This version optimizes both of the dynamic energy
39 consumption by applying synchronously the HSA algorithm at the end of the first iteration of the iterative application and the static energy consumption by using asynchronous communications between nodes from
40 different clusters which are overlapped by computations. The proposed algorithm was also
41 evaluated over three power scenarios which selects different vectors of frequencies proportionally to the dynamic and static powers values. More energy reduction was achieved when the ratio of the
42 dynamic power was increased and vice versa. Whereas, the performance degradation percentages were decreased when the static power ratio was increased.
43 In the Grid’5000 experiments, this scenario reduces the energy consumption by 26.93\% and
44 decreases the execution time of the application by 21.48\%. The experiments executed over Grid'5000 give better results than those simulated with SimGrid because the nodes used in Grid'5000 were more heterogeneous than the ones simulated by SimGrid.
45 In both of the Simulations and real experiments, the proposed algorithm was compared to a method that uses the well known energy and delay product as an objective function. The comparison results showed that the proposed algorithm outperforms the latter by selecting a vector of frequencies that gives
46 a better trade-off between the energy consumption reduction and the performance.
49 \section{Perspectives}
50 In the near future, we will adapt the proposed algorithms to take into consideration the variability between some iterations. For example, each proposed algorithm can be executed twice: after the first iteration the frequencies are scaled down according to the execution times measured in the first iteration, then after a fixed number of iterations, the frequencies are adjusted according to the execution times measured during the fixed number of iterations. If the computing power of the system is constantly changing, it would be interesting to implement a mechanism that detects this change and adjusts the frequencies according to the variability of the system.
51 Also, it would be interesting to evaluate the scalability of the proposed algorithms by running them on large platforms composed of many thousands of cores. The scalability of the algorithms can be improved by distributing them in a hierarchical manner where a leader is chosen for each cluster or a group of nodes to compute their scaled frequencies and by using asynchronous messages to exchange the the data measured at the first iteration.
53 The proposed algorithms should be applied to other message passing methods with iterations in order to see how they adapt to the characteristics of these methods.
54 Also, it would be interesting to explore if a relation can be found between the numbers of asynchronous iterations required to global convergence and the applied frequencies to the
55 nodes. The number of iterations required by each node for global convergence
56 is not known in advance and the change in CPUs frequencies changes the
57 number of iterations required by each node for global convergence.
59 Furthermore, the proposed algorithms for heterogeneous platforms, in chapters \ref{ch3} and \ref{ch4}, should be applied to heterogeneous platforms composed of CPUs and GPUs. Indeed, most of the works in the
60 green computing field showed that these mixed platforms of GPUs and CPUs are more energy efficient than those composed of only CPUS.
62 Finally, it would be interesting to verify the accuracy of the
63 results returned by the energy models by comparing them to the values given by instruments
64 that measure the energy consumptions of CPUs during the execution time, as in \cite{ref106}.