with each new generation of microprocessors the users of the sequential applications have believed that these applications run faster over them.
Nowadays, this idea is no longer valid because the recent release of the microprocessors have many computing units embedded in one chip and these programs are only run over one computing unit sequentially.
Consequently, traditional applications have not improved their performance a lot over the new architectures, whereas the new applications run faster over them in a parallel. The parallel application is executed over all available computing units at the same time to improve its performance. Furthermore, the concurrency revolution has been referred to the drastically improvement in the performance of new applications side by side to new parallel architectures \cite{ref51}. Therefore, parallel applications and parallel architectures are closely tied together. It is hard to think about any of parallel applications without thinking of the parallel hardware executed them.
-For example, the energy consumption of the parallel system mainly depends on both of the parallel application and the parallel architecture executes this application. Indeed, an energy consumption model or any measurement system depends on many specifications, some of them are concerning parallel hardware features such as the frequency of the processor, the power consumption of the processor and the communication model. The others are concerning the parallel application such as the computation and communication times of the application.
+For example, the energy consumption of a parallel system mainly depends on both of the parallel application and the parallel architecture executes this application. Indeed, an energy consumption model or any measurement system depends on many specifications, some of them are concerning parallel hardware features such as the frequency of the processor, the power consumption of the processor and the communication model. The others are concerning the parallel application such as the computation and communication times of the application.
In this work, the iterative parallel applications are interested and running them over different parallel architectures to optimize their energy consumptions is the main goal.
As a result, this chapter is aimed to give a brief overview of parallel hardware architectures, parallel iterative applications and an energy model from the other authors used to measure the energy consumption of these applications.
The reminder of this chapter is organized as follows: section \ref{ch1:2} is devoted
-to describing types of parallelism and types of parallel platforms. It also gives some information about parallel programming models. Section \ref{ch1:3} explains both of a synchronous and asynchronous parallel iterative methods and comparing them. Section \ref{ch1:4}, presents a well accepted energy model from the state of the art that can be used to measure the energy consumption of parallel iterative applications when changing the frequency of the processor. Finally, section \ref{ch1:5} summaries this chapter.
+to describe types of parallelism and types of parallel platforms. It also gives some information about parallel programming models. Section \ref{ch1:3} explains both of a synchronous and asynchronous parallel iterative methods and comparing them. Section \ref{ch1:4}, presents a well accepted energy model from the state of the art that can be used to measure the energy consumption of parallel iterative applications when changing the frequency of the processor. Finally, section \ref{ch1:5} summaries this chapter.
\section{Parallel Computing Architectures}
The process of the simultaneous execution of calculations is called the parallel computing.
Its main principle refer to the ability of dividing the large problem into smaller sub-problems that can be solved at the same time \cite{ref2}.
Mainly, solving sub-problems of the main problem in a parallel computing are carried out on multiple parallel processors.
-Indeed, the parallel processors architecture is a computer system composed of many processing elements connected via network model in addition to software tools required to make the processing units work together \cite{ref1}.
+Indeed, the parallel processors architecture is a computer system composed of many processing elements connected via network model in addition to software tools required to make the processing units work together \cite{ref1}.
Consequently, the parallel computing architecture consist of a software and hardware resources.
Hardware resources are processing units and the memory model in addition to the network system connecting them. Software resources include the specific operating system, the programming language and the compiler, or the runtime libraries. Furthermore, the parallel computing can have different levels of parallelism that can be performed in a software or a hardware level. There are five types of parallelism as follows:
\begin{itemize}
-\item \textbf{Bit-level parallelism (BLP)}: The appearance of the very-large-scale integration (VLSI) in 1970s has been considered the first step towards the parallel computing. It is used to increase the number of bits in the word size being processed by a processor as in the figure~\ref{fig:ch1:1}. For many successive years, the number of bits have increased starting from 4-bit microprocessors reaching until 64 bit microprocessors. For example, the recent x86-64 architecture becomes the most familiar architecture nowadays. Therefore, the biggest word size gives more parallelism level and thus less instructions to be executed by a processor at the same time.
+\item \textbf{Bit-level parallelism (BLP)}: The appearance of the very-large-scale integration (VLSI) in 1970s has been considered the first step towards the parallel computing. It is used to increase the number of bits in the word size being processed by a processor as in the figure~\ref{fig:ch1:1}. For many successive years, the number of bits have increased starting from 4 bit microprocessors reaching until 64 bit microprocessors. For example, the recent x86-64 architecture becomes the most familiar architecture nowadays. Therefore, the biggest word size gives more parallelism level and thus less instructions to be executed by a processor at the same time.
\begin{figure}[h!]
\centering
\item \textbf{Thread-level parallelism (TLP)}: It is also known as a task-level parallelism.
According to the Moore’s law \cite{ref9}, the processor can have number of transistors by a double
each two years to increase the frequency of the processor and thus its performance. Besides, cache and main memories sizes are must increased together to satisfy this increased.
-But, this leads to some limits come from two main reasons, the first one is when the cache size is drastically increased leading to a larger access time. The second is related to the big increase in the number of the transistors per CPU that can be increased significantly the heat dissipation. As a result, programmers subdivided their programs into multiple tasks which can be executed in parallel over distributed processors or shared multi-cores processors to improve the performance of the program, see figure~\ref{fig:ch1:4}. Each processor can has a multiple or an individual thread dedicated for each task. A thread can be defined as a part of the parallel program which shares processor resources with other threads.
+But, this leads to some limits come from two main reasons, the first one is when the cache size is drastically increased leading to a larger access time. The second is related to the big increase in the number of the transistors per CPU that can be increased significantly the heat dissipation. As a result, programmers subdivided their programs into multiple tasks which can be executed in parallel over distributed processors or shared multi-cores processors to improve the performance of the program, see figure~\ref{fig:ch1:4}. Each processor can has a multiple or an individual thread dedicated for each task. A thread can be defined as a part of the parallel program which shares processor resources with other threads.
\begin{figure}[h!]
\centering
\label{fig:ch1:4}
\end{figure}
-Therefore, we can consider the execution time of a sequential program composed of
-$N$ tasks as sum of the execution times of all tasks as follows:
+Therefore, we can consider the execution time of a sequential program composed of $N$ tasks as sum of the execution times of all tasks as follows:
\begin{equation}
\label{ch1:eq1}