Almost of the software applications are traditionally programmed as a sequential programs according to the Von Neumann report in 1993 \cite{ref50}. The structure of the
program code is understandable by the human brain as a series of instructions that execute one after the other. From many years until a short time, the users of the sequential applications are moving their thinking towards that these applications must run faster with each new generation of microprocessors. This idea is no longer valid nowadays, because the recent release of the microprocessors have many computing units embedded in one chip and these programs are only run over one computing unit sequentially.
Consequently, the traditional applications not have improved their performance a lot over the new architectures, whereas the new applications run faster over them in a parallel. The parallel application is executed over all the available computing units at the same time to improve its performance. Furthermore, the concurrency revolution has been referred to the drastically improvement in the performance of the new applications side by side to the new parallel architectures \cite{ref51}. Therefore, parallel applications and parallel architectures are closely tied together. It is hard to think about any of a parallel applications without thinking of the parallel hardware that executing them.
+For example, the energy consumption of the parallel system mainly depends on both of the parallel application and the parallel architecture executing this application. Indeed, the energy consumption model or any measurement system depends on many specifications, some of them are concerting the parallel platform such as the frequency of the processor, power consumption of the processor and communication model. The others are concerting the parallel application such as the computation and communication times of the application.
-In this work, the iterative parallel applications, which is the most popular type of the parallel applications, are interested and running them over different parallel architectures to optimize their energy consumptions is the goal.
+In this work, the iterative parallel applications, which is the most popular type of the parallel applications, are interested and running them over different parallel architectures to optimize their energy consumptions is the main goal.
As a result, this chapter is aimed to give a brief overview for a parallel hardware architectures, parallel iterative applications and the energy model from the other authors used to measure the energy consumption of these applications.
The reminder of this chapter is organized as follows: section \ref{ch1:2} is devoted
to describe the types of parallelism and the types of the parallel platforms. It is also gives some information about the parallel programming models. Section \ref{ch1:3} explains both the synchronous and asynchronous parallel iterative methods and comparing them. Section \ref{ch1:4}, presents the well accepted energy model from the state of the art that can be used to measure the energy consumption of the parallel iterative applications when changing the frequency of the processor. Finally, section \ref{ch1:5} summaries this chapter.
The process of the simultaneous application of the calculations is called the parallel computing.
It has main principle refer to the ability of dividing the large problem into smaller sub-problems that can be solved at the same time \cite{ref2}.
Mainly, solving the sub-problems of the main problem in a parallel computing are carried out on multiple parallel processors.
-Indeed, the parallel processors architecture is a computer system composed from many processing elements connected via network model in addition to the software tools required to make the processing units work together \cite{ref1}.
+Indeed, the parallel processors architecture is a computer system composed of many processing elements connected via network model in addition to the software tools required to make the processing units work together \cite{ref1}.
Consequently, parallel computing architecture consist of software and hardware resources.
The hardware resources are the processing units and the memory model in addition to the network system connecting them. The software resources include the specific operating system, the programming language and the compiler, or the runtime libraries. Furthermore, parallel computing can have different levels of parallelism, which can perform in software or hardware. There are five types of parallelism as follows:
\begin{itemize}
\subsection{Parallel programming Models}
-\label{ch1:2:2}.
+\label{ch1:2:2}
There are many parallel programming languages and libraries have been developed
to explore the computing power of the parallel architectures. In this section,
-the parallel computing programming languages are divided into two main types,
-which is the shared and the distributed models. Moreover, these two types are divided into two subcategories according to the support level to the number of computing units composing them.
+the parallel programming languages are divided into two main types,
+which is the shared and the distributed programming models. Moreover, these two types are divided into two subcategories according to the support level for the number of computing units composing them.
Figure \ref{fig:ch1:14} presents this classification hierarchy of the parallel programming
-paradigm. It is also show three parallel languages examples for each sub-category.
+models. It is also show three parallel languages examples for each subcategory.
\begin{figure}[h!]
\begin{itemize}
\item \textbf{Local cluster programming models}
\begin{itemize}
- \item \textbf{MPI} \cite{ref23} is the Message Passing Interface, is a standardization
+ \item \textbf{MPI} \cite{ref23} is the Message Passing Interface and it considers a
+ standardization
dedicated for message passing in distributed memory environment.
The first version of MPI designated by a group of researchers in
1991. It is a library, not a language and its subroutines
Its library functions are not only for peer to peer operations throw
send and receive messages, but it allowed many others collective
operations such as gathering and reduction operations. MPI user feel
- free form the network topology, synchronization, and communication
+ free form the network topology, synchronization and communication
functionality between group of processes. Furthermore, it has
asynchronous point to point operations, which make the computations
to overlap with communications. While MPI is not devoted to a grid,
The difference between OpenMP and TBB, is the latter uses a task-based scheduling
mechanism. Furthermore, TBB is more popular with C++ programming language than
others languages. It is designed to work with any compiler environments, and thus
- it easily ported to a new platform. Consequently, TBB has been ported to a
+ it is easily ported to a new platform. Consequently, TBB has been ported to a
different types of operating systems and processors. While, it has limited
support to vector processing architecture and then it connected with OpenMP
and Cilk to support this platform.
of core. According to this massively cores parallelism, the NVIDIA in 2007 developed
a parallel programming language called CUDA , which is for Compute Unified Device
Architecture. A CUDA program has two parts, the first one is called a host which is a
- set of threads that executed sequentially over the CPU. The second part is called the
+ set of threads that execute sequentially over the CPU. The second part is called the
kernels, which are a set of a threads that can be executed in a parallel over the GPU.
\item \textbf{OpenCL}\cite{ref38} is for Open Computing Language. It is a parallel
\item \textbf{HLSL} \cite{ref39} is for High Level Shading Language, is the shader
programming language for Direct3D, which is a part of
Microsoft’s DirectX API. It supports the shader construction with
- C-like syntax, types, expressions, statements, and functions. It
+ C-like syntax, types, expressions, statements, and functions and it
provides a graphical pipeline parallelism.
The last version of HLSL is v5.0 for DirectX 11, which adds a new general-purpose GPU
functions like CUDA. Recently, the new OpenCL version starts to replace CUDA
\section{Conclusion}
\label{ch1:5}
-In this chapter, we have presented in general different types of parallelism levels that can be implemented in a software and hardware techniques. Furthermore, the types of the parallel architectures are demonstrated and classified according to how the computing units are connected to a memory model.
-The two parallel systems are described, which are the shared and distributed platforms. Depending on these two types, we have categorized the parallel programming models. The parallel iterative methods are explained and their two types, the synchronous and asynchronous iterative methods, are described. The synchronous iterative methods are well implemented over local homogeneous cluster with a high speed network link, while the asynchronous iterative methods are more conventional to implement over the distributed heterogeneous clusters.
-Consequently, running these two types of the parallel iterative methods over distributed platforms are interested in this work. The energy consumption model for measuring the energy consumption of the parallel applications from the related literature is described. This model cannot be used for all types of parallel architectures. It is assumed to measure the dynamic power during both communication and computation times, while the processor involved remains idle during the communication times and only consumes the static power. Moreover, it is not well adapted to the heterogeneous architectures when there are different
-types of the processors, which are consumed different dynamic and static powers.
+In this chapter, we have presented different types of parallelism levels that can be implemented in software and hardware techniques. Furthermore, the types of the parallel architectures are demonstrated and classified according to how the computing units are connected to a memory model.
+Both of the shared and distributed platforms are demonstrated and depending on them we have categorized the parallel programming models.
+The two types of parallel iterative methods, the synchronous and asynchronous iterative methods, are described. The synchronous iterative methods are well implemented over local homogeneous cluster with a high speed network link, while the asynchronous iterative methods are more conventional to implement over the distributed heterogeneous clusters.
+The energy optimization of running these two types of the parallel iterative methods over distributed platforms is the objective of this work. Consequently, the energy consumption model used for measuring the energy consumption of the parallel applications from the related literature is described. This model cannot be used for all types of parallel architectures. Indeed, it assumes measuring the dynamic power during both communication and computation times, while the processor involved remains idle during the communication times and only consumes the static power. Moreover, it is not well adapted to the heterogeneous architectures when there are different types of the processors, which are consumed different dynamic and static powers.
-However, in the coming chapters of this thesis a new energy consumption models are developed, use for modeling and measuring the energies consumed by a parallel iterative methods running on both homogeneous and heterogeneous architectures.
+However, in the next chapters of this thesis a new energy consumption models are developed, which they use for modeling and measuring the energy consumptions by a parallel iterative methods running on both homogeneous and heterogeneous architectures.