X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/kahina_paper2.git/blobdiff_plain/c126c1597363cb544b0a1df94f4b265068e835ea..be5857ec62afa86f798a638a499155c02bdd61e1:/paper.tex diff --git a/paper.tex b/paper.tex index ba790c1..b0e76b5 100644 --- a/paper.tex +++ b/paper.tex @@ -427,12 +427,14 @@ Polynomials are mathematical algebraic structures that play an important role in \begin{equation} p(x)=\sum_{i=0}^{n}{a_ix^i}. \end{equation} -\LZK{Dans ce cas le polynôme a $n+1$ coefficients !} +\LZK{Dans ce cas le polynôme a $n+1$ coefficients et non pas $n$!} The root-finding problem consists in finding the $n$ different values of the unknown variable $x$ for which $p(x)=0$. Such values are called zeros or roots of $p$. If zeros are $\{\alpha_{i}\}_{1\leq i\leq n}$, then $p(x)$ can be written as : \begin{equation} p(x)=a_n\prod_{i=1}^n(x-\alpha_i), a_0 a_n\neq 0. \end{equation} +\LZK{C'est $a_n\neq 0$ (polynôme de degré $n$) et non pas $a_0 a_n\neq 0$, non?} +\LZK{Est-ce $\alpha_i$ sont les $z_i$ définis dans la suite du papier?} The problem of finding the roots of polynomials can be encountered in numerous applications. \LZK{A mon avis on peut supprimer cette phrase} Most of the numerical methods that deal with the polynomial root-finding problem are simultaneous ones, \textit{i.e.} the iterative methods to find simultaneous approximations of the $n$ polynomial zeros. These methods start from the initial approximations of all $n$ polynomial roots and give a sequence of approximations that converge to the roots of the polynomial. The first method of this group is Durand-Kerner method: @@ -460,17 +462,20 @@ Very few work had been performed since then until the appearing of the Compute U Finding polynomial roots rapidly and accurately is the main objective of our work. In this paper we propose the parallelization of Ehrlich-Aberth method using two parallel programming paradigms OpenMP and MPI on multi-GPU platforms. We consider two architectures: shared memory and distributed memory computers. The first parallel algorithm is implemented on shared memory computers by using OpenMP API. It is based on threads created from the same system process, such that each thread is attached to one GPU. In this case the communications between GPUs are done by OpenMP threads through shared memory. The second parallel algorithm uses the MPI API, such that each GPU is attached and managed by a MPI process. The GPUs exchange their data by message-passing communications. This latter approach is more used on distributed memory clusters to solve very complex problems that are too large for traditional supercomputers, which are very expensive to build and run. \LZK{Cette partie est réécrite. \\ Sinon qu'est ce qui a été fait pour l'accuracy dans ce papier (Finding polynomial roots rapidly and accurately is the main objective of our work.)?} + +\LZK{Les contributions ne sont pas définies !!} -{\color{red}{This paper is organized as follows. In Section~\ref{sec2} we recall the Ehrlich-Aberth method. In section~\ref{sec3} we present EA algorithm on single GPU. In section~\ref{sec4} we propose the EA algorithm implementation on Multi-GPU for (OpenMP-CUDA) approach and (MPI-CUDA) approach. In sectioné\ref{sec5} we present our experiments and discus it. Finally, Section~\ref{sec6} concludes this paper and gives some hints for future research directions in this topic.}}\LZK{A revoir toute cette organization} +%This paper is organized as follows. In Section~\ref{sec2} we recall the Ehrlich-Aberth method. In section~\ref{sec3} we present EA algorithm on single GPU. In section~\ref{sec4} we propose the EA algorithm implementation on Multi-GPU for (OpenMP-CUDA) approach and (MPI-CUDA) approach. In sectioné\ref{sec5} we present our experiments and discus it. Finally, Section~\ref{sec6} concludes this paper and gives some hints for future research directions in this topic.} -The paper is organized as follows. In Section~\ref{sec2} we present three parallel programming models OpenMP, MPI and CUDA. +The paper is organized as follows. In Section~\ref{sec2} we present three different parallel programming models OpenMP, MPI and CUDA. +\LZK{A revoir toute cette organization} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Parallel programming models} \label{sec2} -In this section we present the parallel programming models OpenMP, MPI and CUDA. +In this section we present three different parallel programming models: OpenMP, MPI and CUDA. \subsection{OpenMP} %Open Multi-Processing (OpenMP) is a shared memory architecture API that provides multi thread capacity~\cite{openmp13}. OpenMP is a portable approach for parallel programming on shared memory systems based on compiler directives, that can be included in order to parallelize a loop. In this way, a set of loops can be distributed along the different threads that will access to different data allocated in local shared memory. One of the advantages of OpenMP is its global view of application memory address space that allows relatively fast development of parallel applications with easier maintenance. However, it is often difficult to get high rates of performance in large scale applications. Although usage of OpenMP threads and managed data explicitly done with MPI can be considered, this approcache undermines the advantages of OpenMP. @@ -489,7 +494,10 @@ OpenMP (Open Multi-processing) is an application programming interface for share \LZK{Cette partie est réécrite. A relire et à améliorer si possible.} \subsection{MPI} -The MPI (Message Passing Interface) library allows to create computer programs that run on a distributed memory architecture. The various processes have their own environment of execution and execute their code in a asynchronous way, according to the MIMD model (Multiple Instruction streams, Multiple Data streams); they communicate and synchronize by exchanging messages~\cite{Peter96}. MPI messages are explicitly sent, while the exchanges are implicit within the framework of a multi-thread programming environment like OpenMP or Pthreads. +%The MPI (Message Passing Interface) library allows to create computer programs that run on a distributed memory architecture. The various processes have their own environment of execution and execute their code in a asynchronous way, according to the MIMD model (Multiple Instruction streams, Multiple Data streams); they communicate and synchronize by exchanging messages~\cite{Peter96}. MPI messages are explicitly sent, while the exchanges are implicit within the framework of a multi-thread programming environment like OpenMP or Pthreads. + +MPI (Message Passing Interface) is a portable message passing style of the parallel programming designed especially for the distributed memory architectures~\cite{Peter96}. In most MPI implementations, a computation contains a fixed set of processes created at the initialization of the program in such way one process is created per processor. The processes synchronize their computations and communicate by sending/receiving messages to/from other processes. In this case, the data are explicitly exchanged by message passing while the data exchanges are implicit in a multithread programming model like OpenMP and Pthreads. However in the MPI programming model, the processes may either execute different programs referred to as multiple program multiple data (MPMD) or every process executes the same program (SPMD). The MPI approach is one of most used HPC programming model to solve large scale and complex applications. +\LZK{Cette partie est réécrite. A relire et à améliorer si possible.} \subsection{CUDA} CUDA (an acronym for Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA~\cite{CUDA10}. The