\begin{equation}
{\Large \forall a_{i} \in C, i\in N; p(x)=\sum^{n}_{i=0} a_{i}.x^{i}}
\end{equation}
-For our tests, a CPU Intel(R) Xeon(R) CPU E5620@2.40GHz and a GPU K40 (with 6 Go of ram) are used.
-%SIDER : Une meilleure présentation de l'architecture est à faire ici.
-For our test, a cluster of computing with 72 nodes, 1116 cores, 4 cards GPU tesla Kepler K40 are used,
-In order to evaluate both the M-GPU and Multi-GPU approaches, we performed a set of experiments on a single GPU and multiple GPUs using OpenMP or MPI by EA algorithm, for both sparse and full polynomials of different sizes.
-All experimental results obtained are made in double precision data whereas the convergence threshold of the EA method is set to $10^{-7}$.
-%Since we were more interested in the comparison of the
-%performance behaviors of Ehrlich-Aberth and Durand-Kerner methods on
-%CPUs versus on GPUs.
-The initialization values of the vector solution
-of the methods are given by Guggenheimer method~\cite{Gugg86} %Section~\ref{sec:vec_initialization}.
-
-\subsection{Evaluating the M-GPU (CUDA-OpenMP) approach}
-
-We report here the results of the set of experiments with the M-GPU approach for full and sparse polynomials of different degrees, and we compare it with a Single GPU execution.
-\subsubsection{Execution time of the EA method for solving sparse polynomials on multiple GPUs using the M-GPU approach}
+
+For our test, 4 cards GPU tesla Kepler K40 are used. In order to
+evaluate both the GPU and Multi-GPU approaches, we performed a set of
+experiments on a single GPU and multiple GPUs using OpenMP or MPI with
+the EA algorithm, for both sparse and full polynomials of different
+sizes. All experimental results obtained are perfomed with double
+precision float data and the convergence threshold of the EA method is
+set to $10^{-7}$. The initialization values of the vector solution of
+the methods are given by Guggenheimer method~\cite{Gugg86}.
+
+
+\subsection{Evaluation of the CUDA-OpenMP approach}
+
+Here we report some experiments witt full and sparse polynomials of
+different degrees with multiple GPUs.
+\subsubsection{Execution times of the EA method to solve sparse polynomials on multiple GPUs}
-In this experiments we report the execution time of the EA algorithm, on single GPU and Multi-GPU with (2,3,4) GPUs, for different sparse polynomial degrees ranging from 100,000 to 1,400,000.
+In this experiments we report the execution time of the EA algorithm, on single GPU and multi-GPUs with (2,3,4) GPUs, for different sparse polynomial degrees ranging from 100,000 to 1,400,000.
\begin{figure}[htbp]
\centering
\includegraphics[angle=-90,width=0.5\textwidth]{Sparse_omp}
-\caption{Execution time in seconds of the Ehrlich-Aberth method for solving sparse polynomials on multiple GPUs using the M-GPU approach}
+\caption{Execution time in seconds of the Ehrlich-Aberth method to solve sparse polynomials on multiple GPUs.}
\label{fig:01}
\end{figure}
-This figure~\ref{fig:01} shows that the (CUDA-OpenMP) M-GPU approach reduces the execution time by a factor up to 100 w.r.t the single GPU approach and a by a factor of 1000 for polynomials exceeding degree 1,000,000. It shows the advantage to use the OpenMP parallel paradigm to gather the capabilities of several GPUs and solve polynomials of very high degrees.
+Figure~\ref{fig:01} shows that the CUDA-OpenMP approach scales well
+with multiple GPUs. This version allows us to solve sparse polynomials
+of very high degrees.
-\subsubsection{Execution time in seconds of the Ehrlich-Aberth method for solving full polynomials on multiple GPUs using the M-GPU approach}
+\subsubsection{Execution times of the EA method to solve full polynomials on multiple GPUs}
-The experiments shows the execution time of the EA algorithm, on a single GPU and on multiple GPUs using the CUDA OpenMP approach for full polynomials of degrees ranging from 100,000 to 1,400,000.
+These experiments show the execution times of the EA algorithm, on a single GPU and on multiple GPUs using the CUDA OpenMP approach for full polynomials of degrees ranging from 100,000 to 1,400,000.
\begin{figure}[htbp]
\centering
\includegraphics[angle=-90,width=0.5\textwidth]{Full_omp}
-\caption{Execution time in seconds of the Ehrlich-Aberth method for solving full polynomials on multiple GPUs using the M-GPU appraoch}
-\label{fig:03}
+\caption{Execution time in seconds of the Ehrlich-Aberth method to solve full polynomials on multiple GPUs}
+\label{fig:02}
\end{figure}
-Results with full polynomials show very important savings in execution time. For a polynomial of degree 1,4 million, the CUDA-OpenMP approach with 4 GPUs solves it 4 times as fast as single GPU, thus achieving a quasi-linear speedup.
+In Figure~\ref{fig:02}, we can observe that with full polynomials the EA version with
+CUDA-OpenMP scales also well. Using 4 GPUs allows us to achieve a
+quasi-linear speedup.
-\subsection{Evaluating the Multi-GPU (CUDA-MPI) approach}
-In this part we perform a set of experiments to compare the Multi-GPU (CUDA MPI) approach with a single GPU, for solving full and sparse polynomials of degrees ranging from 100,000 to 1,400,000.
+\subsection{Evaluation of the CUDA-MPI approach}
+In this part we perform some experiments to evaluate the CUDA-MPI
+approach to solve full and sparse polynomials of degrees ranging from
+100,000 to 1,400,000.
-\subsubsection{Execution time of the Ehrlich-Aberth method for solving sparse polynomials on multiple GPUs using the Multi-GPU approach}
+\subsubsection{Execution times of the EA method to solve sparse polynomials on multiple GPUs}
\begin{figure}[htbp]
\centering
\includegraphics[angle=-90,width=0.5\textwidth]{Sparse_mpi}
-\caption{Execution time in seconds of the Ehrlich-Aberth method for solving sparse polynomials on multiple GPUs using the Multi-GPU approach}
-\label{fig:02}
+\caption{Execution time in seconds of the Ehrlich-Aberth method to solve sparse polynomials on multiple GPUs.}
+\label{fig:03}
\end{figure}
-~\\
-Figure~\ref{fig:02} shows execution time of EA algorithm, for a single GPU, and multiple GPUs (2, 3, 4) on respectively 2, 3 and four MPI nodes. We can clearly see that the curve for a single GPU is above the other curves, which shows overtime in execution time compared to the Multi-GPU approach. We can see also that the CUDA-MPI approach reduces the execution time by a factor of 10 for polynomials of degree more than 1,000,000. For example, at degree 1,000,000, the execution time with a single GPU amounted to 10 thousand seconds, while with 4 GPUs, it is lowered to about just one thousand seconds which makes it for a tenfold speedup.
-%%SIDER : Je n'ai pas reformuler car je n'ai pas compris la phrase, merci de l'ecrire ici en fran\cais.
-\\cette figure montre 4 courbes de temps d'exécution pour l'algorithme EA, une courbe avec un seul GPU, 3 courbes pour multiple GPUs(2, 3, 4), on peut constaté clairement que la courbe à un seul GPU est au-dessus des autres courbes, vue sa consommation en temps d'exècution. On peut voir aussi qu'avec l'approche Multi-GPU (CUDA-MPI) reduit le temps d'exècution jusqu'à l'echelle 100 pour le polynômes qui dépasse 1,000,000 tandis que Single GPU est de l'echelle 1000.
+Figure~\ref{fig:03} shows the execution times of te EA algorithm,
+for a single GPU, and multiple GPUs (2, 3, 4) with the CUDA-MPI approach.
\subsubsection{Execution time of the Ehrlich-Aberth method for solving full polynomials on multiple GPUs using the Multi-GPU appraoch}
\label{fig:04}
\end{figure}
-
- Figure \ref{fig:04} shows execution time for a single GPU, and multiple GPUs (2, 3, 4) on respectively 2, 3 and four MPI nodes. With the CUDA-MPI approach, we notice that the three curves are distinct from each other, more we use GPUs more the execution time decreases. On the other hand the curve for a single GPU is well above the other curves.
-
-This is due to the use of MPI parallel paradigm that divides the problem computations and assigns portions to each GPU. But unlike the single GPU which carries all the computations on a single GPU, data communications are introduced, consequently engendering more execution time. But experiments show that execution time is still highly reduced.
-
-
-
+In Figure~\ref{fig:04}, we can also observe that the CUDA-MPI approach
+is also efficient to solve full polynimails on multiple GPUs.
\subsection{Comparing the CUDA-OpenMP approach and the CUDA-MPI approach}
In the previuos section we saw that both approches are very effective in reducing execution time for sparse as well as full polynomials. At this stage, the interesting question is which approach is better. In the fellowing, we present appropriate experiments comparing the two Multi-GPU approaches to answer the question.