X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/blobdiff_plain/5303b288b71ba1b55f6f784f6ab4aba3db64cf22..0aa640af7dcb1e33350b4ade113575ce81bb0d81:/hpcc.tex diff --git a/hpcc.tex b/hpcc.tex index 1c94f84..2c7d65d 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -179,7 +179,7 @@ convergence is generally greater than for the two former classes. But, and as de algorithms can significantly reduce overall execution times by suppressing idle times due to synchronizations especially in a grid computing context. -\begin{figure}[htbp] +\begin{figure}[!t] \centering \includegraphics[width=8cm]{AIAC.pdf} \caption{The Asynchronous Iterations - Asynchronous Communications model } @@ -269,7 +269,7 @@ Y_l = B_l - \displaystyle\sum_{\substack{m=1\\ m\neq l}}^{L}A_{lm}X_m \end{equation} is solved independently by a cluster and communications are required to update the right-hand side sub-vector $Y_l$, such that the sub-vectors $X_m$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel algorithm of GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. -\begin{figure} +\begin{figure}[!t] %%% IEEE instructions forbid to use an algorithm environment here, use figure %%% instead \begin{algorithmic}[1] @@ -310,7 +310,7 @@ clusters (lines $6$ and $7$ in Figure~\ref{algo:01}). The shared vector elements of the solution $x$ are exchanged by message passing using MPI non-blocking communication routines. -\begin{figure} +\begin{figure}[!t] \centering \includegraphics[width=60mm,keepaspectratio]{clustering} \caption{Example of three clusters of processors interconnected by a virtual unidirectional ring network.} @@ -363,9 +363,9 @@ Table~\ref{tab.cluster.2x50} with a matrix size ranging from $N_x = N_y = N_z = 62 \text{ to } 171$ elements or from $62^{3} = \np{238328}$ to $171^{3} = \np{5211000}$ entries. -\begin{table} +\begin{table}[!t] \centering - \caption{2 Clusters x 50 nodes each} + \caption{2 clusters, each with 50 nodes} \label{tab.cluster.2x50} \tiny @@ -388,9 +388,9 @@ clusters. In the same way as above, a judicious choice of key parameters has permitted to get the results in Table~\ref{tab.cluster.3x33} which shows the speedups less than 1 with a matrix size from 62 to 100 elements. -\begin{table} +\begin{table}[!t] \centering - \caption{3 Clusters x 33 nodes each} + \caption{3 clusters, each with 33 nodes} \label{tab.cluster.3x33} \tiny @@ -413,9 +413,9 @@ In a final step, results of an execution attempt to scale up the three clustered configuration but increasing by two hundreds hosts has been recorded in Table~\ref{tab.cluster.3x67}. -\begin{table} +\begin{table}[!t] \centering - \caption{3 Clusters x 66 nodes each} + \caption{3 clusters, each with 66 nodes} \label{tab.cluster.3x67} \tiny