From: lilia Date: Fri, 11 Apr 2014 12:15:43 +0000 (+0200) Subject: 1111 X-Git-Tag: hpcc2014_submission~131 X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/hpcc2014.git/commitdiff_plain/7cbf551d8d0a15c170c8afb74b5b67f42e3b2065?ds=sidebyside 1111 --- diff --git a/clustering.eps b/clustering.eps new file mode 100644 index 0000000..7cc723e --- /dev/null +++ b/clustering.eps @@ -0,0 +1,324 @@ +%!PS-Adobe-3.0 EPSF-3.0 +%%Title: clustering.fig +%%Creator: fig2dev Version 3.2 Patchlevel 5e +%%CreationDate: Fri Apr 11 13:59:49 2014 +%%BoundingBox: 0 0 708 563 +%Magnification: 1.0000 +%%EndComments +%%BeginProlog +/$F2psDict 200 dict def +$F2psDict begin +$F2psDict /mtrx matrix put +/col-1 {0 setgray} bind def +/col0 {0.000 0.000 0.000 srgb} bind def +/col1 {0.000 0.000 1.000 srgb} bind def +/col2 {0.000 1.000 0.000 srgb} bind def +/col3 {0.000 1.000 1.000 srgb} bind def +/col4 {1.000 0.000 0.000 srgb} bind def +/col5 {1.000 0.000 1.000 srgb} bind def +/col6 {1.000 1.000 0.000 srgb} bind def +/col7 {1.000 1.000 1.000 srgb} bind def +/col8 {0.000 0.000 0.560 srgb} bind def +/col9 {0.000 0.000 0.690 srgb} bind def +/col10 {0.000 0.000 0.820 srgb} bind def +/col11 {0.530 0.810 1.000 srgb} bind def +/col12 {0.000 0.560 0.000 srgb} bind def +/col13 {0.000 0.690 0.000 srgb} bind def +/col14 {0.000 0.820 0.000 srgb} bind def +/col15 {0.000 0.560 0.560 srgb} bind def +/col16 {0.000 0.690 0.690 srgb} bind def +/col17 {0.000 0.820 0.820 srgb} bind def +/col18 {0.560 0.000 0.000 srgb} bind def +/col19 {0.690 0.000 0.000 srgb} bind def +/col20 {0.820 0.000 0.000 srgb} bind def +/col21 {0.560 0.000 0.560 srgb} bind def +/col22 {0.690 0.000 0.690 srgb} bind def +/col23 {0.820 0.000 0.820 srgb} bind def +/col24 {0.500 0.190 0.000 srgb} bind def +/col25 {0.630 0.250 0.000 srgb} bind def +/col26 {0.750 0.380 0.000 srgb} bind def +/col27 {1.000 0.500 0.500 srgb} bind def +/col28 {1.000 0.630 0.630 srgb} bind def +/col29 {1.000 0.750 0.750 srgb} bind def +/col30 {1.000 0.880 0.880 srgb} bind def +/col31 {1.000 0.840 0.000 srgb} bind def + +end + +/cp {closepath} bind def +/ef {eofill} bind def +/gr {grestore} bind def +/gs {gsave} bind def +/sa {save} bind def +/rs {restore} bind def +/l {lineto} bind def +/m {moveto} bind def +/rm {rmoveto} bind def +/n {newpath} bind def +/s {stroke} bind def +/sh {show} bind def +/slc {setlinecap} bind def +/slj {setlinejoin} bind def +/slw {setlinewidth} bind def +/srgb {setrgbcolor} bind def +/rot {rotate} bind def +/sc {scale} bind def +/sd {setdash} bind def +/ff {findfont} bind def +/sf {setfont} bind def +/scf {scalefont} bind def +/sw {stringwidth} bind def +/tr {translate} bind def +/tnt {dup dup currentrgbcolor + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add + 4 -2 roll dup 1 exch sub 3 -1 roll mul add srgb} + bind def +/shd {dup dup currentrgbcolor 4 -2 roll mul 4 -2 roll mul + 4 -2 roll mul srgb} bind def + /DrawEllipse { + /endangle exch def + /startangle exch def + /yrad exch def + /xrad exch def + /y exch def + /x exch def + /savematrix mtrx currentmatrix def + x y tr xrad yrad sc 0 0 1 startangle endangle arc + closepath + savematrix setmatrix + } def + +/$F2psBegin {$F2psDict begin /$F2psEnteredState save def} def +/$F2psEnd {$F2psEnteredState restore end} def + +/pageheader { +save +newpath 0 563 moveto 0 0 lineto 708 0 lineto 708 563 lineto closepath clip newpath +-44.5 616.1 translate +1 -1 scale +$F2psBegin +10 setmiterlimit +0 slj 0 slc + 0.06299 0.06299 sc +} bind def +/pagefooter { +$F2psEnd +restore +} bind def +%%EndProlog +pageheader +% +% Fig objects follow +% +% +% here starts figure with depth 50 +% Ellipse +15.000 slw +n 8385 3109 402 402 0 360 DrawEllipse gs col6 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 9781 1945 402 402 0 360 DrawEllipse gs col6 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 11014 3095 402 402 0 360 DrawEllipse gs col6 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 10539 4611 402 402 0 360 DrawEllipse gs col6 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 8948 4658 402 402 0 360 DrawEllipse gs col6 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 9711 3495 2030 2030 0 360 DrawEllipse gs col0 s gr + +% Polyline +0 slj +0 slc +n 8640 2790 m + 9495 2205 l gs col0 s gr +% Polyline +n 10125 2160 m + 10800 2745 l gs col0 s gr +% Polyline +n 10980 3510 m + 10710 4230 l gs col0 s gr +% Polyline +n 9360 4590 m + 10125 4590 l gs col0 s gr +% Polyline +n 8685 4365 m + 8370 3510 l gs col0 s gr +% Polyline +n 8775 3105 m + 10620 3105 l gs col0 s gr +% Polyline +n 8775 3285 m + 10305 4275 l gs col0 s gr +% Polyline +n 9675 2340 m + 9000 4275 l gs col0 s gr +% Polyline +n 9990 2340 m + 10395 4275 l gs col0 s gr +% Polyline +n 9225 4365 m + 10665 3285 l gs col0 s gr +% Ellipse +n 4089 3179 402 402 0 360 DrawEllipse gs col4 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 1325 3147 402 402 0 360 DrawEllipse gs col4 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 2700 4905 402 402 0 360 DrawEllipse gs col4 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 2721 3678 1999 1999 0 360 DrawEllipse gs col0 s gr + +% Polyline +n 1395 3555 m + 2385 4635 l gs col0 s gr +% Polyline +n 4050 3600 m + 3060 4680 l gs col0 s gr +% Polyline +n 1755 3150 m + 3690 3150 l gs col0 s gr +% Ellipse +n 5444 8106 402 402 0 360 DrawEllipse gs col11 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 6411 7503 1734 1734 0 360 DrawEllipse gs col0 s gr + +% Ellipse +n 7406 8155 402 402 0 360 DrawEllipse gs col11 0.90 tnt ef gr gs col0 s gr + +% Ellipse +n 6430 6269 402 402 0 360 DrawEllipse gs col11 0.90 tnt ef gr gs col0 s gr + +% Polyline +n 5850 8145 m + 7020 8145 l gs col0 s gr +% Polyline +n 6165 6570 m + 5580 7740 l gs col0 s gr +% Polyline +n 6750 6525 m + 7470 7785 l gs col0 s gr +% Polyline +7.500 slw +gs clippath +5602 2973 m 5447 3128 l 5511 3192 l 5666 3037 l 5666 3037 l 5502 3138 l 5602 2973 l cp +eoclip +n 5490 3150 m + 6390 2250 l gs col9 s gr gr + +% arrowhead +n 5602 2973 m 5502 3138 l 5666 3037 l 5608 3031 l 5602 2973 l + cp gs col9 1.00 shd ef gr col9 s +% Polyline +gs clippath +7460 7299 m 7275 7417 l 7323 7493 l 7508 7375 l 7508 7375 l 7326 7438 l 7460 7299 l cp +eoclip +n 7312 7447 m + 8910 6435 l gs col9 s gr gr + +% arrowhead +n 7460 7299 m 7326 7438 l 7508 7375 l 7452 7357 l 7460 7299 l + cp gs col9 1.00 shd ef gr col9 s +% Polyline +30.000 slw + [120] 0 sd +gs clippath +8065 3731 m 8230 3452 l 8075 3361 l 7910 3640 l 7910 3640 l 8110 3479 l 8065 3731 l cp +eoclip +n 8145 3420 m + 6660 5940 l gs col4 s gr gr + [] 0 sd +% arrowhead +45.000 slw +n 8065 3731 m 8110 3479 l 7910 3640 l col4 s +% Polyline +30.000 slw + [120] 0 sd +gs clippath +4808 3060 m 4485 3060 l 4485 3240 l 4808 3240 l 4808 3240 l 4568 3150 l 4808 3060 l cp +eoclip +n 4500 3150 m + 8010 3150 l gs col4 s gr gr + [] 0 sd +% arrowhead +45.000 slw +n 4808 3060 m 4568 3150 l 4808 3240 l col4 s +% Polyline +30.000 slw + [120] 0 sd +gs clippath +5866 5788 m 6056 6050 l 6201 5944 l 6011 5682 l 6011 5682 l 6080 5930 l 5866 5788 l cp +eoclip +n 4320 3555 m 4320 3510 l + 6120 5985 l gs col4 s gr gr + [] 0 sd +% arrowhead +45.000 slw +n 5866 5788 m 6080 5930 l 6011 5682 l col4 s +/Times-Roman ff 476.25 scf sf +8145 3285 m +gs 1 -1 sc (P1) col0 sh gr +/Times-Roman ff 476.25 scf sf +9585 2115 m +gs 1 -1 sc (P2) col0 sh gr +/Times-Roman ff 476.25 scf sf +10755 3285 m +gs 1 -1 sc (P3) col0 sh gr +/Times-Roman ff 476.25 scf sf +8730 4860 m +gs 1 -1 sc (P4) col0 sh gr +/Times-Roman ff 476.25 scf sf +10305 4815 m +gs 1 -1 sc (P5) col0 sh gr +/Times-Roman ff 476.25 scf sf +3870 3330 m +gs 1 -1 sc (P1) col0 sh gr +/Times-Roman ff 476.25 scf sf +1080 3330 m +gs 1 -1 sc (P2) col0 sh gr +/Times-Roman ff 476.25 scf sf +2475 5085 m +gs 1 -1 sc (P3) col0 sh gr +/Times-Roman ff 476.25 scf sf +5175 8280 m +gs 1 -1 sc (P2) col0 sh gr +/Times-Roman ff 476.25 scf sf +7155 8325 m +gs 1 -1 sc (P3) col0 sh gr +/Times-Roman ff 476.25 scf sf +6210 6435 m +gs 1 -1 sc (P1) col0 sh gr +/Times-Roman ff 476.25 scf sf +1845 1350 m +gs 1 -1 sc (Cluster 3) col0 sh gr +/Times-Roman ff 476.25 scf sf +8820 1215 m +gs 1 -1 sc (Cluster 2) col0 sh gr +/Times-Roman ff 476.25 scf sf +5535 9765 m +gs 1 -1 sc (Cluster 1) col0 sh gr +/Times-Roman ff 444.50 scf sf +5085 1755 m +gs 1 -1 sc (Asynchronous) col9 sh gr +/Times-Roman ff 444.50 scf sf +4950 2205 m +gs 1 -1 sc (communication) col9 sh gr +/Times-Roman ff 444.50 scf sf +8955 6750 m +gs 1 -1 sc (communication) col9 sh gr +/Times-Roman ff 444.50 scf sf +9225 6345 m +gs 1 -1 sc (Synchronous) col9 sh gr +% here ends figure; +pagefooter +showpage +%%Trailer +%EOF diff --git a/clustering.fig b/clustering.fig new file mode 100644 index 0000000..22dd814 --- /dev/null +++ b/clustering.fig @@ -0,0 +1,92 @@ +#FIG 3.2 Produced by xfig version 3.2.5c +Landscape +Center +Metric +A4 +100.00 +Single +-2 +1200 2 +6 7650 1440 11790 5580 +1 3 0 2 0 6 50 -1 38 0.000 1 0.0000 8385 3109 402 402 8385 3109 8787 3109 +1 3 0 2 0 6 50 -1 38 0.000 1 0.0000 9781 1945 402 402 9781 1945 10183 1945 +1 3 0 2 0 6 50 -1 38 0.000 1 0.0000 11014 3095 402 402 11014 3095 11416 3095 +1 3 0 2 0 6 50 -1 38 0.000 1 0.0000 10539 4611 402 402 10539 4611 10941 4611 +1 3 0 2 0 6 50 -1 38 0.000 1 0.0000 8948 4658 402 402 8948 4658 9350 4658 +1 3 0 2 0 7 50 -1 -1 0.000 1 0.0000 9711 3495 2030 2030 9711 3495 11741 3495 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 8640 2790 9495 2205 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 10125 2160 10800 2745 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 10980 3510 10710 4230 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 9360 4590 10125 4590 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 8685 4365 8370 3510 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 8775 3105 10620 3105 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 8775 3285 10305 4275 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 9675 2340 9000 4275 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 9990 2340 10395 4275 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 9225 4365 10665 3285 +4 0 0 50 -1 0 30 0.0000 4 345 510 8145 3285 P1\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 9585 2115 P2\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 10755 3285 P3\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 8730 4860 P4\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 10305 4815 P5\001 +-6 +6 675 1620 4770 5715 +1 3 0 2 0 4 50 -1 38 0.000 1 0.0000 4089 3179 402 402 4089 3179 4491 3179 +1 3 0 2 0 4 50 -1 38 0.000 1 0.0000 1325 3147 402 402 1325 3147 1727 3147 +1 3 0 2 0 4 50 -1 38 0.000 1 0.0000 2700 4905 402 402 2700 4905 3102 4905 +1 3 0 2 0 7 50 -1 -1 0.000 1 0.0000 2721 3678 1999 1999 2721 3678 4720 3678 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 1395 3555 2385 4635 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 4050 3600 3060 4680 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 1755 3150 3690 3150 +4 0 0 50 -1 0 30 0.0000 4 345 510 3870 3330 P1\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 1080 3330 P2\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 2475 5085 P3\001 +-6 +1 3 0 2 0 11 50 -1 38 0.000 1 0.0000 5444 8106 402 402 5444 8106 5846 8106 +1 3 0 2 0 7 50 -1 -1 0.000 1 0.0000 6411 7503 1734 1734 6411 7503 8145 7515 +1 3 0 2 0 11 50 -1 38 0.000 1 0.0000 7406 8155 402 402 7406 8155 7808 8155 +1 3 0 2 0 11 50 -1 38 0.000 1 0.0000 6430 6269 402 402 6430 6269 6832 6269 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 5850 8145 7020 8145 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 6165 6570 5580 7740 +2 1 0 2 0 7 50 -1 -1 0.000 0 0 -1 0 0 2 + 6750 6525 7470 7785 +2 1 0 1 9 7 50 -1 -1 0.000 0 0 -1 0 1 2 + 2 1 1.00 90.00 150.00 + 5490 3150 6390 2250 +2 1 0 1 9 7 50 -1 -1 0.000 0 0 -1 0 1 2 + 2 1 1.00 90.00 150.00 + 7312 7447 8910 6435 +2 1 1 3 4 7 50 -1 -1 8.000 0 0 -1 0 1 2 + 0 0 4.00 180.00 240.00 + 8145 3420 6660 5940 +2 1 1 3 4 7 50 -1 -1 8.000 0 0 -1 0 1 2 + 0 0 4.00 180.00 240.00 + 4500 3150 8010 3150 +2 1 1 3 4 7 50 -1 -1 8.000 0 0 -1 1 0 3 + 0 0 4.00 180.00 240.00 + 4320 3555 4320 3510 6120 5985 +4 0 0 50 -1 0 30 0.0000 4 345 510 5175 8280 P2\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 7155 8325 P3\001 +4 0 0 50 -1 0 30 0.0000 4 345 510 6210 6435 P1\001 +4 0 0 50 -1 0 30 0.0000 4 345 1800 1845 1350 Cluster 3\001 +4 0 0 50 -1 0 30 0.0000 4 345 1800 8820 1215 Cluster 2\001 +4 0 0 50 -1 0 30 0.0000 4 345 1800 5535 9765 Cluster 1\001 +4 0 9 50 -1 0 28 0.0000 4 435 2655 5085 1755 Asynchronous\001 +4 0 9 50 -1 0 28 0.0000 4 315 2895 4950 2205 communication\001 +4 0 9 50 -1 0 28 0.0000 4 315 2895 8955 6750 communication\001 +4 0 9 50 -1 0 28 0.0000 4 435 2400 9225 6345 Synchronous\001 diff --git a/clustering.pdf b/clustering.pdf new file mode 100644 index 0000000..5e83de2 Binary files /dev/null and b/clustering.pdf differ diff --git a/hpcc.tex b/hpcc.tex index 5e4bea6..ad829c0 100644 --- a/hpcc.tex +++ b/hpcc.tex @@ -32,6 +32,8 @@ \algnewcommand\algorithmicoutput{\textbf{Output:}} \algnewcommand\Output{\item[\algorithmicoutput]} +\newcommand{\MI}{\mathit{MaxIter}} + \begin{document} @@ -145,7 +147,7 @@ Décrire SimGrid (Arnaud) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Simulation of the multisplitting method} %Décrire le problème (algo) traité ainsi que le processus d'adaptation à SimGrid. -Let $Ax=b$ be a large sparse system of $n$ linear equations in $\mathbb{R}$, where $A$ is a sparse square and nonsingular matrix, $x$ is the solution vector and $y$ is the right-hand side vector. We use a multisplitting method based on the block Jacobi partitioning to solve this linear system on a large scale platform composed of $L$ clusters of processors. In this case, we apply a row-by-row splitting without overlapping +Let $Ax=b$ be a large sparse system of $n$ linear equations in $\mathbb{R}$, where $A$ is a sparse square and nonsingular matrix, $x$ is the solution vector and $b$ is the right-hand side vector. We use a multisplitting method based on the block Jacobi partitioning to solve this linear system on a large scale platform composed of $L$ clusters of processors. In this case, we apply a row-by-row splitting without overlapping \[ \left(\begin{array}{ccc} A_{11} & \cdots & A_{1L} \\ @@ -160,47 +162,61 @@ X_L \end{array} \right) = \left(\begin{array}{c} -Y_1 \\ +B_1 \\ \vdots\\ -Y_L +B_L \end{array} \right)\] -in such a way that successive rows of matrix $A$ and both vectors $x$ and $b$ are assigned to one cluster, where for all $l,i\in\{1,\ldots,L\}$ $A_{li}$ is a rectangular block of $A$ of size $n_l\times n_i$, $X_l$ and $Y_l$ are sub-vectors of $x$ and $y$, respectively, each of size $n_l$ and $\sum_{l} n_l=\sum_{i} n_i=n$. +in such a way that successive rows of matrix $A$ and both vectors $x$ and $b$ are assigned to one cluster, where for all $l,m\in\{1,\ldots,L\}$ $A_{lm}$ is a rectangular block of $A$ of size $n_l\times n_m$, $X_l$ and $B_l$ are sub-vectors of $x$ and $b$, respectively, each of size $n_l$ and $\sum_{l} n_l=\sum_{m} n_m=n$. -The multisplitting method proceeds by iteration to solve in parallel the linear system by $L$ clusters of processors, in such a way each sub-system +The multisplitting method proceeds by iteration to solve in parallel the linear system on $L$ clusters of processors, in such a way each sub-system \begin{equation} \left\{ \begin{array}{l} A_{ll}X_l = Y_l \mbox{,~such that}\\ -Y_l = B_l - \displaystyle\sum_{i=1,i\neq l}^{L}A_{li}X_i, +Y_l = B_l - \displaystyle\sum_{\substack{m=1\\ m\neq l}}^{L}A_{lm}X_m \end{array} \right. \label{eq:4.1} \end{equation} -is solved independently by a cluster and communication are required to update the right-hand side sub-vectors $Y_l$, such that the sub-vectors $X_i$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. +is solved independently by a cluster and communication are required to update the right-hand side sub-vector $Y_l$, such that the sub-vectors $X_m$ represent the data dependencies between the clusters. As each sub-system (\ref{eq:4.1}) is solved in parallel by a cluster of processors, our multisplitting method uses an iterative method as an inner solver which is easier to parallelize and more scalable than a direct method. In this work, we use the parallel algorithm of GMRES method~\cite{ref1} which is one of the most used iterative method by many researchers. \begin{algorithm} -\caption{A multisplitting solver with inner iteration GMRES method} +\caption{A multisplitting solver with GMRES method} \begin{algorithmic}[1] -\Input $A_l$ (local sparse matrix), $B_l$ (local right-hand side), $x^0$ (initial guess) -\Output $X_l$ (local solution vector)\vspace{0.2cm} -\State Load $A_l$, $B_l$, $x^0$ -\State Initialize the shared vector $\hat{x}=x^0$ -\For {$k=1,2,3,\ldots$ until the global convergence} -\State $x^0=\hat{x}$ -\State Inner iteration solver: \Call{InnerSolver}{$x^0$, $k$} -\State Exchange the local solution ${X}_l^k$ with the neighboring clusters and copy the shared vector elements in $\hat{x}$ +\Input $A_l$ (sparse sub-matrix), $B_l$ (right-hand side sub-vector) +\Output $X_l$ (solution sub-vector)\vspace{0.2cm} +\State Load $A_l$, $B_l$ +\State Initialize the solution vector $x^0$ +\For {$k=0,1,2,\ldots$ until the global convergence} +\State Restart outer iteration with $x^0=x^k$ +\State Inner iteration: \Call{InnerSolver}{$x^0$, $k+1$} +\State Send shared elements of $X_l^{k+1}$ to neighboring clusters +\State Receive shared elements in $\{X_m^{k+1}\}_{m\neq l}$ \EndFor \Statex \Function {InnerSolver}{$x^0$, $k$} -\State Compute the local right-hand side: $Y_l = B_l - \sum^L_{i=1,i\neq l}A_{li}X_i^0$ -\State Solving the local splitting $A_{ll}X_l^k=Y_l$ using the parallel GMRES method, such that $X_l^0$ is the local initial guess +\State Compute local right-hand side $Y_l$: \[Y_l = B_l - \sum\nolimits^L_{\substack{m=1 \\m\neq l}}A_{lm}X_m^0\] +\State Solving sub-system $A_{ll}X_l^k=Y_l$ with the parallel GMRES method \State \Return $X_l^k$ \EndFunction \end{algorithmic} \label{algo:01} \end{algorithm} + +Algorithm~\ref{algo:01} shows the main key points of the multisplitting method to solve a large sparse linear system. This algorithm is based on an outer-inner iteration method such that the parallel GMRES method is used to solve the inner iteration. It is executed in parallel by each cluster of processors. For all $l,m\in\{1,\ldots,L\}$, the matrices and vectors with the subscript $l$ represent the local data for cluster $l$, while $\{A_{lm}\}_{m\neq l}$ are off-diagonal matrices of sparse matrix $A$ and $\{X_m\}_{m\neq l}$ contain vector elements of solution $x$ shared with neighboring clusters. At every outer iteration $k$, asynchronous communications are performed between processors of the local cluster and those of distant clusters (lines $6$ and $7$ in Algorithm~\ref{algo:01}). The shared vector elements of the solution $x$ are exchanged by message passing using MPI non-blocking communication routines. + +\begin{figure} +\centering + \includegraphics[width=60mm,keepaspectratio]{clustering} +\caption{Example of three clusters of processors interconnected by a virtual unidirectional ring network.} +\label{fig:4.1} +\end{figure} + +The global convergence of the asynchronous multisplitting solver is detected when the clusters of processors have all converged locally. We implemented the global convergence detection process as follows. On each cluster a master processor is designated (for example the processor with rank $1$) and masters of all clusters are interconnected by a virtual unidirectional ring network (see Figure~\ref{fig:4.1}). During the resolution, a Boolean token circulates around the virtual ring from a master processor to another until the global convergence is achieved. So starting from the cluster with rank $1$, each master processor $i$ sets the token to {\it True} if the local convergence is achieved or to {\it False} otherwise, and sends it to master processor $i+1$. Finally, the global convergence is detected when the master of cluster $1$ receive from the master of cluster $L$ a token set to {\it True}. In this case, the master of cluster $1$ sends a stop message to masters of other clusters. In this work, the local convergence on each cluster $l$ is detected when the following condition is satisfied +\[(k\leq \MI) \mbox{~or~} \|X_l^k - X_l^{k+1}\|_{\infty}\leq\epsilon\] +where $\MI$ is the maximum number of outer iterations and $\epsilon$ is the tolerance threshold of the error computed between two successive local solution $X_l^k$ and $X_l^{k+1}$. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%