-The parallel domain decomposition solver has been validated against the sequential solvers with respect to algorithmic efficiency to establish that the code produce correct results. An analysis of the numerical efficiency have also been carried out on different GPU systems to identify comparative behaviors as both the problems sizes and number of compute nodes vary. For example, performance scalings on Test environment 1 and Test environment 2 are presented in figure \ref{ch7:fig:multigpuperformance}. The figure confirms that there is only a limited benefit from using multiple GPUs for small problem sizes, since the computational intensity is simply too low to efficiently hide the latency of message passing. A substantial speedup is achieved compared to the single GPU version, while being able to solve even larger systems.
-With the linear scaling of memory requirements and improved computational speed, the methodology based on multiple GPUs makes it possible to simulate water waves in very large numerical wave tanks with improved performance.
-