implementations of the LB method have been undertaken, typically using
a combination of distributed domain decomposition and the Message
Passing Interface (MPI). However, the potential
performance benefits offered by GPUs has motivated a new ``mixed-mode''
approach to address very large problems. Here, fine-grained
parallelism is implemented on the GPU, while MPI is reserved for
implementations of the LB method have been undertaken, typically using
a combination of distributed domain decomposition and the Message
Passing Interface (MPI). However, the potential
performance benefits offered by GPUs has motivated a new ``mixed-mode''
approach to address very large problems. Here, fine-grained
parallelism is implemented on the GPU, while MPI is reserved for