- $Z_{loc}$ = KernelUpdate($P,P',Z^{prev},n_{loc}$)\;
- $\Delta Z_{loc}$ = KernelComputeError($Z_{loc},Z^{prev}_{loc},n_{loc}$)\;
- $\Delta Z_{max}[id_{gpu}]$ = CudaMaxFunction($\Delta Z_{loc},n_{loc}$)\;
- Copy $Z_{loc}$ from GPU to $Z$ in CPU\;
- $max$ = MaxFunction($\Delta Z_{max},ngpu$)\;
- TestConvergence($max,\epsilon$)\;
+ $Z[offset]$ = KernelUpdate($P,P',Z,n_{loc}$)\;
+ $\Delta Z_{max}[id_{gpu}]$ = KernelComputeError($Z[offset],Z^{prev}[offset],n_{loc}$)\;
+ Copy $Z[offset]$ from GPU to $Z$ in CPU\;
+ $max$ = MaxFunction($\Delta Z_{max},ngpu$)\;