It is almost straightforward to parallelise this scheme for GPUs, by processing each subinterval $[x_i,x_{i+1}]$ independently in a separate thread. However, it is not known in advance whether an extra knot $t_i$ needs to be inserted or not, and therefore calculation of the position of the knot in the output sequence of knots ${t_i}$ is problematic for parallel implementation (for a sequential algorithm no such issue arises). To avoid serialisation, we decided to insert an additional knot in every interval $[x_i,x_{i+1}]$, but set $t_i=x_i$ when the extra knot is not actually needed. This way we know in advance the position of the output knots and the length of this sequence is $2(n-1)$, and therefore all calculations can now be performed independently. The price we pay is that some of the spline knots can coincide. However, this does not affect spline evaluation, as one of the coinciding knots is simply disregarded, and the spline coefficients are replicated (so for a double knot $t_i=t_{i+1}$, we have $\alpha_i=\alpha_{i+1}$, $\beta_i=\beta_{i+1}$, $\gamma_i=\gamma_{i+1}$). Our implementation is presented in Figures \ref{ch11:algcoef}-\ref{ch11:algcoef1}.
It is almost straightforward to parallelise this scheme for GPUs, by processing each subinterval $[x_i,x_{i+1}]$ independently in a separate thread. However, it is not known in advance whether an extra knot $t_i$ needs to be inserted or not, and therefore calculation of the position of the knot in the output sequence of knots ${t_i}$ is problematic for parallel implementation (for a sequential algorithm no such issue arises). To avoid serialisation, we decided to insert an additional knot in every interval $[x_i,x_{i+1}]$, but set $t_i=x_i$ when the extra knot is not actually needed. This way we know in advance the position of the output knots and the length of this sequence is $2(n-1)$, and therefore all calculations can now be performed independently. The price we pay is that some of the spline knots can coincide. However, this does not affect spline evaluation, as one of the coinciding knots is simply disregarded, and the spline coefficients are replicated (so for a double knot $t_i=t_{i+1}$, we have $\alpha_i=\alpha_{i+1}$, $\beta_i=\beta_{i+1}$, $\gamma_i=\gamma_{i+1}$). Our implementation is presented in Figures \ref{ch11:algcoef}-\ref{ch11:algcoef1}.