-From the LSQ algorithm, we have written a C program which uses only integer
-values that have been previously scaled. The quantization of doubles into
-integers has been performed in order to obtain a good trade-off between the
-number of bits used and the precision. Finally, we have compared the result of
-the LSQ version using integer and double. We have observed that the results of
-both versions were similar.
-
-Then we have built two versions of VHDL codes: one directly by hand coding and
-the other with Matlab using simulink HDL coder feature. Although the approach is
-completely different we have obtain VHDL codes that are quite comparable. Each
-approach has advantages and drawbacks. Roughly speaking, hand coding provides
-beautiful and much better structures code while HDL coder provides code faster.
-In terms of speed of code, we think that both approaches will be quite
-comparable. Real experiments will confirm that. In the LSQ algorithm, we have
-replaced all the divisions by multiplications by a constant since divisions are
-performed with constants depending of the number of pixels in the profile
-(i.e. $M$).
+From the LSQ algorithm, we have written a C program that uses only
+integer values. We use a very simple quantization by multiplying
+double precision values by a power of two, keeping the integer
+part. For example, all values stored in lut$_s$, lut$_c$, $\ldots$ are
+scaled by 1024. Since LSQ also computes average, variance, ... to
+remove the slope, the result of implied euclidian divisions may be
+relatively wrong. To avoid that, we also scale the pixel intensities
+by a power of two. Futhermore, assuming $nb_s$ is fixed, these
+divisions have a knonw denominator. Thus, they can be replaced by
+their multiplication/shift counterpart. Finally, all other
+multiplications or divisions by a power of two have been replaced by
+left or right bit shifts. By the way, the code only contains
+additions, substractions and multiplications of signed integers, which
+is perfectly adapted to FGPAs.
+
+As said above, hardware constraints have a great influence on the VHDL
+implementation. Consequently, we searched the maximum value of each
+variable as a function of the different scale factors and the size of
+profiles, which gives their maximum size in bits. That size determines
+the maximum scale factors that allow to use the least possible RAMs
+and DSPs. Actually, we implemented our algorithm with this maximum
+size but current works study the impact of quantization on the results
+precision and design complexity. We have compared the result of the
+LSQ version using integers and doubles and observed that the precision
+of both were similar.
+
+Then we built two versions of VHDL codes: one directly by hand coding
+and the other with Matlab using the Simulink HDL coder
+feature~\cite{HDLCoder}. Although the approach is completely different
+we obtained VHDL codes that are quite comparable. Each approach has
+advantages and drawbacks. Roughly speaking, hand coding provides
+beautiful and much better structured code while Simulink allows to
+produce a code faster. In terms of throughput and latency,
+simulations shows that the two approaches are close with a slight
+advantage for hand coding. We hope that real experiments will confirm
+that.