-From the LSQ algorithm, we have written a C program which uses only integer
-values that have been previously scaled. The quantization of doubles into
-integers has been performed in order to obtain a good trade-off between the
-number of bits used and the precision. We have compared the result of
-the LSQ version using integers and doubles. We have observed that the results of
-both versions were similar.
-
-Then we have built two versions of VHDL codes: one directly by hand coding and
-the other with Matlab using the Simulink HDL coder
-feature~\cite{HDLCoder}. Although the approach is completely different we have
-obtain VHDL codes that are quite comparable. Each approach has advantages and
-drawbacks. Roughly speaking, hand coding provides beautiful and much better
-structured code while HDL coder provides code faster. In terms of speed of
-code, we think that both approaches will be quite comparable with a slightly
-advantage for hand coding. We hope that real experiments will confirm that. In
-the LSQ algorithm, we have replaced all the divisions by multiplications by
-constants since divisions are performed with constants depending of the number
-of pixels in the profile (i.e. $M$).
+From the LSQ algorithm, we have written a C program that uses only
+integer values. We use a very simple quantization by multiplying
+double precision values by a power of two, keeping the integer
+part. For example, all values stored in lut$_s$, lut$_c$, $\ldots$ are
+scaled by 1024. Since LSQ also computes average, variance, ... to
+remove the slope, the result of implied euclidian divisions may be
+relatively wrong. To avoid that, we also scale the pixel intensities
+by a power of two. Futhermore, assuming $nb_s$ is fixed, these
+divisions have a knonw denominator. Thus, they can be replaced by
+their multiplication/shift counterpart. Finally, all other
+multiplications or divisions by a power of two have been replaced by
+left or right bit shifts. By the way, the code only contains
+additions, substractions and multiplications of signed integers, which
+is perfectly adapted to FGPAs.
+
+As said above, hardware constraints have a great influence on the VHDL
+implementation. Consequently, we searched the maximum value of each
+variable as a function of the different scale factors and the size of
+profiles, which gives their maximum size in bits. That size determines
+the maximum scale factors that allow to use the least possible RAMs
+and DSPs. Actually, we implemented our algorithm with this maximum
+size but current works study the impact of quantization on the results
+precision and design complexity. We have compared the result of the
+LSQ version using integers and doubles and observed that the precision
+of both were similar.
+
+Then we built two versions of VHDL codes: one directly by hand coding
+and the other with Matlab using the Simulink HDL coder
+feature~\cite{HDLCoder}. Although the approach is completely different
+we obtained VHDL codes that are quite comparable. Each approach has
+advantages and drawbacks. Roughly speaking, hand coding provides
+beautiful and much better structured code while Simulink allows to
+produce a code faster. In terms of throughput and latency,
+simulations shows that the two approaches are close with a slight
+advantage for hand coding. We hope that real experiments will confirm
+that.