\subsection{FPGAs}
-A field-programmable gate array (FPGA) is an integrated circuit
-designed to be configured by the customer. FGPAs are composed of
-programmable logic components, called configurable logic blocks
-(CLB). These blocks mainly contains look-up tables (LUT), flip/flops
-(F/F) and latches, organized in one or more slices connected
-together. Each CLB can be configured to perform simple (AND, XOR, ...)
-or complex combinational functions. They are interconnected by
-reconfigurable links. Modern FPGAs contain memory elements and
-multipliers which enable to simplify the design and to increase the
-performance. Nevertheless, all other complex operations, like
-division, trigonometric functions, $\ldots$ are not available and must
-be done by configuring a set of CLBs. Since this configuration is not
-obvious at all, it can be done via a framework, like ISE. Such a
-software can synthetize a design written in a hardware description
-language (HDL), map it onto CLBs, place/route them for a specific
-FPGA, and finally produce a bitstream that is used to configre the
-FPGA. Thus, from the developper point of view, the main difficulty is
-to translate an algorithm in HDL code, taking account FPGA resources
-and constraints like clock signals and I/O values that drive the FPGA.
+A field-programmable gate array (FPGA) is an integrated circuit designed to be
+configured by the customer. FGPAs are composed of programmable logic components,
+called configurable logic blocks (CLB). These blocks mainly contains look-up
+tables (LUT), flip/flops (F/F) and latches, organized in one or more slices
+connected together. Each CLB can be configured to perform simple (AND, XOR, ...)
+or complex combinational functions. They are interconnected by reconfigurable
+links. Modern FPGAs contain memory elements and multipliers which enable to
+simplify the design and to increase the performance. Nevertheless, all other
+complex operations, like division, trigonometric functions, $\ldots$ are not
+available and must be done by configuring a set of CLBs. Since this
+configuration is not obvious at all, it can be done via a framework, like
+ISE~\cite{ISE}. Such a software can synthetize a design written in a hardware
+description language (HDL), map it onto CLBs, place/route them for a specific
+FPGA, and finally produce a bitstream that is used to configre the FPGA. Thus,
+from the developper point of view, the main difficulty is to translate an
+algorithm in HDL code, taking account FPGA resources and constraints like clock
+signals and I/O values that drive the FPGA.
Indeed, HDL programming is very different from classic languages like
C. A program can be seen as a state-machine, manipulating signals that
From the LSQ algorithm, we have written a C program which uses only integer
values that have been previously scaled. The quantization of doubles into
integers has been performed in order to obtain a good trade-off between the
-number of bits used and the precision. Finally, we have compared the result of
-the LSQ version using integer and double. We have observed that the results of
+number of bits used and the precision. We have compared the result of
+the LSQ version using integers and doubles. We have observed that the results of
both versions were similar.
Then we have built two versions of VHDL codes: one directly by hand coding and
-the other with Matlab using simulink HDL coder feature. Although the approach is
-completely different we have obtain VHDL codes that are quite comparable. Each
-approach has advantages and drawbacks. Roughly speaking, hand coding provides
-beautiful and much better structures code while HDL coder provides code faster.
-In terms of speed of code, we think that both approaches will be quite
-comparable. Real experiments will confirm that. In the LSQ algorithm, we have
-replaced all the divisions by multiplications by a constant since divisions are
-performed with constants depending of the number of pixels in the profile
-(i.e. $M$).
+the other with Matlab using the Simulink HDL coder
+feature~\cite{HDLCoder}. Although the approach is completely different we have
+obtain VHDL codes that are quite comparable. Each approach has advantages and
+drawbacks. Roughly speaking, hand coding provides beautiful and much better
+structured code while HDL coder provides code faster. In terms of speed of
+code, we think that both approaches will be quite comparable with a slightly
+advantage for hand coding. We hope that real experiments will confirm that. In
+the LSQ algorithm, we have replaced all the divisions by multiplications by
+constants since divisions are performed with constants depending of the number
+of pixels in the profile (i.e. $M$).
\subsection{Simulation}
-Currently, we only have simulated our VHDL codes with GHDL and GTKWave (two free
-tools with linux). Both approaches led to correct results. At the beginning with
-simulations our pipiline could compute a new phase each 33 cycles and the length
-of the pipeline was equal to 95 cycles. When we tried to generate the bitsream
-with ISE environment we had many problems because many stages required more than
-the 10$n$s availabe. So we needed to decompose some part of the pipeline in order
-to add some cycles and siplify some parts.
+Currently, we have only simulated our VHDL codes with GHDL and GTKWave (two free
+tools with linux). Both approaches led to correct results. At the beginning of
+our simulations, our pipiline could compute a new phase each 33 cycles and the
+length of the pipeline was equal to 95 cycles. When we tried to generate the
+corresponding bitsream with ISE environment we had many problems because many
+stages required more than the 10$n$s required by the clock frequency. So we
+needed to decompose some part of the pipeline in order to add some cycles and
+simplify some parts between a clock top.
% ghdl + gtkwave
% au mieux : une phase tous les 33 cycles, latence de 95 cycles.
% mais routage/placement impossible.
Currently both approaches provide synthesable bitstreams with ISE. We expect
that the pipeline will have a latency of 112 cycles, i.e. 1.12$\mu$s and it
-could accept new line of pixel each 48 cycles, i.e. 480$n$s.
+could accept new profiles of pixel each 48 cycles, i.e. 480$n$s.
% pas fait mais prévision d'une sortie tous les 480ns avec une latence de 1120