X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/dmems12.git/blobdiff_plain/015bc351b18995b7727145984fd39381bcce9a5a..c6645d99e9614d4fe8c5d8699f94440ba339900f:/dmems12.tex diff --git a/dmems12.tex b/dmems12.tex index 94e96e4..ad693f6 100644 --- a/dmems12.tex +++ b/dmems12.tex @@ -289,25 +289,23 @@ computation, we give some general information about FPGAs and the board we use. \subsection{FPGAs} -A field-programmable gate array (FPGA) is an integrated circuit -designed to be configured by the customer. FGPAs are composed of -programmable logic components, called configurable logic blocks -(CLB). These blocks mainly contains look-up tables (LUT), flip/flops -(F/F) and latches, organized in one or more slices connected -together. Each CLB can be configured to perform simple (AND, XOR, ...) -or complex combinational functions. They are interconnected by -reconfigurable links. Modern FPGAs contain memory elements and -multipliers which enable to simplify the design and to increase the -performance. Nevertheless, all other complex operations, like -division, trigonometric functions, $\ldots$ are not available and must -be done by configuring a set of CLBs. Since this configuration is not -obvious at all, it can be done via a framework, like ISE. Such a -software can synthetize a design written in a hardware description -language (HDL), map it onto CLBs, place/route them for a specific -FPGA, and finally produce a bitstream that is used to configre the -FPGA. Thus, from the developper point of view, the main difficulty is -to translate an algorithm in HDL code, taking account FPGA resources -and constraints like clock signals and I/O values that drive the FPGA. +A field-programmable gate array (FPGA) is an integrated circuit designed to be +configured by the customer. FGPAs are composed of programmable logic components, +called configurable logic blocks (CLB). These blocks mainly contains look-up +tables (LUT), flip/flops (F/F) and latches, organized in one or more slices +connected together. Each CLB can be configured to perform simple (AND, XOR, ...) +or complex combinational functions. They are interconnected by reconfigurable +links. Modern FPGAs contain memory elements and multipliers which enable to +simplify the design and to increase the performance. Nevertheless, all other +complex operations, like division, trigonometric functions, $\ldots$ are not +available and must be done by configuring a set of CLBs. Since this +configuration is not obvious at all, it can be done via a framework, like +ISE~\cite{ISE}. Such a software can synthetize a design written in a hardware +description language (HDL), map it onto CLBs, place/route them for a specific +FPGA, and finally produce a bitstream that is used to configre the FPGA. Thus, +from the developper point of view, the main difficulty is to translate an +algorithm in HDL code, taking account FPGA resources and constraints like clock +signals and I/O values that drive the FPGA. Indeed, HDL programming is very different from classic languages like C. A program can be seen as a state-machine, manipulating signals that @@ -699,30 +697,32 @@ will include real experiments in the final version of this paper. From the LSQ algorithm, we have written a C program which uses only integer values that have been previously scaled. The quantization of doubles into integers has been performed in order to obtain a good trade-off between the -number of bits used and the precision. Finally, we have compared the result of -the LSQ version using integer and double. We have observed that the results of +number of bits used and the precision. We have compared the result of +the LSQ version using integers and doubles. We have observed that the results of both versions were similar. Then we have built two versions of VHDL codes: one directly by hand coding and -the other with Matlab using simulink HDL coder feature. Although the approach is -completely different we have obtain VHDL codes that are quite comparable. Each -approach has advantages and drawbacks. Roughly speaking, hand coding provides -beautiful and much better structures code while HDL coder provides code faster. -In terms of speed of code, we think that both approaches will be quite -comparable. Real experiments will confirm that. In the LSQ algorithm, we have -replaced all the divisions by multiplications by a constant since divisions are -performed with constants depending of the number of pixels in the profile -(i.e. $M$). +the other with Matlab using the Simulink HDL coder +feature~\cite{HDLCoder}. Although the approach is completely different we have +obtain VHDL codes that are quite comparable. Each approach has advantages and +drawbacks. Roughly speaking, hand coding provides beautiful and much better +structured code while HDL coder provides code faster. In terms of speed of +code, we think that both approaches will be quite comparable with a slightly +advantage for hand coding. We hope that real experiments will confirm that. In +the LSQ algorithm, we have replaced all the divisions by multiplications by +constants since divisions are performed with constants depending of the number +of pixels in the profile (i.e. $M$). \subsection{Simulation} -Currently, we only have simulated our VHDL codes with GHDL and GTKWave (two free -tools with linux). Both approaches led to correct results. At the beginning with -simulations our pipiline could compute a new phase each 33 cycles and the length -of the pipeline was equal to 95 cycles. When we tried to generate the bitsream -with ISE environment we had many problems because many stages required more than -the 10$n$s availabe. So we needed to decompose some part of the pipeline in order -to add some cycles and siplify some parts. +Currently, we have only simulated our VHDL codes with GHDL and GTKWave (two free +tools with linux). Both approaches led to correct results. At the beginning of +our simulations, our pipiline could compute a new phase each 33 cycles and the +length of the pipeline was equal to 95 cycles. When we tried to generate the +corresponding bitsream with ISE environment we had many problems because many +stages required more than the 10$n$s required by the clock frequency. So we +needed to decompose some part of the pipeline in order to add some cycles and +simplify some parts between a clock top. % ghdl + gtkwave % au mieux : une phase tous les 33 cycles, latence de 95 cycles. % mais routage/placement impossible. @@ -730,7 +730,7 @@ to add some cycles and siplify some parts. Currently both approaches provide synthesable bitstreams with ISE. We expect that the pipeline will have a latency of 112 cycles, i.e. 1.12$\mu$s and it -could accept new line of pixel each 48 cycles, i.e. 480$n$s. +could accept new profiles of pixel each 48 cycles, i.e. 480$n$s. % pas fait mais prévision d'une sortie tous les 480ns avec une latence de 1120