X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/dmems12.git/blobdiff_plain/d343834383c4f75427f0864fa81669d0968e6aaf..5643f354956645b7d3592ac0e32af50cdf351155:/dmems12.tex diff --git a/dmems12.tex b/dmems12.tex index c3d00f4..314368e 100644 --- a/dmems12.tex +++ b/dmems12.tex @@ -245,14 +245,16 @@ CPU. But this is not the case for phase computation that used only few tenth of values.\\ In order to evaluate the original algorithm, we translated it in C -language. Profiles are read from a 1Mo file, as if it was an image -stored in a device file representing the camera. The file contains 100 -profiles of 21 pixels, equally scattered in the file. We obtained an -average of 10.5$\mu$s by profile (including I/O accesses). It is under -are requirements but close to the limit. In case of an occasional load -of the system, it could be largely overtaken. A solution would be to -use a real-time operating system but another one to search for a more -efficient algorithm. +language. As said further, for 20 pixels, it does about 1550 +operations, thus an estimated execution time of $1550/155 +=$10$\mu$s. For a more realistic evaluation, we constructed a file of +1Mo containing 200 profiles of 20 pixels, equally scattered. This file +is equivalent to an image stored in a device file representing the +camera. We obtained an average of 10.5$\mu$s by profile (including I/O +accesses). It is under are requirements but close to the limit. In +case of an occasional load of the system, it could be largely +overtaken. A solution would be to use a real-time operating system but +another one to search for a more efficient algorithm. But the main drawback is the latency of such a solution : since each profile must be treated one after another, the deflection of 100 @@ -288,17 +290,24 @@ computation, we give some general information about FPGAs and the board we use. \subsection{FPGAs} -A field-programmable gate array (FPGA) is an integrated circuit designed to be -configured by the customer. A hardware description language (HDL) is used to -configure a FPGA. FGPAs are composed of programmable logic components, called -logic blocks. These blocks can be configured to perform simple (AND, XOR, ...) -or complex combinational functions. Logic blocks are interconnected by -reconfigurable links. Modern FPGAs contain memory elements and multipliers which -enable to simplify the design and to increase the speed. As the most complex -operation on FGPAs is the multiplier, design of FGPAs should use simple -operations. For example, a divider is not an operation available and it should -be programmed using simplest operations. - +A field-programmable gate array (FPGA) is an integrated circuit +designed to be configured by the customer. FGPAs are composed of +programmable logic components, called configurable logic blocks +(CLB). These blocks mainly contains look-up tables (LUT), flip/flops +(F/F) and latches, organized in one or more slices connected +together. Each CLB can be configured to perform simple (AND, XOR, ...) +or complex combinational functions. They are interconnected by +reconfigurable links. Modern FPGAs contain memory elements and +multipliers which enable to simplify the design and to increase the +performance. Nevertheless, all other complex operations, like +division, trigonometric functions, $\ldots$ are not available and must +be done by configuring a set of CLBs. + +Since this configuration is not obvious at all, it can be done via a +framework that synthetize a design written in an hardware description +language (HDL), and after, that place and route + + is used to configure a FPGA. FGPAs programming is very different from classic processors programming. When logic blocks are programmed and linked to perform an operation, they cannot be reused anymore. FPGAs are cadenced more slowly than classic processors but they @@ -595,7 +604,7 @@ largely beyond the worst experimental ones. \begin{figure}[ht] \begin{center} - \includegraphics[width=9cm]{intens-noise20-spl} + \includegraphics[width=9cm]{intens-noise20} \end{center} \caption{Sample of worst profile for N=10} \label{fig:noise20} @@ -603,7 +612,7 @@ largely beyond the worst experimental ones. \begin{figure}[ht] \begin{center} - \includegraphics[width=9cm]{intens-noise60-lsq} + \includegraphics[width=9cm]{intens-noise60} \end{center} \caption{Sample of worst profile for N=30} \label{fig:noise60} @@ -616,7 +625,7 @@ SPL on $N = k\times M$, i.e. the number of interpolated points. We assume that $M=20$, $nb_s=1024$, $k=4$, all possible parts are already in lookup tables and a limited set of operations (+, -, *, /, -<, >) is taken account. Translating the two algorithms in C code, we +$<$, $>$) is taken account. Translating the two algorithms in C code, we obtain about 430 operations for LSQ and 1550 (plus few tenth for $atan$) for SPL. This result is largely in favor of LSQ. Nevertheless, considering the total number of operations is not really pertinent for