i8ème commit

[dmems12.git] / dmems12.tex
diff --git a/dmems12.tex b/dmems12.tex

index c3d00f4d063e8f9ade68cf2b30b05e5df39c73a2..314368e8ff8e9cad225c946751adb95aa857a125 100644 (file)
--- a/dmems12.tex
+++ b/dmems12.tex
@@ -245,14 +245,16 @@ CPU. But this is not the case for phase computation that used only few
  tenth of values.\\
  
  In order to evaluate the original algorithm, we translated it in C
-language. Profiles are read from a 1Mo file, as if it was an image
-stored in a device file representing the camera. The file contains 100
-profiles of 21 pixels, equally scattered in the file. We obtained an
-average of 10.5$\mu$s by profile (including I/O accesses). It is under
-are requirements but close to the limit. In case of an occasional load
-of the system, it could be largely overtaken. A solution would be to
-use a real-time operating system but another one to search for a more
-efficient algorithm.
+language. As said further, for 20 pixels, it does about 1550
+operations, thus an estimated execution time of $1550/155
+=$10$\mu$s. For a more realistic evaluation, we constructed a file of
+1Mo containing 200 profiles of 20 pixels, equally scattered. This file
+is equivalent to an image stored in a device file representing the
+camera. We obtained an average of 10.5$\mu$s by profile (including I/O
+accesses). It is under are requirements but close to the limit. In
+case of an occasional load of the system, it could be largely
+overtaken. A solution would be to use a real-time operating system but
+another one to search for a more efficient algorithm.
  
  But the main drawback is the latency of such a solution : since each
  profile must be treated one after another, the deflection of 100
@@ -288,17 +290,24 @@ computation, we give some general information about FPGAs and the board we use.
  
  \subsection{FPGAs}
  
-A field-programmable gate  array (FPGA) is an integrated  circuit designed to be
-configured by  the customer.  A hardware  description language (HDL)  is used to
-configure a  FPGA. FGPAs are  composed of programmable logic  components, called
-logic blocks.  These blocks can be  configured to perform simple (AND, XOR, ...)
-or  complex  combinational  functions.    Logic  blocks  are  interconnected  by
-reconfigurable links. Modern FPGAs contain memory elements and multipliers which
-enable to  simplify the design  and to increase  the speed. As the  most complex
-operation  on  FGPAs  is the  multiplier,  design  of  FGPAs should  use  simple
-operations. For example,  a divider is not an operation available and it should
-be programmed using simplest operations.
-
+A field-programmable gate array (FPGA) is an integrated circuit
+designed to be configured by the customer. FGPAs are composed of
+programmable logic components, called configurable logic blocks
+(CLB). These blocks mainly contains look-up tables (LUT), flip/flops
+(F/F) and latches, organized in one or more slices connected
+together. Each CLB can be configured to perform simple (AND, XOR, ...)
+or complex combinational functions. They are interconnected by
+reconfigurable links. Modern FPGAs contain memory elements and
+multipliers which enable to simplify the design and to increase the
+performance. Nevertheless, all other complex operations, like
+division, trigonometric functions, $\ldots$ are not available and must
+be done by configuring a set of CLBs.
+
+Since this configuration is not obvious at all, it can be done via a
+framework that synthetize a design written in an hardware description
+language (HDL), and after, that place and route 
+
+ is used to configure a FPGA.
  FGPAs programming  is very different  from classic processors  programming. When
  logic blocks are  programmed and linked to perform an  operation, they cannot be
  reused anymore.  FPGAs are cadenced more slowly than classic processors but they
@@ -595,7 +604,7 @@ largely beyond the worst experimental ones.
  
  \begin{figure}[ht]
  \begin{center}
-  \includegraphics[width=9cm]{intens-noise20-spl}
+  \includegraphics[width=9cm]{intens-noise20}
  \end{center}
  \caption{Sample of worst profile for N=10}
  \label{fig:noise20}
@@ -603,7 +612,7 @@ largely beyond the worst experimental ones.
  
  \begin{figure}[ht]
  \begin{center}
-  \includegraphics[width=9cm]{intens-noise60-lsq}
+  \includegraphics[width=9cm]{intens-noise60}
  \end{center}
  \caption{Sample of worst profile for N=30}
  \label{fig:noise60}
@@ -616,7 +625,7 @@ SPL on $N = k\times M$, i.e. the number of interpolated points.
  
  We assume that $M=20$, $nb_s=1024$, $k=4$, all possible parts are
  already in lookup tables and a limited set of operations (+, -, *, /,
-<, >) is taken account. Translating the two algorithms in C code, we
+$<$, $>$) is taken account. Translating the two algorithms in C code, we
  obtain about 430 operations for LSQ and 1550 (plus few tenth for
  $atan$) for SPL. This result is largely in favor of LSQ. Nevertheless,
  considering the total number of operations is not really pertinent for