From c6645d99e9614d4fe8c5d8699f94440ba339900f Mon Sep 17 00:00:00 2001
From: couturie <couturie@carcariass.(none)>
Date: Thu, 20 Oct 2011 20:39:09 +0200
Subject: [PATCH] ajout qlq ref et qlq corrections

---
 biblio.bib  | 18 +++++++++++++
 dmems12.tex | 76 ++++++++++++++++++++++++++---------------------------
 2 files changed, 56 insertions(+), 38 deletions(-)

diff --git a/biblio.bib b/biblio.bib
index a795337..46a265c 100644
--- a/biblio.bib
+++ b/biblio.bib
@@ -62,3 +62,21 @@ year = {2005},
 }
 
 
+@Incollection{ISE,
+	author = 	 {Hitesh Patel},
+	title = 		 {Unlock New Levels of Productivity for Your Design Using {ISE} Design Suite 12},
+	publisher = {Xilinx White paper},
+	month = 		 {May},
+	year = 		 {2010},
+	url = 		 {http://www.xilinx.com/support/documentation/white_papers/wp368_ISE_Design_Suite_v12.pdf},
+
+}
+
+@Incollection{HDLCoder,
+	title = 		 {Simulink {HDL} Coder 2.1},
+	publisher = {Matworks datasheet},
+	year = 		 {2011},
+	url = 		 {http://www.mathworks.fr/products/datasheets/pdf/simulink-hdl-coder.pdf},
+
+}
+
diff --git a/dmems12.tex b/dmems12.tex
index 94e96e4..ad693f6 100644
--- a/dmems12.tex
+++ b/dmems12.tex
@@ -289,25 +289,23 @@ computation, we give some general information about FPGAs and the board we use.
 
 \subsection{FPGAs}
 
-A field-programmable gate array (FPGA) is an integrated circuit
-designed to be configured by the customer. FGPAs are composed of
-programmable logic components, called configurable logic blocks
-(CLB). These blocks mainly contains look-up tables (LUT), flip/flops
-(F/F) and latches, organized in one or more slices connected
-together. Each CLB can be configured to perform simple (AND, XOR, ...)
-or complex combinational functions. They are interconnected by
-reconfigurable links. Modern FPGAs contain memory elements and
-multipliers which enable to simplify the design and to increase the
-performance. Nevertheless, all other complex operations, like
-division, trigonometric functions, $\ldots$ are not available and must
-be done by configuring a set of CLBs. Since this configuration is not
-obvious at all, it can be done via a framework, like ISE. Such a
-software can synthetize a design written in a hardware description
-language (HDL), map it onto CLBs, place/route them for a specific
-FPGA, and finally produce a bitstream that is used to configre the
-FPGA. Thus, from the developper point of view, the main difficulty is
-to translate an algorithm in HDL code, taking account FPGA resources
-and constraints like clock signals and I/O values that drive the FPGA.
+A field-programmable gate  array (FPGA) is an integrated  circuit designed to be
+configured by the customer. FGPAs are composed of programmable logic components,
+called  configurable logic blocks  (CLB). These  blocks mainly  contains look-up
+tables  (LUT), flip/flops (F/F)  and latches,  organized in  one or  more slices
+connected together. Each CLB can be configured to perform simple (AND, XOR, ...)
+or complex  combinational functions.  They are interconnected  by reconfigurable
+links.  Modern FPGAs  contain memory  elements and  multipliers which  enable to
+simplify the  design and  to increase the  performance. Nevertheless,  all other
+complex  operations, like  division, trigonometric  functions, $\ldots$  are not
+available  and  must  be  done  by   configuring  a  set  of  CLBs.  Since  this
+configuration  is not  obvious at  all, it  can be  done via  a  framework, like
+ISE~\cite{ISE}. Such  a software  can synthetize a  design written in  a hardware
+description language  (HDL), map it onto  CLBs, place/route them  for a specific
+FPGA, and finally  produce a bitstream that is used to  configre the FPGA. Thus,
+from  the developper  point of  view,  the main  difficulty is  to translate  an
+algorithm in HDL code, taking  account FPGA resources and constraints like clock
+signals and I/O values that drive the FPGA.
 
 Indeed, HDL programming is very different from classic languages like
 C. A program can be seen as a state-machine, manipulating signals that
@@ -699,30 +697,32 @@ will include real experiments in the final version of this paper.
 From the  LSQ algorithm,  we have written  a C  program which uses  only integer
 values  that have  been  previously  scaled. The  quantization  of doubles  into
 integers has  been performed  in order  to obtain a  good trade-off  between the
-number of bits  used and the precision. Finally, we have  compared the result of
-the LSQ version  using integer and double. We have observed  that the results of
+number of bits  used and the precision. We have  compared the result of
+the LSQ version  using integers and doubles. We have observed  that the results of
 both versions were similar.
 
 Then we have built  two versions of VHDL codes: one directly  by hand coding and
-the other with Matlab using simulink HDL coder feature. Although the approach is
-completely different we  have obtain VHDL codes that  are quite comparable. Each
-approach has  advantages and drawbacks.  Roughly speaking,  hand coding provides
-beautiful and much better structures  code while HDL coder provides code faster.
-In  terms  of speed  of  code,  we think  that  both  approaches  will be  quite
-comparable. Real experiments  will confirm that.  In the  LSQ algorithm, we have
-replaced all the divisions by  multiplications by a constant since divisions are
-performed  with  constants depending  of  the number  of  pixels  in the  profile
-(i.e. $M$).
+the     other     with    Matlab     using     the     Simulink    HDL     coder
+feature~\cite{HDLCoder}. Although  the approach is completely  different we have
+obtain VHDL  codes that are quite  comparable. Each approach  has advantages and
+drawbacks.   Roughly speaking, hand  coding provides  beautiful and  much better
+structured code  while HDL  coder provides  code faster.  In  terms of  speed of
+code, we  think that both  approaches will be  quite comparable with  a slightly
+advantage for hand coding.  We hope that real experiments will confirm that.  In
+the  LSQ algorithm, we  have replaced  all the  divisions by  multiplications by
+constants since divisions  are performed with constants depending  of the number
+of pixels in the profile (i.e. $M$).
 
 \subsection{Simulation}
 
-Currently, we only have simulated our VHDL codes with GHDL and GTKWave (two free
-tools with linux). Both approaches led to correct results. At the beginning with
-simulations our pipiline could compute a new phase each 33 cycles and the length
-of the pipeline was  equal to 95 cycles. When we tried  to generate the bitsream
-with ISE environment we had many problems because many stages required more than
-the 10$n$s availabe. So we needed to  decompose some part of the pipeline in order
-to add some cycles and siplify some parts.
+Currently, we have only simulated our VHDL codes with GHDL and GTKWave (two free
+tools with linux).  Both approaches led to correct results.  At the beginning of
+our simulations, our  pipiline could compute a new phase each  33 cycles and the
+length of the  pipeline was equal to  95 cycles.  When we tried  to generate the
+corresponding bitsream  with ISE environment  we had many problems  because many
+stages required  more than the  10$n$s required by  the clock frequency.   So we
+needed to decompose  some part of the  pipeline in order to add  some cycles and
+simplify some parts between a clock top.
 % ghdl + gtkwave
 % au mieux : une phase tous les 33 cycles, latence de 95 cycles.
 % mais routage/placement impossible.
@@ -730,7 +730,7 @@ to add some cycles and siplify some parts.
 
 Currently both  approaches provide synthesable  bitstreams with ISE.   We expect
 that the  pipeline will  have a latency  of 112  cycles, i.e. 1.12$\mu$s  and it
-could accept new line of pixel each 48 cycles, i.e. 480$n$s.
+could accept new profiles of pixel each 48 cycles, i.e. 480$n$s.
 
 % pas fait mais prÃ©vision d'une sortie tous les 480ns avec une latence de 1120
 
-- 
2.39.5