qlq modifs

author couturie <couturie@carcariass.(none)>

Fri, 28 Oct 2011 15:45:26 +0000 (17:45 +0200)

committer couturie <couturie@carcariass.(none)>

Fri, 28 Oct 2011 15:45:26 +0000 (17:45 +0200)
author couturie <couturie@carcariass.(none)>
Fri, 28 Oct 2011 15:45:26 +0000 (17:45 +0200)
committer couturie <couturie@carcariass.(none)>
Fri, 28 Oct 2011 15:45:26 +0000 (17:45 +0200)
diff --git a/dmems12.tex b/dmems12.tex

index ef49def2b131e5bab73b7568073f5df5d307bddc..698f99ec3e8fd9da2e0b68877d3bf0e803f28ec7 100644 (file)
--- a/dmems12.tex
+++ b/dmems12.tex
@@ -268,7 +268,7 @@ obvious solution is to parallelize the computations, for example on a GPU.
  Nevertheless, the cost of transferring profile in GPU memory and of taking
  back results would be prohibitive compared to computation time.
  
  Nevertheless, the cost of transferring profile in GPU memory and of taking
  back results would be prohibitive compared to computation time.
  
-We remark that when possible, it is more efficient to pipeline the
+It should be noticed that when possible, it is more efficient to pipeline the
  computation. For example, supposing that 200 profiles of 20 pixels
  could be pushed sequentially in a pipelined unit cadenced at a 100MHz
  (i.e. a pixel enters in the unit each 10ns), all profiles would be
  computation. For example, supposing that 200 profiles of 20 pixels
  could be pushed sequentially in a pipelined unit cadenced at a 100MHz
  (i.e. a pixel enters in the unit each 10ns), all profiles would be
@@ -332,7 +332,7 @@ Furthermore, even if FPGAs are cadenced more slowly than classic processors,
  they can perform pipelines as well as parallel operations. A pipeline
  consists in cutting a process in a sequence of small tasks, taking the same
  execution time. It accepts a new data at each clock top, thus, after a known
  they can perform pipelines as well as parallel operations. A pipeline
  consists in cutting a process in a sequence of small tasks, taking the same
  execution time. It accepts a new data at each clock top, thus, after a known
-latency, it also provides a result at each clock top. We observe that the
+latency, it also provides a result at each clock top. The drawback is that the
  components of a task are not reusable by another one. Nevertheless, this is
  the most efficient technique on FPGAs. Because of their architecture, it is
  also very easy to process several data concurrently. Finally, the best
  components of a task are not reusable by another one. Nevertheless, this is
  the most efficient technique on FPGAs. Because of their architecture, it is
  also very easy to process several data concurrently. Finally, the best
@@ -727,7 +727,7 @@ factors.
  
  Consequently, we have determined the maximum value of each variable as
  a function of the scale factors and the profile size involved in the
  
  Consequently, we have determined the maximum value of each variable as
  a function of the scale factors and the profile size involved in the
-algorithm. It gave us the the maximum number of bits necessary to code
+algorithm. It gave us the maximum number of bits necessary to code
  them. We have chosen the scale factors so that any variable (except
  the covariance) fits in 18 bits, which is the maximum input size of
  DSPs. In this way, all multiplications (except one with covariance)
  them. We have chosen the scale factors so that any variable (except
  the covariance) fits in 18 bits, which is the maximum input size of
  DSPs. In this way, all multiplications (except one with covariance)
@@ -757,7 +757,7 @@ coding and the other with Matlab using the Simulink HDL coder feature~\cite%
  {HDLCoder}. Although the approaches are completely different we obtained
  quite comparable VHDL codes. Each approach has advantages and drawbacks.
  Roughly speaking, hand coding provides beautiful and much better structured
  {HDLCoder}. Although the approaches are completely different we obtained
  quite comparable VHDL codes. Each approach has advantages and drawbacks.
  Roughly speaking, hand coding provides beautiful and much better structured
-code while Simulink HDL coder allows for fast code production. In
+code while Simulink HDL coder allows  fast code production. In
  terms of throughput and latency, simulations show that the two approaches
  yield close results with a slight advantage for hand coding.
  
  terms of throughput and latency, simulations show that the two approaches
  yield close results with a slight advantage for hand coding.
  
@@ -784,14 +784,14 @@ in order to "drive" signals to communicate between i.MX and other
  components. It is mainly used to start to flush profiles and to
  retrieve the computed phases in RAM. Unfortunately, the first designs
  could not be placed and routed with ISE on the Spartan6 with a 100MHz
  components. It is mainly used to start to flush profiles and to
  retrieve the computed phases in RAM. Unfortunately, the first designs
  could not be placed and routed with ISE on the Spartan6 with a 100MHz
-clock. The main problems were encountered with series of arthmetic
+clock. The main problems were encountered with series of arithmetic
  operations and more especially with RAM outputs used in DSPs. So, we
  needed to decompose some parts of the pipeline, which added few clock
  cycles. Finally, we obtained a bitstream that has been successfully
  tested on the board.
  
  Its latency is of 112 cycles and it computes a new phase every 40
  operations and more especially with RAM outputs used in DSPs. So, we
  needed to decompose some parts of the pipeline, which added few clock
  cycles. Finally, we obtained a bitstream that has been successfully
  tested on the board.
  
  Its latency is of 112 cycles and it computes a new phase every 40
-cycles. For 100 cantilevers, it takes $(112+200\times 40).10=81.12\mu
+cycles. For 100 cantilevers, it takes $(112+200\times 40)\times 10ns =81.12\mu
  $s to compute their deflection. It corresponds to about 12300 images
  per second, which is largely beyond the camera capacities and the
  possibility to extract a new profile from an image every 40
  $s to compute their deflection. It corresponds to about 12300 images
  per second, which is largely beyond the camera capacities and the
  possibility to extract a new profile from an image every 40
author	couturie <couturie@carcariass.(none)>
	Fri, 28 Oct 2011 15:45:26 +0000 (17:45 +0200)
committer	couturie <couturie@carcariass.(none)>
	Fri, 28 Oct 2011 15:45:26 +0000 (17:45 +0200)