ispell

author Raphael Couturier <raphael.couturier@univ-fcomte.fr>

Fri, 21 Oct 2011 15:58:14 +0000 (17:58 +0200)

committer Raphael Couturier <raphael.couturier@univ-fcomte.fr>

Fri, 21 Oct 2011 15:58:14 +0000 (17:58 +0200)
author Raphael Couturier <raphael.couturier@univ-fcomte.fr>
Fri, 21 Oct 2011 15:58:14 +0000 (17:58 +0200)
committer Raphael Couturier <raphael.couturier@univ-fcomte.fr>
Fri, 21 Oct 2011 15:58:14 +0000 (17:58 +0200)
diff --git a/dmems12.tex b/dmems12.tex

index 01b56c5f61cebd2ce4f0e735076992728213d69e..1dd3b4a2de6c5480ecc61bfad98723515cbf48d5 100644 (file)
--- a/dmems12.tex
+++ b/dmems12.tex
@@ -69,7 +69,7 @@
    on splines to determine the phase of interference fringes, and thus
    the deflection. Computations were performed on a PC with LabView.
    In this paper, we propose a new approach based on the least square
-  methods and its implementation that we developped on a FPGA, using
+  methods and its implementation that we developed on a FPGA, using
    the pipelining technique. Simulations and real tests showed us that
    this implementation is very efficient and should allow us to control
    a cantilevers array in real time.
@@ -87,15 +87,15 @@ FPGA, cantilever, interferometry.
  \section{Introduction}
  
  Cantilevers  are  used  inside  atomic  force  microscope (AFM) which  provides  high
-resolution images of  surfaces.  Several technics have been  used to measure the
-displacement  of cantilevers  in litterature.   For example,  it is  possible to
+resolution images of  surfaces.  Several techniques have been  used to measure the
+displacement  of cantilevers  in literature.   For example,  it is  possible to
  determine  accurately  the  deflection  with different  mechanisms. 
  In~\cite{CantiPiezzo01},   authors  used   piezoresistor  integrated   into  the
  cantilever.   Nevertheless this  approach  suffers from  the  complexity of  the
  microfabrication  process needed  to  implement the  sensor  in the  cantilever.
  In~\cite{CantiCapacitive03},  authors  have  presented an  cantilever  mechanism
-based on  capacitive sensing. This kind  of technic also  involves to instrument
-the cantiliver which result in a complex fabrication process.
+based on  capacitive sensing. This kind  of technique also  involves to instrument
+the cantilever which result in a complex fabrication process.
  
  In this  paper our attention is focused  on a method based  on interferometry to
  measure cantilevers' displacements.  In  this method cantilevers are illuminated
@@ -109,8 +109,8 @@ spline to estimate the cantilevers' positions.
  The overall process gives accurate results but all the computations
  are performed on a standard computer using LabView.  Consequently, the
  main drawback of this implementation is that the computer is a
-bootleneck. In this paper we propose to use a method based on least
-square and to implement all the computation on a FGPA.
+bottleneck. In this paper we propose to use a method based on least
+square and to implement all the computation on a FPGA.
  
  The remainder  of the paper  is organized as  follows. Section~\ref{sec:measure}
  describes  more precisely  the measurement  process. Our  solution based  on the
@@ -133,16 +133,16 @@ presented.
  %% qu'elle est.
  
  In order to develop simple,  cost effective and user-friendly cantilever arrays,
-authors   of    ~\cite{AFMCSEM11}   have   developped   a    system   based   of
+authors   of    ~\cite{AFMCSEM11}   have   developed   a    system   based   of
  interferometry. In opposition to other optical based systems, using a laser beam
-deflection scheme and  sentitive to the angular displacement  of the cantilever,
+deflection scheme and  sensitive to the angular displacement  of the cantilever,
  interferometry  is sensitive  to  the  optical path  difference  induced by  the
  vertical displacement of the cantilever.
  
  The system build by these authors is based on a Linnick
-interferomter~\cite{Sinclair:05}.  It is illustrated in
+interferometer~\cite{Sinclair:05}.  It is illustrated in
  Figure~\ref{fig:AFM}.  A laser diode is first split (by the splitter)
-into a reference beam and a sample beam that reachs the cantilever
+into a reference beam and a sample beam that reaches the cantilever
  array.  In order to be able to move the cantilever array, it is
  mounted on a translation and rotational hexapod stage with five
  degrees of freedom. The optical system is also fixed to the stage.
@@ -208,7 +208,7 @@ I(x) = ax+b+A.cos(2\pi f.x + \theta)
  where $x$ is the position of a pixel in its associated segment.
  
  The global method consists in two main sequences. The first one aims
-to determin the frequency $f$ of each profile with an algorithm based
+to determine the frequency $f$ of each profile with an algorithm based
  on spline interpolation (see section \ref{algo-spline}). It also
  computes the coefficient used for unwrapping the phase. The second one
  is the acquisition loop, while which images are taken at regular time
@@ -240,7 +240,7 @@ that computing the deflection of a single
  cantilever should take less than 25$\mu$s, thus 12.5$\mu$s by phase.\\
  
  In fact, this timing is a very hard constraint. Let consider a very
-small programm that initializes twenty million of doubles in memory
+small program that initializes twenty million of doubles in memory
  and then does 1000000 cumulated sums on 20 contiguous values
  (experimental profiles have about this size). On an intel Core 2 Duo
  E6650 at 2.33GHz, this program reaches an average of 155Mflops. 
@@ -313,17 +313,17 @@ available  and  must  be  done  by   configuring  a  set  of  CLBs.  Since  this
  configuration  is not  obvious at  all, it  can be  done via  a  framework, like
  ISE~\cite{ISE}. Such  a software  can synthetize a  design written in  a hardware
  description language  (HDL), map it onto  CLBs, place/route them  for a specific
-FPGA, and finally  produce a bitstream that is used to  configre the FPGA. Thus,
-from  the developper  point of  view,  the main  difficulty is  to translate  an
+FPGA, and finally  produce a bitstream that is used to  configure the FPGA. Thus,
+from  the developer  point of  view,  the main  difficulty is  to translate  an
  algorithm in HDL code, taking  account FPGA resources and constraints like clock
  signals and I/O values that drive the FPGA.
  
  Indeed, HDL programming is very different from classic languages like
  C. A program can be seen as a state-machine, manipulating signals that
  evolve from state to state. By the way, HDL instructions can execute
-concurrently. Basic logic operations are used to agregate signals to
+concurrently. Basic logic operations are used to aggregate signals to
  produce new states and assign it to another signal. States are mainly
-expressed as arrays of bits. Fortunaltely, libraries propose some
+expressed as arrays of bits. Fortunately, libraries propose some
  higher levels representations like signed integers, and arithmetic
  operations.
  
@@ -342,7 +342,7 @@ pipelines in order to handle multiple data streams.
  
  \subsection{The board}
  
-The board we use is designed by the Armadeus compagny, under the name
+The board we use is designed by the Armadeus company, under the name
  SP Vision. It consists in a development board hosting a i.MX27 ARM
  processor (from Freescale). The board includes all classical
  connectors: USB, Ethernet, ... A Flash memory contains a Linux kernel
@@ -381,8 +381,8 @@ intensity in gray levels. Let call $I(x)$ the intensity of profile in $x
  \in [0,M[$. 
  
  At first, only $M$ values of $I$ are known, for $x = 0, 1,
-\ldots,M-1$. A normalisation allows to scale known intensities into
-$[-1,1]$. We compute splines that fit at best these normalised
+\ldots,M-1$. A normalization allows to scale known intensities into
+$[-1,1]$. We compute splines that fit at best these normalized
  intensities. Splines are used to interpolate $N = k\times M$ points
  (typically $k=4$ is sufficient), within $[0,M[$. Let call $x^s$ the
  coordinates of these $N$ points and $I^s$ their intensities.
@@ -416,7 +416,7 @@ determine these four parameters. Since it is an iterative process
  ending with a convergence criterion, it is obvious that it is not
  particularly adapted to our design goals.
  
-Fortunatly, it is quite simple to reduce the number of parameters to
+Fortunately, it is quite simple to reduce the number of parameters to
  only $\theta$. Let $x^p$ be the coordinates of pixels in a segment of
  size $M$. Thus, $x^p = 0, 1, \ldots, M-1$. Let $I(x^p)$ be their
  intensity. Firstly, we "remove" the slope by computing:
@@ -560,7 +560,7 @@ Finally, the whole summarizes in an algorithm (called LSQ in the following) in t
  
  We compared the two algorithms on the base of three criteria:
  \begin{itemize}
-\item precision of results on a cosinus profile, distorted with noise,
+\item precision of results on a cosines profile, distorted with noise,
  \item number of operations,
  \item complexity to implement an FPGA version.
  \end{itemize}
@@ -602,13 +602,13 @@ Table \ref{tab:algo_prec} gives the maximum and average error for the two algori
    30 & 17.06 & 2.6 & 13.94 & 2.45 \\ \hline
  
  \end{tabular}
-\caption{Error (in \%) for cosinus profiles, with noise.}
+\caption{Error (in \%) for cosines profiles, with noise.}
  \label{tab:algo_prec}
  \end{center}
  \end{table}
  
  These results show that the two algorithms are very close, with a
-slight advantage for LSQ. Furthemore, both behave very well against
+slight advantage for LSQ. Furthermore, both behave very well against
  noise. Assuming the experimental ratio of 50 (see above), an error of
  1 percent on phase correspond to an error of 0.5nm on the lever
  deflection, which is very close to the best precision.
@@ -619,7 +619,7 @@ profiles. Nevertheless, we can see on figure \ref{fig:noise20} the
  profile with $N=10$ that leads to the biggest error. It is a bit
  distorted, with pikes and straight/rounded portions, and relatively
  close to most of that come from experiments. Figure \ref{fig:noise60}
-shows a sample of worst profile for $N=30$. It is completly distorted,
+shows a sample of worst profile for $N=30$. It is completely distorted,
  largely beyond the worst experimental ones. 
  
  \begin{figure}[ht]
@@ -666,12 +666,12 @@ problem:  DSP48 take  inputs of  18  bits maximum.  For larger  multiplications,
  several DSP must be combined, increasing the latency.
  
  Nevertheless, the hardest constraint does not come from the FPGA characteristics
-but from the algorithms. Their VHDL  implentation will be efficient only if they
+but from the algorithms. Their VHDL  implementation will be efficient only if they
  can be fully (or near) pipelined. By the way, the choice is quickly done: only a
  small  part of  SPL  can be.   Indeed,  the computation  of spline  coefficients
  implies to solve  a tridiagonal system $A.m =  b$. Values in $A$ and  $b$ can be
  computed from  incoming pixels intensity  but after, the back-solve  starts with
-the  lastest  values,  which  breaks  the  pipeline.  Moreover,  SPL  relies  on
+the  latest  values,  which  breaks  the  pipeline.  Moreover,  SPL  relies  on
  interpolating far more points than profile size. Thus, the end of SPL works on a
  larger amount of data than the beginning, which also breaks the pipeline.
  
@@ -703,14 +703,14 @@ integer values. We use a very simple quantization by multiplying
  double precision values by a power of two, keeping the integer
  part. For example, all values stored in lut$_s$, lut$_c$, $\ldots$ are
  scaled by 1024. Since LSQ also computes average, variance, ... to
-remove the slope, the result of implied euclidian divisions may be
+remove the slope, the result of implied Euclidean divisions may be
  relatively wrong. To avoid that, we also scale the pixel intensities
-by a power of two. Futhermore, assuming $nb_s$ is fixed, these
+by a power of two. Furthermore, assuming $nb_s$ is fixed, these
  divisions have a known denominator. Thus, they can be replaced by
  their multiplication/shift counterpart. Finally, all other
  multiplications or divisions by a power of two have been replaced by
  left or right bit shifts. By the way, the code only contains
-additions, substractions and multiplications of signed integers, which
+additions, subtractions and multiplications of signed integers, which
  is perfectly adapted to FGPAs.
  
  As said above, hardware constraints have a great influence on the VHDL
@@ -758,7 +758,7 @@ RAM. We also added a wishbone : a component that can "drive" signals
  to communicate between i.MX and others components. It is mainly used
  to start to flush profiles and to retrieve the computed phases in RAM.
  
-Unfortunatly, the first designs could not be placed and route with ISE
+Unfortunately, the first designs could not be placed and route with ISE
  on the Spartan6 with a 100MHz clock. The main problems came from
  routing values from RAMs to DSPs and obtaining a result under 10ns. By
  the way, we needed to decompose some parts of the pipeline, which adds
@@ -788,8 +788,8 @@ with a pipeline technique.  Consequently, it enables to handle a new
  profile image very quickly. Currently we have performed simulations
  and real tests on a Spartan6 FPGA.
  
-In future  work, we want to couple  our algorithm with a  high speed camera
-and we plan to control the whole AFM system.
+In future work,  we plan to study  the quantization. Then we want  to couple our
+algorithm with a high speed camera and we plan to control the whole AFM system.
  
  \bibliographystyle{plain}
  \bibliography{biblio}
author	Raphael Couturier <raphael.couturier@univ-fcomte.fr>
	Fri, 21 Oct 2011 15:58:14 +0000 (17:58 +0200)
committer	Raphael Couturier <raphael.couturier@univ-fcomte.fr>
	Fri, 21 Oct 2011 15:58:14 +0000 (17:58 +0200)