X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/dmems12.git/blobdiff_plain/c4f311710c2e02f619fac3efdb60cec44b770cf3..77fc759e3cccd43e2d9f6ee355069a0e80e5221f:/dmems12.tex?ds=sidebyside

diff --git a/dmems12.tex b/dmems12.tex
index 61bbdba..d8d592d 100644
--- a/dmems12.tex
+++ b/dmems12.tex
@@ -1,5 +1,447 @@
-\documentclass{article}
+
+\documentclass[10pt, conference, compsocconf]{IEEEtran}
+%\usepackage{latex8}
+%\usepackage{times}
+\usepackage[utf8]{inputenc}
+%\usepackage[cyr]{aeguill}
+%\usepackage{pstricks,pst-node,pst-text,pst-3d}
+%\usepackage{babel}
+\usepackage{amsmath}
+\usepackage{url}
+\usepackage{graphicx}
+\usepackage{thumbpdf}
+\usepackage{color}
+\usepackage{moreverb}
+\usepackage{commath}
+\usepackage{subfigure}
+%\input{psfig.sty}
+\usepackage{fullpage}
+\usepackage{fancybox}
+
+\usepackage[ruled,lined,linesnumbered]{algorithm2e}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
+\newcommand{\noun}[1]{\textsc{#1}}
+
+\newcommand{\tab}{\ \ \ }
+
+
 
 \begin{document}
-blabla
+
+
+%% \author{\IEEEauthorblockN{Authors Name/s per 1st Affiliation (Author)}
+%% \IEEEauthorblockA{line 1 (of Affiliation): dept. name of organization\\
+%% line 2: name of organization, acronyms acceptable\\
+%% line 3: City, Country\\
+%% line 4: Email: name@xyz.com}
+%% \and
+%% \IEEEauthorblockN{Authors Name/s per 2nd Affiliation (Author)}
+%% \IEEEauthorblockA{line 1 (of Affiliation): dept. name of organization\\
+%% line 2: name of organization, acronyms acceptable\\
+%% line 3: City, Country\\
+%% line 4: Email: name@xyz.com}
+%% }
+
+
+
+\title{Using FPGAs for high speed and real time cantilever deflection estimation}
+\author{\IEEEauthorblockN{RaphaÃ«l Couturier\IEEEauthorrefmark{1}, StÃ©phane Domas\IEEEauthorrefmark{1}, GwenhaÃ«l Goavec-Merou\IEEEauthorrefmark{2} and Michel Lenczner\IEEEauthorrefmark{2}}
+\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, DISC, University of Franche-Comte, Belfort, France\\
+\{raphael.couturier,stephane.domas\}@univ-fcomte.fr}
+\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Time-Frequency, University of Franche-Comte, BesanÃ§on, France\\
+\{michel.lenczner@utbm.fr,gwenhael.goavec@trabucayre.com}
+}
+
+
+
+
+
+
+\maketitle
+
+\thispagestyle{empty}
+
+\begin{abstract}
+
+  
+
+{\it keywords}: FPGA, cantilever, interferometry.
+\end{abstract}
+
+\section{Introduction}
+
+Cantilevers  are  used  inside  atomic  force  microscope (AFM) which  provides  high
+resolution images of  surfaces.  Several technics have been  used to measure the
+displacement  of cantilevers  in litterature.   For example,  it is  possible to
+determine  accurately  the  deflection  with different  mechanisms. 
+In~\cite{CantiPiezzo01},   authors  used   piezoresistor  integrated   into  the
+cantilever.   Nevertheless this  approach  suffers from  the  complexity of  the
+microfabrication  process needed  to  implement the  sensor  in the  cantilever.
+In~\cite{CantiCapacitive03},  authors  have  presented an  cantilever  mechanism
+based on  capacitive sensing. This kind  of technic also  involves to instrument
+the cantiliver which result in a complex fabrication process.
+
+In this  paper our attention is focused  on a method based  on interferometry to
+measure cantilevers' displacements.  In  this method cantilevers are illuminated
+by  an optic  source. The  interferometry produces  fringes on  each cantilevers
+which enables to  compute the cantilever displacement.  In  order to analyze the
+fringes a  high speed camera  is used. Images  need to be processed  quickly and
+then  a estimation  method is  required to  determine the  displacement  of each
+cantilever.  In~\cite{AFMCSEM11},  the authors have  used an algorithm  based on
+spline to estimate the cantilevers' positions.
+
+   The overall  process gives
+accurate results  but all the computation  are performed on  a standard computer
+using labview.  Consequently,  the main drawback of this  implementation is that
+the computer is a bootleneck in the overall process. In this paper we propose to
+use a  method based on least  square and to  implement all the computation  on a
+FGPA.
+
+The remainder  of the paper  is organized as  follows. Section~\ref{sec:measure}
+describes  more precisely  the measurement  process. Our  solution based  on the
+least  square   method  and   the  implementation  on   FPGA  is   presented  in
+Section~\ref{sec:solus}.       Experimentations      are       described      in
+Section~\ref{sec:results}.  Finally  a  conclusion  and  some  perspectives  are
+presented.
+
+
+
+%% quelques ref commentÃ©es sur les calculs basÃ©s sur l'interfÃ©romÃ©trie
+
+\section{Measurement principles}
+\label{sec:measure}
+
+
+
+
+
+
+
+
+\subsection{Architecture}
+\label{sec:archi}
+%% description de l'architecture gÃ©nÃ©rale de l'acquisition d'images
+%% avec au milieu une unitÃ© de traitement dont on ne prÃ©cise pas ce
+%% qu'elle est.
+
+In order to develop simple,  cost effective and user-friendly cantilever arrays,
+authors   of    ~\cite{AFMCSEM11}   have   developped   a    system   based   of
+interferometry. In opposition to other optical based systems, using a laser beam
+deflection scheme and  sentitive to the angular displacement  of the cantilever,
+interferometry  is sensitive  to  the  optical path  difference  induced by  the
+vertical displacement of the cantilever.
+
+The system build  by authors of~\cite{AFMCSEM11} has been  developped based on a
+Linnick     interferomter~\cite{Sinclair:05}.    It     is     illustrated    in
+Figure~\ref{fig:AFM}.  A  laser diode  is first split  (by the splitter)  into a
+reference beam and a sample beam  that reachs the cantilever array.  In order to
+be  able to  move  the cantilever  array, it  is  mounted on  a translation  and
+rotational hexapod  stage with  five degrees of  freedom. The optical  system is
+also fixed to the stage.  Thus,  the cantilever array is centered in the optical
+system which  can be adjusted accurately.   The beam illuminates the  array by a
+microscope objective  and the  light reflects on  the cantilevers.  Likewise the
+reference beam  reflects on a  movable mirror.  A  CMOS camera chip  records the
+reference and  sample beams which  are recombined in  the beam splitter  and the
+interferogram.   At the  beginning of  each  experiment, the  movable mirror  is
+fitted  manually in  order to  align the  interferometric  fringes approximately
+parallel  to the cantilevers.   When cantilevers  move due  to the  surface, the
+bending of  cantilevers produce  movements in the  fringes that can  be detected
+with    the    CMOS    camera.     Finally    the    fringes    need    to    be
+analyzed. In~\cite{AFMCSEM11}, the authors used a LabView program to compute the
+cantilevers' movements from the fringes.
+
+\begin{figure}    
+\begin{center}
+\includegraphics[width=\columnwidth]{AFM}
+\end{center}
+\caption{schema of the AFM}
+\label{fig:AFM}   
+\end{figure}
+
+
+%% image tirÃ©e des expÃ©riences.
+
+\subsection{Cantilever deflection estimation}
+\label{sec:deflest}
+
+As shown on image \ref{img:img-xp}, each cantilever is covered by
+interferometric fringes. The fringes will distort when cantilevers are
+deflected. Estimating the deflection is done by computing this
+distortion. For that, (ref A. Meister + M Favre) proposed a method
+based on computing the phase of the fringes, at the base of each
+cantilever, near the tip, and on the base of the array. They assume
+that a linear relation binds these phases, which can be use to
+"unwrap" the phase at the tip and to determine the deflection.\\
+
+More precisely, segment of pixels are extracted from images taken by a
+high-speed camera. These segments are large enough to cover several
+interferometric fringes and are placed at the base and near the tip of
+the cantilevers. They are called base profile and tip profile in the
+following. Furthermore, a reference profile is taken on the base of
+the cantilever array.
+
+The pixels intensity $I$ (in gray level) of each profile is modelized by :
+
+\begin{equation}
+\label{equ:profile}
+I(x) = ax+b+A.cos(2\pi f.x + \theta)
+\end{equation}
+
+where $x$ is the position of a pixel in its associated segment.
+
+The global method consists in two main sequences. The first one aims
+to determin the frequency $f$ of each profile with an algorithm based
+on spline interpolation (see section \ref{algo-spline}). It also
+computes the coefficient used for unwrapping the phase. The second one
+is the acquisition loop, while which images are taken at regular time
+steps. For each image, the phase $\theta$ of all profiles is computed
+to obtain, after unwrapping, the deflection of cantilevers.
+
+\subsection{Design goals}
+\label{sec:goals}
+
+If we put aside some hardware issues like the speed of the link
+between the camera and the computation unit, the time to deserialize
+pixels and to store them in memory, ... the phase computation is
+obviously the bottle-neck of the whole process. For example, if we
+consider the camera actually in use, an exposition time of 2.5ms for
+$1024\times 1204$ pixels seems the minimum that can be reached. For a
+$10\times 10$ cantilever array, if we neglect the time to extract
+pixels, it implies that computing the deflection of a single
+cantilever should take less than 25$\mu$s, thus 12.5$\mu$s by phase.\\
+
+In fact, this timing is a very hard constraint. Let consider a very
+small programm that initializes twenty million of doubles in memory
+and then does 1000000 cumulated sums on 20 contiguous values
+(experimental profiles have about this size). On an intel Core 2 Duo
+E6650 at 2.33GHz, this program reaches an average of 155Mflops. It
+implies that the phase computation algorithm should not take more than
+$240\times 12.5 = 1937$ floating operations. For integers, it gives
+$3000$ operations.
+
+%% to be continued ...
+
+%% ï¿½ faire : timing de l'algo spline en C avec atan et tout le bordel.
+
+
+
+
+\section{Proposed solution}
+\label{sec:solus}
+
+
+\subsection{FPGA constraints}
+
+A field-programmable gate  array (FPGA) is an integrated  circuit designed to be
+configured by  the customer.  A hardware  description language (HDL)  is used to
+configure a  FPGA. FGPAs are  composed of programmable logic  components, called
+logic blocks.  These blocks can be  configured to perform simple (AND, XOR, ...)
+or  complex  combinational  functions.    Logic  blocks  are  interconnected  by
+reconfigurable  links. Modern  FPGAs  contains memory  elements and  multipliers
+which enables to simplify the design and increase the speed. As the most complex
+operation operation on FGPAs is the  multiplier, design of FGPAs should not used
+complex operations. For example, a divider  is not an available operation and it
+should be programmed using simple components.
+
+FGPAs programming  is very different  from classic processors  programming. When
+logic block are programmed and linked  to performed an operation, they cannot be
+reused anymore.  FPGA  are cadenced slowly than classic  processors but they can
+performed pipelined as  well as pipelined operations. A  pipeline provides a way
+manipulate data quickly  since at each clock top to handle  a new data. However,
+using  a  pipeline  consomes more  logics  and  components  since they  are  not
+reusable,  nevertheless it  is probably  the most  efficient technique  on FPGA.
+Parallel  operations   can  be  used   in  order  to  manipulate   several  data
+simultaneously. When  it is  possible, using  a pipeline is  a good  solution to
+manipulate  new  data  at  each  clock  top  and  using  parallelism  to  handle
+simultaneously several data streams.
+
+%% contraintes imposÃ©es par le FPGA : algo pipeline/parallele, pas d'op math complexe, ...
+
+
+\subsection{Considered algorithms}
+
+Two solutions have been studied to achieve phase computation. The
+original one, proposed by A. Meister and M. Favre, is based on
+interpolation by splines. It allows to compute frequency and
+phase. The second one, detailed in this article, is based on a
+classical least square method but suppose that frequency is already
+known.
+
+\subsubsection{Spline algorithm}
+\label{sec:algo-spline}
+Let consider a profile $P$, that is a segment of $M$ pixels with an
+intensity in gray levels. Let call $I(x)$ the intensity of profile in $x
+\in [0,M[$. 
+
+At first, only $M$ values of $I$ are known, for $x = 0, 1,
+\ldots,M-1$. A normalisation allows to scale known intensities into
+$[-1,1]$. We compute splines that fit at best these normalised
+intensities. Splines are used to interpolate $N = k\times M$ points
+(typically $k=3$ is sufficient), within $[0,M[$. Let call $x^s$ the
+coordinates of these $N$ points and $I^s$ their intensities.
+
+In order to have the frequency, the mean line $a.x+b$ (see equation \ref{equ:profile}) of $I^s$ is
+computed. Finding intersections of $I^s$ and this line allow to obtain
+the period thus the frequency.
+
+The phase is computed via the equation :
+\begin{equation}
+\theta = atan \left[ \frac{\sum_{i=0}^{N-1} sin(2\pi f x^s_i) \times I^s(x^s_i)}{\sum_{i=0}^{N-1} cos(2\pi f x^s_i) \times I^s(x^s_i)} \right]
+\end{equation}
+
+Two things can be noticed. Firstly, the frequency could also be
+obtained using the derivates of spline equations, which only implies
+to solve quadratic equations. Secondly, frequency of each profile is
+computed a single time, before the acquisition loop. Thus, $sin(2\pi f
+x^s_i)$ and $cos(2\pi f x^s_i)$ could also be computed before the loop, which leads to a
+much faster computation of $\theta$.
+
+\subsubsection{Least square algorithm}
+
+Assuming that we compute the phase during the acquisition loop,
+equation \ref{equ:profile} has only 4 parameters :$a, b, A$, and
+$\theta$, $f$ and $x$ being already known. Since $I$ is non-linear, a
+least square method based an Gauss-newton algorithm must be used to
+determine these four parameters. Since it is an iterative process
+ending with a convergence criterion, it is obvious that it is not
+particularly adapted to our design goals.
+
+Fortunatly, it is quite simple to reduce the number of parameters to
+only $\theta$. Let $x^p$ be the coordinates of pixels in a segment of
+size $M$. Thus, $x^p = 0, 1, \ldots, M-1$. Let $I(x^p)$ be their
+intensity. Firstly, we "remove" the slope by computing :
+
+\[I^{corr}(x^p) = I(x^p) - a.x^p - b\]
+
+Since linear equation coefficients are searched, a classical least
+square method can be used to determine $a$ and $b$ :
+
+\[a = \frac{covar(x^p,I(x^p))}{var(x^p)} \]
+
+Assuming an overlined symbol means an average, then :
+
+\[b = \overline{I(x^p)} - a.\overline{{x^p}}\]
+
+Let $A$ be the amplitude of $I^{corr}$, i.e. 
+
+\[A = \frac{max(I^{corr}) - min(I^{corr})}{2}\]
+
+Then, the least square method to find $\theta$ is reduced to search the minimum of :
+
+\[\sum_{i=0}^{M-1} \left[ cos(2\pi f.i + \theta) - \frac{I^{corr}(i)}{A} \right]^2\]
+
+It is equivalent to derivate this expression and to solve the following equation :
+
+\begin{eqnarray*}
+2\left[ cos\theta \sum_{i=0}^{M-1} I^{corr}(i).sin(2\pi f.i) + sin\theta \sum_{i=0}^{M-1} I^{corr}(i).cos(2\pi f.i)\right] \\
+- A\left[ cos2\theta \sum_{i=0}^{M-1} sin(4\pi f.i) + sin2\theta \sum_{i=0}^{M-1} cos(4\pi f.i)\right]   = 0
+\end{eqnarray*}
+
+Several points can be noticed :
+\begin{itemize}
+\item As in the spline method, some parts of this equation can be
+  computed before the acquisition loop. It is the case of sums that do
+  not depend on $\theta$ :
+
+\[ \sum_{i=0}^{M-1} sin(4\pi f.i), \sum_{i=0}^{M-1} cos(4\pi f.i) \] 
+
+\item Lookup tables for $sin(2\pi f.i)$ and $cos(2\pi f.i)$ can also be
+computed.
+
+\item The simplest method to find the good $\theta$ is to discretize
+  $[-\pi,\pi]$ in $N$ steps, and to search which step leads to the
+  result closest to zero. By the way, three other lookup tables can
+  also be computed before the loop :
+
+\[ sin \theta, cos \theta, \left[ cos 2\theta \sum_{i=0}^{M-1} sin(4\pi f.i) + sin 2\theta \sum_{i=0}^{M-1} cos(4\pi f.i)\right] \]
+
+\item This search can be very fast using a dichotomous process in $log_2(N)$ 
+
+\end{itemize}
+
+Finally, the whole summarizes in an algorithm (called LSQ in the following) in two parts, one before and one during the acquisition loop :
+\begin{algorithm}[h]
+\caption{LSQ algorithm - before acquisition loop.}
+\label{alg:lsq-before}
+
+   $M \leftarrow $ number of pixels of the profile\\
+   I[] $\leftarrow $ intensities of pixels\\
+   $f \leftarrow $ frequency of the profile\\
+   $s4i \leftarrow \sum_{i=0}^{M-1} sin(4\pi f.i)$\\
+   $c4i \leftarrow \sum_{i=0}^{M-1} cos(4\pi f.i)$\\
+   $nb_s \leftarrow $ number of discretization steps of $[-\pi,\pi]$\\
+
+   \For{$i=0$ to $nb_s $}{
+     $\theta  \leftarrow -\pi + 2\pi\times \frac{i}{nb_s}$\\
+     lut\_sin[$i$] $\leftarrow sin \theta$\\
+     lut\_cos[$i$] $\leftarrow cos \theta$\\
+     lut\_A[$i$] $\leftarrow cos 2 \theta \times s4i + sin 2 \theta \times c4i$\\
+     lut\_sinfi[$i$] $\leftarrow sin (2\pi f.i)$\\
+     lut\_cosfi[$i$] $\leftarrow cos (2\pi f.i)$\\
+   }
+\end{algorithm}
+
+\begin{algorithm}[h]
+\caption{LSQ algorithm - during acquisition loop.}
+\label{alg:lsq-during}
+
+   $\bar{x} \leftarrow \frac{M-1}{2}$\\
+   $\bar{y} \leftarrow 0$, $x_{var} \leftarrow 0$, $xy_{covar} \leftarrow 0$\\
+   \For{$i=0$ to $M-1$}{
+     $\bar{y} \leftarrow \bar{y} + $ I[$i$]\\
+     $x_{var} \leftarrow x_{var} + (i-\bar{x})^2$\\
+   }
+   $\bar{y} \leftarrow \frac{\bar{y}}{M}$\\
+   \For{$i=0$ to $M-1$}{
+     $xy_{covar} \leftarrow xy_{covar} + (i-\bar{x}) \times (I[i]-\bar{y})$\\
+   }
+   $slope \leftarrow \frac{xy_{covar}}{x_{var}}$\\
+   $start \leftarrow y_{moy} - slope\times \bar{x}$\\
+   \For{$i=0$ to $M-1$}{
+     $I[i] \leftarrow I[i] - start - slope\times i$\tcc*[f]{slope removal}\\
+   }
+   
+   $I_{max} \leftarrow max_i(I[i])$, $I_{min} \leftarrow min_i(I[i])$\\
+   $amp \leftarrow \frac{I_{max}-I_{min}}{2}$\\
+
+   $Is \leftarrow 0$, $Ic \leftarrow 0$\\
+   \For{$i=0$ to $M-1$}{
+     $Is \leftarrow Is + I[i]\times $ lut\_sinfi[$i$]\\
+     $Ic \leftarrow Ic + I[i]\times $ lut\_cosfi[$i$]\\
+   }
+
+   $\theta \leftarrow -\pi$\\
+   $val_1 \leftarrow 2\times \left[ Is.\cos(\theta) + Ic.\sin(\theta) \right] - amp\times \left[ c4i.\sin(2\theta) + s4i.\cos(2\theta) \right]$\\
+   \For{$i=1-n_s$ to $n_s$}{
+     $\theta \leftarrow \frac{i.\pi}{n_s}$\\
+     $val_2 \leftarrow 2\times \left[ Is.\cos(\theta) + Ic.\sin(\theta) \right] - amp\times \left[ c4i.\sin(2\theta) + s4i.\cos(2\theta) \right]$\\
+
+     \lIf{$val_1 < 0$ et $val_2 >= 0$}{
+       $\theta_s \leftarrow \theta - \left[ \frac{val_2}{val_2-val_1}\times \frac{\pi}{n_s} \right]$\\
+     }
+     $val_1 \leftarrow val_2$\\
+   }
+
+\end{algorithm}
+
+
+\subsubsection{Comparison}
+
+\subsection{VHDL design paradigms}
+
+\subsection{VHDL implementation}
+
+\section{Experimental results}
+\label{sec:results}
+
+
+
+
+\section{Conclusion and perspectives}
+
+
+\bibliographystyle{plain}
+\bibliography{biblio}
+
 \end{document}