X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/dmems12.git/blobdiff_plain/c4f311710c2e02f619fac3efdb60cec44b770cf3..77fc759e3cccd43e2d9f6ee355069a0e80e5221f:/dmems12.tex?ds=sidebyside diff --git a/dmems12.tex b/dmems12.tex index 61bbdba..d8d592d 100644 --- a/dmems12.tex +++ b/dmems12.tex @@ -1,5 +1,447 @@ -\documentclass{article} + +\documentclass[10pt, conference, compsocconf]{IEEEtran} +%\usepackage{latex8} +%\usepackage{times} +\usepackage[utf8]{inputenc} +%\usepackage[cyr]{aeguill} +%\usepackage{pstricks,pst-node,pst-text,pst-3d} +%\usepackage{babel} +\usepackage{amsmath} +\usepackage{url} +\usepackage{graphicx} +\usepackage{thumbpdf} +\usepackage{color} +\usepackage{moreverb} +\usepackage{commath} +\usepackage{subfigure} +%\input{psfig.sty} +\usepackage{fullpage} +\usepackage{fancybox} + +\usepackage[ruled,lined,linesnumbered]{algorithm2e} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands. +\newcommand{\noun}[1]{\textsc{#1}} + +\newcommand{\tab}{\ \ \ } + + \begin{document} -blabla + + +%% \author{\IEEEauthorblockN{Authors Name/s per 1st Affiliation (Author)} +%% \IEEEauthorblockA{line 1 (of Affiliation): dept. name of organization\\ +%% line 2: name of organization, acronyms acceptable\\ +%% line 3: City, Country\\ +%% line 4: Email: name@xyz.com} +%% \and +%% \IEEEauthorblockN{Authors Name/s per 2nd Affiliation (Author)} +%% \IEEEauthorblockA{line 1 (of Affiliation): dept. name of organization\\ +%% line 2: name of organization, acronyms acceptable\\ +%% line 3: City, Country\\ +%% line 4: Email: name@xyz.com} +%% } + + + +\title{Using FPGAs for high speed and real time cantilever deflection estimation} +\author{\IEEEauthorblockN{Raphaël Couturier\IEEEauthorrefmark{1}, Stéphane Domas\IEEEauthorrefmark{1}, Gwenhaël Goavec-Merou\IEEEauthorrefmark{2} and Michel Lenczner\IEEEauthorrefmark{2}} +\IEEEauthorblockA{\IEEEauthorrefmark{1}FEMTO-ST, DISC, University of Franche-Comte, Belfort, France\\ +\{raphael.couturier,stephane.domas\}@univ-fcomte.fr} +\IEEEauthorblockA{\IEEEauthorrefmark{2}FEMTO-ST, Time-Frequency, University of Franche-Comte, Besançon, France\\ +\{michel.lenczner@utbm.fr,gwenhael.goavec@trabucayre.com} +} + + + + + + +\maketitle + +\thispagestyle{empty} + +\begin{abstract} + + + +{\it keywords}: FPGA, cantilever, interferometry. +\end{abstract} + +\section{Introduction} + +Cantilevers are used inside atomic force microscope (AFM) which provides high +resolution images of surfaces. Several technics have been used to measure the +displacement of cantilevers in litterature. For example, it is possible to +determine accurately the deflection with different mechanisms. +In~\cite{CantiPiezzo01}, authors used piezoresistor integrated into the +cantilever. Nevertheless this approach suffers from the complexity of the +microfabrication process needed to implement the sensor in the cantilever. +In~\cite{CantiCapacitive03}, authors have presented an cantilever mechanism +based on capacitive sensing. This kind of technic also involves to instrument +the cantiliver which result in a complex fabrication process. + +In this paper our attention is focused on a method based on interferometry to +measure cantilevers' displacements. In this method cantilevers are illuminated +by an optic source. The interferometry produces fringes on each cantilevers +which enables to compute the cantilever displacement. In order to analyze the +fringes a high speed camera is used. Images need to be processed quickly and +then a estimation method is required to determine the displacement of each +cantilever. In~\cite{AFMCSEM11}, the authors have used an algorithm based on +spline to estimate the cantilevers' positions. + + The overall process gives +accurate results but all the computation are performed on a standard computer +using labview. Consequently, the main drawback of this implementation is that +the computer is a bootleneck in the overall process. In this paper we propose to +use a method based on least square and to implement all the computation on a +FGPA. + +The remainder of the paper is organized as follows. Section~\ref{sec:measure} +describes more precisely the measurement process. Our solution based on the +least square method and the implementation on FPGA is presented in +Section~\ref{sec:solus}. Experimentations are described in +Section~\ref{sec:results}. Finally a conclusion and some perspectives are +presented. + + + +%% quelques ref commentées sur les calculs basés sur l'interférométrie + +\section{Measurement principles} +\label{sec:measure} + + + + + + + + +\subsection{Architecture} +\label{sec:archi} +%% description de l'architecture générale de l'acquisition d'images +%% avec au milieu une unité de traitement dont on ne précise pas ce +%% qu'elle est. + +In order to develop simple, cost effective and user-friendly cantilever arrays, +authors of ~\cite{AFMCSEM11} have developped a system based of +interferometry. In opposition to other optical based systems, using a laser beam +deflection scheme and sentitive to the angular displacement of the cantilever, +interferometry is sensitive to the optical path difference induced by the +vertical displacement of the cantilever. + +The system build by authors of~\cite{AFMCSEM11} has been developped based on a +Linnick interferomter~\cite{Sinclair:05}. It is illustrated in +Figure~\ref{fig:AFM}. A laser diode is first split (by the splitter) into a +reference beam and a sample beam that reachs the cantilever array. In order to +be able to move the cantilever array, it is mounted on a translation and +rotational hexapod stage with five degrees of freedom. The optical system is +also fixed to the stage. Thus, the cantilever array is centered in the optical +system which can be adjusted accurately. The beam illuminates the array by a +microscope objective and the light reflects on the cantilevers. Likewise the +reference beam reflects on a movable mirror. A CMOS camera chip records the +reference and sample beams which are recombined in the beam splitter and the +interferogram. At the beginning of each experiment, the movable mirror is +fitted manually in order to align the interferometric fringes approximately +parallel to the cantilevers. When cantilevers move due to the surface, the +bending of cantilevers produce movements in the fringes that can be detected +with the CMOS camera. Finally the fringes need to be +analyzed. In~\cite{AFMCSEM11}, the authors used a LabView program to compute the +cantilevers' movements from the fringes. + +\begin{figure} +\begin{center} +\includegraphics[width=\columnwidth]{AFM} +\end{center} +\caption{schema of the AFM} +\label{fig:AFM} +\end{figure} + + +%% image tirée des expériences. + +\subsection{Cantilever deflection estimation} +\label{sec:deflest} + +As shown on image \ref{img:img-xp}, each cantilever is covered by +interferometric fringes. The fringes will distort when cantilevers are +deflected. Estimating the deflection is done by computing this +distortion. For that, (ref A. Meister + M Favre) proposed a method +based on computing the phase of the fringes, at the base of each +cantilever, near the tip, and on the base of the array. They assume +that a linear relation binds these phases, which can be use to +"unwrap" the phase at the tip and to determine the deflection.\\ + +More precisely, segment of pixels are extracted from images taken by a +high-speed camera. These segments are large enough to cover several +interferometric fringes and are placed at the base and near the tip of +the cantilevers. They are called base profile and tip profile in the +following. Furthermore, a reference profile is taken on the base of +the cantilever array. + +The pixels intensity $I$ (in gray level) of each profile is modelized by : + +\begin{equation} +\label{equ:profile} +I(x) = ax+b+A.cos(2\pi f.x + \theta) +\end{equation} + +where $x$ is the position of a pixel in its associated segment. + +The global method consists in two main sequences. The first one aims +to determin the frequency $f$ of each profile with an algorithm based +on spline interpolation (see section \ref{algo-spline}). It also +computes the coefficient used for unwrapping the phase. The second one +is the acquisition loop, while which images are taken at regular time +steps. For each image, the phase $\theta$ of all profiles is computed +to obtain, after unwrapping, the deflection of cantilevers. + +\subsection{Design goals} +\label{sec:goals} + +If we put aside some hardware issues like the speed of the link +between the camera and the computation unit, the time to deserialize +pixels and to store them in memory, ... the phase computation is +obviously the bottle-neck of the whole process. For example, if we +consider the camera actually in use, an exposition time of 2.5ms for +$1024\times 1204$ pixels seems the minimum that can be reached. For a +$10\times 10$ cantilever array, if we neglect the time to extract +pixels, it implies that computing the deflection of a single +cantilever should take less than 25$\mu$s, thus 12.5$\mu$s by phase.\\ + +In fact, this timing is a very hard constraint. Let consider a very +small programm that initializes twenty million of doubles in memory +and then does 1000000 cumulated sums on 20 contiguous values +(experimental profiles have about this size). On an intel Core 2 Duo +E6650 at 2.33GHz, this program reaches an average of 155Mflops. It +implies that the phase computation algorithm should not take more than +$240\times 12.5 = 1937$ floating operations. For integers, it gives +$3000$ operations. + +%% to be continued ... + +%% � faire : timing de l'algo spline en C avec atan et tout le bordel. + + + + +\section{Proposed solution} +\label{sec:solus} + + +\subsection{FPGA constraints} + +A field-programmable gate array (FPGA) is an integrated circuit designed to be +configured by the customer. A hardware description language (HDL) is used to +configure a FPGA. FGPAs are composed of programmable logic components, called +logic blocks. These blocks can be configured to perform simple (AND, XOR, ...) +or complex combinational functions. Logic blocks are interconnected by +reconfigurable links. Modern FPGAs contains memory elements and multipliers +which enables to simplify the design and increase the speed. As the most complex +operation operation on FGPAs is the multiplier, design of FGPAs should not used +complex operations. For example, a divider is not an available operation and it +should be programmed using simple components. + +FGPAs programming is very different from classic processors programming. When +logic block are programmed and linked to performed an operation, they cannot be +reused anymore. FPGA are cadenced slowly than classic processors but they can +performed pipelined as well as pipelined operations. A pipeline provides a way +manipulate data quickly since at each clock top to handle a new data. However, +using a pipeline consomes more logics and components since they are not +reusable, nevertheless it is probably the most efficient technique on FPGA. +Parallel operations can be used in order to manipulate several data +simultaneously. When it is possible, using a pipeline is a good solution to +manipulate new data at each clock top and using parallelism to handle +simultaneously several data streams. + +%% contraintes imposées par le FPGA : algo pipeline/parallele, pas d'op math complexe, ... + + +\subsection{Considered algorithms} + +Two solutions have been studied to achieve phase computation. The +original one, proposed by A. Meister and M. Favre, is based on +interpolation by splines. It allows to compute frequency and +phase. The second one, detailed in this article, is based on a +classical least square method but suppose that frequency is already +known. + +\subsubsection{Spline algorithm} +\label{sec:algo-spline} +Let consider a profile $P$, that is a segment of $M$ pixels with an +intensity in gray levels. Let call $I(x)$ the intensity of profile in $x +\in [0,M[$. + +At first, only $M$ values of $I$ are known, for $x = 0, 1, +\ldots,M-1$. A normalisation allows to scale known intensities into +$[-1,1]$. We compute splines that fit at best these normalised +intensities. Splines are used to interpolate $N = k\times M$ points +(typically $k=3$ is sufficient), within $[0,M[$. Let call $x^s$ the +coordinates of these $N$ points and $I^s$ their intensities. + +In order to have the frequency, the mean line $a.x+b$ (see equation \ref{equ:profile}) of $I^s$ is +computed. Finding intersections of $I^s$ and this line allow to obtain +the period thus the frequency. + +The phase is computed via the equation : +\begin{equation} +\theta = atan \left[ \frac{\sum_{i=0}^{N-1} sin(2\pi f x^s_i) \times I^s(x^s_i)}{\sum_{i=0}^{N-1} cos(2\pi f x^s_i) \times I^s(x^s_i)} \right] +\end{equation} + +Two things can be noticed. Firstly, the frequency could also be +obtained using the derivates of spline equations, which only implies +to solve quadratic equations. Secondly, frequency of each profile is +computed a single time, before the acquisition loop. Thus, $sin(2\pi f +x^s_i)$ and $cos(2\pi f x^s_i)$ could also be computed before the loop, which leads to a +much faster computation of $\theta$. + +\subsubsection{Least square algorithm} + +Assuming that we compute the phase during the acquisition loop, +equation \ref{equ:profile} has only 4 parameters :$a, b, A$, and +$\theta$, $f$ and $x$ being already known. Since $I$ is non-linear, a +least square method based an Gauss-newton algorithm must be used to +determine these four parameters. Since it is an iterative process +ending with a convergence criterion, it is obvious that it is not +particularly adapted to our design goals. + +Fortunatly, it is quite simple to reduce the number of parameters to +only $\theta$. Let $x^p$ be the coordinates of pixels in a segment of +size $M$. Thus, $x^p = 0, 1, \ldots, M-1$. Let $I(x^p)$ be their +intensity. Firstly, we "remove" the slope by computing : + +\[I^{corr}(x^p) = I(x^p) - a.x^p - b\] + +Since linear equation coefficients are searched, a classical least +square method can be used to determine $a$ and $b$ : + +\[a = \frac{covar(x^p,I(x^p))}{var(x^p)} \] + +Assuming an overlined symbol means an average, then : + +\[b = \overline{I(x^p)} - a.\overline{{x^p}}\] + +Let $A$ be the amplitude of $I^{corr}$, i.e. + +\[A = \frac{max(I^{corr}) - min(I^{corr})}{2}\] + +Then, the least square method to find $\theta$ is reduced to search the minimum of : + +\[\sum_{i=0}^{M-1} \left[ cos(2\pi f.i + \theta) - \frac{I^{corr}(i)}{A} \right]^2\] + +It is equivalent to derivate this expression and to solve the following equation : + +\begin{eqnarray*} +2\left[ cos\theta \sum_{i=0}^{M-1} I^{corr}(i).sin(2\pi f.i) + sin\theta \sum_{i=0}^{M-1} I^{corr}(i).cos(2\pi f.i)\right] \\ +- A\left[ cos2\theta \sum_{i=0}^{M-1} sin(4\pi f.i) + sin2\theta \sum_{i=0}^{M-1} cos(4\pi f.i)\right] = 0 +\end{eqnarray*} + +Several points can be noticed : +\begin{itemize} +\item As in the spline method, some parts of this equation can be + computed before the acquisition loop. It is the case of sums that do + not depend on $\theta$ : + +\[ \sum_{i=0}^{M-1} sin(4\pi f.i), \sum_{i=0}^{M-1} cos(4\pi f.i) \] + +\item Lookup tables for $sin(2\pi f.i)$ and $cos(2\pi f.i)$ can also be +computed. + +\item The simplest method to find the good $\theta$ is to discretize + $[-\pi,\pi]$ in $N$ steps, and to search which step leads to the + result closest to zero. By the way, three other lookup tables can + also be computed before the loop : + +\[ sin \theta, cos \theta, \left[ cos 2\theta \sum_{i=0}^{M-1} sin(4\pi f.i) + sin 2\theta \sum_{i=0}^{M-1} cos(4\pi f.i)\right] \] + +\item This search can be very fast using a dichotomous process in $log_2(N)$ + +\end{itemize} + +Finally, the whole summarizes in an algorithm (called LSQ in the following) in two parts, one before and one during the acquisition loop : +\begin{algorithm}[h] +\caption{LSQ algorithm - before acquisition loop.} +\label{alg:lsq-before} + + $M \leftarrow $ number of pixels of the profile\\ + I[] $\leftarrow $ intensities of pixels\\ + $f \leftarrow $ frequency of the profile\\ + $s4i \leftarrow \sum_{i=0}^{M-1} sin(4\pi f.i)$\\ + $c4i \leftarrow \sum_{i=0}^{M-1} cos(4\pi f.i)$\\ + $nb_s \leftarrow $ number of discretization steps of $[-\pi,\pi]$\\ + + \For{$i=0$ to $nb_s $}{ + $\theta \leftarrow -\pi + 2\pi\times \frac{i}{nb_s}$\\ + lut\_sin[$i$] $\leftarrow sin \theta$\\ + lut\_cos[$i$] $\leftarrow cos \theta$\\ + lut\_A[$i$] $\leftarrow cos 2 \theta \times s4i + sin 2 \theta \times c4i$\\ + lut\_sinfi[$i$] $\leftarrow sin (2\pi f.i)$\\ + lut\_cosfi[$i$] $\leftarrow cos (2\pi f.i)$\\ + } +\end{algorithm} + +\begin{algorithm}[h] +\caption{LSQ algorithm - during acquisition loop.} +\label{alg:lsq-during} + + $\bar{x} \leftarrow \frac{M-1}{2}$\\ + $\bar{y} \leftarrow 0$, $x_{var} \leftarrow 0$, $xy_{covar} \leftarrow 0$\\ + \For{$i=0$ to $M-1$}{ + $\bar{y} \leftarrow \bar{y} + $ I[$i$]\\ + $x_{var} \leftarrow x_{var} + (i-\bar{x})^2$\\ + } + $\bar{y} \leftarrow \frac{\bar{y}}{M}$\\ + \For{$i=0$ to $M-1$}{ + $xy_{covar} \leftarrow xy_{covar} + (i-\bar{x}) \times (I[i]-\bar{y})$\\ + } + $slope \leftarrow \frac{xy_{covar}}{x_{var}}$\\ + $start \leftarrow y_{moy} - slope\times \bar{x}$\\ + \For{$i=0$ to $M-1$}{ + $I[i] \leftarrow I[i] - start - slope\times i$\tcc*[f]{slope removal}\\ + } + + $I_{max} \leftarrow max_i(I[i])$, $I_{min} \leftarrow min_i(I[i])$\\ + $amp \leftarrow \frac{I_{max}-I_{min}}{2}$\\ + + $Is \leftarrow 0$, $Ic \leftarrow 0$\\ + \For{$i=0$ to $M-1$}{ + $Is \leftarrow Is + I[i]\times $ lut\_sinfi[$i$]\\ + $Ic \leftarrow Ic + I[i]\times $ lut\_cosfi[$i$]\\ + } + + $\theta \leftarrow -\pi$\\ + $val_1 \leftarrow 2\times \left[ Is.\cos(\theta) + Ic.\sin(\theta) \right] - amp\times \left[ c4i.\sin(2\theta) + s4i.\cos(2\theta) \right]$\\ + \For{$i=1-n_s$ to $n_s$}{ + $\theta \leftarrow \frac{i.\pi}{n_s}$\\ + $val_2 \leftarrow 2\times \left[ Is.\cos(\theta) + Ic.\sin(\theta) \right] - amp\times \left[ c4i.\sin(2\theta) + s4i.\cos(2\theta) \right]$\\ + + \lIf{$val_1 < 0$ et $val_2 >= 0$}{ + $\theta_s \leftarrow \theta - \left[ \frac{val_2}{val_2-val_1}\times \frac{\pi}{n_s} \right]$\\ + } + $val_1 \leftarrow val_2$\\ + } + +\end{algorithm} + + +\subsubsection{Comparison} + +\subsection{VHDL design paradigms} + +\subsection{VHDL implementation} + +\section{Experimental results} +\label{sec:results} + + + + +\section{Conclusion and perspectives} + + +\bibliographystyle{plain} +\bibliography{biblio} + \end{document}