X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/dmems12.git/blobdiff_plain/f020a3305e2463e8dfb0194cb01c5dccbcb327fe..d343834383c4f75427f0864fa81669d0968e6aaf:/dmems12.tex diff --git a/dmems12.tex b/dmems12.tex index 3fd1ca1..c3d00f4 100644 --- a/dmems12.tex +++ b/dmems12.tex @@ -1,5 +1,5 @@ -\documentclass[10pt, conference, compsocconf]{IEEEtran} +\documentclass[10pt, peerreview, compsocconf]{IEEEtran} %\usepackage{latex8} %\usepackage{times} \usepackage[utf8]{inputenc} @@ -58,7 +58,7 @@ -\maketitle +%\maketitle \thispagestyle{empty} @@ -66,9 +66,16 @@ -{\it keywords}: FPGA, cantilever, interferometry. + \end{abstract} +\begin{IEEEkeywords} +FPGA, cantilever, interferometry. +\end{IEEEkeywords} + + +\IEEEpeerreviewmaketitle + \section{Introduction} Cantilevers are used inside atomic force microscope (AFM) which provides high @@ -226,10 +233,12 @@ In fact, this timing is a very hard constraint. Let consider a very small programm that initializes twenty million of doubles in memory and then does 1000000 cumulated sums on 20 contiguous values (experimental profiles have about this size). On an intel Core 2 Duo -E6650 at 2.33GHz, this program reaches an average of 155Mflops. It -implies that the phase computation algorithm should not take more than -$155\times 12.5 = 1937$ floating operations. For integers, it gives -$3000$ operations. Obviously, some cache effects and optimizations on +E6650 at 2.33GHz, this program reaches an average of 155Mflops. + +%%Itimplies that the phase computation algorithm should not take more than +%%$155\times 12.5 = 1937$ floating operations. For integers, it gives $3000$ operations. + +Obviously, some cache effects and optimizations on huge amount of computations can drastically increase these performances : peak efficiency is about 2.5Gflops for the considered CPU. But this is not the case for phase computation that used only few @@ -269,15 +278,13 @@ some hardware constraints specific to FPGAs. \section{Proposed solution} \label{sec:solus} -Project Oscar aims to provide an hardware and software architecture to -estimate and control the deflection of cantilevers. The hardware part -consists in a high-speed camera, linked on an embedded board hosting -FPGAs. By the way, the camera output stream can be pushed directly -into the FPGA. The software part is mostly the VHDL code that -deserializes the camera stream, extracts profile and computes the -deflection. Before focusing on our work to implement the phase -computation, we give some general informations about FPGAs and the -board we use. +Project Oscar aims to provide a hardware and software architecture to estimate +and control the deflection of cantilevers. The hardware part consists in a +high-speed camera, linked on an embedded board hosting FPGAs. By the way, the +camera output stream can be pushed directly into the FPGA. The software part is +mostly the VHDL code that deserializes the camera stream, extracts profile and +computes the deflection. Before focusing on our work to implement the phase +computation, we give some general information about FPGAs and the board we use. \subsection{FPGAs} @@ -286,24 +293,25 @@ configured by the customer. A hardware description language (HDL) is used to configure a FPGA. FGPAs are composed of programmable logic components, called logic blocks. These blocks can be configured to perform simple (AND, XOR, ...) or complex combinational functions. Logic blocks are interconnected by -reconfigurable links. Modern FPGAs contains memory elements and multipliers -which enables to simplify the design and increase the speed. As the most complex -operation operation on FGPAs is the multiplier, design of FGPAs should not used -complex operations. For example, a divider is not an available operation and it -should be programmed using simple components. +reconfigurable links. Modern FPGAs contain memory elements and multipliers which +enable to simplify the design and to increase the speed. As the most complex +operation on FGPAs is the multiplier, design of FGPAs should use simple +operations. For example, a divider is not an operation available and it should +be programmed using simplest operations. FGPAs programming is very different from classic processors programming. When -logic block are programmed and linked to performed an operation, they cannot be -reused anymore. FPGA are cadenced more slowly than classic processors but they can -performed pipelined as well as parallel operations. A pipeline provides a way -manipulate data quickly since at each clock top to handle a new data. However, -using a pipeline consomes more logics and components since they are not -reusable, nevertheless it is probably the most efficient technique on FPGA. -Parallel operations can be used in order to manipulate several data +logic blocks are programmed and linked to perform an operation, they cannot be +reused anymore. FPGAs are cadenced more slowly than classic processors but they +can perform pipeline as well as parallel operations. A pipeline provides a way +to manipulate data quickly since at each clock top it handles a new +data. However, using a pipeline consumes more logics and components since they +are not reusable. Nevertheless it is probably the most efficient technique on +FPGA. Parallel operations can be used in order to manipulate several data simultaneously. When it is possible, using a pipeline is a good solution to manipulate new data at each clock top and using parallelism to handle -simultaneously several data streams. +simultaneously several pipelines in order to handle multiple data streams. +%% parler du VHDL, synthèse et bitstream \subsection{The board} The board we use is designed by the Armadeus compagny, under the name @@ -653,15 +661,30 @@ mapping and routing the design on the Spartan6. By the way, extra-latency is generated and there must be idle times between two profiles entering into the pipeline. -Before obtaining the least bitstream, the crucial question is : how to -translate the C code the LSQ into VHDL ? +%%Before obtaining the least bitstream, the crucial question is : how to +%%translate the C code the LSQ into VHDL ? + +%\subsection{VHDL design paradigms} -\subsection{VHDL design paradigms} +\section{Experimental tests} \subsection{VHDL implementation} -\section{Experimental results} +% - ecriture d'un code en C avec integer +% - calcul de la taille max en bit de chaque variable en fonction de la quantization. +% - tests de quantization : équilibre entre précision et contraintes FPGA +% - en parallèle : simulink et VHDL à la main +% +\subsection{Simulation} + +% ghdl + gtkwave +% au mieux : une phase tous les 33 cycles, latence de 95 cycles. +% mais routage/placement impossible. +\subsection{Bitstream creation} + +% pas fait mais prévision d'une sortie tous les 480ns avec une latence de 1120 + \label{sec:results}