-evolve from state to state. By the way, HDL instructions can execute
-concurrently. Basic logic operations are used to agregate signals to
-produce new states and assign it to another signal. States are mainly
-expressed as arrays of bits. Fortunaltely, libraries propose some
-higher levels representations like signed integers, and arithmetic
-operations.
-
-Furthermore, even if FPGAs are cadenced more slowly than classic
-processors, they can perform pipeline as well as parallel
-operations. A pipeline consists in cutting a process in sequence of
-small tasks, taking the same execution time. It accepts a new data at
-each clock top, thus, after a known latency, it also provides a result
-at each clock top. However, using a pipeline consumes more logics
-since the components of a task are not reusable by another
-one. Nevertheless it is probably the most efficient technique on
-FPGA. Because of its architecture, it is also very easy to process
-several data concurrently. When it is possible, the best performance
-is reached using parallelism to handle simultaneously several
-pipelines in order to handle multiple data streams.
-
-\subsection{The board}
-
-The board we use is designed by the Armadeus compagny, under the name
-SP Vision. It consists in a development board hosting a i.MX27 ARM
-processor (from Freescale). The board includes all classical
-connectors: USB, Ethernet, ... A Flash memory contains a Linux kernel
-that can be launched after booting the board via u-Boot.
-
-The processor is directly connected to a Spartan3A FPGA (from Xilinx)
-via its special interface called WEIM. The Spartan3A is itself
-connected to a Spartan6 FPGA. Thus, it is possible to develop programs
-that communicate between i.MX and Spartan6, using Spartan3 as a
-tunnel. By default, the WEIM interface provides a clock signal at
-100MHz that is connected to dedicated FPGA pins.
-
-The Spartan6 is an LX100 version. It has 15822 slices, equivalent to
-101261 logic cells. There are 268 internal block RAM of 18Kbits, and
-180 dedicated multiply-adders (named DSP48), which is largely enough
-for our project.
-
-Some I/O pins of Spartan6 are connected to two $2\times 17$ headers
-that can be used as user wants. For the project, they will be
-connected to the interface card of the camera.
-
-\subsection{Considered algorithms}
-
-Two solutions have been studied to achieve phase computation. The
-original one, proposed by A. Meister and M. Favre, is based on
-interpolation by splines. It allows to compute frequency and
-phase. The second one, detailed in this article, is based on a
-classical least square method but suppose that frequency is already
-known.
-
-\subsubsection{Spline algorithm}
+evolve from state to state. Moreover, HDL instructions can be executed
+concurrently. Signals may be combined with basic logic operations to
+produce new states that are assigned to another signal. States are mainly expressed as
+arrays of bits. Fortunately, libraries propose higher levels
+representations like signed integers, and arithmetic operations.
+
+Furthermore, even if FPGAs are cadenced more slowly than classic processors,
+they can perform pipelines as well as parallel operations. A pipeline
+consists in cutting a process in a sequence of small tasks, taking the same
+execution time. It accepts a new data at each clock top, thus, after a known
+latency, it also provides a result at each clock top. The drawback is that the
+components of a task are not reusable by another one. Nevertheless, this is
+the most efficient technique on FPGAs. Because of their architecture, it is
+also very easy to process several data concurrently. Finally, the best
+performance can be reached when several pipelines are operating on multiple
+data streams in parallel.
+
+\subsection{The FPGA board}
+
+The architecture we use is designed by the Armadeus Systems
+company. It consists in a development board called APF27 \textsuperscript{\textregistered}, hosting a
+i.MX27 ARM processor (from Freescale) and a Spartan3A (from
+Xilinx). This board includes all classical connectors as USB and
+Ethernet for instance. A Flash memory contains a Linux kernel that can
+be launched after booting the board via u-Boot. The processor is
+directly connected to the Spartan3A via its special interface called
+WEIM. The Spartan3A is itself connected to an extension board called
+SP Vision \textsuperscript{\textregistered}, that hosts a Spartan6 FPGA. Thus, it is
+possible to develop programs that communicate between i.MX and
+Spartan6, using Spartan3 as a tunnel. A clock signal at 100MHz (by
+default) is delivered to dedicated FPGA pins. The Spartan6 of our
+board is an LX100 version. It has 15,822 slices, each slice containing
+4 LUTs and 8 flip/flops. It is equivalent to 101,261 logic
+cells. There are 268 internal block RAM of 18Kbits, and 180 dedicated
+multiply-adders (named DSP48), which is largely enough for our
+project. Some I/O pins of Spartan6 are connected to two $2\times 17$
+headers that can be used for any purpose as to be connected to the
+interface of a camera.
+
+\subsection{Two algorithms for phase computation}
+
+As said in section \ref{sec:deflest}, $f$ is computed only once but
+the phase needs to be computed for each image. This is why, in this
+paper, we focus on its computation. The next section describes the
+original method, based on spline interpolation, and section
+\ref{sec:algo-square} presents the new one based on least
+squares. Finally, in section \ref{sec:algo-comp}, we compare the two
+algorithms from their FPGA implementation point of view.
+
+\subsubsection{Spline algorithm (SPL)}
+