ajout de perceptron.pdf

[chaos1.git] / main.tex
diff --git a/main.tex b/main.tex

index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..717ae3e688d7b36d00dcd58280383073825e8c46 100644 (file)
--- a/main.tex
+++ b/main.tex
@@ -0,0 +1,1061 @@
+% ****** Start of file chaos-paper.tex ******
+%
+\documentclass[%
+aip,
+cha,% long, 
+numerical bibliography, %(default)
+%jcp,% short, numerical bibliography, 
+%jmp,
+amsmath,amssymb,
+preprint,%
+%reprint,%
+%author-year,%
+%author-numerical,%
+]{revtex4-1}
+
+\usepackage{graphicx}% Include figure files
+\usepackage{dcolumn}% Align table columns on decimal point
+\usepackage{bm}% bold math
+\usepackage{color}
+\usepackage{ulem}
+\usepackage[mathlines]{lineno}% Enable numbering of text and display math
+\linenumbers\relax % Commence numbering lines
+% The amssymb package provides various useful mathematical symbols
+\usepackage{amssymb}
+% The amsthm package provides various useful mathematical fonts
+\usepackage{amsfonts}
+% Various mathematical packages
+\PassOptionsToPackage{cmex10draft}{graphics}
+\usepackage[standard]{ntheorem}
+% Various packages
+\usepackage{stmaryrd} % llbracket
+\usepackage{dsfont}   % mathds
+\usepackage{multirow} % tables
+
+\newcommand{\ie}{\textit{i.e.}}
+\newcommand{\Nats}[0]{\ensuremath{\mathds{N}}}
+\newcommand{\R}[0]{\ensuremath{\mathds{R}}}
+\newcommand{\Bool}[0]{\ensuremath{\mathds{B}}}
+
+\begin{document}
+
+\title[Recurrent Neural Networks and Chaos]{Recurrent Neural Networks 
+and Chaos: Construction, Evaluation, \\
+and Prediction Ability}
+
+\author{Jacques M. Bahi}
+\author{Jean-Fran\c{c}ois Couchot}
+\author{Christophe Guyeux}
+\email{christophe.guyeux@univ-fcomte.fr.}
+\author{Michel Salomon}
+\altaffiliation[Authors in ]{alphabetic order} 
+\affiliation{
+Computer Science Laboratory (LIFC), University of Franche-Comt\'e, \\
+IUT de Belfort-Montb\'eliard, BP 527, \\
+90016 Belfort Cedex, France
+}
+
+\date{\today}
+
+\newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
+
+\begin{abstract}
+%% Text of abstract
+Many  research works  deal with  chaotic neural  networks  for various
+fields  of application. Unfortunately,  up to  now these  networks are
+usually  claimed to be  chaotic without  any mathematical  proof.  The
+purpose of this paper is to establish, based on a rigorous theoretical
+framework, an equivalence between  chaotic iterations according to the
+Devaney's  formulation  of chaos  and  a  particular  class of  neural
+networks. On the one hand we show  how to build such a network, on the
+other  hand we provide  a method  to check  if a  neural network  is a
+chaotic one.  Finally, the ability of classical feedforward multilayer
+perceptrons to  learn sets of  data obtained from a  chaotic dynamical
+system is regarded.  Various  Boolean functions are iterated on finite
+states, some  of them  are proven to  be chaotic  as it is  defined by
+Devaney.  In that context, important differences occur in the training
+process,  establishing  with  various  neural  networks  that  chaotic
+behaviors are far more difficult to learn.
+\end{abstract}
+
+%%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
+                             % Classification Scheme.
+%%\keywords{Suggested keywords}%Use showkeys class option if keyword
+                              %display desired
+\maketitle
+
+\begin{quotation}
+Chaotic neural  networks have received a  lot of attention  due to the
+appealing   properties  of   deterministic   chaos  (unpredictability,
+sensitivity,  and so  on). However,  such networks  are  often claimed
+chaotic without  any rigorous mathematical proof.   Therefore, in this
+work  a theoretical  framework based  on the  Devaney's  definition of
+chaos  is introduced.   Starting with  a relationship  between chaotic
+iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
+recurrent  neural networks  that is  equivalent to  a chaotic  map and
+secondly  a way  to check  whether  an already  available network,  is
+chaotic  or not.  We also  study different  topological  properties of
+these  truly  chaotic neural  networks.   Finally,  we  show that  the
+learning,  with neural  networks  having a  feedforward structure,  of
+chaotic behaviors represented by data sets obtained from chaotic maps,
+is far more difficult than non chaotic behaviors.
+\end{quotation}
+
+%% Main text
+\section{Introduction}
+\label{S1}
+
+Several research  works have proposed  or run chaotic  neural networks
+these last  years.  The complex dynamics  of such a  networks leads to
+various       potential      application       areas:      associative
+memories~\cite{Crook2007267}  and  digital  security tools  like  hash
+functions~\cite{Xiao10},                                       digital
+watermarking~\cite{1309431,Zhang2005759},           or          cipher
+schemes~\cite{Lian20091296}.  In the  former case, the background idea
+is to  control chaotic dynamics in  order to store  patterns, with the
+key advantage  of offering a  large storage capacity.  For  the latter
+case,   the  use   of   chaotic  dynamics   is   motivated  by   their
+unpredictability and  random-like behaviors.  Thus,  investigating new
+concepts is crucial in this  field, because new threats are constantly
+emerging.   As an illustrative  example, the  former standard  in hash
+functions,  namely the  SHA-1  algorithm, has  been recently  weakened
+after flaws were discovered.
+
+Chaotic neural networks have  been built with different approaches. In
+the context of associative  memory, chaotic neurons like the nonlinear
+dynamic  state neuron  \cite{Crook2007267}  frequently constitute  the
+nodes of the network. These neurons have an inherent chaotic behavior,
+which  is usually  assessed through  the computation  of  the Lyapunov
+exponent.  An alternative approach  is to consider a well-known neural
+network architecture: the  MultiLayer Perceptron (MLP). These networks
+are  suitable to model  nonlinear relationships  between data,  due to
+their universal approximator capacity. Thus, this kind of networks can
+be trained to model a physical  phenomenon known to be chaotic such as
+Chua's circuit  \cite{dalkiran10}.  Sometimes, a  neural network which
+is build by combining  transfer functions and initial conditions that are both
+chaotic,      is      itself      claimed      to      be      chaotic
+\cite{springerlink:10.1007/s00521-010-0432-2}.
+
+What all of these chaotic neural  networks have in common is that they
+are claimed to be chaotic  despite a lack of any rigorous mathematical
+proof.   The first contribution of  this paper  is  to fill  this  gap, using  a
+theoretical  framework  based on  the  Devaney's  definition of  chaos
+\cite{Devaney}.   This  mathematical  theory  of chaos  provides  both
+qualitative and quantitative tools to evaluate the complex behavior of
+a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
+precisely, in  this paper,  which is an  extension of a  previous work
+\cite{bgs11:ip},   we  establish   the  equivalence  between   chaotic
+iterations and  a class of globally  recurrent MLP.
+The investigation the  converse problem is the second contribution: 
+we indeed study the  ability for
+classical MultiLayer Perceptrons  to learn a particular family of
+discrete  chaotic  dynamical  systems.   This family,  called  chaotic
+iterations, is defined by a  Boolean vector, an update function, and a
+sequence giving  which component  to update at  each iteration.   It has
+been  previously established  that such  dynamical systems  can behave
+chaotically, as it is defined by Devaney, when the chosen function has
+a  strongly connected  iterations graph.   In this  document,  we 
+experiment several MLPs and try to learn some iterations of this kind.
+We  show that non-chaotic iterations can be learned, whereas it is
+far  more  difficult  for chaotic  ones.   That  is  to say,  we  have
+discovered at  least one  family of problems  with a  reasonable size,
+such  that artificial  neural networks  should not  be applied  
+due to their inability to  learn chaotic behaviors in this context.
+
+The remainder of this research  work is organized as follows. The next
+section is devoted  to the basics of chaotic  iterations and Devaney's
+chaos.   Section~\ref{S2} formally  describes  how to  build a  neural
+network  that operates  chaotically.  Section~\ref{S3} is 
+devoted to the dual case of  checking whether an existing neural network
+is chaotic  or not. 
+Topological properties  of   chaotic  neural  networks  
+are discussed in Sect.~\ref{S4}.   The
+Section~\ref{section:translation}   shows   how   to  translate   such
+iterations  into  an Artificial  Neural  Network  (ANN),  in order  to
+evaluate the  capability for this  latter to learn  chaotic behaviors.
+This  ability  is  studied in  Sect.~\ref{section:experiments},  where
+various ANNs try to learn two  sets of data: the first one is obtained
+by chaotic iterations while the  second one results from a non-chaotic
+system.  Prediction success rates are  given and discussed for the two
+sets.  The paper ends with a conclusion section where our contribution
+is summed up and intended future work is exposed.
+
+\section{Link between Chaotic Iterations and Devaney's Chaos}
+
+In this section, the well-established notion of Devaney's mathematical
+chaos is  firstly recalled.  Preservation of  the unpredictability of
+such dynamical  system when implemented  on a computer is  obtained by
+using  some discrete iterations  called ``chaotic  iterations'', which
+are thus introduced.  The result establishing the link between chaotic
+iterations and Devaney's chaos is  finally presented at the end of this
+section.
+
+In what follows and for  any function $f$, $f^n$ means the composition
+$f \circ f \circ \hdots \circ f$ ($n$ times).
+
+\subsection{Devaney's chaotic dynamical systems}
+
+Various domains such as  physics, biology, or economy, contain systems
+that exhibit a chaotic behavior,  a well-known example is the weather.
+These  systems   are  in   particular  highly  sensitive   to  initial
+conditions, a concept usually presented as the butterfly effect: small
+variations in the initial conditions possibly lead to widely different
+behaviors.  Theoretically speaking, a  system is sensitive if for each
+point  $x$ in  the  iteration space,  one  can find  a  point in  each
+neighborhood of $x$ having a significantly different future evolution.
+Conversely, a  system seeded with  the same initial  conditions always
+has  the same  evolution.   In  other words,  chaotic  systems have  a
+deterministic  behavior  defined through  a  physical or  mathematical
+model  and a  high  sensitivity to  the  initial conditions.   Besides
+mathematically this  kind of unpredictability  is also referred  to as
+deterministic chaos.  For example, many weather forecast models exist,
+but they give only suitable predictions for about a week, because they
+are initialized with conditions  that reflect only a partial knowledge
+of the current weather.  Even  the  differences are  initially small,
+they are  amplified in the course  of time, and thus  make difficult a
+long-term prediction.  In fact,  in a chaotic system, an approximation
+of  the current  state is  a  quite useless  indicator for  predicting
+future states.
+
+From  mathematical  point  of   view,  deterministic  chaos  has  been
+thoroughly studied  these last decades, with  different research works
+that  have   provide  various  definitions  of   chaos.   Among  these
+definitions,   the  one  given   by  Devaney~\cite{Devaney}   is  
+well-established.    This   definition   consists  of   three   conditions:
+topological  transitivity, density of  periodic points,  and sensitive
+point dependence on initial conditions.
+
+Topological  transitivity   is  checked  when,  for   any  point,  any
+neighborhood of its future evolution eventually overlap with any other
+given region. More precisely,
+
+\begin{definition} \label{def2}
+A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
+is defined  to be  {\bf topologically transitive}  if for any  pair of
+open  sets $U$,  $V  \in  \mathcal{X}$ there  exists  
+$k \in
+\mathds{N}^{\ast}$
+ such  that
+$f^k(U) \cap V \neq \emptyset$.
+\end{definition}
+
+This property  implies that a  dynamical system cannot be  broken into
+simpler  subsystems.  
+Intuitively, its complexity does not allow any simplification.
+On  the contrary,  a dense set  of periodic points  is an
+element of regularity that a chaotic dynamical system has to exhibit.
+
+\begin{definition} \label{def3}
+A point $x$ is called a  {\bf periodic point} for $f$ of period~$n \in
+\mathds{N}^{\ast}$ if $f^{n}(x)=x$.
+\end{definition}
+
+\begin{definition} \label{def4}
+$f$ is said to be {\bf  regular} on $(\mathcal{X},\tau)$ if the set of
+  periodic points  for $f$ is  dense in $\mathcal{X}$ ( for any $x \in
+  \mathcal{X}$, we can find at least  one periodic point in any of its
+  neighborhood).
+\end{definition}
+
+This  regularity ``counteracts'' the  effects of  transitivity.  Thus,
+due to these two properties, two points close to each other can behave
+in a completely different  manner, leading to unpredictability for the
+whole system. Then,
+
+\begin{definition} \label{sensitivity}
+$f$  has {\bf  sensitive dependence  on initial  conditions}  if there
+  exists $\delta  >0$ such  that, for any  $x\in \mathcal{X}$  and any
+  neighborhood $V$ of $x$, there exist  $y\in V$ and $n > 0$ such that
+  $d\left(f^{n}(x), f^{n}(y)\right) >\delta  $. The value $\delta$ is called the
+  {\bf constant of sensitivity} of $f$.
+\end{definition}
+
+Finally,
+
+\begin{definition} \label{def5}
+$f$ is  {\bf chaotic according to Devaney}  on $(\mathcal{X},\tau)$ if
+  $f$  is   regular,  topologically  transitive,   and  has  sensitive
+  dependence to its initial conditions.
+\end{definition}
+
+Let us notice that for a  metric space the last condition follows from
+the two first ones~\cite{Banks92}.
+
+\subsection{Chaotic Iterations}
+
+This section  presents some basics on  topological chaotic iterations.
+Let us  firstly discuss about the  domain of iteration.  As  far as we
+know, no result rules that the chaotic behavior of a function that has
+been theoretically proven on  $\R$ remains valid on the floating-point
+numbers, which is  the implementation domain.  Thus, to  avoid loss of
+chaos this  work presents an  alternative, that is to  iterate Boolean
+maps:  results that  are  theoretically obtained  in  that domain  are
+preserved in implementations.
+
+Let us denote by $\llbracket  a ; b \rrbracket$ the following interval
+of integers:  $\{a, a+1,  \hdots, b\}$, where $a~<~b$.  A  {\it system}
+under   consideration    iteratively   modifies   a    collection   of
+$n$~components.   Each component  $i \in  \llbracket 1;  n \rrbracket$
+takes  its  value  $x_i$  among the  domain  $\Bool=\{0,1\}$.   A~{\it
+  configuration} of the system at discrete time $t$ (also said at {\it
+  iteration} $t$) is the vector
+%\begin{equation*}   
+$x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
+%\end{equation*}
+The dynamics of the system is  described according to a  function $f :
+\Bool^n \rightarrow \Bool^n$ such that
+%\begin{equation*} 
+$f(x)=(f_1(x),\ldots,f_n(x))$.
+%\end{equation*}
+% Notice that $f^k$ denotes the 
+% $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
+
+Let    be   given    a    configuration   $x$.    In   what    follows
+$N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$  is  the configuration
+obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
+indeed  the negation  of $x_i$).   Intuitively, $x$  and  $N(i,x)$ are
+neighbors.   The discrete  iterations of  $f$ are  represented  by the
+oriented  {\it graph  of iterations}  $\Gamma(f)$.  In  such  a graph,
+vertices are configurations  of $\Bool^n$ and there is  an arc labeled
+$i$ from $x$ to $N(i,x)$ if and only if  $f_i(x)$ is $N(i,x)$.
+
+In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
+sequence  defining the  component to  update at  time $t$  and $S^{t}$
+denotes its  $t-$th term.  We  introduce the function $F_{f}$  that is
+defined  for  any given  application  $f:\Bool^{n}  \to \Bool^{n}$  by
+$F_{f}:   \llbracket1;n\rrbracket\times   \mathds{B}^{n}   \rightarrow
+\mathds{B}^{n}$, s.t.
+\begin{equation}
+\label{eq:CIs}
+  F_{f}(s,x)_j  =  
+    \left\{
+    \begin{array}{l}
+    f_j(x) \textrm{ if } j= s \enspace , \\ 
+    x_{j} \textrm{ otherwise} \enspace .
+    \end{array}
+    \right. 
+\end{equation}
+
+\noindent With such a notation, configurations are defined for times 
+$t=0,1,2,\ldots$ by:
+\begin{equation}\label{eq:sync}   
+\left\{\begin{array}{l}   
+  x^{0}\in \mathds{B}^{n} \textrm{ and}\\
+  x^{t+1}=F_{f}(S^t,x^{t}) \enspace .
+\end{array}\right.
+\end{equation}
+
+\noindent  Finally, iterations defined  in Eq.~(\ref{eq:sync})  can be
+described by the following system:
+\begin{equation} 
+\left\{
+\begin{array}{lll} 
+X^{0} & = &  ((S^t)^{t \in \Nats},x^0) \in 
+\llbracket1;n\rrbracket^\Nats \times \Bool^{n}\\ 
+X^{k+1}& = & G_{f}(X^{k})\\
+\multicolumn{3}{c}{\textrm{where } G_{f}\left(((S^t)^{t \in \Nats},x)\right)
+= \left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right) \enspace ,}
+\end{array} 
+\right.
+\label{eq:Gf}
+\end{equation}
+where  $\sigma$ is the  function that  removes the  first term  of the
+strategy  ({\it i.e.},~$S^0$).   This definition  means that  only one
+component  of the  system is  updated  at an  iteration: the  $S^t$-th
+element.  But it can be extended by considering subsets for $S^t$.
+
+Let us finally remark that, despite the use of the term {\it chaotic},
+there is {\it  priori} no connection between these  iterations and the
+mathematical theory of chaos presented previously.
+
+\subsection{Chaotic Iterations and Devaney's Chaos}
+
+In  this subsection  we recall  the link  we have  established between
+chaotic iterations and Devaney's  chaos.  The theoretical framework is
+fully described in \cite{guyeux09}.
+
+The   {\bf   distance}   $d$    between   two   points   $(S,x)$   and
+$(\check{S},\check{x})\in  \mathcal{X} = \llbracket1;n\rrbracket^\Nats
+\times \Bool^{n}$ is defined by
+\begin{equation}
+d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
+\enspace ,
+\end{equation}
+where
+\begin{equation}
+d_{e}(x,\check{x})=\sum_{j=1}^{n}\Delta
+(x_{j},\check{x}_{j}) \in \llbracket  0 ; n \rrbracket 
+\end{equation}
+\noindent and   
+\begin{equation}
+d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
+}\frac{|S^{t}-\check{S}^{t}|}{10^{t+1}} \in [0 ; 1] \enspace .
+\end{equation}
+
+This    distance    is    defined    to    reflect    the    following
+information. Firstly, the more  two systems have different components,
+the  larger the  distance  between them is.  Secondly,  two systems  with
+similar components and strategies, which have the same starting terms,
+must  induce only  a small  distance.  The  proposed  distance fulfill
+these  requirements: on  the one  hand  its floor  value reflects  the
+difference between  the cells, on  the other hand its  fractional part
+measures the difference between the strategies.
+
+The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
+path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
+strategy  $s$ such  that  the  parallel iteration  of  $G_f$ from  the
+initial  point $(s,x)$  reaches  the configuration  $x'$.  Using  this
+link, Guyeux~\cite{GuyeuxThese10} has proven that,
+\begin{theorem}%[Characterization of $\mathcal{C}$]
+\label{Th:Caracterisation   des   IC   chaotiques}  
+Let $f:\Bool^n\to\Bool^n$. $G_f$ is chaotic  (according to  Devaney) 
+if and only if  $\Gamma(f)$ is strongly connected.
+\end{theorem}
+
+Checking  if a  graph  is  strongly connected  is  not difficult.  For
+example,  consider the  function $f_1\left(x_1,\dots,x_n\right)=\left(
+\overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$. As $\Gamma(f_1)$ is
+obviously strongly connected, then $G_{f_1}$ is a chaotic map.
+
+With this  material, we are now  able to build a  first chaotic neural
+network, as defined in the Devaney's formulation.
+
+\section{A chaotic neural network in the sense of Devaney}
+\label{S2}
+
+Let     us     firstly     introduce    the     vectorial     negation
+$f_{0}(x_{1},\dots,x_{n}) =(\overline{x_{1}},\dots,\overline{x_{n}})$,
+which is  such that $\Gamma(f_0)$ is  strongly connected.  Considering
+the map  $F_{f_0}:\llbracket 1;  n \rrbracket \times  \mathds{B}^n \to
+\mathds{B}^n$  associated to the  vectorial negation,  we can  build a
+multilayer perceptron  neural network modeling  $F_{f_0}$.  Hence, for
+all inputs  $(s,x) \in \llbracket  1;n\rrbracket \times \mathds{B}^n$,
+the output layer will produce  $F_{f_0}(s,x)$.  It is then possible to
+link  the output  layer  and the  input  one, in  order  to model  the
+dependence between two successive iterations.  As a result we obtain a
+global  recurrent   neural  network  that  behaves   as  follows  (see
+Fig.~\ref{Fig:perceptron}).
+
+\begin{itemize}
+\item   The   network   is   initialized   with   the   input   vector
+  $\left(S^0,x^0\right)    \in    \llbracket   1;n\rrbracket    \times
+  \mathds{B}^n$      and      computes      the     output      vector
+  $x^1=F_{f_0}\left(S^0,x^0\right)$. This last  vector is published as
+  an output one of the chaotic  neural network and is sent back to the
+  input layer through the feedback links.
+\item When  the network  is activated at  the $t^{th}$  iteration, the
+  state of the system $x^t  \in \mathds{B}^n$ received from the output
+  layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
+  ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
+  output.   This new  output, which  represents the  new state  of the
+  dynamical system, satisfies:
+  \begin{equation}
+    x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
+  \end{equation}
+\end{itemize}
+
+\begin{figure}
+  \centering
+  \includegraphics[scale=0.625]{perceptron}
+  \caption{A perceptron equivalent to chaotic iterations}
+  \label{Fig:perceptron}
+\end{figure}
+
+The behavior of the neural network is such that when the initial state
+is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
+given  as outside  input, then  the sequence  of  successive published
+output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
+the  one produced  by  the chaotic  iterations  formally described  in
+Eq.~(\ref{eq:CIs}).  It  means that  mathematically if we  use similar
+input  vectors   they  both  generate  the   same  successive  outputs
+$\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
+are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
+$\mathcal{X}$. Finally, since the  proposed neural network is build to
+model  the  behavior  of  $G_{f_0}$,  which is  chaotic  according  to
+Devaney's definition  of chaos,  we can conclude  that the  network is
+also chaotic in this sense.
+
+The previous construction scheme  is not restricted to function $f_0$.
+It can  be extended to any function  $f$ such that $G_f$  is a chaotic
+map by training the network to model $F_{f}:\llbracket 1; n \rrbracket
+\times      \mathds{B}^n      \to      \mathds{B}^n$.      Due      to
+Theorem~\ref{Th:Caracterisation  des  IC  chaotiques},  we  can  find
+alternative functions  $f$ for $f_0$  through a simple check  of their
+graph of  iterations $\Gamma(f)$.  For  example, we can  build another
+chaotic neural network by using $f_1$ instead of $f_0$.
+
+\section{Checking whether a neural network is chaotic or not}
+\label{S3}
+
+We focus now on the case  where a neural network is already available,
+and for which we  want to know if it is chaotic. Typically, in
+many research papers neural network  are usually claimed to be chaotic
+without any  convincing mathematical proof. We propose  an approach to
+overcome  this  drawback  for  a  particular  category  of  multilayer
+perceptrons defined below, and for the Devaney's formulation of chaos.
+In  spite of  this restriction,  we think  that this  approach  can be
+extended to  a large variety  of neural networks.  We plan to  study a
+generalization of this approach in a future work.
+
+We consider a multilayer perceptron  of the following form: inputs
+are $n$ binary digits and  one integer value, while outputs are  $n$
+bits.   Moreover, each  binary  output is  connected  with a  feedback
+connection to an input one.
+
+\begin{itemize}
+\item During  initialization, the network if we seed it with $n$~bits denoted
+  $\left(x^0_1,\dots,x^0_n\right)$  and an  integer  value $S^0$  that
+  belongs to $\llbracket1;n\rrbracket$.
+\item     At     iteration~$t$,     the     last     output     vector
+  $\left(x^t_1,\dots,x^t_n\right)$   defines  the  $n$~bits   used  to
+  compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
+  While  the remaining  input receives  a new  integer value  $S^t \in
+  \llbracket1;n\rrbracket$, which is provided by the outside world.
+\end{itemize}
+
+The topological  behavior of these  particular neural networks  can be
+proven to be chaotic through the following process. Firstly, we denote
+by  $F:  \llbracket  1;n  \rrbracket \times  \mathds{B}^n  \rightarrow
+\mathds{B}^n$     the     function     that     maps     the     value
+$\left(s,\left(x_1,\dots,x_n\right)\right)    \in    \llbracket    1;n
+\rrbracket      \times      \mathds{B}^n$      into     the      value
+$\left(y_1,\dots,y_n\right)       \in       \mathds{B}^n$,       where
+$\left(y_1,\dots,y_n\right)$  is the  response of  the  neural network
+after    the    initialization     of    its    input    layer    with
+$\left(s,\left(x_1,\dots, x_n\right)\right)$.  Secondly, we define $f:
+\mathds{B}^n       \rightarrow      \mathds{B}^n$       such      that
+$f\left(x_1,x_2,\dots,x_n\right)$ is equal to
+\begin{equation}
+\left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
+  F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
+\end{equation}
+Then $F=F_f$  and this recurrent  neural network produces  exactly the
+same      output      vectors,      when     feeding      it      with
+$\left(x_1^0,\dots,x_n^0\right)$    and   $S   \in    \llbracket   1;n
+\rrbracket^{\mathds{N}}$, than  chaotic iterations $F_f$  with initial
+condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
+\rrbracket^{\mathds{N}}  \times \mathds{B}^n$.   In the  rest  of this
+paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
+stands for ``Chaotic Iterations based MultiLayer Perceptron''.
+
+Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
+definition  of  chaos  is  simple:  we  need just  to  verify  if  the
+associated graph  of iterations  $\Gamma(f)$ is strongly  connected or
+not. As  an incidental consequence,  we finally obtain  an equivalence
+between  chaotic  iterations   and  CI-MLP($f$).   Therefore,  we  can
+obviously  study such multilayer  perceptrons with  mathematical tools
+like  topology  to  establish,  for  example,  their  convergence  or,
+contrarily, their unpredictable behavior.   An example of such a study
+is given in the next section.
+
+\section{Topological properties of chaotic neural networks}
+\label{S4}
+
+Let us first recall  two fundamental definitions from the mathematical
+theory of chaos.
+
+\begin{definition} \label{def8}
+A  function   $f$  is   said  to  be   {\bf  expansive}   if  $\exists
+\varepsilon>0$, $\forall  x \neq y$,  $\exists n \in  \mathds{N}$ such
+that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
+\end{definition}
+
+\begin{definition} \label{def9}
+A discrete dynamical  system is said to be  {\bf topologically mixing}
+if  and only  if,  for any  pair  of disjoint  open  sets $U$,$V  \neq
+\emptyset$, we can find some $n_0  \in \mathds{N}$ such that  for any $n$, 
+$n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
+\end{definition}
+
+As  proven in Ref.~\cite{gfb10:ip},  chaotic iterations  are expansive
+and  topologically mixing when  $f$ is  the vectorial  negation $f_0$.
+Consequently,  these  properties are  inherited  by the  CI-MLP($f_0$)
+recurrent neural network previously presented, which induce a greater
+unpredictability.  Any  difference on the  initial value of  the input
+layer is  in particular  magnified up to  be equal to  the expansivity
+constant.
+
+Let us then focus on the consequences for  a neural  network to  be chaotic
+according  to  Devaney's definition.  First  of  all, the  topological
+transitivity property implies indecomposability.
+
+\begin{definition} \label{def10}
+A   dynamical   system   $\left(   \mathcal{X},  f\right)$   is   {\bf
+indecomposable}  if it  is not  the  union of  two closed  sets $A,  B
+\subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
+\end{definition}
+
+\noindent Hence, reducing the set of outputs generated by CI-MLP($f$),
+in order to  simplify its complexity, is impossible  if $\Gamma(f)$ is
+strongly connected.  Moreover, under this  hypothesis CI-MLPs($f$) are
+strongly transitive:
+
+\begin{definition} \label{def11}
+A  dynamical system  $\left( \mathcal{X},  f\right)$ is  {\bf strongly
+transitive} if $\forall x,y  \in \mathcal{X}$, $\forall r>0$, $\exists
+z  \in  \mathcal{X}$,  $d(z,x)~\leq~r  \Rightarrow  \exists  n  \in
+\mathds{N}^{\ast}$, $f^n(z)=y$.
+\end{definition}
+According to this definition, for all  pairs of points $(x, y)$ in the
+phase space, a point $z$ can  be found in the neighborhood of $x$ such
+that one of its iterates $f^n(z)$ is $y$. Indeed, this result has been
+established  during  the  proof   of  the  transitivity  presented  in
+Ref.~\cite{guyeux09}.  Among  other  things, the  strong  transitivity
+leads  to the fact  that without  the knowledge  of the  initial input
+layer, all outputs are possible.  Additionally, no point of the output
+space  can   be  discarded  when  studying  CI-MLPs:   this  space  is
+intrinsically complicated and it cannot be decomposed or simplified.
+
+Furthermore, those  recurrent neural networks  exhibit the instability
+property:
+\begin{definition}
+A dynamical  system $\left( \mathcal{X}, f\right)$ is  unstable if for
+all  $x  \in  \mathcal{X}$,   the  orbit  $\gamma_x:n  \in  \mathds{N}
+\longmapsto f^n(x)$  is unstable,  that means: $\exists  \varepsilon >
+0$, $\forall  \delta>0$, $\exists y  \in \mathcal{X}$, $\exists  n \in
+\mathds{N}$,        such        that        $d(x,y)<\delta$        and
+$d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
+\end{definition}
+This property, which  is implied by the sensitive  point dependence on
+initial conditions, leads to the fact that in all neighborhoods of any
+point $x$, there are points that  can be apart by $\varepsilon$ in the
+future through iterations of the  CI-MLP($f$). Thus, we can claim that
+the behavior of  these MLPs is unstable when  $\Gamma (f)$ is strongly
+connected.
+
+Let  us  now consider  a  compact  metric space  $(M,  d)$  and $f:  M
+\rightarrow M$  a continuous map. For  each natural number  $n$, a new
+metric $d_n$ is defined on $M$ by
+$$d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .$$
+
+Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
+$\varepsilon$-close  with respect to  this metric  if their  first $n$
+iterates are $\varepsilon$-close.
+
+This metric  allows one to distinguish  in a neighborhood  of an orbit
+the points  that move away from  each other during  the iteration from
+the points  that travel together.  A subset  $E$ of $M$ is  said to be
+$(n, \varepsilon)$-separated if each pair of distinct points of $E$ is
+at  least $\varepsilon$  apart in  the metric  $d_n$. Denote  by $H(n,
+\varepsilon)$     the    maximum     cardinality     of    an     $(n,
+\varepsilon)$-separated set,
+\begin{definition}
+The {\it topological entropy} of the  map $f$ is defined by (see e.g.,
+Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
+$$h(f)=\lim_{\varepsilon\to      0}      \left(\limsup_{n\to      \infty}
+\frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
+\end{definition}
+
+Then we have the following result \cite{GuyeuxThese10},
+\begin{theorem}
+$\left( \mathcal{X},d\right)$  is compact and  the topological entropy
+of $(\mathcal{X},G_{f_0})$ is infinite.
+\end{theorem}
+
+We have explained how to  construct truly chaotic neural networks, how
+to check whether a  given MLP is chaotic or not, and  how to study its
+topological behavior.   The last thing to  investigate, when comparing
+neural  networks   and  Devaney's  chaos,  is   to  determine  whether
+artificial neural networks  are able to learn or  predict some chaotic
+behaviors, as  it is defined  in the Devaney's formulation  (when they
+are not specifically constructed for this purpose).  This statement is
+studied in the next section.
+
+\section{Suitability of Artificial Neural Networks 
+for Predicting Chaotic Behaviors}
+
+In  the context  of computer  science  different topic  areas have  an
+interest       in       chaos,       as       for       steganographic
+techniques~\cite{1309431,Zhang2005759}.    Steganography  consists  in
+embedding a  secret message within  an ordinary one, while  the secret
+extraction takes place once  at destination.  The reverse ({\it i.e.},
+automatically detecting the presence  of hidden messages inside media)
+is  called   steganalysis.   Among  the   deployed  strategies  inside
+detectors,          there         are          support         vectors
+machines~\cite{Qiao:2009:SM:1704555.1704664},                    neural
+networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
+Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
+detectors  give quite  good results  and are  rather  competitive when
+facing steganographic  tools.  However, to  the best of  our knowledge
+none of the considered information hiding schemes fulfills the Devaney
+definition  of chaos~\cite{Devaney}.  Indeed,  one can  wonder whether
+detectors  continue to  give good  results when  facing  truly chaotic
+schemes.  More  generally, there remains the open  problem of deciding
+whether artificial intelligence is suitable for predicting topological
+chaotic behaviors.
+
+\subsection{Representing Chaotic Iterations for Neural Networks} 
+\label{section:translation}
+
+The  problem  of  deciding  whether  classical  feedforward  ANNs  are
+suitable  to approximate  topological chaotic  iterations may  then be
+reduced  to evaluate  ANNs on  iterations of  functions  with Strongly
+Connected  Component  (SCC)~graph  of  iterations.   To  compare  with
+non-chaotic  iterations,  the experiments  detailed  in the  following
+sections are  carried out  using both kinds  of function  (chaotic and
+non-chaotic). Let us emphasize on  the difference between this kind of
+neural   networks  and  the   Chaotic  Iterations   based  MultiLayer
+Perceptron.
+
+We are  then left to compute  two disjoint function  sets that contain
+either functions  with topological chaos properties  or not, depending
+on  the strong  connectivity of  their iterations graph.  This  can be
+achieved for  instance by removing a  set of edges  from the iteration
+graph $\Gamma(f_0)$ of the vectorial negation function~$f_0$.  One can
+deduce whether  a function verifies the topological  chaos property or
+not  by checking  the strong  connectivity of  the resulting  graph of
+iterations.
+
+For instance let us consider  the functions $f$ and $g$ from $\Bool^4$
+to $\Bool^4$ respectively defined by the following lists:
+$$[0,  0,  2,   3,  13,  13,  6,   3,  8,  9,  10,  11,   8,  13,  14,
+  15]$$ $$\mbox{and } [11, 14, 13, 14, 11, 10, 1, 8, 7, 6, 5, 4, 3, 2,
+  1, 0]  \enspace.$$ In  other words,  the image of  $0011$ by  $g$ is
+$1110$: it  is obtained as the  binary value of the  fourth element in
+the  second  list  (namely~14).   It   is  not  hard  to  verify  that
+$\Gamma(f)$ is  not SCC  (\textit{e.g.}, $f(1111)$ is  $1111$) whereas
+$\Gamma(g)$  is.  Next section  shows how  to translate  iterations of
+such functions into a model amenable to be learned by an ANN.
+  
+This  section  presents how  (not)  chaotic  iterations  of $G_f$  are
+translated  into  another  model  more  suited  to  artificial  neural
+networks.  Formally, input and output vectors are pairs~$((S^t)^{t \in
+\Nats},x)$ and $\left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right)$
+as defined in~Eq.~(\ref{eq:Gf}).
+
+Firstly, let us focus on how to memorize configurations.  Two distinct
+translations are  proposed.  In the first  case, we take  one input in
+$\Bool$  per  component;  in   the  second  case,  configurations  are
+memorized  as   natural  numbers.    A  coarse  attempt   to  memorize
+configuration  as  natural  number  could  consist  in  labeling  each
+configuration  with  its  translation  into  decimal  numeral  system.
+However,  such a  representation induces  too many  changes  between a
+configuration  labeled by  a  power  of two  and  its direct  previous
+configuration: for instance, 16~(10000) and 15~(01111) are closed in a
+decimal ordering, but  their Hamming distance is 5.   This is why Gray
+codes~\cite{Gray47} have been preferred.
+
+Let us secondly detail how to deal  with strategies.   Obviously,  it is  not
+possible to  translate in a finite  way an infinite  strategy, even if
+both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong to
+$\{1,\ldots,n\}^{\Nats}$.  Input strategies are then reduced to have a
+length  of size  $l  \in  \llbracket 2,k\rrbracket$,  where  $k$ is  a
+parameter of the evaluation. Notice  that $l$ is greater than or equal
+to $2$ since  we do not want the shift  $\sigma$~function to return an
+empty strategy.  Strategies are memorized as natural numbers expressed
+in base  $n+1$.  At  each iteration, either  none or one  component is
+modified  (among the  $n$ components)  leading to  a radix  with $n+1$
+entries.  Finally,  we give an  other input, namely $m  \in \llbracket
+1,l-1\rrbracket$, which  is the  number of successive  iterations that
+are applied starting  from $x$.  Outputs are translated  with the same
+rules.
+
+To address  the complexity  issue of the  problem, let us  compute the
+size of the data set an ANN has to deal with.  Each input vector of an
+input-output pair  is composed of a configuration~$x$,  an excerpt $S$
+of the strategy to iterate  of size $l \in \llbracket 2, k\rrbracket$,
+and a  number $m \in  \llbracket 1, l-1\rrbracket$ of  iterations that
+are executed.
+
+Firstly, there are $2^n$  configurations $x$, with $n^l$ strategies of
+size $l$ for  each of them. Secondly, for  a given configuration there
+are $\omega = 1 \times n^2 +  2 \times n^3 + \ldots+ (k-1) \times n^k$
+ways  of writing  the pair  $(m,S)$. Furthermore,  it is  not  hard to
+establish that
+\begin{equation}
+\displaystyle{(n-1) \times \omega = (k-1)\times n^{k+1} - \sum_{i=2}^k n^i} \nonumber
+\end{equation}
+then
+\begin{equation}
+\omega =
+\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2} \enspace . \nonumber
+\end{equation}
+\noindent And then, finally, the number of  input-output pairs for our 
+ANNs is 
+$$
+2^n \times \left(\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2}\right) \enspace .
+$$
+For  instance, for $4$  binary components  and a  strategy of  at most
+$3$~terms we obtain 2304~input-output pairs.
+
+\subsection{Experiments}
+\label{section:experiments}
+
+To study  if chaotic iterations can  be predicted, we  choose to train
+the MultiLayer Perceptron.  As stated  before, this kind of network is
+in   particular    well-known   for   its    universal   approximation
+property. Furthermore,  MLPs have been already  considered for chaotic
+time series prediction.  For example, in~\cite{dalkiran10} the authors
+have shown that a feedforward  MLP with two hidden layers, and trained
+with Bayesian  Regulation backpropagation, can  learn successfully the
+dynamics of Chua's circuit.
+
+In these experimentations we consider  MLPs having one hidden layer of
+sigmoidal  neurons  and  output   neurons  with  a  linear  activation
+function.     They    are    trained    using    the    Limited-memory
+Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
+with the Wolfe linear search.  The training process is performed until
+a maximum number of epochs  is reached.  To prevent overfitting and to
+estimate the  generalization performance we use  holdout validation by
+splitting the  data set into  learning, validation, and  test subsets.
+These subsets  are obtained through  random selection such  that their
+respective size represents 65\%, 10\%, and 25\% of the whole data set.
+
+Several  neural  networks  are  trained  for  both  iterations  coding
+schemes.   In  both  cases   iterations  have  the  following  layout:
+configurations of  four components and  strategies with at  most three
+terms. Thus, for  the first coding scheme a data  set pair is composed
+of 6~inputs and 5~outputs, while for the second one it is respectively
+3~inputs and 2~outputs. As noticed at the end of the previous section,
+this  leads to  data sets  that  consist of  2304~pairs. The  networks
+differ  in the  size of  the hidden  layer and  the maximum  number of
+training epochs.  We remember that  to evaluate the ability  of neural
+networks to  predict a  chaotic behavior for  each coding  scheme, the
+trainings of two data sets, one of them describing chaotic iterations,
+are compared.
+
+Thereafter we give,  for the different learning setups  and data sets,
+the  mean prediction  success rate  obtained for  each  output.  These
+values  are  computed  considering  10~trainings with  random  subsets
+construction,  weights  and  biases initialization.   Firstly,  neural
+networks having 10  and 25~hidden neurons are trained,  with a maximum
+number  of  epochs that  takes  its  value  in $\{125,250,500\}$  (see
+Tables~\ref{tab1}  and \ref{tab2}).   Secondly, we  refine  the second
+coding scheme by splitting the  output vector such that each output is
+learned by a specific  neural network (Table~\ref{tab3}). In this last
+case, we increase  the size of the hidden layer  up to 40~neurons, and
+we consider larger number of epochs.
+
+\begin{table}[htbp!]
+\caption{Prediction success rates for configurations expressed as boolean vectors.}
+\label{tab1}
+\centering {\small
+\begin{tabular}{|c|c||c|c|c|}
+\hline 
+\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
+\hline
+\hline
+\multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
+\cline{3-5} 
+\multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ 
+\hline
+\multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 90.92\% & 91.75\% & 91.82\% \\ 
+& Output~(2) & 69.32\% & 78.46\% & 82.15\% \\
+& Output~(3) & 68.47\% & 78.49\% & 82.22\% \\
+& Output~(4) & 91.53\% & 92.37\% & 93.4\% \\
+& Config. & 36.10\% & 51.35\% & 56.85\% \\
+& Strategy~(5) & 1.91\% & 3.38\% & 2.43\% \\
+\hline
+\multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.64\% & 98.10\% & 98.20\% \\
+& Output~(2) & 95.15\% & 95.39\% & 95.46\% \\
+& Output~(3) & 100\% & 100\% & 100\% \\
+& Output~(4) & 97.47\% & 97.90\% & 97.99\% \\
+& Config. & 90.52\% & 91.59\% & 91.73\% \\
+& Strategy~(5) & 3.41\% & 3.40\% & 3.47\% \\
+\hline
+\hline
+\multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{25 neurons} \\ %& \multicolumn{3}{|c|}{40 neurons} \\
+\cline{3-5} 
+\multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ %& 125 & 250 & 500 \\ 
+\hline
+\multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 91.65\% & 92.69\% & 93.93\% \\ %& 91.94\% & 92.89\% & 94.00\% \\
+& Output~(2) & 72.06\% & 88.46\% & 90.5\% \\ %& 74.97\% & 89.83\% & 91.14\% \\
+& Output~(3) & 79.19\% & 89.83\% & 91.59\% \\ %& 76.69\% & 89.58\% & 91.84\% \\
+& Output~(4) & 91.61\% & 92.34\% & 93.47\% \\% & 82.77\% & 92.93\% & 93.48\% \\
+& Config. & 48.82\% & 67.80\% & 70.97\% \\%& 49.46\% & 68.94\% & 71.11\% \\
+& Strategy~(5) & 2.62\% & 3.43\% & 3.78\% \\% & 3.10\% & 3.10\% & 3.03\% \\
+\hline
+\multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.87\% & 97.99\% & 98.03\% \\ %& 98.16\% \\
+& Output~(2) & 95.46\% & 95.84\% & 96.75\% \\ % & 97.4\% \\
+& Output~(3) & 100\% & 100\% & 100\% \\%& 100\% \\
+& Output~(4) & 97.77\% & 97.82\% & 98.06\% \\%& 98.31\% \\
+& Config. & 91.36\% & 91.99\% & 93.03\% \\%& 93.98\% \\
+& Strategy~(5) & 3.37\% & 3.44\% & 3.29\% \\%& 3.23\% \\
+\hline
+\end{tabular}
+}
+\end{table}
+
+Table~\ref{tab1}  presents the  rates  obtained for  the first  coding
+scheme.   For  the chaotic  data,  it can  be  seen  that as  expected
+configuration  prediction becomes  better  when the  number of  hidden
+neurons and maximum  epochs increases: an improvement by  a factor two
+is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
+25~neurons  and  500~epochs). We  also  notice  that  the learning  of
+outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
+non-chaotic  case the  simplest training  setup is  enough  to predict
+configurations.   For  all  network  topologies and  all  outputs  the
+obtained  results  for the  non-chaotic  case  outperform the  chaotic
+ones. Finally,  the rates for  the strategies show that  the different
+networks are unable to learn them.
+
+For the second coding  scheme (\textit{i.e.}, with Gray Codes)
+Table~\ref{tab2} shows that any network
+learns about  five times more non-chaotic  configurations than chaotic
+ones. As in the previous scheme, the strategies cannot be predicted.
+
+Let us now compare the  two coding schemes. Firstly, the second scheme
+disturbs  the   learning  process.   In   fact  in  this   scheme  the
+configuration is always expressed as  a natural number, whereas in the
+first one  the number  of inputs follows  the increase of  the boolean
+vectors coding configurations. In this latter case, the coding gives a
+finer information on configuration evolution.
+
+\begin{table}[b]
+\caption{Prediction success rates for configurations expressed with Gray code}
+\label{tab2}
+\centering
+\begin{tabular}{|c|c||c|c|c|}
+\hline 
+\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
+\hline
+\hline
+& Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
+\cline{2-5}
+& Epochs & 125 & 250 & 500 \\ %& 1000 
+\hline
+\multirow{2}{*}{Chaotic}& Config.~(1) & 13.29\% & 13.55\% & 13.08\% \\ %& 12.5\%
+& Strategy~(2) & 0.50\% & 0.52\% & 1.32\% \\ %& 1.42\%
+\hline
+\multirow{2}{*}{Non-Chaotic}&Config.~(1) & 77.12\% & 74.00\% & 72.60\% \\ %& 75.81\% 
+& Strategy~(2) & 0.42\% & 0.80\% & 1.16\% \\ %& 1.42\% 
+\hline
+\hline
+& Hidden neurons & \multicolumn{3}{c|}{25 neurons} \\
+\cline{2-5}
+& Epochs & 125 & 250 & 500 \\ %& 1000 
+\hline
+\multirow{2}{*}{Chaotic}& Config.~(1) & 12.27\% & 13.15\% & 13.05\% \\ %& 15.44\%
+& Strategy~(2) & 0.71\% & 0.66\% & 0.88\% \\ %& 1.73\%
+\hline
+\multirow{2}{*}{Non-Chaotic}&Config.~(1) & 73.60\% & 74.70\% & 75.89\% \\ %& 68.32\%
+& Strategy~(2) & 0.64\% & 0.97\% & 1.23\% \\ %& 1.80\%
+\hline
+\end{tabular}
+\end{table}
+
+Unfortunately, in  practical applications the number  of components is
+usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
+systematically.   Therefore, we  provide  a refinement  of the  second
+scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
+presents the  results for  this approach.  In  any case,  whatever the
+network  topologies,  the  maximum   epoch  number  and  the  kind  of
+iterations,  the  configuration  success  rate is  slightly  improved.
+Moreover, the strategies predictions  rates reach almost 12\%, whereas
+in  Table~\ref{tab2}  they  never   exceed  1.5\%.   Despite  of  this
+improvement, a long term prediction of chaotic iterations still 
+appear to be
+an open issue.
+
+\begin{table}
+\caption{Prediction success rates for split outputs.}
+\label{tab3}
+\centering
+\begin{tabular}{|c||c|c|c|}
+\hline 
+\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
+\hline
+\hline
+Epochs & 125 & 250 & 500 \\ 
+\hline
+\hline
+Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
+\hline
+10~neurons & 12.39\% & 14.06\% & 14.32\% \\
+25~neurons & 13.00\% & 14.28\% & 14.58\% \\
+40~neurons & 11.58\% & 13.47\% & 14.23\% \\
+\hline
+\hline
+Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
+\cline{2-4}
+%Epochs & 125 & 250 & 500 \\ 
+\hline
+10~neurons & 76.01\% & 74.04\% & 78.16\% \\
+25~neurons & 76.60\% & 72.13\% & 75.96\% \\
+40~neurons & 76.34\% & 75.63\% & 77.50\% \\
+\hline
+\hline
+Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
+\cline{2-4}
+%Epochs & 125 & 250 & 500 \\ 
+\hline
+10~neurons & 0.76\% & 0.97\% & 1.21\% \\
+25~neurons & 1.09\% & 0.73\% & 1.79\% \\
+40~neurons & 0.90\% & 1.02\% & 2.15\% \\
+\hline
+\multicolumn{4}{c}{} \\
+\hline
+Epochs & 1000 & 2500 & 5000 \\ 
+\hline
+\hline
+Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
+\hline
+10~neurons & 14.51\% & 15.22\% & 15.22\% \\
+25~neurons & 16.95\% & 17.57\% & 18.46\% \\
+40~neurons & 17.73\% & 20.75\% & 22.62\% \\
+\hline
+\hline
+Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
+\cline{2-4}
+%Epochs & 1000 & 2500 & 5000 \\ 
+\hline
+10~neurons & 78.98\% & 80.02\% & 79.97\% \\
+25~neurons & 79.19\% & 81.59\% & 81.53\% \\
+40~neurons & 79.64\% & 81.37\% & 81.37\% \\
+\hline
+\hline
+Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
+\cline{2-4}
+%Epochs & 1000 & 2500 & 5000 \\ 
+\hline
+10~neurons & 3.47\% & 9.98\% & 11.66\% \\
+25~neurons & 3.92\% & 8.63\% & 10.09\% \\
+40~neurons & 3.29\% & 7.19\% & 7.18\% \\
+\hline
+\end{tabular}
+\end{table}
+
+\section{Conclusion}
+
+In  this paper,  we have  established an  equivalence  between chaotic
+iterations,  according to  the Devaney's  definition of  chaos,  and a
+class  of  multilayer perceptron  neural  networks.  Firstly, we  have
+described how to build a neural network that can be trained to learn a
+given chaotic map function.  Then,  we found a condition that allow to
+check whether the iterations induced by a function are chaotic or not,
+and thus  if a chaotic map  is obtained. Thanks to  this condition our
+approach is not limited to a particular function. In the dual case, we
+show  that  checking  if  a  neural network  is  chaotic  consists  in
+verifying  a property  on an  associated  graph, called  the graph  of
+iterations.   These results  are valid  for recurrent  neural networks
+with a  particular architecture.  However,  we believe that  a similar
+work can be done for  other neural network architectures.  Finally, we
+have  discovered at  least one  family of  problems with  a reasonable
+size, such  that artificial neural  networks should not be  applied in
+the  presence  of chaos,  due  to  their  inability to  learn  chaotic
+behaviors in this  context.  Such a consideration is  not reduced to a
+theoretical detail:  this family of discrete  iterations is concretely
+implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
+steganographic   detectors  embed  tools   like  neural   networks  to
+distinguish between  original and stego contents, our  studies tend to
+prove that such  detectors might be unable to  tackle with chaos-based
+information hiding schemes.  Furthermore, iterations such that not all
+of  the  components  are updated  at  each  step  are very  common  in
+biological  and  physics mechanisms.   Therefore,  one can  reasonably
+wonder whether neural networks should be applied in these contexts.
+
+In  future  work we  intend  to  enlarge  the comparison  between  the
+learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
+computational intelligence tools such  as support vector machines will
+be investigated  too, to  discover which tools  are the  most relevant
+when facing a truly chaotic phenomenon.  A comparison between learning
+rate  success  and  prediction  quality will  be  realized.   Concrete
+consequences in biology, physics, and computer science security fields
+will be  stated.  Lastly,  thresholds separating systems  depending on
+the ability to learn their dynamics will be established.
+
+\bibliography{chaos-paper}% Produces the bibliography via BibTeX.
+
+\end{document}
+%
+% ****** End of file chaos-paper.tex ******