C'est de la merde ce git

[chaos1.git] / main.tex
diff --git a/main.tex b/main.tex

index 545524feed5c07a727395d780b39820755607c45..49564db89def8615a0ad19f0d6b20ab7b6a7bc4f 100644 (file)
--- a/main.tex
+++ b/main.tex
@@ -39,9 +39,10 @@ preprint,%
  
  \begin{document}
  
-\title[Recurrent Neural Networks and Chaos]{Recurrent Neural Networks 
-and Chaos: Construction, Evaluation, \\
-and Prediction Ability}
+\title[Neural Networks and Chaos]{Neural Networks and Chaos:
+Construction, Evaluation of Chaotic Networks, \\
+and Prediction of Chaos with Multilayer Feedforward Networks
+}
  
  \author{Jacques M. Bahi}
  \author{Jean-Fran\c{c}ois Couchot}
@@ -58,6 +59,8 @@ IUT de Belfort-Montb\'eliard, BP 527, \\
  \date{\today}
  
  \newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
+\newcommand{\JFC}[1]{\begin{color}{blue}\textit{#1}\end{color}}
+\newcommand{\MS}[1]{\begin{color}{green}\textit{#1}\end{color}}
  
  \begin{abstract}
  %% Text of abstract
@@ -65,50 +68,48 @@ Many  research works  deal with  chaotic neural  networks  for various
  fields  of application. Unfortunately,  up to  now these  networks are
  usually  claimed to be  chaotic without  any mathematical  proof.  The
  purpose of this paper is to establish, based on a rigorous theoretical
-framework, an equivalence between  chaotic iterations according to 
-Devaney  and  a  particular  class of  neural
-networks. On the one hand we show  how to build such a network, on the
-other  hand we provide  a method  to check  if a  neural network  is a
-chaotic one.  Finally, the ability of classical feedforward multilayer
-perceptrons to  learn sets of  data obtained from a  chaotic dynamical
-system is regarded.  Various  Boolean functions are iterated on finite
-states. Iterations of some of them  are proven to  be chaotic 
- as it is  defined by
-Devaney.  In that context, important differences occur in the training
-process,  establishing  with  various  neural  networks  that  chaotic
-behaviors are far more difficult to learn.
+framework,  an  equivalence between  chaotic  iterations according  to
+Devaney and a particular class of  neural networks. On the one hand we
+show  how to build  such a  network, on  the other  hand we  provide a
+method to  check if a neural  network is a chaotic  one.  Finally, the
+ability of classical feedforward  multilayer perceptrons to learn sets
+of data obtained from a dynamical system is regarded.  Various Boolean
+functions are  iterated on finite  states. Iterations of some  of them
+are  proven to  be  chaotic as  it  is defined  by  Devaney.  In  that
+context,  important   differences  occur  in   the  training  process,
+establishing with  various neural networks that  chaotic behaviors are
+far more difficult to learn.
  \end{abstract}
  
  %%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
-                             % Classification Scheme.
+% Classification Scheme.
  %%\keywords{Suggested keywords}%Use showkeys class option if keyword
-                              %display desired
+%display desired
  \maketitle
  
  \begin{quotation}
-
  Chaotic neural  networks have received a  lot of attention  due to the
  appealing   properties  of   deterministic   chaos  (unpredictability,
  sensitivity,  and so  on). However,  such networks  are  often claimed
  chaotic without  any rigorous mathematical proof.   Therefore, in this
  work  a theoretical  framework based  on the  Devaney's  definition of
-chaos  is introduced.   Starting with  a relationship  between chaotic
+chaos is  introduced.  Starting  with a relationship  between discrete
  iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
-recurrent  neural networks  that is  equivalent to  a chaotic  map and
-secondly  a way  to check  whether  an already  available network,  is
-chaotic  or not.  We also  study different  topological  properties of
+recurrent  neural network  that is  equivalent  to a  chaotic map  and
+secondly  a way  to  check  whether an  already  available network  is
+chaotic  or not.  We  also study  different topological  properties of
  these  truly  chaotic neural  networks.   Finally,  we  show that  the
-learning,  with neural  networks  having a  feedforward structure,  of
-chaotic behaviors represented by data sets obtained from chaotic maps,
-is far more difficult than non chaotic behaviors.
+learning,  with   neural  networks  having   a  classical  feedforward
+structure, of chaotic behaviors represented by data sets obtained from
+chaotic maps, is far more difficult than non chaotic behaviors.
  \end{quotation}
  
  %% Main text
  \section{Introduction}
  \label{S1}
  
-Several research  works have proposed  or run chaotic  neural networks
-these last  years.  The complex dynamics  of such a  networks leads to
+Several research  works have proposed or used  chaotic neural networks
+these  last years.   The complex  dynamics  of such  networks lead  to
  various       potential      application       areas:      associative
  memories~\cite{Crook2007267}  and  digital  security tools  like  hash
  functions~\cite{Xiao10},                                       digital
@@ -117,11 +118,11 @@ schemes~\cite{Lian20091296}.  In the  former case, the background idea
  is to  control chaotic dynamics in  order to store  patterns, with the
  key advantage  of offering a  large storage capacity.  For  the latter
  case,   the  use   of   chaotic  dynamics   is   motivated  by   their
-unpredictability and  random-like behaviors.  Thus,  investigating new
-concepts is crucial in this  field, because new threats are constantly
-emerging.   As an illustrative  example, the  former standard  in hash
-functions,  namely the  SHA-1  algorithm, has  been recently  weakened
-after flaws were discovered.
+unpredictability and random-like behaviors.  Indeed, investigating new
+concepts  is crucial  for  the computer  security  field, because  new
+threats  are constantly  emerging.   As an  illustrative example,  the
+former  standard in hash  functions, namely  the SHA-1  algorithm, has
+been recently weakened after flaws were discovered.
  
  Chaotic neural networks have  been built with different approaches. In
  the context of associative  memory, chaotic neurons like the nonlinear
@@ -131,48 +132,46 @@ which  is usually  assessed through  the computation  of  the Lyapunov
  exponent.  An alternative approach  is to consider a well-known neural
  network architecture: the  MultiLayer Perceptron (MLP). These networks
  are  suitable to model  nonlinear relationships  between data,  due to
-their universal approximator capacity. Thus, this kind of networks can
-be trained to model a physical  phenomenon known to be chaotic such as
-Chua's circuit  \cite{dalkiran10}.  Sometimes, a  neural network which
-is build by combining  transfer functions and initial conditions that are both
-chaotic,      is      itself      claimed      to      be      chaotic
+their            universal            approximator            capacity
+\cite{Cybenko89,DBLP:journals/nn/HornikSW89}.   Thus,   this  kind  of
+networks can  be trained  to model a  physical phenomenon known  to be
+chaotic such  as Chua's circuit \cite{dalkiran10}.   Sometime a neural
+network, which  is build by  combining transfer functions  and initial
+conditions  that are  both chaotic,  is itself  claimed to  be chaotic
  \cite{springerlink:10.1007/s00521-010-0432-2}.
  
  What all of these chaotic neural  networks have in common is that they
  are claimed to be chaotic  despite a lack of any rigorous mathematical
-proof.   The first contribution of  this paper  is  to fill  this  gap, 
-using  a theoretical  framework  based on  the  Devaney's  definition of  chaos
-\cite{Devaney}.   This  mathematical  theory  of chaos  provides  both
+proof.   The first contribution  of this  paper is  to fill  this gap,
+using  a theoretical framework  based on  the Devaney's  definition of
+chaos \cite{Devaney}.  This mathematical theory of chaos provides both
  qualitative and quantitative tools to evaluate the complex behavior of
  a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
  precisely, in  this paper,  which is an  extension of a  previous work
-\cite{bgs11:ip},   we  establish   the  equivalence  between   chaotic
-iterations and  a class of globally  recurrent MLP.
-The investigation the  converse problem is the second contribution: 
-we indeed study the  ability for
-classical MultiLayer Perceptrons  to learn a particular family of
-discrete  chaotic  dynamical  systems.   This family,  called  chaotic
-iterations, is defined by a  Boolean vector, an update function, and a
-sequence giving  which component  to update at  each iteration.   It has
-been  previously established  that such  dynamical systems is 
-chaotically iterated (as it is defined by Devaney) when the chosen function has
-a  strongly connected  iterations graph.   In this  document,  we 
-experiment several MLPs and try to learn some iterations of this kind.
-We  show that non-chaotic iterations can be learned, whereas it is
-far  more  difficult  for chaotic  ones.   That  is  to say,  we  have
-discovered at  least one  family of problems  with a  reasonable size,
-such  that artificial  neural networks  should not  be applied  
-due to their inability to  learn chaotic behaviors in this context.
+\cite{bgs11:ip},   we  establish   the  equivalence   between  chaotic
+iterations  and  a  class  of  globally  recurrent  MLP.   The  second
+contribution is a study of the converse problem, indeed we investigate
+the ability of classical  multilayer perceptrons to learn a particular
+family of discrete chaotic  dynamical systems.  This family is defined
+by a Boolean vector, an update function, and a sequence defining the
+component  to  update  at  each  iteration.  It  has  been  previously
+established that  such dynamical systems are  chaotically iterated (as
+it  is defined by  Devaney) when  the chosen  function has  a strongly
+connected iterations  graph.  In this document,  we experiment several
+MLPs and  try to  learn some  iterations of this  kind.  We  show that
+non-chaotic  iterations  can  be  learned,  whereas  it  is  far  more
+difficult for  chaotic ones.   That is to  say, we have  discovered at
+least  one  family of  problems  with  a  reasonable size,  such  that
+artificial  neural  networks  should  not  be  applied  due  to  their
+inability to learn chaotic behaviors in this context.
  
  The remainder of this research  work is organized as follows. The next
-section is devoted  to the basics of Devaney's
-chaos.   Section~\ref{S2} formally  describes  how to  build a  neural
-network  that operates  chaotically.  Section~\ref{S3} is 
-devoted to the dual case of  checking whether an existing neural network
-is chaotic  or not. 
-Topological properties  of   chaotic  neural  networks  
-are discussed in Sect.~\ref{S4}.   The
-Section~\ref{section:translation}   shows   how   to  translate   such
+section is devoted to the basics of Devaney's chaos.  Section~\ref{S2}
+formally  describes  how  to  build  a neural  network  that  operates
+chaotically.  Section~\ref{S3} is devoted to the dual case of checking
+whether  an existing neural  network is  chaotic or  not.  Topological
+properties of chaotic neural networks are discussed in Sect.~\ref{S4}.
+The  Section~\ref{section:translation}  shows  how to  translate  such
  iterations  into  an Artificial  Neural  Network  (ANN),  in order  to
  evaluate the  capability for this  latter to learn  chaotic behaviors.
  This  ability  is  studied in  Sect.~\ref{section:experiments},  where
@@ -185,15 +184,18 @@ is summed up and intended future work is exposed.
  \section{Chaotic Iterations according to  Devaney}
  
  In this section, the well-established notion of Devaney's mathematical
-chaos is  firstly recalled.  Preservation of  the unpredictability of
+chaos is  firstly recalled.   Preservation of the  unpredictability of
  such dynamical  system when implemented  on a computer is  obtained by
-using  some discrete iterations  called ``asynchronous  iterations'', which
-are thus introduced.  The result establishing the link between such
-iterations and Devaney's chaos is  finally presented at the end of this
-section.
+using  some discrete  iterations  called ``asynchronous  iterations'',
+which are  thus introduced.  The result establishing  the link between
+such iterations and Devaney's chaos is finally presented at the end of
+this section.
  
  In what follows and for  any function $f$, $f^n$ means the composition
-$f \circ f \circ \hdots \circ f$ ($n$ times).
+$f \circ f \circ \hdots \circ f$ ($n$ times) and an {\bf iteration} of
+a {\bf  dynamical system}  is the step  that consists in  updating the
+global  state  $x^t$  at time  $t$  with  respect  to a  function  $f$
+s.t. $x^{t+1} = f(x^t)$.
  
  \subsection{Devaney's chaotic dynamical systems}
  
@@ -213,7 +215,7 @@ mathematically this  kind of unpredictability  is also referred  to as
  deterministic chaos.  For example, many weather forecast models exist,
  but they give only suitable predictions for about a week, because they
  are initialized with conditions  that reflect only a partial knowledge
-of the current weather.  Even  the  differences are  initially small,
+of the current weather.  Even  if the differences are initially small,
  they are  amplified in the course  of time, and thus  make difficult a
  long-term prediction.  In fact,  in a chaotic system, an approximation
  of  the current  state is  a  quite useless  indicator for  predicting
@@ -222,86 +224,70 @@ future states.
  From  mathematical  point  of   view,  deterministic  chaos  has  been
  thoroughly studied  these last decades, with  different research works
  that  have   provide  various  definitions  of   chaos.   Among  these
-definitions,   the  one  given   by  Devaney~\cite{Devaney}   is  
-well-established.    This   definition   consists  of   three   conditions:
+definitions,    the   one    given   by    Devaney~\cite{Devaney}   is
+well-established.   This  definition  consists  of  three  conditions:
  topological  transitivity, density of  periodic points,  and sensitive
  point dependence on initial conditions.
  
-Topological  transitivity   is  checked  when,  for   any  point,  any
+{\bf  Topological transitivity} is  checked when,  for any  point, any
  neighborhood of its future evolution eventually overlap with any other
-given region. More precisely,
-
-\begin{definition} \label{def2}
-A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
-is defined  to be  {\bf topologically transitive}  if for any  pair of
-open  sets $U$,  $V  \in  \mathcal{X}$ there  exists  
-$k \in
-\mathds{N}^{\ast}$
- such  that
-$f^k(U) \cap V \neq \emptyset$.
-\end{definition}
-
-This property  implies that a  dynamical system cannot be  broken into
-simpler  subsystems.  
-Intuitively, its complexity does not allow any simplification.
-On  the contrary,  a dense set  of periodic points  is an
-element of regularity that a chaotic dynamical system has to exhibit.
-
-\begin{definition} \label{def3}
-A point $x$ is called a  {\bf periodic point} for $f$ of period~$n \in
-\mathds{N}^{\ast}$ if $f^{n}(x)=x$.
-\end{definition}
-
-\begin{definition} \label{def4}
-$f$ is said to be {\bf  regular} on $(\mathcal{X},\tau)$ if the set of
-  periodic points  for $f$ is  dense in $\mathcal{X}$ ( for any $x \in
-  \mathcal{X}$, we can find at least  one periodic point in any of its
-  neighborhood).
-\end{definition}
-
-This  regularity ``counteracts'' the  effects of  transitivity.  Thus,
-due to these two properties, two points close to each other can behave
-in a completely different  manner, leading to unpredictability for the
-whole system. Then,
-
-\begin{definition} \label{sensitivity}
-$f$  has {\bf  sensitive dependence  on initial  conditions}  if there
-  exists $\delta  >0$ such  that, for any  $x\in \mathcal{X}$  and any
-  neighborhood $V$ of $x$, there exist  $y\in V$ and $n > 0$ such that
-  $d\left(f^{n}(x), f^{n}(y)\right) >\delta  $. The value $\delta$ is called the
-  {\bf constant of sensitivity} of $f$.
-\end{definition}
-
-Finally,
-
-\begin{definition} \label{def5}
-The dynamical system that iterates $f$ is {\bf chaotic according to Devaney}  
-on $(\mathcal{X},\tau)$ if $f$  is   regular,  topologically  transitive,   
-and  has  sensitive dependence to its initial conditions.
-\end{definition}
-
-Let us notice that for a  metric space the last condition follows from
-the two first ones~\cite{Banks92}.
+given region.  This property implies that a dynamical system cannot be
+broken into simpler subsystems.   Intuitively, its complexity does not
+allow any  simplification.  
+
+However, chaos needs some regularity to ``counteracts'' the effects of
+transitivity. % Thus, two points close to each other can behave in a completely different manner, the first one visiting the whole space whereas the second one has a regular orbit.
+In the Devaney's formulation, a dense set  of periodic
+points is the element of regularity that a chaotic dynamical system has
+to exhibit.
+%\begin{definition} \label{def3}
+We recall that a  point $x$ is a {\bf periodic point} for $f$ of 
+period~$n \in \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
+%\end{definition}
+Then, the map 
+%\begin{definition} \label{def4}
+$f$ is {\bf regular}  on the topological space $(\mathcal{X},\tau)$ if
+the set of periodic points for  $f$ is dense in $\mathcal{X}$ (for any
+$x \in \mathcal{X}$, we can find at least one periodic point in any of
+its neighborhood).
+%\end{definition}
+Thus, due to these two properties,  two points close to each other can
+behave in  a completely different manner,  leading to unpredictability
+for the whole system.
+
+Let  us recall  that  $f$  has {\bf  sensitive  dependence on  initial
+  conditions} if  there exists  $\delta >0$ such  that, for  any $x\in
+\mathcal{X}$ and any neighborhood $V$ of $x$, there exist $y\in V$ and
+$n  > 0$ such  that $d\left(f^{n}(x),  f^{n}(y)\right) >\delta  $. The
+value $\delta$ is called the {\bf constant of sensitivity} of $f$.
+
+Finally,  the  dynamical system  that  iterates  $f$  is {\bf  chaotic
+  according  to Devaney}  on $(\mathcal{X},\tau)$  if $f$  is regular,
+topologically transitive, and has  sensitive dependence to its initial
+conditions.   In  what follows,  iterations  are  said  to be  chaotic
+(according  to Devaney)  when  the corresponding  dynamical system  is
+chaotic, as it is defined in the Devaney's formulation.
+
+%Let us notice that for a  metric space the last condition follows from
+%the two first ones~\cite{Banks92}.
  
  \subsection{Asynchronous Iterations}
  
  %This section  presents some basics on  topological chaotic iterations.
  Let us  firstly discuss about the  domain of iteration.  As  far as we
-know, no result rules that the chaotic behavior of a dynamical system 
-that has been theoretically proven on  $\R$ remains valid on the
-floating-point
-numbers, which is  the implementation domain.  Thus, to  avoid loss of
-chaos this  work presents an  alternative, that is to  iterate Boolean
-maps:  results that  are  theoretically obtained  in  that domain  are
-preserved in implementations.
+know, no result rules that  the chaotic behavior of a dynamical system
+that  has been  theoretically  proven  on $\R$  remains  valid on  the
+floating-point numbers, which is  the implementation domain.  Thus, to
+avoid  loss of chaos  this work  presents an  alternative, that  is to
+iterate Boolean maps: results  that are theoretically obtained in that
+domain are preserved in implementations.
  
  Let us denote by $\llbracket  a ; b \rrbracket$ the following interval
-of integers:  $\{a, a+1,  \hdots, b\}$, where $a~<~b$.  
-In that section,  a  system 
-under   consideration    iteratively   modifies   a    collection   of
-$n$~components.   Each component  $i \in  \llbracket 1;  n \rrbracket$
-takes  its  value  $x_i$  among the  domain  $\Bool=\{0,1\}$.   A~{\it
-  configuration} of the system at discrete time $t$  is the vector
+of integers: $\{a, a+1, \hdots, b\}$, where $a~<~b$.  In this section,
+a {\it  system} under consideration iteratively  modifies a collection
+of $n$~components.  Each component  $i \in \llbracket 1; n \rrbracket$
+takes  its  value $x_i$  among  the  domain  $\Bool=\{0,1\}$.  A  {\it
+  configuration} of the system at discrete time $t$ is the vector
  %\begin{equation*}   
  $x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
  %\end{equation*}
@@ -313,33 +299,45 @@ $f(x)=(f_1(x),\ldots,f_n(x))$.
  % Notice that $f^k$ denotes the 
  % $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
  
-Let    be   given    a    configuration   $x$.    In   what    follows
+Let    be   given    a   configuration    $x$.    In    what   follows
  $N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$  is  the configuration
  obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
  indeed  the negation  of $x_i$).   Intuitively, $x$  and  $N(i,x)$ are
  neighbors.   The discrete  iterations of  $f$ are  represented  by the
  oriented  {\it graph  of iterations}  $\Gamma(f)$.  In  such  a graph,
  vertices are configurations  of $\Bool^n$ and there is  an arc labeled
-$i$ from $x$ to $N(i,x)$ if and only if  $f_i(x)$ is $N(i,x)$.
+$i$ from $x$ to $N(i,x)$ if and only if $f_i(x)$ is $N(i,x)$. 
  
  In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
-sequence  defining which  component to  update at  time $t$  and $S^{t}$
-denotes its  $t-$th term. 
-This iteration scheme that only modifies one element at each iteration 
-is clasically refered as \emph{asynchronous iterations}. 
-Next section shows the link between asynchronous iterations and 
-Devaney's Chaos.
+sequence defining  which component to  update at time $t$  and $S^{t}$
+denotes its $t-$th term.  This iteration scheme that only modifies one
+element at each iteration is usually referred as {\it asynchronous
+  iterations}.  More precisely, we have for any $i$, $1\le i \le n$,
+\begin{equation}
+\left\{ \begin{array}{l}
+x^{0}  \in \Bool^n \\ 
+x^{t+1}_i = \left\{ 
+\begin{array}{l}
+  f_i(x^t) \textrm{ if $S^t = i$}\enspace , \\
+  x_i^t \textrm{ otherwise}\enspace . 
+ \end{array}
+\right.
+\end{array} \right.
+\end{equation}
+
+Next  section  shows  the  link between  asynchronous  iterations  and
+Devaney's chaos.
  
  \subsection{On the link between asynchronous iterations and
    Devaney's Chaos}
  
  In  this subsection  we recall  the link  we have  established between
-asynchronous iterations and Devaney's  chaos.  The theoretical framework is
-fully described in \cite{guyeux09}.
+asynchronous   iterations  and   Devaney's  chaos.    The  theoretical
+framework is fully described in \cite{guyeux09}.
  
-We  introduce the function $F_{f}$  that is
-defined  for  any given  application  $f:\Bool^{n}  \to \Bool^{n}$  by
-$F_{f}:   \llbracket1;n\rrbracket\times   \mathds{B}^{n}   \rightarrow
+We  introduce the  function  $F_{f}$  that is  defined  for any  given
+application     $f:\Bool^{n}     \to     \Bool^{n}$     by     $F_{f}:
+\llbracket1;n\rrbracket\times        \mathds{B}^{n}        \rightarrow
  \mathds{B}^{n}$, s.t.
  \begin{equation}
  \label{eq:CIs}
@@ -352,8 +350,8 @@ $F_{f}:   \llbracket1;n\rrbracket\times   \mathds{B}^{n}   \rightarrow
      \right. 
  \end{equation}
  
-\noindent With such a notation, configurations are defined for times 
-$t=0,1,2,\ldots$ by:
+\noindent With such a notation, asynchronously obtained configurations
+are defined for times \linebreak $t=0,1,2,\ldots$ by:
  \begin{equation}\label{eq:sync}   
  \left\{\begin{array}{l}   
    x^{0}\in \mathds{B}^{n} \textrm{ and}\\
@@ -375,17 +373,17 @@ X^{k+1}& = & G_{f}(X^{k})\\
  \right.
  \label{eq:Gf}
  \end{equation}
-where  $\sigma$ is the  function that  removes the  first term  of the
-strategy  ({\it i.e.},~$S^0$).   This definition  means that  only one
-component  of the  system is  updated  at an  iteration: the  $S^t$-th
-element. But it can be extended by considering subsets for $S^t$.
-
-
-
-
-The   {\bf   distance}   $d$    between   two   points   $(S,x)$   and
-$(\check{S},\check{x})\in  \mathcal{X} = \llbracket1;n\rrbracket^\Nats
-\times \Bool^{n}$ is defined by
+where $\sigma$ is the so-called  shift function that removes the first
+term of  the strategy ({\it i.e.},~$S^0$).  This  definition allows to
+link asynchronous iterations with  classical iterations of a dynamical
+system. Note that it can be extended by considering subsets for $S^t$.
+
+To study topological properties of  these iterations, we are then left
+to  introduce a  {\bf distance}  $d$  between two  points $(S,x)$  and
+$(\check{S},\check{x})$           in           $\mathcal{X}          =
+\llbracket1;n\rrbracket^\Nats \times \Bool^{n}$. Let $\Delta(x,y) = 0$
+if   $x=y$,  and   $\Delta(x,y)  =   1$   else,  be   a  distance   on
+$\mathds{B}$. The distance $d$ is defined by
  \begin{equation}
  d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
  \enspace ,
@@ -403,28 +401,40 @@ d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
  
  This    distance    is    defined    to    reflect    the    following
  information. Firstly, the more  two systems have different components,
-the  larger the  distance  between them is.  Secondly,  two systems  with
+the  larger the  distance between  them.  Secondly,  two  systems with
  similar components and strategies, which have the same starting terms,
-must  induce only  a small  distance.  The  proposed  distance fulfill
+must  induce only a  small distance.   The proposed  distance fulfills
  these  requirements: on  the one  hand  its floor  value reflects  the
  difference between  the cells, on  the other hand its  fractional part
  measures the difference between the strategies.
  
-The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
+The relation  between $\Gamma(f)$ and  $G_f$ is obvious: there  exists a
  path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
-strategy  $s$ such  that  the  parallel iteration  of  $G_f$ from  the
-initial  point $(s,x)$  reaches  the configuration  $x'$.  Using  this
-link, Guyeux~\cite{GuyeuxThese10} has proven that,
+strategy  $s$ such  that iterations  of $G_f$  from the  initial point
+$(s,x)$   reach   the   configuration   $x'$.   Using   this   link,
+Guyeux~\cite{GuyeuxThese10} has proven that,
  \begin{theorem}%[Characterization of $\mathcal{C}$]
  \label{Th:Caracterisation   des   IC   chaotiques}  
  Let $f:\Bool^n\to\Bool^n$. Iterations of $G_f$ are chaotic  according 
  to  Devaney if and only if  $\Gamma(f)$ is strongly connected.
  \end{theorem}
  
-Checking  if a  graph  is  strongly connected  is  not difficult.  For
-example,  consider the  function $f_1\left(x_1,\dots,x_n\right)=\left(
-\overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$. As $\Gamma(f_1)$ is
-obviously strongly connected, then $G_{f_1}$ is a chaotic map.
+Checking if  a graph  is strongly connected  is not difficult  (by the
+Tarjan's algorithm for  instance).  Let be given a  strategy $S$ and a
+function  $f$ such that  $\Gamma(f)$ is  strongly connected.   In that
+case, iterations of the function $G_f$ as defined in Eq.~(\ref{eq:Gf})
+are chaotic according to Devaney.
+
+
+Let   us  then   define  two   functions  $f_0$   and  $f_1$   both  in
+$\Bool^n\to\Bool^n$ that are used all along this paper.  The former is
+the   vectorial  negation,   \textit{i.e.},  $f_{0}(x_{1},\dots,x_{n})
+=(\overline{x_{1}},\dots,\overline{x_{n}})$.      The     latter    is
+$f_1\left(x_1,\dots,x_n\right)=\left(
+\overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$.  It is not hard to check
+that $\Gamma(f_0)$ and $\Gamma(f_1)$ are both strongly connected, then
+iterations  of $G_{f_0}$  and of  $G_{f_1}$ are  chaotic  according to
+Devaney.
  
  With this  material, we are now  able to build a  first chaotic neural
  network, as defined in the Devaney's formulation.
@@ -432,18 +442,15 @@ network, as defined in the Devaney's formulation.
  \section{A chaotic neural network in the sense of Devaney}
  \label{S2}
  
-Let     us     firstly     introduce    the     vectorial     negation
-$f_{0}(x_{1},\dots,x_{n}) =(\overline{x_{1}},\dots,\overline{x_{n}})$,
-which is  such that $\Gamma(f_0)$ is  strongly connected.  Considering
-the map  $F_{f_0}:\llbracket 1;  n \rrbracket \times  \mathds{B}^n \to
-\mathds{B}^n$  associated to the  vectorial negation,  we can  build a
-multilayer perceptron  neural network modeling  $F_{f_0}$.  Hence, for
-all inputs  $(s,x) \in \llbracket  1;n\rrbracket \times \mathds{B}^n$,
-the output layer will produce  $F_{f_0}(s,x)$.  It is then possible to
-link  the output  layer  and the  input  one, in  order  to model  the
-dependence between two successive iterations.  As a result we obtain a
-global  recurrent   neural  network  that  behaves   as  follows  (see
-Fig.~\ref{Fig:perceptron}).
+Let  us   build  a  multilayer  perceptron   neural  network  modeling
+$F_{f_0}:\llbracket   1;   n   \rrbracket  \times   \mathds{B}^n   \to
+\mathds{B}^n$ associated  to the vectorial  negation.  More precisely,
+for   all   inputs   $(s,x)   \in  \llbracket   1;n\rrbracket   \times
+\mathds{B}^n$, the  output layer  will produce $F_{f_0}(s,x)$.   It is
+then possible to link the output  layer and the input one, in order to
+model the  dependence between two successive iterations.   As a result
+we obtain  a global recurrent  neural network that behaves  as follows
+(see Fig.~\ref{Fig:perceptron}).
  
  \begin{itemize}
  \item   The   network   is   initialized   with   the   input   vector
@@ -455,9 +462,9 @@ Fig.~\ref{Fig:perceptron}).
  \item When  the network  is activated at  the $t^{th}$  iteration, the
    state of the system $x^t  \in \mathds{B}^n$ received from the output
    layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
-  ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
-  output.   This new  output, which  represents the  new state  of the
-  dynamical system, satisfies:
+  (\textit{i.e.},  $S^0  \in \llbracket  1;n\rrbracket$)  are used  to
+  compute the  new output vector.   This new vector,  which represents
+  the new state of the dynamical system, satisfies:
    \begin{equation}
      x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
    \end{equation}
@@ -475,14 +482,14 @@ is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
  given  as outside  input, then  the sequence  of  successive published
  output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
  the  one produced  by  the chaotic  iterations  formally described  in
-Eq.~(\ref{eq:CIs}).  It  means that  mathematically if we  use similar
+Eq.~(\ref{eq:Gf}).   It means  that mathematically  if we  use similar
  input  vectors   they  both  generate  the   same  successive  outputs
  $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
  are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
-$\mathcal{X}$. Finally, since the  proposed neural network is build to
-model  the  behavior  of  $G_{f_0}$,  which is  chaotic  according  to
-Devaney's definition  of chaos,  we can conclude  that the  network is
-also chaotic in this sense.
+$\mathcal{X}$. Finally, since the  proposed neural network is built to
+model  the  behavior  of   $G_{f_0}$,  whose  iterations  are  chaotic
+according to the  Devaney's definition of chaos, we  can conclude that
+the network is also chaotic in this sense.
  
  The previous construction scheme  is not restricted to function $f_0$.
  It can  be extended to any function  $f$ such that $G_f$  is a chaotic
@@ -497,24 +504,23 @@ chaotic neural network by using $f_1$ instead of $f_0$.
  \label{S3}
  
  We focus now on the case  where a neural network is already available,
-and for which we  want to know if it is chaotic. Typically, in
-many research papers neural network  are usually claimed to be chaotic
+and for  which we want  to know if  it is chaotic. Typically,  in many
+research  papers neural  network  are usually  claimed  to be  chaotic
  without any  convincing mathematical proof. We propose  an approach to
  overcome  this  drawback  for  a  particular  category  of  multilayer
  perceptrons defined below, and for the Devaney's formulation of chaos.
  In  spite of  this restriction,  we think  that this  approach  can be
-extended to  a large variety  of neural networks.  We plan to  study a
-generalization of this approach in a future work.
+extended to a large variety of neural networks.
  
-We consider a multilayer perceptron  of the following form: inputs
-are $n$ binary digits and  one integer value, while outputs are  $n$
-bits.   Moreover, each  binary  output is  connected  with a  feedback
-connection to an input one.
+We consider a multilayer perceptron  of the following form: inputs are
+$n$ binary digits  and one integer value, while  outputs are $n$ bits.
+Moreover, each  binary output is connected with  a feedback connection
+to an input one.
  
  \begin{itemize}
-\item During  initialization, the network is seeded with $n$~bits denoted
-  $\left(x^0_1,\dots,x^0_n\right)$  and an  integer  value $S^0$  that
-  belongs to $\llbracket1;n\rrbracket$.
+\item  During  initialization, the  network  is  seeded with  $n$~bits
+  denoted $\left(x^0_1,\dots,x^0_n\right)$ and  an integer value $S^0$
+  that belongs to $\llbracket1;n\rrbracket$.
  \item     At     iteration~$t$,     the     last     output     vector
    $\left(x^t_1,\dots,x^t_n\right)$   defines  the  $n$~bits   used  to
    compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
@@ -538,16 +544,19 @@ $f\left(x_1,x_2,\dots,x_n\right)$ is equal to
  \left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
    F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
  \end{equation}
-Then $F=F_f$. If this recurrent  neural network is seeded with 
+Thus, for any $j$, $1 \le j \le n$, we have 
+$f_j\left(x_1,x_2,\dots,x_n\right) = 
+F\left(j,\left(x_1,x_2,\dots,x_n\right)\right)$.
+If this recurrent  neural network is seeded with 
  $\left(x_1^0,\dots,x_n^0\right)$    and   $S   \in    \llbracket   1;n
  \rrbracket^{\mathds{N}}$, it produces  exactly the
  same      output      vectors  than the 
  chaotic iterations of $F_f$  with initial
  condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
  \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.
-Theoretically speaking, such iterations of $F_f$ are thus a formal model of  
-these kind of recurrent neural networks. In the  rest  of this
-paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
+Theoretically  speaking, such iterations  of $F_f$  are thus  a formal
+model of these kind of recurrent  neural networks. In the rest of this
+paper, we will call such multilayer perceptrons ``CI-MLP($f$)'', which
  stands for ``Chaotic Iterations based MultiLayer Perceptron''.
  
  Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
@@ -572,28 +581,37 @@ A  function   $f$  is   said  to  be   {\bf  expansive}   if  $\exists
  that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
  \end{definition}
  
+\noindent In  other words, a small  error on any  initial condition is
+always amplified  until $\varepsilon$,  which denotes the  constant of
+expansivity of $f$.
+
  \begin{definition} \label{def9}
  A discrete dynamical  system is said to be  {\bf topologically mixing}
  if  and only  if,  for any  pair  of disjoint  open  sets $U$,$V  \neq
-\emptyset$, we can find some $n_0  \in \mathds{N}$ such that  for any $n$, 
-$n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
+\emptyset$, we  can find some $n_0  \in \mathds{N}$ such  that for any
+$n$, $n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
  \end{definition}
  
-As  proven in Ref.~\cite{gfb10:ip},  chaotic iterations  are expansive
-and  topologically mixing when  $f$ is  the vectorial  negation $f_0$.
-Consequently,  these  properties are  inherited  by the  CI-MLP($f_0$)
-recurrent neural network previously presented, which induce a greater
-unpredictability.  Any  difference on the  initial value of  the input
-layer is  in particular  magnified up to  be equal to  the expansivity
-constant.
+\noindent Topologically mixing means that the dynamical system evolves
+in  time such that  any given  region of  its topological  space might
+overlap with any other region.
+
+It has been proven in Ref.~\cite{gfb10:ip} that chaotic iterations are
+expansive and topologically mixing  when $f$ is the vectorial negation
+$f_0$.    Consequently,  these   properties  are   inherited   by  the
+CI-MLP($f_0$)  recurrent neural  network  previously presented,  which
+induce  a greater  unpredictability.   Any difference  on the  initial
+value of the input layer is  in particular magnified up to be equal to
+the expansivity constant.
  
-Let us then focus on the consequences for  a neural  network to  be chaotic
-according  to  Devaney's definition.  First  of  all, the  topological
-transitivity property implies indecomposability.
+Let  us then  focus on  the consequences  for a  neural network  to be
+chaotic   according  to   Devaney's   definition.   Intuitively,   the
+topological transitivity property  implies indecomposability, which is
+formally defined as follows:
  
  \begin{definition} \label{def10}
-A   dynamical   system   $\left(   \mathcal{X},  f\right)$   is   {\bf
-not decomposable}  if it  is not  the  union of  two closed  sets $A,  B
+A  dynamical  system  $\left(   \mathcal{X},  f\right)$  is  {\bf  not
+  decomposable}  if it  is not  the  union of  two closed  sets $A,  B
  \subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
  \end{definition}
  
@@ -604,41 +622,47 @@ strongly transitive:
  
  \begin{definition} \label{def11}
  A  dynamical system  $\left( \mathcal{X},  f\right)$ is  {\bf strongly
-transitive} if $\forall x,y  \in \mathcal{X}$, $\forall r>0$, $\exists
-z  \in  \mathcal{X}$,  $d(z,x)~\leq~r  \Rightarrow  \exists  n  \in
+  transitive}  if  $\forall   x,y  \in  \mathcal{X}$,  $\forall  r>0$,
+$\exists z \in \mathcal{X}$,  $d(z,x)~\leq~r \Rightarrow \exists n \in
  \mathds{N}^{\ast}$, $f^n(z)=y$.
  \end{definition}
-According to this definition, for all  pairs of points $(x, y)$ in the
-phase space, a point $z$ can  be found in the neighborhood of $x$ such
-that one of its iterates $f^n(z)$ is $y$. Indeed, this result has been
-established  during  the  proof   of  the  transitivity  presented  in
-Ref.~\cite{guyeux09}.  Among  other  things, the  strong  transitivity
-leads  to the fact  that without  the knowledge  of the  initial input
-layer, all outputs are possible.  Additionally, no point of the output
-space  can   be  discarded  when  studying  CI-MLPs:   this  space  is
-intrinsically complicated and it cannot be decomposed or simplified.
-
-Furthermore, those  recurrent neural networks  exhibit the instability
+
+\noindent According to  this definition, for all pairs  of points $(x,
+y)$ in the  phase space, a point $z$ can be  found in the neighborhood
+of $x$  such that one  of its iterates  $f^n(z)$ is $y$.  Indeed, this
+result  has been  established  during the  proof  of the  transitivity
+presented  in Ref.~\cite{guyeux09}.   Among other  things,  the strong
+transitivity  leads to  the fact  that  without the  knowledge of  the
+initial input layer, all outputs are possible.  Additionally, no point
+of the output space can be discarded when studying CI-MLPs: this space
+is  intrinsically   complicated  and   it  cannot  be   decomposed  or
+simplified.
+
+Furthermore, these  recurrent neural networks  exhibit the instability
  property:
  \begin{definition}
-A dynamical  system $\left( \mathcal{X}, f\right)$ is  unstable if for
+A dynamical  system $\left( \mathcal{X}, f\right)$ is {\bf unstable}
+if for
  all  $x  \in  \mathcal{X}$,   the  orbit  $\gamma_x:n  \in  \mathds{N}
  \longmapsto f^n(x)$  is unstable,  that means: $\exists  \varepsilon >
  0$, $\forall  \delta>0$, $\exists y  \in \mathcal{X}$, $\exists  n \in
  \mathds{N}$,        such        that        $d(x,y)<\delta$        and
  $d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
  \end{definition}
-This property, which  is implied by the sensitive  point dependence on
-initial conditions, leads to the fact that in all neighborhoods of any
-point $x$, there are points that  can be apart by $\varepsilon$ in the
-future through iterations of the  CI-MLP($f$). Thus, we can claim that
-the behavior of  these MLPs is unstable when  $\Gamma (f)$ is strongly
-connected.
+
+\noindent  This property,  which  is implied  by  the sensitive  point
+dependence  on initial  conditions,  leads  to the  fact  that in  all
+neighborhoods of any point $x$, there  are points that can be apart by
+$\varepsilon$    in   the   future    through   iterations    of   the
+CI-MLP($f$). Thus,  we can  claim that the  behavior of these  MLPs is
+unstable when $\Gamma (f)$ is strongly connected.
  
  Let  us  now consider  a  compact  metric space  $(M,  d)$  and $f:  M
  \rightarrow M$  a continuous map. For  each natural number  $n$, a new
  metric $d_n$ is defined on $M$ by
-$$d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .$$
+\begin{equation}
+d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .
+\end{equation}
  
  Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
  $\varepsilon$-close  with respect to  this metric  if their  first $n$
@@ -652,7 +676,7 @@ at  least $\varepsilon$  apart in  the metric  $d_n$. Denote  by $H(n,
  \varepsilon)$     the    maximum     cardinality     of    an     $(n,
  \varepsilon)$-separated set,
  \begin{definition}
-The {\it topological entropy} of the  map $f$ is defined by (see e.g.,
+The {\bf topological entropy} of the  map $f$ is defined by (see e.g.,
  Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
  $$h(f)=\lim_{\varepsilon\to      0}      \left(\limsup_{n\to      \infty}
  \frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
@@ -666,32 +690,25 @@ of $(\mathcal{X},G_{f_0})$ is infinite.
  
  \begin{figure}
    \centering
-  \includegraphics[scale=0.625]{scheme}
+  \includegraphics[scale=0.5]{scheme}
    \caption{Summary of addressed neural networks and chaos problems}
    \label{Fig:scheme}
  \end{figure}
  
-The Figure~\ref{Fig:scheme} is a summary of addressed neural networks and chaos problems.
-Section~\ref{S2} has explained how to  construct a truly chaotic neural
-networks $A$ for instance.
-Section~\ref{S3} has shown how to check whether a  given MLP
-$A$ or $C$ is chaotic or not in the sens of Devaney.
-%, and  how to study its topological behavior. 
-The last thing to  investigate, when comparing
-neural  networks   and  Devaney's  chaos,  is to  determine  whether
-an artificial neural network $A$  is able to learn or  predict some chaotic
-behaviors of $B$, as  it is defined  in the Devaney's formulation  (when they
-are not specifically constructed for this purpose).  This statement is
-studied in the next section.
-
-
-
-
-
-
-
-\section{Suitability of Artificial Neural Networks 
-for Predicting Chaotic Behaviors}
+Figure~\ref{Fig:scheme} is a summary  of addressed neural networks and
+chaos  problems.   In  Section~\ref{S2}   we  have  explained  how  to
+construct    a    truly    chaotic    neural   networks,    $A$    for
+instance. Section~\ref{S3} has shown how  to check whether a given MLP
+$A$ or $C$ is chaotic or not in the sense of Devaney, and how to study
+its  topological  behavior.   The  last  thing  to  investigate,  when
+comparing neural networks and Devaney's chaos, is to determine whether
+an  artificial neural network  $C$ is  able to  learn or  predict some
+chaotic  behaviors  of  $B$,  as   it  is  defined  in  the  Devaney's
+formulation  (when  they are  not  specifically  constructed for  this
+purpose).  This statement is studied in the next section.
+
+\section{Suitability of Feedforward Neural Networks 
+for Predicting Chaotic and Non-chaotic Behaviors}
  
  In  the context  of computer  science  different topic  areas have  an
  interest       in       chaos,       as       for       steganographic
@@ -701,7 +718,7 @@ extraction takes place once  at destination.  The reverse ({\it i.e.},
  automatically detecting the presence  of hidden messages inside media)
  is  called   steganalysis.   Among  the   deployed  strategies  inside
  detectors,          there         are          support         vectors
-machines~\cite{Qiao:2009:SM:1704555.1704664},                    neural
+machines~\cite{Qiao:2009:SM:1704555.1704664},                   neural
  networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
  Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
  detectors  give quite  good results  and are  rather  competitive when
@@ -718,13 +735,13 @@ chaotic behaviors.
  
  The  problem  of  deciding  whether  classical  feedforward  ANNs  are
  suitable  to approximate  topological chaotic  iterations may  then be
-reduced  to evaluate  ANNs on  iterations of  functions  with Strongly
-Connected  Component  (SCC)~graph  of  iterations.   To  compare  with
-non-chaotic  iterations,  the experiments  detailed  in the  following
-sections are  carried out  using both kinds  of function  (chaotic and
-non-chaotic). Let us emphasize on  the difference between this kind of
-neural   networks  and  the   Chaotic  Iterations   based  MultiLayer
-Perceptron.
+reduced to  evaluate such neural  networks on iterations  of functions
+with  Strongly  Connected  Component  (SCC)~graph of  iterations.   To
+compare with  non-chaotic iterations, the experiments  detailed in the
+following  sections  are carried  out  using  both  kinds of  function
+(chaotic and non-chaotic). Let  us emphasize on the difference between
+this  kind  of  neural  networks  and  the  Chaotic  Iterations  based
+multilayer peceptron.
  
  We are  then left to compute  two disjoint function  sets that contain
  either functions  with topological chaos properties  or not, depending
@@ -743,14 +760,11 @@ $$[0,  0,  2,   3,  13,  13,  6,   3,  8,  9,  10,  11,   8,  13,  14,
  $1110$: it  is obtained as the  binary value of the  fourth element in
  the  second  list  (namely~14).   It   is  not  hard  to  verify  that
  $\Gamma(f)$ is  not SCC  (\textit{e.g.}, $f(1111)$ is  $1111$) whereas
-$\Gamma(g)$  is.  Next section  shows how  to translate  iterations of
-such functions into a model amenable to be learned by an ANN.
-  
-This  section  presents how  (not)  chaotic  iterations  of $G_f$  are
-translated  into  another  model  more  suited  to  artificial  neural
-networks.  Formally, input and output vectors are pairs~$((S^t)^{t \in
-\Nats},x)$ and $\left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right)$
-as defined in~Eq.~(\ref{eq:Gf}).
+$\Gamma(g)$ is. The  remaining of this section shows  how to translate
+iterations of such functions into a model amenable to be learned by an
+ANN.   Formally, input  and  output vectors  are pairs~$((S^t)^{t  \in
+  \Nats},x)$          and          $\left(\sigma((S^t)^{t          \in
+  \Nats}),F_{f}(S^0,x)\right)$ as defined in~Eq.~(\ref{eq:Gf}).
  
  Firstly, let us focus on how to memorize configurations.  Two distinct
  translations are  proposed.  In the first  case, we take  one input in
@@ -760,15 +774,15 @@ configuration  as  natural  number  could  consist  in  labeling  each
  configuration  with  its  translation  into  decimal  numeral  system.
  However,  such a  representation induces  too many  changes  between a
  configuration  labeled by  a  power  of two  and  its direct  previous
-configuration: for instance, 16~(10000) and 15~(01111) are closed in a
+configuration: for instance, 16~(10000)  and 15~(01111) are close in a
  decimal ordering, but  their Hamming distance is 5.   This is why Gray
  codes~\cite{Gray47} have been preferred.
  
-Let us secondly detail how to deal  with strategies.   Obviously,  it is  not
-possible to  translate in a finite  way an infinite  strategy, even if
-both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong to
-$\{1,\ldots,n\}^{\Nats}$.  Input strategies are then reduced to have a
-length  of size  $l  \in  \llbracket 2,k\rrbracket$,  where  $k$ is  a
+Secondly, let us detail how to deal with strategies.  Obviously, it is
+not possible to  translate in a finite way  an infinite strategy, even
+if both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong
+to  $\{1,\ldots,n\}^{\Nats}$.  Input  strategies are  then  reduced to
+have a length of size $l \in \llbracket 2,k\rrbracket$, where $k$ is a
  parameter of the evaluation. Notice  that $l$ is greater than or equal
  to $2$ since  we do not want the shift  $\sigma$~function to return an
  empty strategy.  Strategies are memorized as natural numbers expressed
@@ -811,15 +825,16 @@ $3$~terms we obtain 2304~input-output pairs.
  \label{section:experiments}
  
  To study  if chaotic iterations can  be predicted, we  choose to train
-the MultiLayer Perceptron.  As stated  before, this kind of network is
-in   particular    well-known   for   its    universal   approximation
-property. Furthermore,  MLPs have been already  considered for chaotic
-time series prediction.  For example, in~\cite{dalkiran10} the authors
-have shown that a feedforward  MLP with two hidden layers, and trained
-with Bayesian  Regulation back-propagation, can  learn successfully the
-dynamics of Chua's circuit.
-
-In these experiments  we consider  MLPs having one hidden layer of
+the multilayer perceptron.  As stated  before, this kind of network is
+in  particular  well-known for  its  universal approximation  property
+\cite{Cybenko89,DBLP:journals/nn/HornikSW89}.  Furthermore,  MLPs have
+been  already  considered for  chaotic  time  series prediction.   For
+example,   in~\cite{dalkiran10}  the   authors  have   shown   that  a
+feedforward  MLP with  two hidden  layers, and  trained  with Bayesian
+Regulation  back-propagation, can learn  successfully the  dynamics of
+Chua's circuit.
+
+In  these experiments  we consider  MLPs  having one  hidden layer  of
  sigmoidal  neurons  and  output   neurons  with  a  linear  activation
  function.     They    are    trained    using    the    Limited-memory
  Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
@@ -844,16 +859,18 @@ trainings of two data sets, one of them describing chaotic iterations,
  are compared.
  
  Thereafter we give,  for the different learning setups  and data sets,
-the  mean prediction  success rate  obtained for  each  output.  These
-values  are  computed  considering  10~trainings with  random  subsets
-construction,  weights  and  biases initialization.   Firstly,  neural
-networks having 10  and 25~hidden neurons are trained,  with a maximum
-number  of  epochs that  takes  its  value  in $\{125,250,500\}$  (see
-Tables~\ref{tab1}  and \ref{tab2}).   Secondly, we  refine  the second
-coding scheme by splitting the  output vector such that each output is
-learned by a specific  neural network (Table~\ref{tab3}). In this last
-case, we increase  the size of the hidden layer  up to 40~neurons, and
-we consider larger number of epochs.
+the mean prediction success rate obtained for each output. Such a rate
+represents the percentage of  input-output pairs belonging to the test
+subset  for  which  the   corresponding  output  value  was  correctly
+predicted.   These values are  computed considering  10~trainings with
+random  subsets  construction,   weights  and  biases  initialization.
+Firstly, neural networks having  10 and 25~hidden neurons are trained,
+with   a  maximum   number  of   epochs  that   takes  its   value  in
+$\{125,250,500\}$  (see Tables~\ref{tab1} and  \ref{tab2}).  Secondly,
+we refine the second coding scheme by splitting the output vector such
+that   each  output   is  learned   by  a   specific   neural  network
+(Table~\ref{tab3}). In  this last  case, we increase  the size  of the
+hidden layer up to 40~neurons and we consider larger number of epochs.
  
  \begin{table}[htbp!]
  \caption{Prediction success rates for configurations expressed as boolean vectors.}
@@ -861,7 +878,7 @@ we consider larger number of epochs.
  \centering {\small
  \begin{tabular}{|c|c||c|c|c|}
  \hline 
-\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
+\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs, and one hidden layer} \\
  \hline
  \hline
  \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
@@ -913,30 +930,39 @@ is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
  25~neurons  and  500~epochs). We  also  notice  that  the learning  of
  outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
  non-chaotic  case the  simplest training  setup is  enough  to predict
-configurations.   For  all  network  topologies and  all  outputs  the
-obtained  results  for the  non-chaotic  case  outperform the  chaotic
-ones. Finally,  the rates for  the strategies show that  the different
-networks are unable to learn them.
-
-For the second coding  scheme (\textit{i.e.}, with Gray Codes)
-Table~\ref{tab2} shows that any network
-learns about  five times more non-chaotic  configurations than chaotic
-ones. As in the previous scheme, the strategies cannot be predicted.
+configurations.  For all these  feedforward network topologies and all
+outputs the  obtained results for the non-chaotic  case outperform the
+chaotic  ones. Finally,  the rates  for the  strategies show  that the
+different feedforward networks are unable to learn them.
+
+For  the  second  coding   scheme  (\textit{i.e.},  with  Gray  Codes)
+Table~\ref{tab2} shows  that any network learns about  five times more
+non-chaotic  configurations than  chaotic  ones.  As  in the  previous
+scheme,       the      strategies      cannot       be      predicted.
+Figures~\ref{Fig:chaotic_predictions}                              and
+\ref{Fig:non-chaotic_predictions} present the predictions given by two
+feedforward multilayer  perceptrons that were  respectively trained to
+learn chaotic  and non-chaotic data,  using the second  coding scheme.
+Each figure  shows for  each sample of  the test  subset (577~samples,
+representing 25\%  of the 2304~samples) the  configuration that should
+have been predicted and the one given by the multilayer perceptron. It
+can be  seen that for  the chaotic data  the predictions are  far away
+from the  expected configurations.  Obviously,  the better predictions
+for the non-chaotic data reflect their regularity.
  
  Let us now compare the  two coding schemes. Firstly, the second scheme
  disturbs  the   learning  process.   In   fact  in  this   scheme  the
  configuration is always expressed as  a natural number, whereas in the
-first one  the number  of inputs follows  the increase of  the boolean
+first one  the number  of inputs follows  the increase of  the Boolean
  vectors coding configurations. In this latter case, the coding gives a
  finer information on configuration evolution.
-
  \begin{table}[b]
  \caption{Prediction success rates for configurations expressed with Gray code}
  \label{tab2}
  \centering
  \begin{tabular}{|c|c||c|c|c|}
  \hline 
-\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
+\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs, and one hidden layer} \\
  \hline
  \hline
  & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
@@ -963,18 +989,31 @@ finer information on configuration evolution.
  \end{tabular}
  \end{table}
  
+\begin{figure}
+  \centering
+  \includegraphics[scale=0.5]{chaotic_trace2}
+  \caption {Second coding scheme - Predictions obtained for a chaotic test subset.}
+  \label{Fig:chaotic_predictions}
+\end{figure}
+
+\begin{figure}
+  \centering
+  \includegraphics[scale=0.5]{non-chaotic_trace2} 
+  \caption{Second coding scheme - Predictions obtained for a non-chaotic test subset.}
+  \label{Fig:non-chaotic_predictions}
+\end{figure}
+
  Unfortunately, in  practical applications the number  of components is
  usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
  systematically.   Therefore, we  provide  a refinement  of the  second
  scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
  presents the  results for  this approach.  In  any case,  whatever the
-network  topologies,  the  maximum   epoch  number  and  the  kind  of
-iterations,  the  configuration  success  rate is  slightly  improved.
-Moreover, the strategies predictions  rates reach almost 12\%, whereas
-in  Table~\ref{tab2}  they  never   exceed  1.5\%.   Despite  of  this
-improvement, a long term prediction of chaotic iterations still 
-appear to be
-an open issue.
+considered feedforward  network topologies, the  maximum epoch number,
+and the kind of iterations, the configuration success rate is slightly
+improved.   Moreover, the  strategies predictions  rates  reach almost
+12\%, whereas in Table~\ref{tab2} they never exceed 1.5\%.  Despite of
+this improvement,  a long term prediction of  chaotic iterations still
+appear to be an open issue.
  
  \begin{table}
  \caption{Prediction success rates for split outputs.}
@@ -982,7 +1021,7 @@ an open issue.
  \centering
  \begin{tabular}{|c||c|c|c|}
  \hline 
-\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
+\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output, and one hidden layer} \\
  \hline
  \hline
  Epochs & 125 & 250 & 500 \\ 
@@ -1048,13 +1087,13 @@ Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
  
  In  this paper,  we have  established an  equivalence  between chaotic
  iterations,  according to  the Devaney's  definition of  chaos,  and a
-class  of  multilayer perceptron  neural  networks.  Firstly, we  have
+class  of multilayer  perceptron  neural networks.   Firstly, we  have
  described how to build a neural network that can be trained to learn a
-given chaotic map function.  Then,  we found a condition that allow to
-check whether the iterations induced by a function are chaotic or not,
-and thus  if a chaotic map  is obtained. Thanks to  this condition our
-approach is not limited to a particular function. In the dual case, we
-show  that  checking  if  a  neural network  is  chaotic  consists  in
+given chaotic map function. Secondly,  we found a condition that allow
+to check whether  the iterations induced by a  function are chaotic or
+not, and thus  if a chaotic map is obtained.  Thanks to this condition
+our  approach is not  limited to  a particular  function. In  the dual
+case, we show that checking if a neural network is chaotic consists in
  verifying  a property  on an  associated  graph, called  the graph  of
  iterations.   These results  are valid  for recurrent  neural networks
  with a  particular architecture.  However,  we believe that  a similar
@@ -1068,10 +1107,7 @@ implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
  steganographic   detectors  embed  tools   like  neural   networks  to
  distinguish between  original and stego contents, our  studies tend to
  prove that such  detectors might be unable to  tackle with chaos-based
-information hiding schemes.  Furthermore, iterations such that not all
-of  the  components  are updated  at  each  step  are very  common  in
-biological  and  physics mechanisms.   Therefore,  one can  reasonably
-wonder whether neural networks should be applied in these contexts.
+information  hiding  schemes.
  
  In  future  work we  intend  to  enlarge  the comparison  between  the
  learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
@@ -1080,8 +1116,19 @@ be investigated  too, to  discover which tools  are the  most relevant
  when facing a truly chaotic phenomenon.  A comparison between learning
  rate  success  and  prediction  quality will  be  realized.   Concrete
  consequences in biology, physics, and computer science security fields
-will be  stated.  Lastly,  thresholds separating systems  depending on
-the ability to learn their dynamics will be established.
+will then be stated.
+
+% \appendix{}
+
+% \begin{definition} \label{def2}
+% A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
+% is defined  to be  {\emph{topologically transitive}}  if for any  pair of
+% open  sets $U$,  $V  \in  \mathcal{X}$ there  exists  
+% $k \in
+% \mathds{N}^{\ast}$
+%  such  that
+% $f^k(U) \cap V \neq \emptyset$.
+% \end{definition}
  
  \bibliography{chaos-paper}% Produces the bibliography via BibTeX.