main.tex

   1 % ****** Start of file chaos-paper.tex ******
   2 %
   3 \documentclass[%
   4 aip,
   5 cha,% long,
   6 numerical bibliography, %(default)
   7 %jcp,% short, numerical bibliography,
   8 %jmp,
   9 amsmath,amssymb,
  10 preprint,%
  11 %reprint,%
  12 %author-year,%
  13 %author-numerical,%
  14 ]{revtex4-1}
  15
  16 \usepackage{graphicx}% Include figure files
  17 \usepackage{dcolumn}% Align table columns on decimal point
  18 \usepackage{bm}% bold math
  19 \usepackage{color}
  20 \usepackage{ulem}
  21 \usepackage[mathlines]{lineno}% Enable numbering of text and display math
  22 \linenumbers\relax % Commence numbering lines
  23 % The amssymb package provides various useful mathematical symbols
  24 \usepackage{amssymb}
  25 % The amsthm package provides various useful mathematical fonts
  26 \usepackage{amsfonts}
  27 % Various mathematical packages
  28 \PassOptionsToPackage{cmex10draft}{graphics}
  29 \usepackage[standard]{ntheorem}
  30 % Various packages
  31 \usepackage{stmaryrd} % llbracket
  32 \usepackage{dsfont}   % mathds
  33 \usepackage{multirow} % tables
  34
  35 \newcommand{\ie}{\textit{i.e.}}
  36 \newcommand{\Nats}[0]{\ensuremath{\mathds{N}}}
  37 \newcommand{\R}[0]{\ensuremath{\mathds{R}}}
  38 \newcommand{\Bool}[0]{\ensuremath{\mathds{B}}}
  39
  40 \begin{document}
  41
  42 \title[Recurrent Neural Networks and Chaos]{Recurrent Neural Networks
  43 and Chaos: Construction, Evaluation, \\
  44 and Prediction Ability}
  45
  46 \author{Jacques M. Bahi}
  47 \author{Jean-Fran\c{c}ois Couchot}
  48 \author{Christophe Guyeux}
  49 \email{christophe.guyeux@univ-fcomte.fr.}
  50 \author{Michel Salomon}
  51 \altaffiliation[Authors in ]{alphabetic order}
  52 \affiliation{
  53 Computer Science Laboratory (LIFC), University of Franche-Comt\'e, \\
  54 IUT de Belfort-Montb\'eliard, BP 527, \\
  55 90016 Belfort Cedex, France
  56 }
  57
  58 \date{\today}
  59
  60 \newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
  61 \newcommand{\JFC}[1]{\begin{color}{blue}\textit{#1}\end{color}}
  62
  63
  64
  65
  66
  67 \begin{abstract}
  68 %% Text of abstract
  69 Many  research works  deal with  chaotic neural  networks  for various
  70 fields  of application. Unfortunately,  up to  now these  networks are
  71 usually  claimed to be  chaotic without  any mathematical  proof.  The
  72 purpose of this paper is to establish, based on a rigorous theoretical
  73 framework, an equivalence between  chaotic iterations according to
  74 Devaney  and  a  particular  class of  neural
  75 networks. On the one hand we show  how to build such a network, on the
  76 other  hand we provide  a method  to check  if a  neural network  is a
  77 chaotic one.  Finally, the ability of classical feedforward multilayer
  78 perceptrons to  learn sets of  data obtained from a  dynamical
  79 system is regarded.  Various  Boolean functions are iterated on finite
  80 states. Iterations of some of them  are proven to  be chaotic
  81  as it is  defined by
  82 Devaney.  In that context, important differences occur in the training
  83 process,  establishing  with  various  neural  networks  that  chaotic
  84 behaviors are far more difficult to learn.
  85 \end{abstract}
  86
  87 %%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
  88                              % Classification Scheme.
  89 %%\keywords{Suggested keywords}%Use showkeys class option if keyword
  90                               %display desired
  91 \maketitle
  92
  93 \begin{quotation}
  94
  95 Chaotic neural  networks have received a  lot of attention  due to the
  96 appealing   properties  of   deterministic   chaos  (unpredictability,
  97 sensitivity,  and so  on). However,  such networks  are  often claimed
  98 chaotic without  any rigorous mathematical proof.   Therefore, in this
  99 work  a theoretical  framework based  on the  Devaney's  definition of
 100 chaos  is introduced.   Starting with  a relationship  between discrete
 101 iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
 102 recurrent  neural networks  that is  equivalent to  a chaotic  map and
 103 secondly  a way  to check  whether  an already  available network,  is
 104 chaotic  or not.  We also  study different  topological  properties of
 105 these  truly  chaotic neural  networks.   Finally,  we  show that  the
 106 learning,  with neural  networks  having a  feedforward structure,  of
 107 chaotic behaviors represented by data sets obtained from chaotic maps,
 108 is far more difficult than non chaotic behaviors.
 109 \end{quotation}
 110
 111 %% Main text
 112 \section{Introduction}
 113 \label{S1}
 114
 115 Several research  works have proposed  or run chaotic  neural networks
 116 these last  years.  The complex dynamics  of such a  networks leads to
 117 various       potential      application       areas:      associative
 118 memories~\cite{Crook2007267}  and  digital  security tools  like  hash
 119 functions~\cite{Xiao10},                                       digital
 120 watermarking~\cite{1309431,Zhang2005759},           or          cipher
 121 schemes~\cite{Lian20091296}.  In the  former case, the background idea
 122 is to  control chaotic dynamics in  order to store  patterns, with the
 123 key advantage  of offering a  large storage capacity.  For  the latter
 124 case,   the  use   of   chaotic  dynamics   is   motivated  by   their
 125 unpredictability and  random-like behaviors.  Thus,  investigating new
 126 concepts is crucial in this  field, because new threats are constantly
 127 emerging.   As an illustrative  example, the  former standard  in hash
 128 functions,  namely the  SHA-1  algorithm, has  been recently  weakened
 129 after flaws were discovered.
 130
 131 Chaotic neural networks have  been built with different approaches. In
 132 the context of associative  memory, chaotic neurons like the nonlinear
 133 dynamic  state neuron  \cite{Crook2007267}  frequently constitute  the
 134 nodes of the network. These neurons have an inherent chaotic behavior,
 135 which  is usually  assessed through  the computation  of  the Lyapunov
 136 exponent.  An alternative approach  is to consider a well-known neural
 137 network architecture: the  MultiLayer Perceptron (MLP). These networks
 138 are  suitable to model  nonlinear relationships  between data,  due to
 139 their universal approximator capacity.
 140 \JFC{Michel, peux-tu donner une ref la dessus}
 141 Thus, this kind of networks can
 142 be trained to model a physical  phenomenon known to be chaotic such as
 143 Chua's circuit  \cite{dalkiran10}.  Sometimes, a  neural network which
 144 is build by combining  transfer functions and initial conditions that are both
 145 chaotic,      is      itself      claimed      to      be      chaotic
 146 \cite{springerlink:10.1007/s00521-010-0432-2}.
 147
 148 What all of these chaotic neural  networks have in common is that they
 149 are claimed to be chaotic  despite a lack of any rigorous mathematical
 150 proof.   The first contribution of  this paper  is  to fill  this  gap,
 151 using  a theoretical  framework  based on  the  Devaney's  definition of  chaos
 152 \cite{Devaney}.   This  mathematical  theory  of chaos  provides  both
 153 qualitative and quantitative tools to evaluate the complex behavior of
 154 a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
 155 precisely, in  this paper,  which is an  extension of a  previous work
 156 \cite{bgs11:ip},   we  establish   the  equivalence  between chaotic
 157 iterations and  a class of globally  recurrent MLP.
 158 The investigation the  converse problem is the second contribution:
 159 we indeed study the  ability for
 160 classical MultiLayer Perceptrons  to learn a particular family of
 161 discrete  chaotic  dynamical  systems.   This family
 162 is defined by a  Boolean vector, an update function, and a
 163 sequence giving  which component  to update at  each iteration.   It has
 164 been  previously established  that such  dynamical systems is
 165 chaotically iterated (as it is defined by Devaney) when the chosen function has
 166 a  strongly connected  iterations graph.   In this  document,  we
 167 experiment several MLPs and try to learn some iterations of this kind.
 168 We  show that non-chaotic iterations can be learned, whereas it is
 169 far  more  difficult  for chaotic  ones.   That  is  to say,  we  have
 170 discovered at  least one  family of problems  with a  reasonable size,
 171 such  that artificial  neural networks  should not  be applied
 172 due to their inability to  learn chaotic behaviors in this context.
 173
 174 The remainder of this research  work is organized as follows. The next
 175 section is devoted  to the basics of Devaney's
 176 chaos.   Section~\ref{S2} formally  describes  how to  build a  neural
 177 network  that operates  chaotically.  Section~\ref{S3} is
 178 devoted to the dual case of  checking whether an existing neural network
 179 is chaotic  or not.
 180 Topological properties  of   chaotic  neural  networks
 181 are discussed in Sect.~\ref{S4}.   The
 182 Section~\ref{section:translation}   shows   how   to  translate   such
 183 iterations  into  an Artificial  Neural  Network  (ANN),  in order  to
 184 evaluate the  capability for this  latter to learn  chaotic behaviors.
 185 This  ability  is  studied in  Sect.~\ref{section:experiments},  where
 186 various ANNs try to learn two  sets of data: the first one is obtained
 187 by chaotic iterations while the  second one results from a non-chaotic
 188 system.  Prediction success rates are  given and discussed for the two
 189 sets.  The paper ends with a conclusion section where our contribution
 190 is summed up and intended future work is exposed.
 191
 192 \section{Chaotic Iterations according to  Devaney}
 193
 194 In this section, the well-established notion of Devaney's mathematical
 195 chaos is  firstly recalled.  Preservation of  the unpredictability of
 196 such dynamical  system when implemented  on a computer is  obtained by
 197 using  some discrete iterations  called ``asynchronous  iterations'', which
 198 are thus introduced.  The result establishing the link between such
 199 iterations and Devaney's chaos is  finally presented at the end of this
 200 section.
 201
 202 In what follows and for  any function $f$, $f^n$ means the composition
 203 $f \circ f \circ \hdots \circ f$ ($n$ times) and an \emph{iteration}
 204 of a \emph{dynamical system} is the step that consists in
 205 updating the global state $x^t$ at time $t$ with respect to a function $f$
 206 s.t. $x^{t+1} = f(x^t)$.
 207
 208 \subsection{Devaney's chaotic dynamical systems}
 209
 210 Various domains such as  physics, biology, or economy, contain systems
 211 that exhibit a chaotic behavior,  a well-known example is the weather.
 212 These  systems   are  in   particular  highly  sensitive   to  initial
 213 conditions, a concept usually presented as the butterfly effect: small
 214 variations in the initial conditions possibly lead to widely different
 215 behaviors.  Theoretically speaking, a  system is sensitive if for each
 216 point  $x$ in  the  iteration space,  one  can find  a  point in  each
 217 neighborhood of $x$ having a significantly different future evolution.
 218 Conversely, a  system seeded with  the same initial  conditions always
 219 has  the same  evolution.   In  other words,  chaotic  systems have  a
 220 deterministic  behavior  defined through  a  physical or  mathematical
 221 model  and a  high  sensitivity to  the  initial conditions.   Besides
 222 mathematically this  kind of unpredictability  is also referred  to as
 223 deterministic chaos.  For example, many weather forecast models exist,
 224 but they give only suitable predictions for about a week, because they
 225 are initialized with conditions  that reflect only a partial knowledge
 226 of the current weather.  Even  the  differences are  initially small,
 227 they are  amplified in the course  of time, and thus  make difficult a
 228 long-term prediction.  In fact,  in a chaotic system, an approximation
 229 of  the current  state is  a  quite useless  indicator for  predicting
 230 future states.
 231
 232 From  mathematical  point  of   view,  deterministic  chaos  has  been
 233 thoroughly studied  these last decades, with  different research works
 234 that  have   provide  various  definitions  of   chaos.   Among  these
 235 definitions,   the  one  given   by  Devaney~\cite{Devaney}   is
 236 well-established.    This   definition   consists  of   three   conditions:
 237 topological  transitivity, density of  periodic points,  and sensitive
 238 point dependence on initial conditions.
 239
 240 Topological  transitivity   is  checked  when,  for   any  point,  any
 241 neighborhood of its future evolution eventually overlap with any other
 242 given region.  This property  implies that a  dynamical system
 243 cannot be  broken into simpler  subsystems.
 244 Intuitively, its complexity does not allow any simplification.
 245 On  the contrary,  a dense set  of periodic points  is an
 246 element of regularity that a chaotic dynamical system has to exhibit.
 247
 248 However, chaos need some regularity to ``counteracts''
 249 the  effects of  transitivity.
 250 %\begin{definition} \label{def3}
 251 We recall that a  point $x$ is  {\emph{periodic point}} for $f$ of
 252 period~$n \in \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
 253 %\end{definition}
 254 Then, the map
 255 %\begin{definition} \label{def4}
 256 $f$ is {\emph{ regular}} on $(\mathcal{X},\tau)$ if the set of
 257   periodic points  for $f$ is  dense in $\mathcal{X}$ (for any $x \in
 258   \mathcal{X}$, we can find at least  one periodic point in any of its
 259   neighborhood).
 260 %\end{definition}
 261   Thus,
 262   due to these two properties, two points close to each other can behave
 263   in a completely different  manner, leading to unpredictability for the
 264   whole system.
 265
 266 Let we recall that $f$
 267 has {\emph{  sensitive dependence  on initial  conditions}}  if there
 268 exists $\delta  >0$ such  that, for any  $x\in \mathcal{X}$  and any
 269 neighborhood $V$ of $x$, there exist  $y\in V$ and $n > 0$ such that
 270 $d\left(f^{n}(x), f^{n}(y)\right) >\delta  $. The value $\delta$ is called the
 271   {\emph{constant of sensitivity}} of $f$.
 272
 273 Finally, The dynamical system that iterates $f$ is {\emph{ chaotic according to Devaney}} on $(\mathcal{X},\tau)$ if $f$  is   regular,  topologically  transitive,
 274 and  has  sensitive dependence to its initial conditions.
 275 In what follows, iterations are said to be \emph{chaotic according Devaney}
 276 when corresponding dynamical system  is chaotic according Devaney.
 277
 278
 279 %Let us notice that for a  metric space the last condition follows from
 280 %the two first ones~\cite{Banks92}.
 281
 282 \subsection{Asynchronous Iterations}
 283
 284 %This section  presents some basics on  topological chaotic iterations.
 285 Let us  firstly discuss about the  domain of iteration.  As  far as we
 286 know, no result rules that the chaotic behavior of a dynamical system
 287 that has been theoretically proven on  $\R$ remains valid on the
 288 floating-point
 289 numbers, which is  the implementation domain.  Thus, to  avoid loss of
 290 chaos this  work presents an  alternative, that is to  iterate Boolean
 291 maps:  results that  are  theoretically obtained  in  that domain  are
 292 preserved in implementations.
 293
 294 Let us denote by $\llbracket  a ; b \rrbracket$ the following interval
 295 of integers:  $\{a, a+1,  \hdots, b\}$, where $a~<~b$.
 296 In that section,  a  system
 297 under   consideration    iteratively   modifies   a    collection   of
 298 $n$~components.   Each component  $i \in  \llbracket 1;  n \rrbracket$
 299 takes  its  value  $x_i$  among the  domain  $\Bool=\{0,1\}$.
 300 A \emph{configuration} of the system at discrete time $t$  is the vector
 301 %\begin{equation*}
 302 $x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
 303 %\end{equation*}
 304 The dynamics of the system is  described according to a  function $f :
 305 \Bool^n \rightarrow \Bool^n$ such that
 306 %\begin{equation*}
 307 $f(x)=(f_1(x),\ldots,f_n(x))$.
 308 %\end{equation*}
 309 % Notice that $f^k$ denotes the
 310 % $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
 311
 312 Let    be   given    a    configuration   $x$.    In   what    follows
 313 $N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$  is  the configuration
 314 obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
 315 indeed  the negation  of $x_i$).   Intuitively, $x$  and  $N(i,x)$ are
 316 neighbors.   The discrete  iterations of  $f$ are  represented  by the
 317 oriented  \emph{graph  of iterations}  $\Gamma(f)$.  In  such  a graph,
 318 vertices are configurations  of $\Bool^n$ and there is  an arc labeled
 319 $i$ from $x$ to $N(i,x)$ if and only if  $f_i(x)$ is $N(i,x)$.
 320
 321 In the  sequel, the  \emph{strategy} $S=(S^{t})^{t  \in \Nats}$  is the
 322 sequence  defining which  component to  update at  time $t$  and $S^{t}$
 323 denotes its  $t-$th term.
 324 This iteration scheme that only modifies one element at each iteration
 325 is classically referred as \emph{asynchronous iterations}.
 326 More precisely, we have here for any $i$, $1\le i \le n$,
 327 $$
 328 \left\{ \begin{array}{l}
 329 x^{0}  \in \Bool^n \\
 330 x^{t+1}_i = \left\{
 331 \begin{array}{l}
 332   f_i(x^t) \textrm{ if $S^t = i$} \\
 333   x_i^t \textrm{ otherwise}
 334  \end{array}
 335 \right.
 336 \end{array} \right.
 337 $$
 338
 339 Next section shows the link between asynchronous iterations and
 340 Devaney's Chaos.
 341
 342 \subsection{On the link between asynchronous iterations and
 343   Devaney's Chaos}
 344
 345 In  this subsection  we recall  the link  we have  established between
 346 asynchronous iterations and Devaney's  chaos.  The theoretical framework is
 347 fully described in \cite{guyeux09}.
 348
 349 We  introduce the function $F_{f}$  that is
 350 defined  for  any given  application  $f:\Bool^{n}  \to \Bool^{n}$  by
 351 $F_{f}:   \llbracket1;n\rrbracket\times   \mathds{B}^{n}   \rightarrow
 352 \mathds{B}^{n}$, s.t.
 353 \begin{equation}
 354 \label{eq:CIs}
 355   F_{f}(s,x)_j  =
 356     \left\{
 357     \begin{array}{l}
 358     f_j(x) \textrm{ if } j= s \enspace , \\
 359     x_{j} \textrm{ otherwise} \enspace .
 360     \end{array}
 361     \right.
 362 \end{equation}
 363
 364 \noindent With such a notation, configurations
 365 asynchronously obtained are defined for times
 366 $t=0,1,2,\ldots$ by:
 367 \begin{equation}\label{eq:sync}
 368 \left\{\begin{array}{l}
 369   x^{0}\in \mathds{B}^{n} \textrm{ and}\\
 370   x^{t+1}=F_{f}(S^t,x^{t}) \enspace .
 371 \end{array}\right.
 372 \end{equation}
 373
 374 \noindent  Finally, iterations defined  in Eq.~(\ref{eq:sync})  can be
 375 described by the following system:
 376 \begin{equation}
 377 \left\{
 378 \begin{array}{lll}
 379 X^{0} & = &  ((S^t)^{t \in \Nats},x^0) \in
 380 \llbracket1;n\rrbracket^\Nats \times \Bool^{n}\\
 381 X^{k+1}& = & G_{f}(X^{k})\\
 382 \multicolumn{3}{c}{\textrm{where } G_{f}\left(((S^t)^{t \in \Nats},x)\right)
 383 = \left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right) \enspace ,}
 384 \end{array}
 385 \right.
 386 \label{eq:Gf}
 387 \end{equation}
 388 where  $\sigma$ is the  function that  removes the  first term  of the
 389 strategy  ({\it i.e.},~$S^0$).
 390 This definition  allows to links asynchronous iterations with
 391 classical iterations of a dynamical system.
 392
 393
 394 %means that  only one
 395 %component  of the  system is  updated  at an  iteration: the  $S^t$-th
 396 %element. But it can be extended by considering subsets for $S^t$.
 397
 398
 399 To study topological properties of these iterations, we are then left to
 400 introduce a  {\emph{  distance}}   $d$    between   two   points   $(S,x)$   and
 401 $(\check{S},\check{x})\in  \mathcal{X} = \llbracket1;n\rrbracket^\Nats.
 402 \times \Bool^{n}$. It is defined by
 403 \begin{equation}
 404 d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
 405 \enspace ,
 406 \end{equation}
 407 where
 408 \begin{equation}
 409 d_{e}(x,\check{x})=\sum_{j=1}^{n}\Delta
 410 (x_{j},\check{x}_{j}) \in \llbracket  0 ; n \rrbracket
 411 \end{equation}
 412 \noindent and
 413 \begin{equation}
 414 d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
 415 }\frac{|S^{t}-\check{S}^{t}|}{10^{t+1}} \in [0 ; 1] \enspace .
 416 \end{equation}
 417
 418 Notice that the more  two systems have different components,
 419 the  larger the  distance  between them is.  Secondly,  two systems  with
 420 similar components and strategies, which have the same starting terms,
 421 must  induce only  a small  distance.  The  proposed  distance fulfill
 422 these  requirements: on  the one  hand  its floor  value reflects  the
 423 difference between  the cells, on  the other hand its  fractional part
 424 measures the difference between the strategies.
 425
 426 The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
 427 path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
 428 strategy  $s$ such  that   iterations  of  $G_f$ from  the
 429 initial  point $(s,x)$  reaches  the configuration  $x'$.  Using  this
 430 link, Guyeux~\cite{GuyeuxThese10} has proven that,
 431 \begin{theorem}%[Characterization of $\mathcal{C}$]
 432 \label{Th:Caracterisation   des   IC   chaotiques}
 433 Let $f:\Bool^n\to\Bool^n$. Iterations of $G_f$ are chaotic  according
 434 to  Devaney if and only if  $\Gamma(f)$ is strongly connected.
 435 \end{theorem}
 436
 437 Checking  if a  graph  is  strongly connected  is  not difficult
 438 (by the Tarjan's algorithm for instance).
 439 Let be given  a strategy $S$ and a function $f$ such that
 440 $\Gamma(f)$ is strongly connected.
 441 In that case, iterations of the function $G_f$ as defined in
 442 Eq.~(\ref{eq:Gf}) are chaotic according to Devaney.
 443
 444
 445 Let us then define two function $f_0$ and $f_1$ both in
 446 $\Bool^n\to\Bool^n$ that are used all along this article.
 447 The former is the  vectorial negation, \textit{i.e.},
 448 $f_{0}(x_{1},\dots,x_{n}) =(\overline{x_{1}},\dots,\overline{x_{n}})$.
 449 The latter is  $f_1\left(x_1,\dots,x_n\right)=\left(
 450 \overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$.
 451 It is not hard to see that  $\Gamma(f_0)$ and $\Gamma(f_1)$ are
 452 both strongly connected, then iterations of $G_{f_0}$ and of
 453 $G_{f_1}$ are chaotic according to Devaney.
 454
 455 With this  material, we are now  able to build a  first chaotic neural
 456 network, as defined in the Devaney's formulation.
 457
 458 \section{A chaotic neural network in the sense of Devaney}
 459 \label{S2}
 460
 461 Firstly, let us build a
 462 multilayer perceptron  neural network modeling
 463 $F_{f_0}:\llbracket 1;  n \rrbracket \times  \mathds{B}^n \to
 464 \mathds{B}^n$  associated to the  vectorial negation.
 465 More precisely, for all inputs
 466 $(s,x) \in \llbracket  1;n\rrbracket \times \mathds{B}^n$,
 467 the output layer  produces  $F_{f_0}(s,x)$.  It is then possible to
 468 link  the output  layer  and the  input  one, in  order  to model  the
 469 dependence between two successive iterations.  As a result we obtain a
 470 global  recurrent   neural  network  that  behaves   as  follows  (see
 471 Fig.~\ref{Fig:perceptron}).
 472
 473 \begin{itemize}
 474 \item   The   network   is   initialized   with   the   input   vector
 475   $\left(S^0,x^0\right)    \in    \llbracket   1;n\rrbracket    \times
 476   \mathds{B}^n$      and      computes      the     output      vector
 477   $x^1=F_{f_0}\left(S^0,x^0\right)$. This last  vector is published as
 478   an output one of the chaotic  neural network and is sent back to the
 479   input layer through the feedback links.
 480 \item When  the network  is activated at  the $t^{th}$  iteration, the
 481   state of the system $x^t  \in \mathds{B}^n$ received from the output
 482   layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
 483   ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
 484   output.   This new  output, which  represents the  new state  of the
 485   dynamical system, satisfies:
 486   \begin{equation}
 487     x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
 488   \end{equation}
 489 \end{itemize}
 490
 491 \begin{figure}
 492   \centering
 493   \includegraphics[scale=0.625]{perceptron}
 494   \caption{A perceptron equivalent to chaotic iterations}
 495   \label{Fig:perceptron}
 496 \end{figure}
 497
 498 The behavior of the neural network is such that when the initial state
 499 is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
 500 given  as outside  input,
 501 \JFC{en dire davantage sur l'outside world}
 502  then  the sequence  of  successive published
 503 output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
 504 the  one produced  by  the chaotic  iterations  formally described  in
 505 Eq.~(\ref{eq:Gf}).  It  means that  mathematically if we  use similar
 506 input  vectors   they  both  generate  the   same  successive  outputs
 507 $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
 508 are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
 509 $\mathcal{X}$. Finally, since the  proposed neural network is built to
 510 model  the  behavior  of  $G_{f_0}$,  whose iterations are
 511   chaotic  according  to
 512 Devaney's definition  of chaos,  we can conclude  that the  network is
 513 also chaotic in this sense.
 514
 515 The previous construction scheme  is not restricted to function $f_0$.
 516 It can  be extended to any function  $f$ such that $G_f$  is a chaotic
 517 map by training the network to model $F_{f}:\llbracket 1; n \rrbracket
 518 \times      \mathds{B}^n      \to      \mathds{B}^n$.      Due      to
 519 Theorem~\ref{Th:Caracterisation  des  IC  chaotiques},  we  can  find
 520 alternative functions  $f$ for $f_0$  through a simple check  of their
 521 graph of  iterations $\Gamma(f)$.  For  example, we can  build another
 522 chaotic neural network by using $f_1$ instead of $f_0$.
 523
 524 \section{Checking whether a neural network is chaotic or not}
 525 \label{S3}
 526
 527 We focus now on the case  where a neural network is already available,
 528 and for which we  want to know if it is chaotic. Typically, in
 529 many research papers neural network  are usually claimed to be chaotic
 530 without any  convincing mathematical proof. We propose  an approach to
 531 overcome  this  drawback  for  a  particular  category  of  multilayer
 532 perceptrons defined below, and for the Devaney's formulation of chaos.
 533 In  spite of  this restriction,  we think  that this  approach  can be
 534 extended to  a large variety  of neural networks.
 535
 536
 537 We consider a multilayer perceptron  of the following form: inputs
 538 are $n$ binary digits and  one integer value, while outputs are  $n$
 539 bits.   Moreover, each  binary  output is  connected  with a  feedback
 540 connection to an input one.
 541
 542 \begin{itemize}
 543 \item During  initialization, the network is seeded with $n$~bits denoted
 544   $\left(x^0_1,\dots,x^0_n\right)$  and an  integer  value $S^0$  that
 545   belongs to $\llbracket1;n\rrbracket$.
 546 \item     At     iteration~$t$,     the     last     output     vector
 547   $\left(x^t_1,\dots,x^t_n\right)$   defines  the  $n$~bits   used  to
 548   compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
 549   While  the remaining  input receives  a new  integer value  $S^t \in
 550   \llbracket1;n\rrbracket$, which is provided by the outside world.
 551 \JFC{en dire davantage sur l'outside world}
 552 \end{itemize}
 553
 554 The topological  behavior of these  particular neural networks  can be
 555 proven to be chaotic through the following process. Firstly, we denote
 556 by  $F:  \llbracket  1;n  \rrbracket \times  \mathds{B}^n  \rightarrow
 557 \mathds{B}^n$     the     function     that     maps     the     value
 558 $\left(s,\left(x_1,\dots,x_n\right)\right)    \in    \llbracket    1;n
 559 \rrbracket      \times      \mathds{B}^n$
 560 \JFC{ici, cela devait etre $S^t$ et pas $s$, nn ?}
 561    into     the      value
 562 $\left(y_1,\dots,y_n\right)       \in       \mathds{B}^n$,       where
 563 $\left(y_1,\dots,y_n\right)$  is the  response of  the  neural network
 564 after    the    initialization     of    its    input    layer    with
 565 $\left(s,\left(x_1,\dots, x_n\right)\right)$.
 566 \JFC{ici, cela devait etre $S^t$ et pas $s$, nn ?}
 567 Secondly, we define $f:
 568 \mathds{B}^n       \rightarrow      \mathds{B}^n$       such      that
 569 $f\left(x_1,x_2,\dots,x_n\right)$ is equal to
 570 \begin{equation}
 571 \left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
 572   F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
 573 \end{equation}
 574 Thus, for any $j$, $1 \le j \le n$, we have
 575 $f_j\left(x_1,x_2,\dots,x_n\right) =
 576 F\left(j,\left(x_1,x_2,\dots,x_n\right)\right)$.
 577 If this recurrent  neural network is seeded with
 578 $\left(x_1^0,\dots,x_n^0\right)$    and   $S   \in    \llbracket   1;n
 579 \rrbracket^{\mathds{N}}$, it produces  exactly the
 580 same      output      vectors  than the
 581 chaotic iterations of $F_f$  with initial
 582 condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
 583 \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.
 584 Theoretically speaking, such iterations of $F_f$ are thus a formal model of
 585 these kind of recurrent neural networks. In the  rest  of this
 586 paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
 587 stands for ``Chaotic Iterations based MultiLayer Perceptron''.
 588
 589 Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
 590 definition  of  chaos  is  simple:  we  need just  to  verify  if  the
 591 associated graph  of iterations  $\Gamma(f)$ is strongly  connected or
 592 not. As  an incidental consequence,  we finally obtain  an equivalence
 593 between  chaotic  iterations   and  CI-MLP($f$).   Therefore,  we  can
 594 obviously  study such multilayer  perceptrons with  mathematical tools
 595 like  topology  to  establish,  for  example,  their  convergence  or,
 596 contrarily, their unpredictable behavior.   An example of such a study
 597 is given in the next section.
 598
 599 \JFC{Ce qui suit est davantage qu'un exemple.Il faudrait
 600 motiver davantage, non?}
 601
 602
 603 \section{Topological properties of chaotic neural networks}
 604 \label{S4}
 605
 606 Let us first recall  two fundamental definitions from the mathematical
 607 theory of chaos.
 608
 609 \begin{definition} \label{def8}
 610 A  function   $f$  is   said  to  be   {\emph{  expansive}}   if  $\exists
 611 \varepsilon>0$, $\forall  x \neq y$,  $\exists n \in  \mathds{N}$ such
 612 that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
 613 \end{definition}
 614
 615 \begin{definition} \label{def9}
 616 A discrete dynamical  system is said to be  {\emph{ topologically mixing}}
 617 if  and only  if,  for any  pair  of disjoint  open  sets $U$,$V  \neq
 618 \emptyset$, we can find some $n_0  \in \mathds{N}$ such that  for any $n$,
 619 $n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
 620 \end{definition}
 621 \JFC{Donner un sens à ces definitions}
 622
 623
 624 It has been proven in Ref.~\cite{gfb10:ip},  that chaotic iterations
 625 are expansive and  topologically mixing when  $f$ is  the
 626 vectorial  negation $f_0$.
 627 Consequently,  these  properties are  inherited  by the  CI-MLP($f_0$)
 628 recurrent neural network previously presented, which induce a greater
 629 unpredictability.  Any  difference on the  initial value of  the input
 630 layer is  in particular  magnified up to  be equal to  the expansivity
 631 constant.
 632
 633 Let us then focus on the consequences for  a neural  network to  be chaotic
 634 according  to  Devaney's definition.  Intuitively, the  topological
 635 transitivity property implies indecomposability, which is formally defined
 636 as follows:
 637
 638
 639 \begin{definition} \label{def10}
 640 A   dynamical   system   $\left(   \mathcal{X},  f\right)$   is
 641 {\emph{not decomposable}}  if it  is not  the  union of  two closed  sets $A,  B
 642 \subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
 643 \end{definition}
 644
 645 \noindent Hence, reducing the set of outputs generated by CI-MLP($f$),
 646 in order to  simplify its complexity, is impossible  if $\Gamma(f)$ is
 647 strongly connected.  Moreover, under this  hypothesis CI-MLPs($f$) are
 648 strongly transitive:
 649
 650 \begin{definition} \label{def11}
 651 A  dynamical system  $\left( \mathcal{X},  f\right)$ is  {\emph{ strongly
 652 transitive}} if $\forall x,y  \in \mathcal{X}$, $\forall r>0$, $\exists
 653 z  \in  \mathcal{X}$,  $d(z,x)~\leq~r  \Rightarrow  \exists  n  \in
 654 \mathds{N}^{\ast}$, $f^n(z)=y$.
 655 \end{definition}
 656 According to this definition, for all  pairs of points $(x, y)$ in the
 657 phase space, a point $z$ can  be found in the neighborhood of $x$ such
 658 that one of its iterates $f^n(z)$ is $y$. Indeed, this result has been
 659 established  during  the  proof   of  the  transitivity  presented  in
 660 Ref.~\cite{guyeux09}.  Among  other  things, the  strong  transitivity
 661 leads  to the fact  that without  the knowledge  of the  initial input
 662 layer, all outputs are possible.  Additionally, no point of the output
 663 space  can   be  discarded  when  studying  CI-MLPs:   this  space  is
 664 intrinsically complicated and it cannot be decomposed or simplified.
 665
 666 Furthermore, those  recurrent neural networks  exhibit the instability
 667 property:
 668 \begin{definition}
 669 A dynamical  system $\left( \mathcal{X}, f\right)$ is  \emph{unstable}
 670 if for
 671 all  $x  \in  \mathcal{X}$,   the  orbit  $\gamma_x:n  \in  \mathds{N}
 672 \longmapsto f^n(x)$  is unstable,  that means: $\exists  \varepsilon >
 673 0$, $\forall  \delta>0$, $\exists y  \in \mathcal{X}$, $\exists  n \in
 674 \mathds{N}$,        such        that        $d(x,y)<\delta$        and
 675 $d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
 676 \end{definition}
 677 This property, which  is implied by the sensitive  point dependence on
 678 initial conditions, leads to the fact that in all neighborhoods of any
 679 point $x$, there are points that  can be apart by $\varepsilon$ in the
 680 future through iterations of the  CI-MLP($f$). Thus, we can claim that
 681 the behavior of  these MLPs is unstable when  $\Gamma (f)$ is strongly
 682 connected.
 683
 684 Let  us  now consider  a  compact  metric space  $(M,  d)$  and $f:  M
 685 \rightarrow M$  a continuous map. For  each natural number  $n$, a new
 686 metric $d_n$ is defined on $M$ by
 687 $$d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .$$
 688
 689 Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
 690 $\varepsilon$-close  with respect to  this metric  if their  first $n$
 691 iterates are $\varepsilon$-close.
 692
 693 This metric  allows one to distinguish  in a neighborhood  of an orbit
 694 the points  that move away from  each other during  the iteration from
 695 the points  that travel together.  A subset  $E$ of $M$ is  said to be
 696 $(n, \varepsilon)$-separated if each pair of distinct points of $E$ is
 697 at  least $\varepsilon$  apart in  the metric  $d_n$. Denote  by $H(n,
 698 \varepsilon)$     the    maximum     cardinality     of    an     $(n,
 699 \varepsilon)$-separated set,
 700 \begin{definition}
 701 The \emph{topological entropy} of the  map $f$ is defined by (see e.g.,
 702 Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
 703 $$h(f)=\lim_{\varepsilon\to      0}      \left(\limsup_{n\to      \infty}
 704 \frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
 705 \end{definition}
 706
 707 Then we have the following result \cite{GuyeuxThese10},
 708 \begin{theorem}
 709 $\left( \mathcal{X},d\right)$  is compact and  the topological entropy
 710 of $(\mathcal{X},G_{f_0})$ is infinite.
 711 \end{theorem}
 712
 713 \begin{figure}
 714   \centering
 715   \includegraphics[scale=0.625]{scheme}
 716   \caption{Summary of addressed neural networks and chaos problems}
 717   \label{Fig:scheme}
 718 \end{figure}
 719
 720 The Figure~\ref{Fig:scheme} is a summary of addressed neural networks and chaos problems.
 721 Section~\ref{S2} has explained how to  construct a truly chaotic neural
 722 networks $A$ for instance.
 723 Section~\ref{S3} has shown how to check whether a  given MLP
 724 $A$ or $C$ is chaotic or not in the sens of Devaney.
 725 %, and  how to study its topological behavior.
 726 The last thing to  investigate, when comparing
 727 neural  networks   and  Devaney's  chaos,  is to  determine  whether
 728 an artificial neural network $A$  is able to learn or  predict some chaotic
 729 behaviors of $B$, as  it is defined  in the Devaney's formulation  (when they
 730 are not specifically constructed for this purpose).  This statement is
 731 studied in the next section.
 732
 733
 734
 735
 736
 737
 738
 739 \section{Suitability of Artificial Neural Networks
 740 for Predicting Chaotic Behaviors}
 741
 742 In  the context  of computer  science  different topic  areas have  an
 743 interest       in       chaos,       as       for       steganographic
 744 techniques~\cite{1309431,Zhang2005759}.    Steganography  consists  in
 745 embedding a  secret message within  an ordinary one, while  the secret
 746 extraction takes place once  at destination.  The reverse ({\it i.e.},
 747 automatically detecting the presence  of hidden messages inside media)
 748 is  called   steganalysis.   Among  the   deployed  strategies  inside
 749 detectors,          there         are          support         vectors
 750 machines~\cite{Qiao:2009:SM:1704555.1704664},                    neural
 751 networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
 752 Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
 753 detectors  give quite  good results  and are  rather  competitive when
 754 facing steganographic  tools.  However, to  the best of  our knowledge
 755 none of the considered information hiding schemes fulfills the Devaney
 756 definition  of chaos~\cite{Devaney}.  Indeed,  one can  wonder whether
 757 detectors  continue to  give good  results when  facing  truly chaotic
 758 schemes.  More  generally, there remains the open  problem of deciding
 759 whether artificial intelligence is suitable for predicting topological
 760 chaotic behaviors.
 761
 762 \subsection{Representing Chaotic Iterations for Neural Networks}
 763 \label{section:translation}
 764
 765 The  problem  of  deciding  whether  classical  feedforward  ANNs  are
 766 suitable  to approximate  topological chaotic  iterations may  then be
 767 reduced  to evaluate  ANNs on  iterations of  functions  with Strongly
 768 Connected  Component  (SCC)~graph  of  iterations.   To  compare  with
 769 non-chaotic  iterations,  the experiments  detailed  in the  following
 770 sections are  carried out  using both kinds  of function  (chaotic and
 771 non-chaotic). Let us emphasize on  the difference between this kind of
 772 neural   networks  and  the   Chaotic  Iterations   based  MultiLayer
 773 Perceptron.
 774
 775 We are  then left to compute  two disjoint function  sets that contain
 776 either functions  with topological chaos properties  or not, depending
 777 on  the strong  connectivity of  their iterations graph.  This  can be
 778 achieved for  instance by removing a  set of edges  from the iteration
 779 graph $\Gamma(f_0)$ of the vectorial negation function~$f_0$.  One can
 780 deduce whether  a function verifies the topological  chaos property or
 781 not  by checking  the strong  connectivity of  the resulting  graph of
 782 iterations.
 783
 784 For instance let us consider  the functions $f$ and $g$ from $\Bool^4$
 785 to $\Bool^4$ respectively defined by the following lists:
 786 $$[0,  0,  2,   3,  13,  13,  6,   3,  8,  9,  10,  11,   8,  13,  14,
 787   15]$$ $$\mbox{and } [11, 14, 13, 14, 11, 10, 1, 8, 7, 6, 5, 4, 3, 2,
 788   1, 0]  \enspace.$$ In  other words,  the image of  $0011$ by  $g$ is
 789 $1110$: it  is obtained as the  binary value of the  fourth element in
 790 the  second  list  (namely~14).   It   is  not  hard  to  verify  that
 791 $\Gamma(f)$ is  not SCC  (\textit{e.g.}, $f(1111)$ is  $1111$) whereas
 792 $\Gamma(g)$  is.  Next section  shows how  to translate  iterations of
 793 such functions into a model amenable to be learned by an ANN.
 794
 795 This  section  presents how  (not)  chaotic  iterations  of $G_f$  are
 796 translated  into  another  model  more  suited  to  artificial  neural
 797 networks.
 798 \JFC{détailler le more suited}
 799 Formally, input and output vectors are pairs~$((S^t)^{t \in
 800 \Nats},x)$ and $\left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right)$
 801 as defined in~Eq.~(\ref{eq:Gf}).
 802
 803 Firstly, let us focus on how to memorize configurations.  Two distinct
 804 translations are  proposed.  In the first  case, we take  one input in
 805 $\Bool$  per  component;  in   the  second  case,  configurations  are
 806 memorized  as   natural  numbers.    A  coarse  attempt   to  memorize
 807 configuration  as  natural  number  could  consist  in  labeling  each
 808 configuration  with  its  translation  into  decimal  numeral  system.
 809 However,  such a  representation induces  too many  changes  between a
 810 configuration  labeled by  a  power  of two  and  its direct  previous
 811 configuration: for instance, 16~(10000) and 15~(01111) are closed in a
 812 decimal ordering, but  their Hamming distance is 5.   This is why Gray
 813 codes~\cite{Gray47} have been preferred.
 814
 815 Let us secondly detail how to deal  with strategies.   Obviously,  it is  not
 816 possible to  translate in a finite  way an infinite  strategy, even if
 817 both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong to
 818 $\{1,\ldots,n\}^{\Nats}$.  Input strategies are then reduced to have a
 819 length  of size  $l  \in  \llbracket 2,k\rrbracket$,  where  $k$ is  a
 820 parameter of the evaluation. Notice  that $l$ is greater than or equal
 821 to $2$ since  we do not want the shift  $\sigma$~function to return an
 822 empty strategy.  Strategies are memorized as natural numbers expressed
 823 in base  $n+1$.  At  each iteration, either  none or one  component is
 824 modified  (among the  $n$ components)  leading to  a radix  with $n+1$
 825 entries.  Finally,  we give an  other input, namely $m  \in \llbracket
 826 1,l-1\rrbracket$, which  is the  number of successive  iterations that
 827 are applied starting  from $x$.  Outputs are translated  with the same
 828 rules.
 829
 830 To address  the complexity  issue of the  problem, let us  compute the
 831 size of the data set an ANN has to deal with.  Each input vector of an
 832 input-output pair  is composed of a configuration~$x$,  an excerpt $S$
 833 of the strategy to iterate  of size $l \in \llbracket 2, k\rrbracket$,
 834 and a  number $m \in  \llbracket 1, l-1\rrbracket$ of  iterations that
 835 are executed.
 836
 837 Firstly, there are $2^n$  configurations $x$, with $n^l$ strategies of
 838 size $l$ for  each of them. Secondly, for  a given configuration there
 839 are $\omega = 1 \times n^2 +  2 \times n^3 + \ldots+ (k-1) \times n^k$
 840 ways  of writing  the pair  $(m,S)$. Furthermore,  it is  not  hard to
 841 establish that
 842 \begin{equation}
 843 \displaystyle{(n-1) \times \omega = (k-1)\times n^{k+1} - \sum_{i=2}^k n^i} \nonumber
 844 \end{equation}
 845 then
 846 \begin{equation}
 847 \omega =
 848 \dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2} \enspace . \nonumber
 849 \end{equation}
 850 \noindent And then, finally, the number of  input-output pairs for our
 851 ANNs is
 852 $$
 853 2^n \times \left(\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2}\right) \enspace .
 854 $$
 855 For  instance, for $4$  binary components  and a  strategy of  at most
 856 $3$~terms we obtain 2304~input-output pairs.
 857
 858 \subsection{Experiments}
 859 \label{section:experiments}
 860
 861 To study  if chaotic iterations can  be predicted, we  choose to train
 862 the MultiLayer Perceptron.  As stated  before, this kind of network is
 863 in   particular    well-known   for   its    universal   approximation
 864 property. Furthermore,  MLPs have been already  considered for chaotic
 865 time series prediction.  For example, in~\cite{dalkiran10} the authors
 866 have shown that a feedforward  MLP with two hidden layers, and trained
 867 with Bayesian  Regulation back-propagation, can  learn successfully the
 868 dynamics of Chua's circuit.
 869
 870 In these experiments  we consider  MLPs having one hidden layer of
 871 sigmoidal  neurons  and  output   neurons  with  a  linear  activation
 872 function.     They    are    trained    using    the    Limited-memory
 873 Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
 874 with the Wolfe linear search.  The training process is performed until
 875 a maximum number of epochs  is reached.  To prevent overfitting and to
 876 estimate the  generalization performance we use  holdout validation by
 877 splitting the  data set into  learning, validation, and  test subsets.
 878 These subsets  are obtained through  random selection such  that their
 879 respective size represents 65\%, 10\%, and 25\% of the whole data set.
 880
 881 Several  neural  networks  are  trained  for  both  iterations  coding
 882 schemes.   In  both  cases   iterations  have  the  following  layout:
 883 configurations of  four components and  strategies with at  most three
 884 terms. Thus, for  the first coding scheme a data  set pair is composed
 885 of 6~inputs and 5~outputs, while for the second one it is respectively
 886 3~inputs and 2~outputs. As noticed at the end of the previous section,
 887 this  leads to  data sets  that  consist of  2304~pairs. The  networks
 888 differ  in the  size of  the hidden  layer and  the maximum  number of
 889 training epochs.  We remember that  to evaluate the ability  of neural
 890 networks to  predict a  chaotic behavior for  each coding  scheme, the
 891 trainings of two data sets, one of them describing chaotic iterations,
 892 are compared.
 893
 894 Thereafter we give,  for the different learning setups  and data sets,
 895 the  mean prediction  success rate  obtained for  each  output.  These
 896 values  are  computed  considering  10~trainings with  random  subsets
 897 construction,  weights  and  biases initialization.   Firstly,  neural
 898 networks having 10  and 25~hidden neurons are trained,  with a maximum
 899 number  of  epochs that  takes  its  value  in $\{125,250,500\}$  (see
 900 Tables~\ref{tab1}  and \ref{tab2}).   Secondly, we  refine  the second
 901 coding scheme by splitting the  output vector such that each output is
 902 learned by a specific  neural network (Table~\ref{tab3}). In this last
 903 case, we increase  the size of the hidden layer  up to 40~neurons, and
 904 we consider larger number of epochs.
 905
 906 \begin{table}[htbp!]
 907 \caption{Prediction success rates for configurations expressed as boolean vectors.}
 908 \label{tab1}
 909 \centering {\small
 910 \begin{tabular}{|c|c||c|c|c|}
 911 \hline
 912 \multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
 913 \hline
 914 \hline
 915 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
 916 \cline{3-5}
 917 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\
 918 \hline
 919 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 90.92\% & 91.75\% & 91.82\% \\
 920 & Output~(2) & 69.32\% & 78.46\% & 82.15\% \\
 921 & Output~(3) & 68.47\% & 78.49\% & 82.22\% \\
 922 & Output~(4) & 91.53\% & 92.37\% & 93.4\% \\
 923 & Config. & 36.10\% & 51.35\% & 56.85\% \\
 924 & Strategy~(5) & 1.91\% & 3.38\% & 2.43\% \\
 925 \hline
 926 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.64\% & 98.10\% & 98.20\% \\
 927 & Output~(2) & 95.15\% & 95.39\% & 95.46\% \\
 928 & Output~(3) & 100\% & 100\% & 100\% \\
 929 & Output~(4) & 97.47\% & 97.90\% & 97.99\% \\
 930 & Config. & 90.52\% & 91.59\% & 91.73\% \\
 931 & Strategy~(5) & 3.41\% & 3.40\% & 3.47\% \\
 932 \hline
 933 \hline
 934 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{25 neurons} \\ %& \multicolumn{3}{|c|}{40 neurons} \\
 935 \cline{3-5}
 936 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ %& 125 & 250 & 500 \\
 937 \hline
 938 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 91.65\% & 92.69\% & 93.93\% \\ %& 91.94\% & 92.89\% & 94.00\% \\
 939 & Output~(2) & 72.06\% & 88.46\% & 90.5\% \\ %& 74.97\% & 89.83\% & 91.14\% \\
 940 & Output~(3) & 79.19\% & 89.83\% & 91.59\% \\ %& 76.69\% & 89.58\% & 91.84\% \\
 941 & Output~(4) & 91.61\% & 92.34\% & 93.47\% \\% & 82.77\% & 92.93\% & 93.48\% \\
 942 & Config. & 48.82\% & 67.80\% & 70.97\% \\%& 49.46\% & 68.94\% & 71.11\% \\
 943 & Strategy~(5) & 2.62\% & 3.43\% & 3.78\% \\% & 3.10\% & 3.10\% & 3.03\% \\
 944 \hline
 945 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.87\% & 97.99\% & 98.03\% \\ %& 98.16\% \\
 946 & Output~(2) & 95.46\% & 95.84\% & 96.75\% \\ % & 97.4\% \\
 947 & Output~(3) & 100\% & 100\% & 100\% \\%& 100\% \\
 948 & Output~(4) & 97.77\% & 97.82\% & 98.06\% \\%& 98.31\% \\
 949 & Config. & 91.36\% & 91.99\% & 93.03\% \\%& 93.98\% \\
 950 & Strategy~(5) & 3.37\% & 3.44\% & 3.29\% \\%& 3.23\% \\
 951 \hline
 952 \end{tabular}
 953 }
 954 \end{table}
 955
 956 Table~\ref{tab1}  presents the  rates  obtained for  the first  coding
 957 scheme.   For  the chaotic  data,  it can  be  seen  that as  expected
 958 configuration  prediction becomes  better  when the  number of  hidden
 959 neurons and maximum  epochs increases: an improvement by  a factor two
 960 is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
 961 25~neurons  and  500~epochs). We  also  notice  that  the learning  of
 962 outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
 963 non-chaotic  case the  simplest training  setup is  enough  to predict
 964 configurations.   For  all  network  topologies and  all  outputs  the
 965 obtained  results  for the  non-chaotic  case  outperform the  chaotic
 966 ones. Finally,  the rates for  the strategies show that  the different
 967 networks are unable to learn them.
 968
 969 For the second coding  scheme (\textit{i.e.}, with Gray Codes)
 970 Table~\ref{tab2} shows that any network
 971 learns about  five times more non-chaotic  configurations than chaotic
 972 ones. As in the previous scheme, the strategies cannot be predicted.
 973
 974 Let us now compare the  two coding schemes. Firstly, the second scheme
 975 disturbs  the   learning  process.   In   fact  in  this   scheme  the
 976 configuration is always expressed as  a natural number, whereas in the
 977 first one  the number  of inputs follows  the increase of  the boolean
 978 vectors coding configurations. In this latter case, the coding gives a
 979 finer information on configuration evolution.
 980 \JFC{Je n'ai pas compris le paragraphe precedent. Devrait être repris}
 981 \begin{table}[b]
 982 \caption{Prediction success rates for configurations expressed with Gray code}
 983 \label{tab2}
 984 \centering
 985 \begin{tabular}{|c|c||c|c|c|}
 986 \hline
 987 \multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
 988 \hline
 989 \hline
 990 & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
 991 \cline{2-5}
 992 & Epochs & 125 & 250 & 500 \\ %& 1000
 993 \hline
 994 \multirow{2}{*}{Chaotic}& Config.~(1) & 13.29\% & 13.55\% & 13.08\% \\ %& 12.5\%
 995 & Strategy~(2) & 0.50\% & 0.52\% & 1.32\% \\ %& 1.42\%
 996 \hline
 997 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 77.12\% & 74.00\% & 72.60\% \\ %& 75.81\%
 998 & Strategy~(2) & 0.42\% & 0.80\% & 1.16\% \\ %& 1.42\%
 999 \hline
1000 \hline
1001 & Hidden neurons & \multicolumn{3}{c|}{25 neurons} \\
1002 \cline{2-5}
1003 & Epochs & 125 & 250 & 500 \\ %& 1000
1004 \hline
1005 \multirow{2}{*}{Chaotic}& Config.~(1) & 12.27\% & 13.15\% & 13.05\% \\ %& 15.44\%
1006 & Strategy~(2) & 0.71\% & 0.66\% & 0.88\% \\ %& 1.73\%
1007 \hline
1008 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 73.60\% & 74.70\% & 75.89\% \\ %& 68.32\%
1009 & Strategy~(2) & 0.64\% & 0.97\% & 1.23\% \\ %& 1.80\%
1010 \hline
1011 \end{tabular}
1012 \end{table}
1013
1014 Unfortunately, in  practical applications the number  of components is
1015 usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
1016 systematically.   Therefore, we  provide  a refinement  of the  second
1017 scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
1018 presents the  results for  this approach.  In  any case,  whatever the
1019 network  topologies,  the  maximum   epoch  number  and  the  kind  of
1020 iterations,  the  configuration  success  rate is  slightly  improved.
1021 Moreover, the strategies predictions  rates reach almost 12\%, whereas
1022 in  Table~\ref{tab2}  they  never   exceed  1.5\%.   Despite  of  this
1023 improvement, a long term prediction of chaotic iterations still
1024 appear to be
1025 an open issue.
1026
1027 \begin{table}
1028 \caption{Prediction success rates for split outputs.}
1029 \label{tab3}
1030 \centering
1031 \begin{tabular}{|c||c|c|c|}
1032 \hline
1033 \multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
1034 \hline
1035 \hline
1036 Epochs & 125 & 250 & 500 \\
1037 \hline
1038 \hline
1039 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1040 \hline
1041 10~neurons & 12.39\% & 14.06\% & 14.32\% \\
1042 25~neurons & 13.00\% & 14.28\% & 14.58\% \\
1043 40~neurons & 11.58\% & 13.47\% & 14.23\% \\
1044 \hline
1045 \hline
1046 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1047 \cline{2-4}
1048 %Epochs & 125 & 250 & 500 \\
1049 \hline
1050 10~neurons & 76.01\% & 74.04\% & 78.16\% \\
1051 25~neurons & 76.60\% & 72.13\% & 75.96\% \\
1052 40~neurons & 76.34\% & 75.63\% & 77.50\% \\
1053 \hline
1054 \hline
1055 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1056 \cline{2-4}
1057 %Epochs & 125 & 250 & 500 \\
1058 \hline
1059 10~neurons & 0.76\% & 0.97\% & 1.21\% \\
1060 25~neurons & 1.09\% & 0.73\% & 1.79\% \\
1061 40~neurons & 0.90\% & 1.02\% & 2.15\% \\
1062 \hline
1063 \multicolumn{4}{c}{} \\
1064 \hline
1065 Epochs & 1000 & 2500 & 5000 \\
1066 \hline
1067 \hline
1068 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1069 \hline
1070 10~neurons & 14.51\% & 15.22\% & 15.22\% \\
1071 25~neurons & 16.95\% & 17.57\% & 18.46\% \\
1072 40~neurons & 17.73\% & 20.75\% & 22.62\% \\
1073 \hline
1074 \hline
1075 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1076 \cline{2-4}
1077 %Epochs & 1000 & 2500 & 5000 \\
1078 \hline
1079 10~neurons & 78.98\% & 80.02\% & 79.97\% \\
1080 25~neurons & 79.19\% & 81.59\% & 81.53\% \\
1081 40~neurons & 79.64\% & 81.37\% & 81.37\% \\
1082 \hline
1083 \hline
1084 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1085 \cline{2-4}
1086 %Epochs & 1000 & 2500 & 5000 \\
1087 \hline
1088 10~neurons & 3.47\% & 9.98\% & 11.66\% \\
1089 25~neurons & 3.92\% & 8.63\% & 10.09\% \\
1090 40~neurons & 3.29\% & 7.19\% & 7.18\% \\
1091 \hline
1092 \end{tabular}
1093 \end{table}
1094
1095 \section{Conclusion}
1096
1097 In  this paper,  we have  established an  equivalence  between chaotic
1098 iterations,  according to  the Devaney's  definition of  chaos,  and a
1099 class  of  multilayer perceptron  neural  networks.  Firstly, we  have
1100 described how to build a neural network that can be trained to learn a
1101 given chaotic map function.  Then,  we found a condition that allow to
1102 check whether the iterations induced by a function are chaotic or not,
1103 and thus  if a chaotic map  is obtained. Thanks to  this condition our
1104 approach is not limited to a particular function. In the dual case, we
1105 show  that  checking  if  a  neural network  is  chaotic  consists  in
1106 verifying  a property  on an  associated  graph, called  the graph  of
1107 iterations.   These results  are valid  for recurrent  neural networks
1108 with a  particular architecture.  However,  we believe that  a similar
1109 work can be done for  other neural network architectures.  Finally, we
1110 have  discovered at  least one  family of  problems with  a reasonable
1111 size, such  that artificial neural  networks should not be  applied in
1112 the  presence  of chaos,  due  to  their  inability to  learn  chaotic
1113 behaviors in this  context.  Such a consideration is  not reduced to a
1114 theoretical detail:  this family of discrete  iterations is concretely
1115 implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
1116 steganographic   detectors  embed  tools   like  neural   networks  to
1117 distinguish between  original and stego contents, our  studies tend to
1118 prove that such  detectors might be unable to  tackle with chaos-based
1119 information hiding schemes.  Furthermore, iterations such that not all
1120 of  the  components  are updated  at  each  step  are very  common  in
1121 biological  and  physics mechanisms.   Therefore,  one can  reasonably
1122 wonder whether neural networks should be applied in these contexts.
1123
1124 In  future  work we  intend  to  enlarge  the comparison  between  the
1125 learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
1126 computational intelligence tools such  as support vector machines will
1127 be investigated  too, to  discover which tools  are the  most relevant
1128 when facing a truly chaotic phenomenon.  A comparison between learning
1129 rate  success  and  prediction  quality will  be  realized.   Concrete
1130 consequences in biology, physics, and computer science security fields
1131 will be  stated.  Lastly,  thresholds separating systems  depending on
1132 the ability to learn their dynamics will be established.
1133
1134 % \appendix{}
1135
1136
1137
1138 % \begin{definition} \label{def2}
1139 % A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
1140 % is defined  to be  {\emph{topologically transitive}}  if for any  pair of
1141 % open  sets $U$,  $V  \in  \mathcal{X}$ there  exists
1142 % $k \in
1143 % \mathds{N}^{\ast}$
1144 %  such  that
1145 % $f^k(U) \cap V \neq \emptyset$.
1146 % \end{definition}
1147
1148
1149 \bibliography{chaos-paper}% Produces the bibliography via BibTeX.
1150
1151 \end{document}
1152 %
1153 % ****** End of file chaos-paper.tex ******