main.tex

   1 % ****** Start of file chaos-paper.tex ******
   2 %
   3 \documentclass[%
   4 aip,
   5 cha,% long,
   6 numerical bibliography, %(default)
   7 %jcp,% short, numerical bibliography,
   8 %jmp,
   9 amsmath,amssymb,
  10 preprint,%
  11 %reprint,%
  12 %author-year,%
  13 %author-numerical,%
  14 ]{revtex4-1}
  15
  16 \usepackage{graphicx}% Include figure files
  17 \usepackage{dcolumn}% Align table columns on decimal point
  18 \usepackage{bm}% bold math
  19 \usepackage{color}
  20 \usepackage{ulem}
  21 \usepackage[mathlines]{lineno}% Enable numbering of text and display math
  22 \linenumbers\relax % Commence numbering lines
  23 % The amssymb package provides various useful mathematical symbols
  24 \usepackage{amssymb}
  25 % The amsthm package provides various useful mathematical fonts
  26 \usepackage{amsfonts}
  27 % Various mathematical packages
  28 \PassOptionsToPackage{cmex10draft}{graphics}
  29 \usepackage[standard]{ntheorem}
  30 % Various packages
  31 \usepackage{stmaryrd} % llbracket
  32 \usepackage{dsfont}   % mathds
  33 \usepackage{multirow} % tables
  34
  35 \newcommand{\ie}{\textit{i.e.}}
  36 \newcommand{\Nats}[0]{\ensuremath{\mathds{N}}}
  37 \newcommand{\R}[0]{\ensuremath{\mathds{R}}}
  38 \newcommand{\Bool}[0]{\ensuremath{\mathds{B}}}
  39
  40 \begin{document}
  41
  42 \title[Recurrent Neural Networks and Chaos]{Recurrent Neural Networks
  43 and Chaos: Construction, Evaluation, \\
  44 and Prediction Ability}
  45
  46 \author{Jacques M. Bahi}
  47 \author{Jean-Fran\c{c}ois Couchot}
  48 \author{Christophe Guyeux}
  49 \email{christophe.guyeux@univ-fcomte.fr.}
  50 \author{Michel Salomon}
  51 \altaffiliation[Authors in ]{alphabetic order}
  52 \affiliation{
  53 Computer Science Laboratory (LIFC), University of Franche-Comt\'e, \\
  54 IUT de Belfort-Montb\'eliard, BP 527, \\
  55 90016 Belfort Cedex, France
  56 }
  57
  58 \date{\today}
  59
  60 \newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
  61
  62 \begin{abstract}
  63 %% Text of abstract
  64 Many  research works  deal with  chaotic neural  networks  for various
  65 fields  of application. Unfortunately,  up to  now these  networks are
  66 usually  claimed to be  chaotic without  any mathematical  proof.  The
  67 purpose of this paper is to establish, based on a rigorous theoretical
  68 framework, an equivalence between  chaotic iterations according to
  69 Devaney  and  a  particular  class of  neural
  70 networks. On the one hand we show  how to build such a network, on the
  71 other  hand we provide  a method  to check  if a  neural network  is a
  72 chaotic one.  Finally, the ability of classical feedforward multilayer
  73 perceptrons to  learn sets of  data obtained from a  chaotic dynamical
  74 system is regarded.  Various  Boolean functions are iterated on finite
  75 states. Iterations of some of them  are proven to  be chaotic
  76  as it is  defined by
  77 Devaney.  In that context, important differences occur in the training
  78 process,  establishing  with  various  neural  networks  that  chaotic
  79 behaviors are far more difficult to learn.
  80 \end{abstract}
  81
  82 %%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
  83                              % Classification Scheme.
  84 %%\keywords{Suggested keywords}%Use showkeys class option if keyword
  85                               %display desired
  86 \maketitle
  87
  88 \begin{quotation}
  89
  90 Chaotic neural  networks have received a  lot of attention  due to the
  91 appealing   properties  of   deterministic   chaos  (unpredictability,
  92 sensitivity,  and so  on). However,  such networks  are  often claimed
  93 chaotic without  any rigorous mathematical proof.   Therefore, in this
  94 work  a theoretical  framework based  on the  Devaney's  definition of
  95 chaos  is introduced.   Starting with  a relationship  between chaotic
  96 iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
  97 recurrent  neural networks  that is  equivalent to  a chaotic  map and
  98 secondly  a way  to check  whether  an already  available network,  is
  99 chaotic  or not.  We also  study different  topological  properties of
 100 these  truly  chaotic neural  networks.   Finally,  we  show that  the
 101 learning,  with neural  networks  having a  feedforward structure,  of
 102 chaotic behaviors represented by data sets obtained from chaotic maps,
 103 is far more difficult than non chaotic behaviors.
 104 \end{quotation}
 105
 106 %% Main text
 107 \section{Introduction}
 108 \label{S1}
 109
 110 Several research  works have proposed  or run chaotic  neural networks
 111 these last  years.  The complex dynamics  of such a  networks leads to
 112 various       potential      application       areas:      associative
 113 memories~\cite{Crook2007267}  and  digital  security tools  like  hash
 114 functions~\cite{Xiao10},                                       digital
 115 watermarking~\cite{1309431,Zhang2005759},           or          cipher
 116 schemes~\cite{Lian20091296}.  In the  former case, the background idea
 117 is to  control chaotic dynamics in  order to store  patterns, with the
 118 key advantage  of offering a  large storage capacity.  For  the latter
 119 case,   the  use   of   chaotic  dynamics   is   motivated  by   their
 120 unpredictability and  random-like behaviors.  Thus,  investigating new
 121 concepts is crucial in this  field, because new threats are constantly
 122 emerging.   As an illustrative  example, the  former standard  in hash
 123 functions,  namely the  SHA-1  algorithm, has  been recently  weakened
 124 after flaws were discovered.
 125
 126 Chaotic neural networks have  been built with different approaches. In
 127 the context of associative  memory, chaotic neurons like the nonlinear
 128 dynamic  state neuron  \cite{Crook2007267}  frequently constitute  the
 129 nodes of the network. These neurons have an inherent chaotic behavior,
 130 which  is usually  assessed through  the computation  of  the Lyapunov
 131 exponent.  An alternative approach  is to consider a well-known neural
 132 network architecture: the  MultiLayer Perceptron (MLP). These networks
 133 are  suitable to model  nonlinear relationships  between data,  due to
 134 their universal approximator capacity. Thus, this kind of networks can
 135 be trained to model a physical  phenomenon known to be chaotic such as
 136 Chua's circuit  \cite{dalkiran10}.  Sometimes, a  neural network which
 137 is build by combining  transfer functions and initial conditions that are both
 138 chaotic,      is      itself      claimed      to      be      chaotic
 139 \cite{springerlink:10.1007/s00521-010-0432-2}.
 140
 141 What all of these chaotic neural  networks have in common is that they
 142 are claimed to be chaotic  despite a lack of any rigorous mathematical
 143 proof.   The first contribution of  this paper  is  to fill  this  gap,
 144 using  a theoretical  framework  based on  the  Devaney's  definition of  chaos
 145 \cite{Devaney}.   This  mathematical  theory  of chaos  provides  both
 146 qualitative and quantitative tools to evaluate the complex behavior of
 147 a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
 148 precisely, in  this paper,  which is an  extension of a  previous work
 149 \cite{bgs11:ip},   we  establish   the  equivalence  between   chaotic
 150 iterations and  a class of globally  recurrent MLP.
 151 The investigation the  converse problem is the second contribution:
 152 we indeed study the  ability for
 153 classical MultiLayer Perceptrons  to learn a particular family of
 154 discrete  chaotic  dynamical  systems.   This family,  called  chaotic
 155 iterations, is defined by a  Boolean vector, an update function, and a
 156 sequence giving  which component  to update at  each iteration.   It has
 157 been  previously established  that such  dynamical systems is
 158 chaotically iterated (as it is defined by Devaney) when the chosen function has
 159 a  strongly connected  iterations graph.   In this  document,  we
 160 experiment several MLPs and try to learn some iterations of this kind.
 161 We  show that non-chaotic iterations can be learned, whereas it is
 162 far  more  difficult  for chaotic  ones.   That  is  to say,  we  have
 163 discovered at  least one  family of problems  with a  reasonable size,
 164 such  that artificial  neural networks  should not  be applied
 165 due to their inability to  learn chaotic behaviors in this context.
 166
 167 The remainder of this research  work is organized as follows. The next
 168 section is devoted  to the basics of Devaney's
 169 chaos.   Section~\ref{S2} formally  describes  how to  build a  neural
 170 network  that operates  chaotically.  Section~\ref{S3} is
 171 devoted to the dual case of  checking whether an existing neural network
 172 is chaotic  or not.
 173 Topological properties  of   chaotic  neural  networks
 174 are discussed in Sect.~\ref{S4}.   The
 175 Section~\ref{section:translation}   shows   how   to  translate   such
 176 iterations  into  an Artificial  Neural  Network  (ANN),  in order  to
 177 evaluate the  capability for this  latter to learn  chaotic behaviors.
 178 This  ability  is  studied in  Sect.~\ref{section:experiments},  where
 179 various ANNs try to learn two  sets of data: the first one is obtained
 180 by chaotic iterations while the  second one results from a non-chaotic
 181 system.  Prediction success rates are  given and discussed for the two
 182 sets.  The paper ends with a conclusion section where our contribution
 183 is summed up and intended future work is exposed.
 184
 185 \section{Chaotic Iterations according to  Devaney}
 186
 187 In this section, the well-established notion of Devaney's mathematical
 188 chaos is  firstly recalled.  Preservation of  the unpredictability of
 189 such dynamical  system when implemented  on a computer is  obtained by
 190 using  some discrete iterations  called ``asynchronous  iterations'', which
 191 are thus introduced.  The result establishing the link between such
 192 iterations and Devaney's chaos is  finally presented at the end of this
 193 section.
 194
 195 In what follows and for  any function $f$, $f^n$ means the composition
 196 $f \circ f \circ \hdots \circ f$ ($n$ times).
 197
 198 \subsection{Devaney's chaotic dynamical systems}
 199
 200 Various domains such as  physics, biology, or economy, contain systems
 201 that exhibit a chaotic behavior,  a well-known example is the weather.
 202 These  systems   are  in   particular  highly  sensitive   to  initial
 203 conditions, a concept usually presented as the butterfly effect: small
 204 variations in the initial conditions possibly lead to widely different
 205 behaviors.  Theoretically speaking, a  system is sensitive if for each
 206 point  $x$ in  the  iteration space,  one  can find  a  point in  each
 207 neighborhood of $x$ having a significantly different future evolution.
 208 Conversely, a  system seeded with  the same initial  conditions always
 209 has  the same  evolution.   In  other words,  chaotic  systems have  a
 210 deterministic  behavior  defined through  a  physical or  mathematical
 211 model  and a  high  sensitivity to  the  initial conditions.   Besides
 212 mathematically this  kind of unpredictability  is also referred  to as
 213 deterministic chaos.  For example, many weather forecast models exist,
 214 but they give only suitable predictions for about a week, because they
 215 are initialized with conditions  that reflect only a partial knowledge
 216 of the current weather.  Even  the  differences are  initially small,
 217 they are  amplified in the course  of time, and thus  make difficult a
 218 long-term prediction.  In fact,  in a chaotic system, an approximation
 219 of  the current  state is  a  quite useless  indicator for  predicting
 220 future states.
 221
 222 From  mathematical  point  of   view,  deterministic  chaos  has  been
 223 thoroughly studied  these last decades, with  different research works
 224 that  have   provide  various  definitions  of   chaos.   Among  these
 225 definitions,   the  one  given   by  Devaney~\cite{Devaney}   is
 226 well-established.    This   definition   consists  of   three   conditions:
 227 topological  transitivity, density of  periodic points,  and sensitive
 228 point dependence on initial conditions.
 229
 230 Topological  transitivity   is  checked  when,  for   any  point,  any
 231 neighborhood of its future evolution eventually overlap with any other
 232 given region. More precisely,
 233
 234 \begin{definition} \label{def2}
 235 A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
 236 is defined  to be  {\bf topologically transitive}  if for any  pair of
 237 open  sets $U$,  $V  \in  \mathcal{X}$ there  exists
 238 $k \in
 239 \mathds{N}^{\ast}$
 240  such  that
 241 $f^k(U) \cap V \neq \emptyset$.
 242 \end{definition}
 243
 244 This property  implies that a  dynamical system cannot be  broken into
 245 simpler  subsystems.
 246 Intuitively, its complexity does not allow any simplification.
 247 On  the contrary,  a dense set  of periodic points  is an
 248 element of regularity that a chaotic dynamical system has to exhibit.
 249
 250 \begin{definition} \label{def3}
 251 A point $x$ is called a  {\bf periodic point} for $f$ of period~$n \in
 252 \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
 253 \end{definition}
 254
 255 \begin{definition} \label{def4}
 256 $f$ is said to be {\bf  regular} on $(\mathcal{X},\tau)$ if the set of
 257   periodic points  for $f$ is  dense in $\mathcal{X}$ ( for any $x \in
 258   \mathcal{X}$, we can find at least  one periodic point in any of its
 259   neighborhood).
 260 \end{definition}
 261
 262 This  regularity ``counteracts'' the  effects of  transitivity.  Thus,
 263 due to these two properties, two points close to each other can behave
 264 in a completely different  manner, leading to unpredictability for the
 265 whole system. Then,
 266
 267 \begin{definition} \label{sensitivity}
 268 $f$  has {\bf  sensitive dependence  on initial  conditions}  if there
 269   exists $\delta  >0$ such  that, for any  $x\in \mathcal{X}$  and any
 270   neighborhood $V$ of $x$, there exist  $y\in V$ and $n > 0$ such that
 271   $d\left(f^{n}(x), f^{n}(y)\right) >\delta  $. The value $\delta$ is called the
 272   {\bf constant of sensitivity} of $f$.
 273 \end{definition}
 274
 275 Finally,
 276
 277 \begin{definition} \label{def5}
 278 The dynamical system that iterates $f$ is {\bf chaotic according to Devaney}
 279 on $(\mathcal{X},\tau)$ if $f$  is   regular,  topologically  transitive,
 280 and  has  sensitive dependence to its initial conditions.
 281 \end{definition}
 282
 283 Let us notice that for a  metric space the last condition follows from
 284 the two first ones~\cite{Banks92}.
 285
 286 \subsection{Asynchronous Iterations}
 287
 288 %This section  presents some basics on  topological chaotic iterations.
 289 Let us  firstly discuss about the  domain of iteration.  As  far as we
 290 know, no result rules that the chaotic behavior of a dynamical system
 291 that has been theoretically proven on  $\R$ remains valid on the
 292 floating-point
 293 numbers, which is  the implementation domain.  Thus, to  avoid loss of
 294 chaos this  work presents an  alternative, that is to  iterate Boolean
 295 maps:  results that  are  theoretically obtained  in  that domain  are
 296 preserved in implementations.
 297
 298 Let us denote by $\llbracket  a ; b \rrbracket$ the following interval
 299 of integers:  $\{a, a+1,  \hdots, b\}$, where $a~<~b$.
 300 In that section,  a  system
 301 under   consideration    iteratively   modifies   a    collection   of
 302 $n$~components.   Each component  $i \in  \llbracket 1;  n \rrbracket$
 303 takes  its  value  $x_i$  among the  domain  $\Bool=\{0,1\}$.   A~{\it
 304   configuration} of the system at discrete time $t$  is the vector
 305 %\begin{equation*}
 306 $x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
 307 %\end{equation*}
 308 The dynamics of the system is  described according to a  function $f :
 309 \Bool^n \rightarrow \Bool^n$ such that
 310 %\begin{equation*}
 311 $f(x)=(f_1(x),\ldots,f_n(x))$.
 312 %\end{equation*}
 313 % Notice that $f^k$ denotes the
 314 % $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
 315
 316 Let    be   given    a    configuration   $x$.    In   what    follows
 317 $N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$  is  the configuration
 318 obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
 319 indeed  the negation  of $x_i$).   Intuitively, $x$  and  $N(i,x)$ are
 320 neighbors.   The discrete  iterations of  $f$ are  represented  by the
 321 oriented  {\it graph  of iterations}  $\Gamma(f)$.  In  such  a graph,
 322 vertices are configurations  of $\Bool^n$ and there is  an arc labeled
 323 $i$ from $x$ to $N(i,x)$ if and only if  $f_i(x)$ is $N(i,x)$.
 324
 325 In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
 326 sequence  defining which  component to  update at  time $t$  and $S^{t}$
 327 denotes its  $t-$th term.
 328 This iteration scheme that only modifies one element at each iteration
 329 is clasically refered as \emph{asynchronous iterations}.
 330 Next section shows the link between asynchronous iterations and
 331 Devaney's Chaos.
 332
 333 \subsection{On the link between asynchronous iterations and
 334   Devaney's Chaos}
 335
 336 In  this subsection  we recall  the link  we have  established between
 337 asynchronous iterations and Devaney's  chaos.  The theoretical framework is
 338 fully described in \cite{guyeux09}.
 339
 340 We  introduce the function $F_{f}$  that is
 341 defined  for  any given  application  $f:\Bool^{n}  \to \Bool^{n}$  by
 342 $F_{f}:   \llbracket1;n\rrbracket\times   \mathds{B}^{n}   \rightarrow
 343 \mathds{B}^{n}$, s.t.
 344 \begin{equation}
 345 \label{eq:CIs}
 346   F_{f}(s,x)_j  =
 347     \left\{
 348     \begin{array}{l}
 349     f_j(x) \textrm{ if } j= s \enspace , \\
 350     x_{j} \textrm{ otherwise} \enspace .
 351     \end{array}
 352     \right.
 353 \end{equation}
 354
 355 \noindent With such a notation, configurations are defined for times
 356 $t=0,1,2,\ldots$ by:
 357 \begin{equation}\label{eq:sync}
 358 \left\{\begin{array}{l}
 359   x^{0}\in \mathds{B}^{n} \textrm{ and}\\
 360   x^{t+1}=F_{f}(S^t,x^{t}) \enspace .
 361 \end{array}\right.
 362 \end{equation}
 363
 364 \noindent  Finally, iterations defined  in Eq.~(\ref{eq:sync})  can be
 365 described by the following system:
 366 \begin{equation}
 367 \left\{
 368 \begin{array}{lll}
 369 X^{0} & = &  ((S^t)^{t \in \Nats},x^0) \in
 370 \llbracket1;n\rrbracket^\Nats \times \Bool^{n}\\
 371 X^{k+1}& = & G_{f}(X^{k})\\
 372 \multicolumn{3}{c}{\textrm{where } G_{f}\left(((S^t)^{t \in \Nats},x)\right)
 373 = \left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right) \enspace ,}
 374 \end{array}
 375 \right.
 376 \label{eq:Gf}
 377 \end{equation}
 378 where  $\sigma$ is the  function that  removes the  first term  of the
 379 strategy  ({\it i.e.},~$S^0$).   This definition  means that  only one
 380 component  of the  system is  updated  at an  iteration: the  $S^t$-th
 381 element. But it can be extended by considering subsets for $S^t$.
 382
 383
 384
 385
 386 The   {\bf   distance}   $d$    between   two   points   $(S,x)$   and
 387 $(\check{S},\check{x})\in  \mathcal{X} = \llbracket1;n\rrbracket^\Nats
 388 \times \Bool^{n}$ is defined by
 389 \begin{equation}
 390 d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
 391 \enspace ,
 392 \end{equation}
 393 where
 394 \begin{equation}
 395 d_{e}(x,\check{x})=\sum_{j=1}^{n}\Delta
 396 (x_{j},\check{x}_{j}) \in \llbracket  0 ; n \rrbracket
 397 \end{equation}
 398 \noindent and
 399 \begin{equation}
 400 d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
 401 }\frac{|S^{t}-\check{S}^{t}|}{10^{t+1}} \in [0 ; 1] \enspace .
 402 \end{equation}
 403
 404 This    distance    is    defined    to    reflect    the    following
 405 information. Firstly, the more  two systems have different components,
 406 the  larger the  distance  between them is.  Secondly,  two systems  with
 407 similar components and strategies, which have the same starting terms,
 408 must  induce only  a small  distance.  The  proposed  distance fulfill
 409 these  requirements: on  the one  hand  its floor  value reflects  the
 410 difference between  the cells, on  the other hand its  fractional part
 411 measures the difference between the strategies.
 412
 413 The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
 414 path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
 415 strategy  $s$ such  that  the  parallel iteration  of  $G_f$ from  the
 416 initial  point $(s,x)$  reaches  the configuration  $x'$.  Using  this
 417 link, Guyeux~\cite{GuyeuxThese10} has proven that,
 418 \begin{theorem}%[Characterization of $\mathcal{C}$]
 419 \label{Th:Caracterisation   des   IC   chaotiques}
 420 Let $f:\Bool^n\to\Bool^n$. Iterations of $G_f$ are chaotic  according
 421 to  Devaney if and only if  $\Gamma(f)$ is strongly connected.
 422 \end{theorem}
 423
 424 Checking  if a  graph  is  strongly connected  is  not difficult.  For
 425 example,  consider the  function $f_1\left(x_1,\dots,x_n\right)=\left(
 426 \overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$. As $\Gamma(f_1)$ is
 427 obviously strongly connected, then $G_{f_1}$ is a chaotic map.
 428
 429 With this  material, we are now  able to build a  first chaotic neural
 430 network, as defined in the Devaney's formulation.
 431
 432 \section{A chaotic neural network in the sense of Devaney}
 433 \label{S2}
 434
 435 Let     us     firstly     introduce    the     vectorial     negation
 436 $f_{0}(x_{1},\dots,x_{n}) =(\overline{x_{1}},\dots,\overline{x_{n}})$,
 437 which is  such that $\Gamma(f_0)$ is  strongly connected.  Considering
 438 the map  $F_{f_0}:\llbracket 1;  n \rrbracket \times  \mathds{B}^n \to
 439 \mathds{B}^n$  associated to the  vectorial negation,  we can  build a
 440 multilayer perceptron  neural network modeling  $F_{f_0}$.  Hence, for
 441 all inputs  $(s,x) \in \llbracket  1;n\rrbracket \times \mathds{B}^n$,
 442 the output layer will produce  $F_{f_0}(s,x)$.  It is then possible to
 443 link  the output  layer  and the  input  one, in  order  to model  the
 444 dependence between two successive iterations.  As a result we obtain a
 445 global  recurrent   neural  network  that  behaves   as  follows  (see
 446 Fig.~\ref{Fig:perceptron}).
 447
 448 \begin{itemize}
 449 \item   The   network   is   initialized   with   the   input   vector
 450   $\left(S^0,x^0\right)    \in    \llbracket   1;n\rrbracket    \times
 451   \mathds{B}^n$      and      computes      the     output      vector
 452   $x^1=F_{f_0}\left(S^0,x^0\right)$. This last  vector is published as
 453   an output one of the chaotic  neural network and is sent back to the
 454   input layer through the feedback links.
 455 \item When  the network  is activated at  the $t^{th}$  iteration, the
 456   state of the system $x^t  \in \mathds{B}^n$ received from the output
 457   layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
 458   ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
 459   output.   This new  output, which  represents the  new state  of the
 460   dynamical system, satisfies:
 461   \begin{equation}
 462     x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
 463   \end{equation}
 464 \end{itemize}
 465
 466 \begin{figure}
 467   \centering
 468   \includegraphics[scale=0.625]{perceptron}
 469   \caption{A perceptron equivalent to chaotic iterations}
 470   \label{Fig:perceptron}
 471 \end{figure}
 472
 473 The behavior of the neural network is such that when the initial state
 474 is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
 475 given  as outside  input, then  the sequence  of  successive published
 476 output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
 477 the  one produced  by  the chaotic  iterations  formally described  in
 478 Eq.~(\ref{eq:CIs}).  It  means that  mathematically if we  use similar
 479 input  vectors   they  both  generate  the   same  successive  outputs
 480 $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
 481 are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
 482 $\mathcal{X}$. Finally, since the  proposed neural network is build to
 483 model  the  behavior  of  $G_{f_0}$,  which is  chaotic  according  to
 484 Devaney's definition  of chaos,  we can conclude  that the  network is
 485 also chaotic in this sense.
 486
 487 The previous construction scheme  is not restricted to function $f_0$.
 488 It can  be extended to any function  $f$ such that $G_f$  is a chaotic
 489 map by training the network to model $F_{f}:\llbracket 1; n \rrbracket
 490 \times      \mathds{B}^n      \to      \mathds{B}^n$.      Due      to
 491 Theorem~\ref{Th:Caracterisation  des  IC  chaotiques},  we  can  find
 492 alternative functions  $f$ for $f_0$  through a simple check  of their
 493 graph of  iterations $\Gamma(f)$.  For  example, we can  build another
 494 chaotic neural network by using $f_1$ instead of $f_0$.
 495
 496 \section{Checking whether a neural network is chaotic or not}
 497 \label{S3}
 498
 499 We focus now on the case  where a neural network is already available,
 500 and for which we  want to know if it is chaotic. Typically, in
 501 many research papers neural network  are usually claimed to be chaotic
 502 without any  convincing mathematical proof. We propose  an approach to
 503 overcome  this  drawback  for  a  particular  category  of  multilayer
 504 perceptrons defined below, and for the Devaney's formulation of chaos.
 505 In  spite of  this restriction,  we think  that this  approach  can be
 506 extended to  a large variety  of neural networks.  We plan to  study a
 507 generalization of this approach in a future work.
 508
 509 We consider a multilayer perceptron  of the following form: inputs
 510 are $n$ binary digits and  one integer value, while outputs are  $n$
 511 bits.   Moreover, each  binary  output is  connected  with a  feedback
 512 connection to an input one.
 513
 514 \begin{itemize}
 515 \item During  initialization, the network is seeded with $n$~bits denoted
 516   $\left(x^0_1,\dots,x^0_n\right)$  and an  integer  value $S^0$  that
 517   belongs to $\llbracket1;n\rrbracket$.
 518 \item     At     iteration~$t$,     the     last     output     vector
 519   $\left(x^t_1,\dots,x^t_n\right)$   defines  the  $n$~bits   used  to
 520   compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
 521   While  the remaining  input receives  a new  integer value  $S^t \in
 522   \llbracket1;n\rrbracket$, which is provided by the outside world.
 523 \end{itemize}
 524
 525 The topological  behavior of these  particular neural networks  can be
 526 proven to be chaotic through the following process. Firstly, we denote
 527 by  $F:  \llbracket  1;n  \rrbracket \times  \mathds{B}^n  \rightarrow
 528 \mathds{B}^n$     the     function     that     maps     the     value
 529 $\left(s,\left(x_1,\dots,x_n\right)\right)    \in    \llbracket    1;n
 530 \rrbracket      \times      \mathds{B}^n$      into     the      value
 531 $\left(y_1,\dots,y_n\right)       \in       \mathds{B}^n$,       where
 532 $\left(y_1,\dots,y_n\right)$  is the  response of  the  neural network
 533 after    the    initialization     of    its    input    layer    with
 534 $\left(s,\left(x_1,\dots, x_n\right)\right)$.  Secondly, we define $f:
 535 \mathds{B}^n       \rightarrow      \mathds{B}^n$       such      that
 536 $f\left(x_1,x_2,\dots,x_n\right)$ is equal to
 537 \begin{equation}
 538 \left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
 539   F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
 540 \end{equation}
 541 Then $F=F_f$. If this recurrent  neural network is seeded with
 542 $\left(x_1^0,\dots,x_n^0\right)$    and   $S   \in    \llbracket   1;n
 543 \rrbracket^{\mathds{N}}$, it produces  exactly the
 544 same      output      vectors  than the
 545 chaotic iterations of $F_f$  with initial
 546 condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
 547 \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.
 548 Theoretically speaking, such iterations of $F_f$ are thus a formal model of
 549 these kind of recurrent neural networks. In the  rest  of this
 550 paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
 551 stands for ``Chaotic Iterations based MultiLayer Perceptron''.
 552
 553 Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
 554 definition  of  chaos  is  simple:  we  need just  to  verify  if  the
 555 associated graph  of iterations  $\Gamma(f)$ is strongly  connected or
 556 not. As  an incidental consequence,  we finally obtain  an equivalence
 557 between  chaotic  iterations   and  CI-MLP($f$).   Therefore,  we  can
 558 obviously  study such multilayer  perceptrons with  mathematical tools
 559 like  topology  to  establish,  for  example,  their  convergence  or,
 560 contrarily, their unpredictable behavior.   An example of such a study
 561 is given in the next section.
 562
 563 \section{Topological properties of chaotic neural networks}
 564 \label{S4}
 565
 566 Let us first recall  two fundamental definitions from the mathematical
 567 theory of chaos.
 568
 569 \begin{definition} \label{def8}
 570 A  function   $f$  is   said  to  be   {\bf  expansive}   if  $\exists
 571 \varepsilon>0$, $\forall  x \neq y$,  $\exists n \in  \mathds{N}$ such
 572 that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
 573 \end{definition}
 574
 575 \begin{definition} \label{def9}
 576 A discrete dynamical  system is said to be  {\bf topologically mixing}
 577 if  and only  if,  for any  pair  of disjoint  open  sets $U$,$V  \neq
 578 \emptyset$, we can find some $n_0  \in \mathds{N}$ such that  for any $n$,
 579 $n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
 580 \end{definition}
 581
 582 As  proven in Ref.~\cite{gfb10:ip},  chaotic iterations  are expansive
 583 and  topologically mixing when  $f$ is  the vectorial  negation $f_0$.
 584 Consequently,  these  properties are  inherited  by the  CI-MLP($f_0$)
 585 recurrent neural network previously presented, which induce a greater
 586 unpredictability.  Any  difference on the  initial value of  the input
 587 layer is  in particular  magnified up to  be equal to  the expansivity
 588 constant.
 589
 590 Let us then focus on the consequences for  a neural  network to  be chaotic
 591 according  to  Devaney's definition.  First  of  all, the  topological
 592 transitivity property implies indecomposability.
 593
 594 \begin{definition} \label{def10}
 595 A   dynamical   system   $\left(   \mathcal{X},  f\right)$   is   {\bf
 596 not decomposable}  if it  is not  the  union of  two closed  sets $A,  B
 597 \subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
 598 \end{definition}
 599
 600 \noindent Hence, reducing the set of outputs generated by CI-MLP($f$),
 601 in order to  simplify its complexity, is impossible  if $\Gamma(f)$ is
 602 strongly connected.  Moreover, under this  hypothesis CI-MLPs($f$) are
 603 strongly transitive:
 604
 605 \begin{definition} \label{def11}
 606 A  dynamical system  $\left( \mathcal{X},  f\right)$ is  {\bf strongly
 607 transitive} if $\forall x,y  \in \mathcal{X}$, $\forall r>0$, $\exists
 608 z  \in  \mathcal{X}$,  $d(z,x)~\leq~r  \Rightarrow  \exists  n  \in
 609 \mathds{N}^{\ast}$, $f^n(z)=y$.
 610 \end{definition}
 611 According to this definition, for all  pairs of points $(x, y)$ in the
 612 phase space, a point $z$ can  be found in the neighborhood of $x$ such
 613 that one of its iterates $f^n(z)$ is $y$. Indeed, this result has been
 614 established  during  the  proof   of  the  transitivity  presented  in
 615 Ref.~\cite{guyeux09}.  Among  other  things, the  strong  transitivity
 616 leads  to the fact  that without  the knowledge  of the  initial input
 617 layer, all outputs are possible.  Additionally, no point of the output
 618 space  can   be  discarded  when  studying  CI-MLPs:   this  space  is
 619 intrinsically complicated and it cannot be decomposed or simplified.
 620
 621 Furthermore, those  recurrent neural networks  exhibit the instability
 622 property:
 623 \begin{definition}
 624 A dynamical  system $\left( \mathcal{X}, f\right)$ is  unstable if for
 625 all  $x  \in  \mathcal{X}$,   the  orbit  $\gamma_x:n  \in  \mathds{N}
 626 \longmapsto f^n(x)$  is unstable,  that means: $\exists  \varepsilon >
 627 0$, $\forall  \delta>0$, $\exists y  \in \mathcal{X}$, $\exists  n \in
 628 \mathds{N}$,        such        that        $d(x,y)<\delta$        and
 629 $d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
 630 \end{definition}
 631 This property, which  is implied by the sensitive  point dependence on
 632 initial conditions, leads to the fact that in all neighborhoods of any
 633 point $x$, there are points that  can be apart by $\varepsilon$ in the
 634 future through iterations of the  CI-MLP($f$). Thus, we can claim that
 635 the behavior of  these MLPs is unstable when  $\Gamma (f)$ is strongly
 636 connected.
 637
 638 Let  us  now consider  a  compact  metric space  $(M,  d)$  and $f:  M
 639 \rightarrow M$  a continuous map. For  each natural number  $n$, a new
 640 metric $d_n$ is defined on $M$ by
 641 $$d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .$$
 642
 643 Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
 644 $\varepsilon$-close  with respect to  this metric  if their  first $n$
 645 iterates are $\varepsilon$-close.
 646
 647 This metric  allows one to distinguish  in a neighborhood  of an orbit
 648 the points  that move away from  each other during  the iteration from
 649 the points  that travel together.  A subset  $E$ of $M$ is  said to be
 650 $(n, \varepsilon)$-separated if each pair of distinct points of $E$ is
 651 at  least $\varepsilon$  apart in  the metric  $d_n$. Denote  by $H(n,
 652 \varepsilon)$     the    maximum     cardinality     of    an     $(n,
 653 \varepsilon)$-separated set,
 654 \begin{definition}
 655 The {\it topological entropy} of the  map $f$ is defined by (see e.g.,
 656 Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
 657 $$h(f)=\lim_{\varepsilon\to      0}      \left(\limsup_{n\to      \infty}
 658 \frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
 659 \end{definition}
 660
 661 Then we have the following result \cite{GuyeuxThese10},
 662 \begin{theorem}
 663 $\left( \mathcal{X},d\right)$  is compact and  the topological entropy
 664 of $(\mathcal{X},G_{f_0})$ is infinite.
 665 \end{theorem}
 666
 667 \begin{figure}
 668   \centering
 669   \includegraphics[scale=0.625]{scheme}
 670   \caption{Summary of addressed neural networks and chaos problems}
 671   \label{Fig:scheme}
 672 \end{figure}
 673
 674 The Figure~\ref{Fig:scheme} is a summary of addressed neural networks and chaos problems.
 675 Section~\ref{S2} has explained how to  construct a truly chaotic neural
 676 networks $A$ for instance.
 677 Section~\ref{S3} has shown how to check whether a  given MLP
 678 $A$ or $C$ is chaotic or not in the sens of Devaney.
 679 %, and  how to study its topological behavior.
 680 The last thing to  investigate, when comparing
 681 neural  networks   and  Devaney's  chaos,  is to  determine  whether
 682 an artificial neural network $A$  is able to learn or  predict some chaotic
 683 behaviors of $B$, as  it is defined  in the Devaney's formulation  (when they
 684 are not specifically constructed for this purpose).  This statement is
 685 studied in the next section.
 686
 687
 688
 689
 690
 691
 692
 693 \section{Suitability of Artificial Neural Networks
 694 for Predicting Chaotic Behaviors}
 695
 696 In  the context  of computer  science  different topic  areas have  an
 697 interest       in       chaos,       as       for       steganographic
 698 techniques~\cite{1309431,Zhang2005759}.    Steganography  consists  in
 699 embedding a  secret message within  an ordinary one, while  the secret
 700 extraction takes place once  at destination.  The reverse ({\it i.e.},
 701 automatically detecting the presence  of hidden messages inside media)
 702 is  called   steganalysis.   Among  the   deployed  strategies  inside
 703 detectors,          there         are          support         vectors
 704 machines~\cite{Qiao:2009:SM:1704555.1704664},                    neural
 705 networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
 706 Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
 707 detectors  give quite  good results  and are  rather  competitive when
 708 facing steganographic  tools.  However, to  the best of  our knowledge
 709 none of the considered information hiding schemes fulfills the Devaney
 710 definition  of chaos~\cite{Devaney}.  Indeed,  one can  wonder whether
 711 detectors  continue to  give good  results when  facing  truly chaotic
 712 schemes.  More  generally, there remains the open  problem of deciding
 713 whether artificial intelligence is suitable for predicting topological
 714 chaotic behaviors.
 715
 716 \subsection{Representing Chaotic Iterations for Neural Networks}
 717 \label{section:translation}
 718
 719 The  problem  of  deciding  whether  classical  feedforward  ANNs  are
 720 suitable  to approximate  topological chaotic  iterations may  then be
 721 reduced  to evaluate  ANNs on  iterations of  functions  with Strongly
 722 Connected  Component  (SCC)~graph  of  iterations.   To  compare  with
 723 non-chaotic  iterations,  the experiments  detailed  in the  following
 724 sections are  carried out  using both kinds  of function  (chaotic and
 725 non-chaotic). Let us emphasize on  the difference between this kind of
 726 neural   networks  and  the   Chaotic  Iterations   based  MultiLayer
 727 Perceptron.
 728
 729 We are  then left to compute  two disjoint function  sets that contain
 730 either functions  with topological chaos properties  or not, depending
 731 on  the strong  connectivity of  their iterations graph.  This  can be
 732 achieved for  instance by removing a  set of edges  from the iteration
 733 graph $\Gamma(f_0)$ of the vectorial negation function~$f_0$.  One can
 734 deduce whether  a function verifies the topological  chaos property or
 735 not  by checking  the strong  connectivity of  the resulting  graph of
 736 iterations.
 737
 738 For instance let us consider  the functions $f$ and $g$ from $\Bool^4$
 739 to $\Bool^4$ respectively defined by the following lists:
 740 $$[0,  0,  2,   3,  13,  13,  6,   3,  8,  9,  10,  11,   8,  13,  14,
 741   15]$$ $$\mbox{and } [11, 14, 13, 14, 11, 10, 1, 8, 7, 6, 5, 4, 3, 2,
 742   1, 0]  \enspace.$$ In  other words,  the image of  $0011$ by  $g$ is
 743 $1110$: it  is obtained as the  binary value of the  fourth element in
 744 the  second  list  (namely~14).   It   is  not  hard  to  verify  that
 745 $\Gamma(f)$ is  not SCC  (\textit{e.g.}, $f(1111)$ is  $1111$) whereas
 746 $\Gamma(g)$  is.  Next section  shows how  to translate  iterations of
 747 such functions into a model amenable to be learned by an ANN.
 748
 749 This  section  presents how  (not)  chaotic  iterations  of $G_f$  are
 750 translated  into  another  model  more  suited  to  artificial  neural
 751 networks.  Formally, input and output vectors are pairs~$((S^t)^{t \in
 752 \Nats},x)$ and $\left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right)$
 753 as defined in~Eq.~(\ref{eq:Gf}).
 754
 755 Firstly, let us focus on how to memorize configurations.  Two distinct
 756 translations are  proposed.  In the first  case, we take  one input in
 757 $\Bool$  per  component;  in   the  second  case,  configurations  are
 758 memorized  as   natural  numbers.    A  coarse  attempt   to  memorize
 759 configuration  as  natural  number  could  consist  in  labeling  each
 760 configuration  with  its  translation  into  decimal  numeral  system.
 761 However,  such a  representation induces  too many  changes  between a
 762 configuration  labeled by  a  power  of two  and  its direct  previous
 763 configuration: for instance, 16~(10000) and 15~(01111) are closed in a
 764 decimal ordering, but  their Hamming distance is 5.   This is why Gray
 765 codes~\cite{Gray47} have been preferred.
 766
 767 Let us secondly detail how to deal  with strategies.   Obviously,  it is  not
 768 possible to  translate in a finite  way an infinite  strategy, even if
 769 both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong to
 770 $\{1,\ldots,n\}^{\Nats}$.  Input strategies are then reduced to have a
 771 length  of size  $l  \in  \llbracket 2,k\rrbracket$,  where  $k$ is  a
 772 parameter of the evaluation. Notice  that $l$ is greater than or equal
 773 to $2$ since  we do not want the shift  $\sigma$~function to return an
 774 empty strategy.  Strategies are memorized as natural numbers expressed
 775 in base  $n+1$.  At  each iteration, either  none or one  component is
 776 modified  (among the  $n$ components)  leading to  a radix  with $n+1$
 777 entries.  Finally,  we give an  other input, namely $m  \in \llbracket
 778 1,l-1\rrbracket$, which  is the  number of successive  iterations that
 779 are applied starting  from $x$.  Outputs are translated  with the same
 780 rules.
 781
 782 To address  the complexity  issue of the  problem, let us  compute the
 783 size of the data set an ANN has to deal with.  Each input vector of an
 784 input-output pair  is composed of a configuration~$x$,  an excerpt $S$
 785 of the strategy to iterate  of size $l \in \llbracket 2, k\rrbracket$,
 786 and a  number $m \in  \llbracket 1, l-1\rrbracket$ of  iterations that
 787 are executed.
 788
 789 Firstly, there are $2^n$  configurations $x$, with $n^l$ strategies of
 790 size $l$ for  each of them. Secondly, for  a given configuration there
 791 are $\omega = 1 \times n^2 +  2 \times n^3 + \ldots+ (k-1) \times n^k$
 792 ways  of writing  the pair  $(m,S)$. Furthermore,  it is  not  hard to
 793 establish that
 794 \begin{equation}
 795 \displaystyle{(n-1) \times \omega = (k-1)\times n^{k+1} - \sum_{i=2}^k n^i} \nonumber
 796 \end{equation}
 797 then
 798 \begin{equation}
 799 \omega =
 800 \dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2} \enspace . \nonumber
 801 \end{equation}
 802 \noindent And then, finally, the number of  input-output pairs for our
 803 ANNs is
 804 $$
 805 2^n \times \left(\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2}\right) \enspace .
 806 $$
 807 For  instance, for $4$  binary components  and a  strategy of  at most
 808 $3$~terms we obtain 2304~input-output pairs.
 809
 810 \subsection{Experiments}
 811 \label{section:experiments}
 812
 813 To study  if chaotic iterations can  be predicted, we  choose to train
 814 the MultiLayer Perceptron.  As stated  before, this kind of network is
 815 in   particular    well-known   for   its    universal   approximation
 816 property. Furthermore,  MLPs have been already  considered for chaotic
 817 time series prediction.  For example, in~\cite{dalkiran10} the authors
 818 have shown that a feedforward  MLP with two hidden layers, and trained
 819 with Bayesian  Regulation back-propagation, can  learn successfully the
 820 dynamics of Chua's circuit.
 821
 822 In these experiments  we consider  MLPs having one hidden layer of
 823 sigmoidal  neurons  and  output   neurons  with  a  linear  activation
 824 function.     They    are    trained    using    the    Limited-memory
 825 Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
 826 with the Wolfe linear search.  The training process is performed until
 827 a maximum number of epochs  is reached.  To prevent overfitting and to
 828 estimate the  generalization performance we use  holdout validation by
 829 splitting the  data set into  learning, validation, and  test subsets.
 830 These subsets  are obtained through  random selection such  that their
 831 respective size represents 65\%, 10\%, and 25\% of the whole data set.
 832
 833 Several  neural  networks  are  trained  for  both  iterations  coding
 834 schemes.   In  both  cases   iterations  have  the  following  layout:
 835 configurations of  four components and  strategies with at  most three
 836 terms. Thus, for  the first coding scheme a data  set pair is composed
 837 of 6~inputs and 5~outputs, while for the second one it is respectively
 838 3~inputs and 2~outputs. As noticed at the end of the previous section,
 839 this  leads to  data sets  that  consist of  2304~pairs. The  networks
 840 differ  in the  size of  the hidden  layer and  the maximum  number of
 841 training epochs.  We remember that  to evaluate the ability  of neural
 842 networks to  predict a  chaotic behavior for  each coding  scheme, the
 843 trainings of two data sets, one of them describing chaotic iterations,
 844 are compared.
 845
 846 Thereafter we give,  for the different learning setups  and data sets,
 847 the  mean prediction  success rate  obtained for  each  output.  These
 848 values  are  computed  considering  10~trainings with  random  subsets
 849 construction,  weights  and  biases initialization.   Firstly,  neural
 850 networks having 10  and 25~hidden neurons are trained,  with a maximum
 851 number  of  epochs that  takes  its  value  in $\{125,250,500\}$  (see
 852 Tables~\ref{tab1}  and \ref{tab2}).   Secondly, we  refine  the second
 853 coding scheme by splitting the  output vector such that each output is
 854 learned by a specific  neural network (Table~\ref{tab3}). In this last
 855 case, we increase  the size of the hidden layer  up to 40~neurons, and
 856 we consider larger number of epochs.
 857
 858 \begin{table}[htbp!]
 859 \caption{Prediction success rates for configurations expressed as boolean vectors.}
 860 \label{tab1}
 861 \centering {\small
 862 \begin{tabular}{|c|c||c|c|c|}
 863 \hline
 864 \multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
 865 \hline
 866 \hline
 867 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
 868 \cline{3-5}
 869 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\
 870 \hline
 871 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 90.92\% & 91.75\% & 91.82\% \\
 872 & Output~(2) & 69.32\% & 78.46\% & 82.15\% \\
 873 & Output~(3) & 68.47\% & 78.49\% & 82.22\% \\
 874 & Output~(4) & 91.53\% & 92.37\% & 93.4\% \\
 875 & Config. & 36.10\% & 51.35\% & 56.85\% \\
 876 & Strategy~(5) & 1.91\% & 3.38\% & 2.43\% \\
 877 \hline
 878 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.64\% & 98.10\% & 98.20\% \\
 879 & Output~(2) & 95.15\% & 95.39\% & 95.46\% \\
 880 & Output~(3) & 100\% & 100\% & 100\% \\
 881 & Output~(4) & 97.47\% & 97.90\% & 97.99\% \\
 882 & Config. & 90.52\% & 91.59\% & 91.73\% \\
 883 & Strategy~(5) & 3.41\% & 3.40\% & 3.47\% \\
 884 \hline
 885 \hline
 886 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{25 neurons} \\ %& \multicolumn{3}{|c|}{40 neurons} \\
 887 \cline{3-5}
 888 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ %& 125 & 250 & 500 \\
 889 \hline
 890 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 91.65\% & 92.69\% & 93.93\% \\ %& 91.94\% & 92.89\% & 94.00\% \\
 891 & Output~(2) & 72.06\% & 88.46\% & 90.5\% \\ %& 74.97\% & 89.83\% & 91.14\% \\
 892 & Output~(3) & 79.19\% & 89.83\% & 91.59\% \\ %& 76.69\% & 89.58\% & 91.84\% \\
 893 & Output~(4) & 91.61\% & 92.34\% & 93.47\% \\% & 82.77\% & 92.93\% & 93.48\% \\
 894 & Config. & 48.82\% & 67.80\% & 70.97\% \\%& 49.46\% & 68.94\% & 71.11\% \\
 895 & Strategy~(5) & 2.62\% & 3.43\% & 3.78\% \\% & 3.10\% & 3.10\% & 3.03\% \\
 896 \hline
 897 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.87\% & 97.99\% & 98.03\% \\ %& 98.16\% \\
 898 & Output~(2) & 95.46\% & 95.84\% & 96.75\% \\ % & 97.4\% \\
 899 & Output~(3) & 100\% & 100\% & 100\% \\%& 100\% \\
 900 & Output~(4) & 97.77\% & 97.82\% & 98.06\% \\%& 98.31\% \\
 901 & Config. & 91.36\% & 91.99\% & 93.03\% \\%& 93.98\% \\
 902 & Strategy~(5) & 3.37\% & 3.44\% & 3.29\% \\%& 3.23\% \\
 903 \hline
 904 \end{tabular}
 905 }
 906 \end{table}
 907
 908 Table~\ref{tab1}  presents the  rates  obtained for  the first  coding
 909 scheme.   For  the chaotic  data,  it can  be  seen  that as  expected
 910 configuration  prediction becomes  better  when the  number of  hidden
 911 neurons and maximum  epochs increases: an improvement by  a factor two
 912 is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
 913 25~neurons  and  500~epochs). We  also  notice  that  the learning  of
 914 outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
 915 non-chaotic  case the  simplest training  setup is  enough  to predict
 916 configurations.   For  all  network  topologies and  all  outputs  the
 917 obtained  results  for the  non-chaotic  case  outperform the  chaotic
 918 ones. Finally,  the rates for  the strategies show that  the different
 919 networks are unable to learn them.
 920
 921 For the second coding  scheme (\textit{i.e.}, with Gray Codes)
 922 Table~\ref{tab2} shows that any network
 923 learns about  five times more non-chaotic  configurations than chaotic
 924 ones. As in the previous scheme, the strategies cannot be predicted.
 925
 926 Let us now compare the  two coding schemes. Firstly, the second scheme
 927 disturbs  the   learning  process.   In   fact  in  this   scheme  the
 928 configuration is always expressed as  a natural number, whereas in the
 929 first one  the number  of inputs follows  the increase of  the boolean
 930 vectors coding configurations. In this latter case, the coding gives a
 931 finer information on configuration evolution.
 932
 933 \begin{table}[b]
 934 \caption{Prediction success rates for configurations expressed with Gray code}
 935 \label{tab2}
 936 \centering
 937 \begin{tabular}{|c|c||c|c|c|}
 938 \hline
 939 \multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
 940 \hline
 941 \hline
 942 & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
 943 \cline{2-5}
 944 & Epochs & 125 & 250 & 500 \\ %& 1000
 945 \hline
 946 \multirow{2}{*}{Chaotic}& Config.~(1) & 13.29\% & 13.55\% & 13.08\% \\ %& 12.5\%
 947 & Strategy~(2) & 0.50\% & 0.52\% & 1.32\% \\ %& 1.42\%
 948 \hline
 949 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 77.12\% & 74.00\% & 72.60\% \\ %& 75.81\%
 950 & Strategy~(2) & 0.42\% & 0.80\% & 1.16\% \\ %& 1.42\%
 951 \hline
 952 \hline
 953 & Hidden neurons & \multicolumn{3}{c|}{25 neurons} \\
 954 \cline{2-5}
 955 & Epochs & 125 & 250 & 500 \\ %& 1000
 956 \hline
 957 \multirow{2}{*}{Chaotic}& Config.~(1) & 12.27\% & 13.15\% & 13.05\% \\ %& 15.44\%
 958 & Strategy~(2) & 0.71\% & 0.66\% & 0.88\% \\ %& 1.73\%
 959 \hline
 960 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 73.60\% & 74.70\% & 75.89\% \\ %& 68.32\%
 961 & Strategy~(2) & 0.64\% & 0.97\% & 1.23\% \\ %& 1.80\%
 962 \hline
 963 \end{tabular}
 964 \end{table}
 965
 966 Unfortunately, in  practical applications the number  of components is
 967 usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
 968 systematically.   Therefore, we  provide  a refinement  of the  second
 969 scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
 970 presents the  results for  this approach.  In  any case,  whatever the
 971 network  topologies,  the  maximum   epoch  number  and  the  kind  of
 972 iterations,  the  configuration  success  rate is  slightly  improved.
 973 Moreover, the strategies predictions  rates reach almost 12\%, whereas
 974 in  Table~\ref{tab2}  they  never   exceed  1.5\%.   Despite  of  this
 975 improvement, a long term prediction of chaotic iterations still
 976 appear to be
 977 an open issue.
 978
 979 \begin{table}
 980 \caption{Prediction success rates for split outputs.}
 981 \label{tab3}
 982 \centering
 983 \begin{tabular}{|c||c|c|c|}
 984 \hline
 985 \multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
 986 \hline
 987 \hline
 988 Epochs & 125 & 250 & 500 \\
 989 \hline
 990 \hline
 991 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
 992 \hline
 993 10~neurons & 12.39\% & 14.06\% & 14.32\% \\
 994 25~neurons & 13.00\% & 14.28\% & 14.58\% \\
 995 40~neurons & 11.58\% & 13.47\% & 14.23\% \\
 996 \hline
 997 \hline
 998 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
 999 \cline{2-4}
1000 %Epochs & 125 & 250 & 500 \\
1001 \hline
1002 10~neurons & 76.01\% & 74.04\% & 78.16\% \\
1003 25~neurons & 76.60\% & 72.13\% & 75.96\% \\
1004 40~neurons & 76.34\% & 75.63\% & 77.50\% \\
1005 \hline
1006 \hline
1007 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1008 \cline{2-4}
1009 %Epochs & 125 & 250 & 500 \\
1010 \hline
1011 10~neurons & 0.76\% & 0.97\% & 1.21\% \\
1012 25~neurons & 1.09\% & 0.73\% & 1.79\% \\
1013 40~neurons & 0.90\% & 1.02\% & 2.15\% \\
1014 \hline
1015 \multicolumn{4}{c}{} \\
1016 \hline
1017 Epochs & 1000 & 2500 & 5000 \\
1018 \hline
1019 \hline
1020 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1021 \hline
1022 10~neurons & 14.51\% & 15.22\% & 15.22\% \\
1023 25~neurons & 16.95\% & 17.57\% & 18.46\% \\
1024 40~neurons & 17.73\% & 20.75\% & 22.62\% \\
1025 \hline
1026 \hline
1027 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1028 \cline{2-4}
1029 %Epochs & 1000 & 2500 & 5000 \\
1030 \hline
1031 10~neurons & 78.98\% & 80.02\% & 79.97\% \\
1032 25~neurons & 79.19\% & 81.59\% & 81.53\% \\
1033 40~neurons & 79.64\% & 81.37\% & 81.37\% \\
1034 \hline
1035 \hline
1036 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1037 \cline{2-4}
1038 %Epochs & 1000 & 2500 & 5000 \\
1039 \hline
1040 10~neurons & 3.47\% & 9.98\% & 11.66\% \\
1041 25~neurons & 3.92\% & 8.63\% & 10.09\% \\
1042 40~neurons & 3.29\% & 7.19\% & 7.18\% \\
1043 \hline
1044 \end{tabular}
1045 \end{table}
1046
1047 \section{Conclusion}
1048
1049 In  this paper,  we have  established an  equivalence  between chaotic
1050 iterations,  according to  the Devaney's  definition of  chaos,  and a
1051 class  of  multilayer perceptron  neural  networks.  Firstly, we  have
1052 described how to build a neural network that can be trained to learn a
1053 given chaotic map function.  Then,  we found a condition that allow to
1054 check whether the iterations induced by a function are chaotic or not,
1055 and thus  if a chaotic map  is obtained. Thanks to  this condition our
1056 approach is not limited to a particular function. In the dual case, we
1057 show  that  checking  if  a  neural network  is  chaotic  consists  in
1058 verifying  a property  on an  associated  graph, called  the graph  of
1059 iterations.   These results  are valid  for recurrent  neural networks
1060 with a  particular architecture.  However,  we believe that  a similar
1061 work can be done for  other neural network architectures.  Finally, we
1062 have  discovered at  least one  family of  problems with  a reasonable
1063 size, such  that artificial neural  networks should not be  applied in
1064 the  presence  of chaos,  due  to  their  inability to  learn  chaotic
1065 behaviors in this  context.  Such a consideration is  not reduced to a
1066 theoretical detail:  this family of discrete  iterations is concretely
1067 implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
1068 steganographic   detectors  embed  tools   like  neural   networks  to
1069 distinguish between  original and stego contents, our  studies tend to
1070 prove that such  detectors might be unable to  tackle with chaos-based
1071 information hiding schemes.  Furthermore, iterations such that not all
1072 of  the  components  are updated  at  each  step  are very  common  in
1073 biological  and  physics mechanisms.   Therefore,  one can  reasonably
1074 wonder whether neural networks should be applied in these contexts.
1075
1076 In  future  work we  intend  to  enlarge  the comparison  between  the
1077 learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
1078 computational intelligence tools such  as support vector machines will
1079 be investigated  too, to  discover which tools  are the  most relevant
1080 when facing a truly chaotic phenomenon.  A comparison between learning
1081 rate  success  and  prediction  quality will  be  realized.   Concrete
1082 consequences in biology, physics, and computer science security fields
1083 will be  stated.  Lastly,  thresholds separating systems  depending on
1084 the ability to learn their dynamics will be established.
1085
1086 \bibliography{chaos-paper}% Produces the bibliography via BibTeX.
1087
1088 \end{document}
1089 %
1090 % ****** End of file chaos-paper.tex ******