main.tex

   1 % ****** Start of file chaos-paper.tex ******
   2 %
   3 \documentclass[%
   4 aip,
   5 cha,% long,
   6 numerical bibliography, %(default)
   7 %jcp,% short, numerical bibliography,
   8 %jmp,
   9 amsmath,amssymb,
  10 preprint,%
  11 %reprint,%
  12 %author-year,%
  13 %author-numerical,%
  14 ]{revtex4-1}
  15
  16 \usepackage{graphicx}% Include figure files
  17 \usepackage{dcolumn}% Align table columns on decimal point
  18 \usepackage{bm}% bold math
  19 \usepackage{color}
  20 \usepackage{ulem}
  21 \usepackage[mathlines]{lineno}% Enable numbering of text and display math
  22 \linenumbers\relax % Commence numbering lines
  23 % The amssymb package provides various useful mathematical symbols
  24 \usepackage{amssymb}
  25 % The amsthm package provides various useful mathematical fonts
  26 \usepackage{amsfonts}
  27 % Various mathematical packages
  28 \PassOptionsToPackage{cmex10draft}{graphics}
  29 \usepackage[standard]{ntheorem}
  30 % Various packages
  31 \usepackage{stmaryrd} % llbracket
  32 \usepackage{dsfont}   % mathds
  33 \usepackage{multirow} % tables
  34
  35 \newcommand{\ie}{\textit{i.e.}}
  36 \newcommand{\Nats}[0]{\ensuremath{\mathds{N}}}
  37 \newcommand{\R}[0]{\ensuremath{\mathds{R}}}
  38 \newcommand{\Bool}[0]{\ensuremath{\mathds{B}}}
  39
  40 \begin{document}
  41
  42 \title[Recurrent Neural Networks and Chaos]{Recurrent Neural Networks
  43 and Chaos: Construction, Evaluation, \\
  44 and Prediction Ability}
  45
  46 \author{Jacques M. Bahi}
  47 \author{Jean-Fran\c{c}ois Couchot}
  48 \author{Christophe Guyeux}
  49 \email{christophe.guyeux@univ-fcomte.fr.}
  50 \author{Michel Salomon}
  51 \altaffiliation[Authors in ]{alphabetic order}
  52 \affiliation{
  53 Computer Science Laboratory (LIFC), University of Franche-Comt\'e, \\
  54 IUT de Belfort-Montb\'eliard, BP 527, \\
  55 90016 Belfort Cedex, France
  56 }
  57
  58 \date{\today}
  59
  60 \newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
  61
  62 \begin{abstract}
  63 %% Text of abstract
  64 Many  research works  deal with  chaotic neural  networks  for various
  65 fields  of application. Unfortunately,  up to  now these  networks are
  66 usually  claimed to be  chaotic without  any mathematical  proof.  The
  67 purpose of this paper is to establish, based on a rigorous theoretical
  68 framework, an equivalence between  chaotic iterations according to the
  69 Devaney's  formulation  of chaos  and  a  particular  class of  neural
  70 networks. On the one hand we show  how to build such a network, on the
  71 other  hand we provide  a method  to check  if a  neural network  is a
  72 chaotic one.  Finally, the ability of classical feedforward multilayer
  73 perceptrons to  learn sets of  data obtained from a  chaotic dynamical
  74 system is regarded.  Various  Boolean functions are iterated on finite
  75 states, some  of them  are proven to  be chaotic  as it is  defined by
  76 Devaney.  In that context, important differences occur in the training
  77 process,  establishing  with  various  neural  networks  that  chaotic
  78 behaviors are far more difficult to learn.
  79 \end{abstract}
  80
  81 %%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
  82                              % Classification Scheme.
  83 %%\keywords{Suggested keywords}%Use showkeys class option if keyword
  84                               %display desired
  85 \maketitle
  86
  87 \begin{quotation}
  88 Chaotic neural  networks have received a  lot of attention  due to the
  89 appealing   properties  of   deterministic   chaos  (unpredictability,
  90 sensitivity,  and so  on). However,  such networks  are  often claimed
  91 chaotic without  any rigorous mathematical proof.   Therefore, in this
  92 work  a theoretical  framework based  on the  Devaney's  definition of
  93 chaos  is introduced.   Starting with  a relationship  between chaotic
  94 iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
  95 recurrent  neural networks  that is  equivalent to  a chaotic  map and
  96 secondly  a way  to check  whether  an already  available network,  is
  97 chaotic  or not.  We also  study different  topological  properties of
  98 these  truly  chaotic neural  networks.   Finally,  we  show that  the
  99 learning,  with neural  networks  having a  feedforward structure,  of
 100 chaotic behaviors represented by data sets obtained from chaotic maps,
 101 is far more difficult than non chaotic behaviors.
 102 \end{quotation}
 103
 104 %% Main text
 105 \section{Introduction}
 106 \label{S1}
 107
 108 Several research  works have proposed or used  chaotic neural networks
 109 these last years. This interest  comes from their complex dynamics and
 110 the various  potential application areas. Chaotic  neural networks are
 111 particularly     considered    to    build     associative    memories
 112 \cite{Crook2007267}  and digital  security tools  like  hash functions
 113 \cite{Xiao10},  digital  watermarking \cite{1309431,Zhang2005759},  or
 114 cipher schemes \cite{Lian20091296}.  In the first case, the background
 115 idea is to  control chaotic dynamics in order  to store patterns, with
 116 the  key advantage  of offering  a  large storage  capacity.  For  the
 117 computer security field,  the use of chaotic dynamics  is motivated by
 118 their    unpredictability   and    random-like    behaviors.   Indeed,
 119 investigating  new concepts  is  crucial in  this  field, because  new
 120 threats  are constantly  emerging.   As an  illustrative example,  the
 121 former  standard in hash  functions, namely  the SHA-1  algorithm, has
 122 been recently weakened after flaws were discovered.
 123
 124 Chaotic neural networks have  been built with different approaches. In
 125 the context of associative  memory, chaotic neurons like the nonlinear
 126 dynamic state neuron \cite{Crook2007267}  are frequently used to build
 127 the network. These neurons have an inherent chaotic behavior, which is
 128 usually assessed through the computation of the Lyapunov exponent.  An
 129 alternative  approach  is  to  consider a  well-known  neural  network
 130 architecture: the  MultiLayer Perceptron.  MLP  networks are basically
 131 used  to model  nonlinear  relationships between  data,  due to  their
 132 universal approximator  capacity. Thus, this  kind of networks  can be
 133 trained to  model a  physical phenomenon known  to be chaotic  such as
 134 Chua's circuit  \cite{dalkiran10}.  Sometimes, a  neural network which
 135 is build using transfer functions and initial conditions that are both
 136 chaotic,      is      itself      claimed      to      be      chaotic
 137 \cite{springerlink:10.1007/s00521-010-0432-2}.
 138
 139 What all of these chaotic neural  networks have in common is that they
 140 are claimed to be chaotic  despite a lack of any rigorous mathematical
 141 proof.   The purpose  of  this paper  is  to fill  this  gap, using  a
 142 theoretical  framework  based on  the  Devaney's  definition of  chaos
 143 \cite{Devaney}.   This  mathematical  theory  of chaos  provides  both
 144 qualitative and quantitative tools to evaluate the complex behavior of
 145 a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
 146 precisely, in  this paper,  which is an  extension of a  previous work
 147 \cite{bgs11:ip},   we  establish   an   equivalence  between   chaotic
 148 iterations and  a class of globally  recurrent multilayer perceptrons.
 149 We  investigate the  converse problem  too, that  is, the  ability for
 150 classical MultiLayer Perceptrons (MLP) to learn a particular family of
 151 discrete  chaotic  dynamical  systems.   This family,  called  chaotic
 152 iterations, is defined by a  Boolean vector, an update function, and a
 153 sequence giving  the component  to update at  each iteration.   It has
 154 been  previously established  that such  dynamical systems  can behave
 155 chaotically, as it is defined by Devaney, when the chosen function has
 156 a  strongly connected  iterations graph.   In this  document,  we will
 157 experiment several MLPs and try to learn some iterations of this kind.
 158 We will show that non-chaotic iterations can be learned, whereas it is
 159 far  more  difficult  for chaotic  ones.   That  is  to say,  we  have
 160 discovered at  least one  family of problems  with a  reasonable size,
 161 such  that artificial  neural networks  should not  be applied  in the
 162 presence of chaos,  due to their inability to  learn chaotic behaviors
 163 in this context.
 164
 165 The remainder of this research  work is organized as follows. The next
 166 section is devoted  to the basics of chaotic  iterations and Devaney's
 167 chaos.   Section~\ref{S2} formally  describes  how to  build a  neural
 168 network  that operates  chaotically.  The  following  two~sections are
 169 devoted to the dual case:  checking whether an existing neural network
 170 is chaotic  or not (Section  \ref{S3}), and discussion  on topological
 171 properties  of   chaotic  neural  networks   (Section~\ref{S4}).   The
 172 Section~\ref{section:translation}   shows   how   to  translate   such
 173 iterations  into  an Artificial  Neural  Network  (ANN),  in order  to
 174 evaluate the  capability for this  latter to learn  chaotic behaviors.
 175 This  ability  is  studied in  Sect.~\ref{section:experiments},  where
 176 various ANNs try to learn two  sets of data: the first one is obtained
 177 by chaotic iterations while the  second one results from a non-chaotic
 178 system.  Prediction success rates are  given and discussed for the two
 179 sets.  The paper ends with a conclusion section where our contribution
 180 is summed up and intended future work is exposed.
 181
 182 \section{Link between Chaotic Iterations and Devaney's Chaos}
 183
 184 In this section, the well-established notion of Devaney's mathematical
 185 chaos is  firstly presented.  Preservation of  the unpredictability of
 186 such dynamical  system when implemented  on a computer is  obtained by
 187 using  some discrete iterations  called ``chaotic  iterations'', which
 188 are thus introduced.  The result establishing the link between chaotic
 189 iterations and Devaney's chaos is  finally recalled at the end of this
 190 section.
 191
 192 In what follows and for  any function $f$, $f^n$ means the composition
 193 $f \circ f \circ \hdots \circ f$ ($n$ times).
 194
 195 \subsection{Devaney's chaotic dynamical systems}
 196
 197 Various domains such as  physics, biology, or economy, contain systems
 198 that exhibit a chaotic behavior,  a well-known example is the weather.
 199 These  systems   are  in   particular  highly  sensitive   to  initial
 200 conditions, a concept usually presented as the butterfly effect: small
 201 variations in the initial conditions possibly lead to widely different
 202 behaviors.  Theoretically speaking, a  system is sensitive if for each
 203 point  $x$ in  the  iteration space,  one  can find  a  point in  each
 204 neighborhood of $x$ having a significantly different future evolution.
 205 Conversely, a  system seeded with  the same initial  conditions always
 206 has  the same  evolution.   In  other words,  chaotic  systems have  a
 207 deterministic  behavior  defined through  a  physical or  mathematical
 208 model  and a  high  sensitivity to  the  initial conditions.   Besides
 209 mathematically this  kind of unpredictability  is also referred  to as
 210 deterministic chaos.  For example, many weather forecast models exist,
 211 but they give only suitable predictions for about a week, because they
 212 are initialized with conditions  that reflect only a partial knowledge
 213 of the current weather.  Even  if initially the differences are small,
 214 they are  amplified in the course  of time, and thus  make difficult a
 215 long-term prediction.  In fact,  in a chaotic system, an approximation
 216 of  the current  state is  a  quite useless  indicator for  predicting
 217 future states.
 218
 219 From  mathematical  point  of   view,  deterministic  chaos  has  been
 220 thoroughly studied  these last decades, with  different research works
 221 that  have   provide  various  definitions  of   chaos.   Among  these
 222 definitions,   the  one  given   by  Devaney~\cite{Devaney}   is  well
 223 established.    This   definition   consists  of   three   conditions:
 224 topological  transitivity, density of  periodic points,  and sensitive
 225 point dependence on initial conditions.
 226
 227 Topological  transitivity   is  checked  when,  for   any  point,  any
 228 neighborhood of its future evolution eventually overlap with any other
 229 given region. More precisely,
 230
 231 \begin{definition} \label{def2}
 232 A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
 233 is defined  to be  {\bf topologically transitive}  if for any  pair of
 234 open  sets $U$,  $V  \in  \mathcal{X}$ there  exists  $k>0$ such  that
 235 $f^k(U) \cap V \neq \emptyset$.
 236 \end{definition}
 237
 238 This property  implies that a  dynamical system cannot be  broken into
 239 simpler  subsystems.  It  is intrinsically  complicated and  cannot be
 240 simplified.  On  the contrary,  a dense set  of periodic points  is an
 241 element of regularity that a chaotic dynamical system has to exhibit.
 242
 243 \begin{definition} \label{def3}
 244 A point $x$ is called a  {\bf periodic point} for $f$ of period~$n \in
 245 \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
 246 \end{definition}
 247
 248 \begin{definition} \label{def4}
 249 $f$ is said to be {\bf  regular} on $(\mathcal{X},\tau)$ if the set of
 250   periodic points  for $f$ is  dense in $\mathcal{X}$ ($\forall  x \in
 251   \mathcal{X}$, we can find at least  one periodic point in any of its
 252   neighborhood).
 253 \end{definition}
 254
 255 This  regularity ``counteracts'' the  effects of  transitivity.  Thus,
 256 due to these two properties, two points close to each other can behave
 257 in a completely different  manner, leading to unpredictability for the
 258 whole system. Then,
 259
 260 \begin{definition} \label{sensitivity}
 261 $f$  has {\bf  sensitive dependence  on initial  conditions}  if there
 262   exists $\delta  >0$ such  that, for any  $x\in \mathcal{X}$  and any
 263   neighborhood $V$ of $x$, there exist  $y\in V$ and $n > 0$ such that
 264   $d\left(f^{n}(x), f^{n}(y)\right) >\delta  $. $\delta$ is called the
 265   {\bf constant of sensitivity} of $f$.
 266 \end{definition}
 267
 268 Finally,
 269
 270 \begin{definition} \label{def5}
 271 $f$ is  {\bf chaotic according to Devaney}  on $(\mathcal{X},\tau)$ if
 272   $f$  is   regular,  topologically  transitive,   and  has  sensitive
 273   dependence to its initial conditions.
 274 \end{definition}
 275
 276 Let us notice that for a  metric space the last condition follows from
 277 the two first ones~\cite{Banks92}.
 278
 279 \subsection{Chaotic Iterations}
 280
 281 This section  presents some basics on  topological chaotic iterations.
 282 Let us  firstly discuss about the  domain of iteration.  As  far as we
 283 know, no result rules that the chaotic behavior of a function that has
 284 been theoretically proven on  $\R$ remains valid on the floating-point
 285 numbers, which is  the implementation domain.  Thus, to  avoid loss of
 286 chaos this  work presents an  alternative, that is to  iterate Boolean
 287 maps:  results that  are  theoretically obtained  in  that domain  are
 288 preserved in implementations.
 289
 290 Let us denote by $\llbracket  a ; b \rrbracket$ the following interval
 291 of integers:  $\{a, a+1,  \hdots, b\}$, where $a~<~b$.  A  {\it system}
 292 under   consideration    iteratively   modifies   a    collection   of
 293 $n$~components.   Each component  $i \in  \llbracket 1;  n \rrbracket$
 294 takes  its  value  $x_i$  among the  domain  $\Bool=\{0,1\}$.   A~{\it
 295   configuration} of the system at discrete time $t$ (also said at {\it
 296   iteration} $t$) is the vector
 297 %\begin{equation*}
 298 $x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
 299 %\end{equation*}
 300 The dynamics of the system is  described according to a  function $f :
 301 \Bool^n \rightarrow \Bool^n$ such that
 302 %\begin{equation*}
 303 $f(x)=(f_1(x),\ldots,f_n(x))$.
 304 %\end{equation*}
 305 % Notice that $f^k$ denotes the
 306 % $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
 307
 308 Let    be   given    a    configuration   $x$.    In   what    follows
 309 $N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$  is  the configuration
 310 obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
 311 indeed  the negation  of $x_i$).   Intuitively, $x$  and  $N(i,x)$ are
 312 neighbors.   The discrete  iterations of  $f$ are  represented  by the
 313 oriented  {\it graph  of iterations}  $\Gamma(f)$.  In  such  a graph,
 314 vertices are configurations  of $\Bool^n$ and there is  an arc labeled
 315 $i$ from $x$ to $N(i,x)$ iff $f_i(x)$ is $N(i,x)$.
 316
 317 In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
 318 sequence  defining the  component to  update at  time $t$  and $S^{t}$
 319 denotes its  $t-$th term.  We  introduce the function $F_{f}$  that is
 320 defined  for  any given  application  $f:\Bool^{n}  \to \Bool^{n}$  by
 321 $F_{f}:   \llbracket1;n\rrbracket\times   \mathds{B}^{n}   \rightarrow
 322 \mathds{B}^{n}$, s.t.
 323 \begin{equation}
 324 \label{eq:CIs}
 325   F_{f}(s,x)_j  =
 326     \left\{
 327     \begin{array}{l}
 328     f_j(x) \textrm{ if } j= s \enspace , \\
 329     x_{j} \textrm{ otherwise} \enspace .
 330     \end{array}
 331     \right.
 332 \end{equation}
 333
 334 \noindent With such a notation, configurations are defined for times
 335 $t=0,1,2,\ldots$ by:
 336 \begin{equation}\label{eq:sync}
 337 \left\{\begin{array}{l}
 338   x^{0}\in \mathds{B}^{n} \textrm{ and}\\
 339   x^{t+1}=F_{f}(S^t,x^{t}) \enspace .
 340 \end{array}\right.
 341 \end{equation}
 342
 343 \noindent  Finally, iterations defined  in Eq.~(\ref{eq:sync})  can be
 344 described by the following system:
 345 \begin{equation}
 346 \left\{
 347 \begin{array}{lll}
 348 X^{0} & = &  ((S^t)^{t \in \Nats},x^0) \in
 349 \llbracket1;n\rrbracket^\Nats \times \Bool^{n}\\
 350 X^{k+1}& = & G_{f}(X^{k})\\
 351 \multicolumn{3}{c}{\textrm{where } G_{f}\left(((S^t)^{t \in \Nats},x)\right)
 352 = \left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right) \enspace ,}
 353 \end{array}
 354 \right.
 355 \label{eq:Gf}
 356 \end{equation}
 357 where  $\sigma$ is the  function that  removes the  first term  of the
 358 strategy  ({\it i.e.},~$S^0$).   This definition  means that  only one
 359 component  of the  system is  updated  at an  iteration: the  $S^t$-th
 360 element.  But it can be extended by considering subsets for $S^t$.
 361
 362 Let us finally remark that, despite the use of the term {\it chaotic},
 363 there is {\it  priori} no connection between these  iterations and the
 364 mathematical theory of chaos presented previously.
 365
 366 \subsection{Chaotic Iterations and Devaney's Chaos}
 367
 368 In  this subsection  we recall  the link  we have  established between
 369 chaotic iterations and Devaney's  chaos.  The theoretical framework is
 370 fully described in \cite{guyeux09}.
 371
 372 The   {\bf   distance}   $d$    between   two   points   $(S,x)$   and
 373 $(\check{S},\check{x})\in  \mathcal{X} = \llbracket1;n\rrbracket^\Nats
 374 \times \Bool^{n}$ is defined by
 375 \begin{equation}
 376 d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
 377 \enspace ,
 378 \end{equation}
 379 where
 380 \begin{equation}
 381 d_{e}(x,\check{x})=\sum_{j=1}^{n}\Delta
 382 (x_{j},\check{x}_{j}) \in \llbracket  0 ; n \rrbracket
 383 \end{equation}
 384 \noindent and
 385 \begin{equation}
 386 d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
 387 }\frac{|S^{t}-\check{S}^{t}|}{10^{t+1}} \in [0 ; 1] \enspace .
 388 \end{equation}
 389
 390 This    distance    is    defined    to    reflect    the    following
 391 information. Firstly, the more  two systems have different components,
 392 the  larger the  distance  between them.  Secondly,  two systems  with
 393 similar components and strategies, which have the same starting terms,
 394 must  induce only  a small  distance.  The  proposed  distance fulfill
 395 these  requirements: on  the one  hand  its floor  value reflects  the
 396 difference between  the cells, on  the other hand its  fractional part
 397 measure the difference between the strategies.
 398
 399 The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
 400 path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
 401 strategy  $s$ such  that  the  parallel iteration  of  $G_f$ from  the
 402 initial  point $(s,x)$  reaches  the configuration  $x'$.  Using  this
 403 link, Guyeux~\cite{GuyeuxThese10} has proven that,
 404 \begin{theorem}%[Characterization of $\mathcal{C}$]
 405 \label{Th:Caracterisation   des   IC   chaotiques}
 406 Let $f:\Bool^n\to\Bool^n$. $G_f$ is chaotic  (according to  Devaney)
 407 if and only if  $\Gamma(f)$ is strongly connected.
 408 \end{theorem}
 409
 410 Checking  if a  graph  is  strongly connected  is  not difficult.  For
 411 example,  consider the  function $f_1\left(x_1,\dots,x_n\right)=\left(
 412 \overline{x_1},x_1,x_2,\dots,x_\mathsf{n}\right)$. As $\Gamma(f_1)$ is
 413 obviously strongly connected, then $G_{f_1}$ is a chaotic map.
 414
 415 With this  material, we are now  able to build a  first chaotic neural
 416 network, as defined in the Devaney's formulation.
 417
 418 \section{A chaotic neural network in the sense of Devaney}
 419 \label{S2}
 420
 421 Let     us     firstly     introduce    the     vectorial     negation
 422 $f_{0}(x_{1},\dots,x_{n}) =(\overline{x_{1}},\dots,\overline{x_{n}})$,
 423 which is  such that $\Gamma(f_0)$ is  strongly connected.  Considering
 424 the map  $F_{f_0}:\llbracket 1;  n \rrbracket \times  \mathds{B}^n \to
 425 \mathds{B}^n$  associated to the  vectorial negation,  we can  build a
 426 multilayer perceptron  neural network modeling  $F_{f_0}$.  Hence, for
 427 all inputs  $(s,x) \in \llbracket  1;n\rrbracket \times \mathds{B}^n$,
 428 the output layer will produce  $F_{f_0}(s,x)$.  It is then possible to
 429 link  the output  layer  and the  input  one, in  order  to model  the
 430 dependence between two successive iterations.  As a result we obtain a
 431 global  recurrent   neural  network  that  behaves   as  follows  (see
 432 Fig.~\ref{Fig:perceptron}).
 433
 434 \begin{itemize}
 435 \item   The   network   is   initialized   with   the   input   vector
 436   $\left(S^0,x^0\right)    \in    \llbracket   1;n\rrbracket    \times
 437   \mathds{B}^n$      and      computes      the     output      vector
 438   $x^1=F_{f_0}\left(S^0,x^0\right)$. This last  vector is published as
 439   an output one of the chaotic  neural network and is sent back to the
 440   input layer through the feedback links.
 441 \item When  the network  is activated at  the $t^{th}$  iteration, the
 442   state of the system $x^t  \in \mathds{B}^n$ received from the output
 443   layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
 444   ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
 445   output.   This new  output, which  represents the  new state  of the
 446   dynamical system, satisfies:
 447   \begin{equation}
 448     x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
 449   \end{equation}
 450 \end{itemize}
 451
 452 \begin{figure}
 453   \centering
 454   \includegraphics[scale=0.625]{perceptron}
 455   \caption{A perceptron equivalent to chaotic iterations}
 456   \label{Fig:perceptron}
 457 \end{figure}
 458
 459 The behavior of the neural network is such that when the initial state
 460 is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
 461 given  as outside  input, then  the sequence  of  successive published
 462 output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
 463 the  one produced  by  the chaotic  iterations  formally described  in
 464 Eq.~(\ref{eq:CIs}).  It  means that  mathematically if we  use similar
 465 input  vectors   they  both  generate  the   same  successive  outputs
 466 $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
 467 are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
 468 $\mathcal{X}$. Finally, since the  proposed neural network is build to
 469 model  the  behavior  of  $G_{f_0}$,  which is  chaotic  according  to
 470 Devaney's definition  of chaos,  we can conclude  that the  network is
 471 also chaotic in this sense.
 472
 473 The previous construction scheme  is not restricted to function $f_0$.
 474 It can  be extended to any function  $f$ such that $G_f$  is a chaotic
 475 map by training the network to model $F_{f}:\llbracket 1; n \rrbracket
 476 \times      \mathds{B}^n      \to      \mathds{B}^n$.      Due      to
 477 Theorem~\ref{Th:Caracterisation  des  IC  chaotiques},  we  can  find
 478 alternative functions  $f$ for $f_0$  through a simple check  of their
 479 graph of  iterations $\Gamma(f)$.  For  example, we can  build another
 480 chaotic neural network by using $f_1$ instead of $f_0$.
 481
 482 \section{Checking if a neural network is chaotic or not}
 483 \label{S3}
 484
 485 We focus now on the case  where a neural network is already available,
 486 and for which we  want to know if it is chaotic  or not. Typically, in
 487 many research papers neural network  are usually claimed to be chaotic
 488 without any  convincing mathematical proof. We propose  an approach to
 489 overcome  this  drawback  for  a  particular  category  of  multilayer
 490 perceptrons defined below, and for the Devaney's formulation of chaos.
 491 In  spite of  this restriction,  we think  that this  approach  can be
 492 extended to  a large variety  of neural networks.  We plan to  study a
 493 generalization of this approach in a future work.
 494
 495 We consider a multilayer perceptron  of the following form: as inputs
 496 it has $n$ binary digits and  one integer value, while it produces $n$
 497 bits.   Moreover, each  binary  output is  connected  with a  feedback
 498 connection to an input one.
 499
 500 \begin{itemize}
 501 \item At  initialization, the network is feeded  with $n$~bits denoted
 502   $\left(x^0_1,\dots,x^0_n\right)$  and an  integer  value $S^0$  that
 503   belongs to $\llbracket1;n\rrbracket$.
 504 \item     At     iteration~$t$,     the     last     output     vector
 505   $\left(x^t_1,\dots,x^t_n\right)$   defines  the  $n$~bits   used  to
 506   compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
 507   While  the remaining  input receives  a new  integer value  $S^t \in
 508   \llbracket1;n\rrbracket$, which is provided by the outside world.
 509 \end{itemize}
 510
 511 The topological  behavior of these  particular neural networks  can be
 512 proven to be chaotic through the following process. Firstly, we denote
 513 by  $F:  \llbracket  1;n  \rrbracket \times  \mathds{B}^n  \rightarrow
 514 \mathds{B}^n$     the     function     that     maps     the     value
 515 $\left(s,\left(x_1,\dots,x_n\right)\right)    \in    \llbracket    1;n
 516 \rrbracket      \times      \mathds{B}^n$      into     the      value
 517 $\left(y_1,\dots,y_n\right)       \in       \mathds{B}^n$,       where
 518 $\left(y_1,\dots,y_n\right)$  is the  response of  the  neural network
 519 after    the    initialization     of    its    input    layer    with
 520 $\left(s,\left(x_1,\dots, x_n\right)\right)$.  Secondly, we define $f:
 521 \mathds{B}^n       \rightarrow      \mathds{B}^n$       such      that
 522 $f\left(x_1,x_2,\dots,x_n\right)$ is equal to
 523 \begin{equation}
 524 \left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
 525   F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
 526 \end{equation}
 527 Then $F=F_f$  and this recurrent  neural network produces  exactly the
 528 same      output      vectors,      when     feeding      it      with
 529 $\left(x_1^0,\dots,x_n^0\right)$    and   $S   \in    \llbracket   1;n
 530 \rrbracket^{\mathds{N}}$, than  chaotic iterations $F_f$  with initial
 531 condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
 532 \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.   In the  rest  of this
 533 paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
 534 stands for ``Chaotic Iterations based MultiLayer Perceptron''.
 535
 536 Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
 537 definition  of  chaos  is  simple:  we  need just  to  verify  if  the
 538 associated graph  of iterations  $\Gamma(f)$ is strongly  connected or
 539 not. As  an incidental consequence,  we finally obtain  an equivalence
 540 between  chaotic  iterations   and  CI-MLP($f$).   Therefore,  we  can
 541 obviously  study such multilayer  perceptrons with  mathematical tools
 542 like  topology  to  establish,  for  example,  their  convergence  or,
 543 contrarily, their unpredictable behavior.   An example of such a study
 544 is given in the next section.
 545
 546 \section{Topological properties of chaotic neural networks}
 547 \label{S4}
 548
 549 Let us first recall  two fundamental definitions from the mathematical
 550 theory of chaos.
 551
 552 \begin{definition} \label{def8}
 553 A  function   $f$  is   said  to  be   {\bf  expansive}   if  $\exists
 554 \varepsilon>0$, $\forall  x \neq y$,  $\exists n \in  \mathds{N}$ such
 555 that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
 556 \end{definition}
 557
 558 \begin{definition} \label{def9}
 559 A discrete dynamical  system is said to be  {\bf topologically mixing}
 560 if  and only  if,  for any  pair  of disjoint  open  sets $U$,$V  \neq
 561 \emptyset$, $n_0  \in \mathds{N}$  can be found  such that  $\forall n
 562 \geq n_0$, $f^n(U) \cap V \neq \emptyset$.
 563 \end{definition}
 564
 565 As  proven in Ref.~\cite{gfb10:ip},  chaotic iterations  are expansive
 566 and  topologically mixing when  $f$ is  the vectorial  negation $f_0$.
 567 Consequently,  these  properties are  inherited  by the  CI-MLP($f_0$)
 568 recurrent neural network presented  previously, which induce a greater
 569 unpredictability.  Any  difference on the  initial value of  the input
 570 layer is  in particular  magnified up to  be equal to  the expansivity
 571 constant.
 572
 573 Now,  what are the  consequences for  a neural  network to  be chaotic
 574 according  to  Devaney's definition?  First  of  all, the  topological
 575 transitivity property implies indecomposability.
 576
 577 \begin{definition} \label{def10}
 578 A   dynamical   system   $\left(   \mathcal{X},  f\right)$   is   {\bf
 579 indecomposable}  if it  is not  the  union of  two closed  sets $A,  B
 580 \subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
 581 \end{definition}
 582
 583 \noindent Hence, reducing the set of outputs generated by CI-MLP($f$),
 584 in order to  simplify its complexity, is impossible  if $\Gamma(f)$ is
 585 strongly connected.  Moreover, under this  hypothesis CI-MLPs($f$) are
 586 strongly transitive:
 587
 588 \begin{definition} \label{def11}
 589 A  dynamical system  $\left( \mathcal{X},  f\right)$ is  {\bf strongly
 590 transitive} if $\forall x,y  \in \mathcal{X}$, $\forall r>0$, $\exists
 591 z  \in  \mathcal{X}$,  $d(z,x)~\leq~r  \Rightarrow  \exists  n  \in
 592 \mathds{N}^{\ast}$, $f^n(z)=y$.
 593 \end{definition}
 594 According to this definition, for all  pairs of points $(x, y)$ in the
 595 phase space, a point $z$ can  be found in the neighborhood of $x$ such
 596 that one of its iterates $f^n(z)$ is $y$. Indeed, this result has been
 597 established  during  the  proof   of  the  transitivity  presented  in
 598 Ref.~\cite{guyeux09}.  Among  other  things, the  strong  transitivity
 599 leads  to the fact  that without  the knowledge  of the  initial input
 600 layer, all outputs are possible.  Additionally, no point of the output
 601 space  can   be  discarded  when  studying  CI-MLPs:   this  space  is
 602 intrinsically complicated and it cannot be decomposed or simplified.
 603
 604 Furthermore, those  recurrent neural networks  exhibit the instability
 605 property:
 606 \begin{definition}
 607 A dynamical  system $\left( \mathcal{X}, f\right)$ is  unstable if for
 608 all  $x  \in  \mathcal{X}$,   the  orbit  $\gamma_x:n  \in  \mathds{N}
 609 \longmapsto f^n(x)$  is unstable,  that means: $\exists  \varepsilon >
 610 0$, $\forall  \delta>0$, $\exists y  \in \mathcal{X}$, $\exists  n \in
 611 \mathds{N}$,        such        that        $d(x,y)<\delta$        and
 612 $d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
 613 \end{definition}
 614 This property, which  is implied by the sensitive  point dependence on
 615 initial conditions, leads to the fact that in all neighborhoods of any
 616 point $x$, there are points that  can be apart by $\varepsilon$ in the
 617 future through iterations of the  CI-MLP($f$). Thus, we can claim that
 618 the behavior of  these MLPs is unstable when  $\Gamma (f)$ is strongly
 619 connected.
 620
 621 Let  us  now consider  a  compact  metric space  $(M,  d)$  and $f:  M
 622 \rightarrow M$  a continuous map. For  each natural number  $n$, a new
 623 metric $d_n$ is defined on $M$ by
 624 $$d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .$$
 625
 626 Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
 627 $\varepsilon$-close  with respect to  this metric  if their  first $n$
 628 iterates are $\varepsilon$-close.
 629
 630 This metric  allows one to distinguish  in a neighborhood  of an orbit
 631 the points  that move away from  each other during  the iteration from
 632 the points  that travel together.  A subset  $E$ of $M$ is  said to be
 633 $(n, \varepsilon)$-separated if each pair of distinct points of $E$ is
 634 at  least $\varepsilon$  apart in  the metric  $d_n$. Denote  by $H(n,
 635 \varepsilon)$     the    maximum     cardinality     of    an     $(n,
 636 \varepsilon)$-separated set,
 637 \begin{definition}
 638 The {\it topological entropy} of the  map $f$ is defined by (see e.g.,
 639 Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
 640 $$h(f)=\lim_{\varepsilon\to      0}      \left(\limsup_{n\to      \infty}
 641 \frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
 642 \end{definition}
 643
 644 Then we have the following result \cite{GuyeuxThese10},
 645 \begin{theorem}
 646 $\left( \mathcal{X},d\right)$  is compact and  the topological entropy
 647 of $(\mathcal{X},G_{f_0})$ is infinite.
 648 \end{theorem}
 649
 650 We have explained how to  construct truly chaotic neural networks, how
 651 to check whether a  given MLP is chaotic or not, and  how to study its
 652 topological behavior.   The last thing to  investigate, when comparing
 653 neural  networks   and  Devaney's  chaos,  is   to  determine  whether
 654 artificial neural networks  are able to learn or  predict some chaotic
 655 behaviors, as  it is defined  in the Devaney's formulation  (when they
 656 are not specifically constructed for this purpose).  This statement is
 657 studied in the next section.
 658
 659 \section{Suitability of Artificial Neural Networks
 660 for Predicting Chaotic Behaviors}
 661
 662 In  the context  of computer  science  different topic  areas have  an
 663 interest       in       chaos,       as       for       steganographic
 664 techniques~\cite{1309431,Zhang2005759}.    Steganography  consists  in
 665 embedding a  secret message within  an ordinary one, while  the secret
 666 extraction takes place once  at destination.  The reverse ({\it i.e.},
 667 automatically detecting the presence  of hidden messages inside media)
 668 is  called   steganalysis.   Among  the   deployed  strategies  inside
 669 detectors,          there         are          support         vectors
 670 machines~\cite{Qiao:2009:SM:1704555.1704664},                    neural
 671 networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
 672 Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
 673 detectors  give quite  good results  and are  rather  competitive when
 674 facing steganographic  tools.  However, to  the best of  our knowledge
 675 none of the considered information hiding schemes fulfills the Devaney
 676 definition  of chaos~\cite{Devaney}.  Indeed,  one can  wonder whether
 677 detectors  continue to  give good  results when  facing  truly chaotic
 678 schemes.  More  generally, there remains the open  problem of deciding
 679 whether artificial intelligence is suitable for predicting topological
 680 chaotic behaviors.
 681
 682 \subsection{Representing Chaotic Iterations for Neural Networks}
 683 \label{section:translation}
 684
 685 The  problem  of  deciding  whether  classical  feedforward  ANNs  are
 686 suitable  to approximate  topological chaotic  iterations may  then be
 687 reduced  to evaluate  ANNs on  iterations of  functions  with Strongly
 688 Connected  Component  (SCC)~graph  of  iterations.   To  compare  with
 689 non-chaotic  iterations,  the experiments  detailed  in the  following
 690 sections are  carried out  using both kinds  of function  (chaotic and
 691 non-chaotic). Let us emphasize on  the difference between this kind of
 692 neural   networks  and  the   Chaotic  Iterations   based  MultiLayer
 693 Perceptron.
 694
 695 We are  then left to compute  two disjoint function  sets that contain
 696 either functions  with topological chaos properties  or not, depending
 697 on  the strong  connectivity of  their iteration  graph.  This  can be
 698 achieved for  instance by removing a  set of edges  from the iteration
 699 graph $\Gamma(f_0)$ of the vectorial negation function~$f_0$.  One can
 700 deduce whether  a function verifies the topological  chaos property or
 701 not  by checking  the strong  connectivity of  the resulting  graph of
 702 iterations.
 703
 704 For instance let us consider  the functions $f$ and $g$ from $\Bool^4$
 705 to $\Bool^4$ respectively defined by the following lists:
 706 $$[0,  0,  2,   3,  13,  13,  6,   3,  8,  9,  10,  11,   8,  13,  14,
 707   15]$$ $$\mbox{and } [11, 14, 13, 14, 11, 10, 1, 8, 7, 6, 5, 4, 3, 2,
 708   1, 0]  \enspace.$$ In  other words,  the image of  $0011$ by  $g$ is
 709 $1110$: it  is obtained as the  binary value of the  fourth element in
 710 the  second  list  (namely~14).   It   is  not  hard  to  verify  that
 711 $\Gamma(f)$ is  not SCC  (\textit{e.g.}, $f(1111)$ is  $1111$) whereas
 712 $\Gamma(g)$  is.  Next section  shows how  to translate  iterations of
 713 such functions into a model amenable to be learned by an ANN.
 714
 715 This  section  presents how  (not)  chaotic  iterations  of $G_f$  are
 716 translated  into  another  model  more  suited  to  artificial  neural
 717 networks.  Formally, input and output vectors are pairs~$((S^t)^{t \in
 718 \Nats},x)$ and $\left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right)$
 719 as defined in~Eq.~(\ref{eq:Gf}).
 720
 721 Firstly, let us focus on how to memorize configurations.  Two distinct
 722 translations are  proposed.  In the first  case, we take  one input in
 723 $\Bool$  per  component;  in   the  second  case,  configurations  are
 724 memorized  as   natural  numbers.    A  coarse  attempt   to  memorize
 725 configuration  as  natural  number  could  consist  in  labeling  each
 726 configuration  with  its  translation  into  decimal  numeral  system.
 727 However,  such a  representation induces  too many  changes  between a
 728 configuration  labeled by  a  power  of two  and  its direct  previous
 729 configuration: for instance, 16~(10000) and 15~(01111) are closed in a
 730 decimal ordering, but  their Hamming distance is 5.   This is why Gray
 731 codes~\cite{Gray47} have been preferred.
 732
 733 Secondly,  how do  we  deal  with strategies.   Obviously,  it is  not
 734 possible to  translate in a finite  way an infinite  strategy, even if
 735 both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong to
 736 $\{1,\ldots,n\}^{\Nats}$.  Input strategies are then reduced to have a
 737 length  of size  $l  \in  \llbracket 2,k\rrbracket$,  where  $k$ is  a
 738 parameter of the evaluation. Notice  that $l$ is greater than or equal
 739 to $2$ since  we do not want the shift  $\sigma$~function to return an
 740 empty strategy.  Strategies are memorized as natural numbers expressed
 741 in base  $n+1$.  At  each iteration, either  none or one  component is
 742 modified  (among the  $n$ components)  leading to  a radix  with $n+1$
 743 entries.  Finally,  we give an  other input, namely $m  \in \llbracket
 744 1,l-1\rrbracket$, which  is the  number of successive  iterations that
 745 are applied starting  from $x$.  Outputs are translated  with the same
 746 rules.
 747
 748 To address  the complexity  issue of the  problem, let us  compute the
 749 size of the data set an ANN has to deal with.  Each input vector of an
 750 input-output pair  is composed of a configuration~$x$,  an excerpt $S$
 751 of the strategy to iterate  of size $l \in \llbracket 2, k\rrbracket$,
 752 and a  number $m \in  \llbracket 1, l-1\rrbracket$ of  iterations that
 753 are executed.
 754
 755 Firstly, there are $2^n$  configurations $x$, with $n^l$ strategies of
 756 size $l$ for  each of them. Secondly, for  a given configuration there
 757 are $\omega = 1 \times n^2 +  2 \times n^3 + \ldots+ (k-1) \times n^k$
 758 ways  of writing  the pair  $(m,S)$. Furthermore,  it is  not  hard to
 759 establish that
 760 \begin{equation}
 761 \displaystyle{(n-1) \times \omega = (k-1)\times n^{k+1} - \sum_{i=2}^k n^i} \nonumber
 762 \end{equation}
 763 then
 764 \begin{equation}
 765 \omega =
 766 \dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2} \enspace . \nonumber
 767 \end{equation}
 768 \noindent And then, finally, the number of  input-output pairs for our
 769 ANNs is
 770 $$
 771 2^n \times \left(\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2}\right) \enspace .
 772 $$
 773 For  instance, for $4$  binary components  and a  strategy of  at most
 774 $3$~terms we obtain 2304~input-output pairs.
 775
 776 \subsection{Experiments}
 777 \label{section:experiments}
 778
 779 To study  if chaotic iterations can  be predicted, we  choose to train
 780 the MultiLayer Perceptron.  As stated  before, this kind of network is
 781 in   particular    well-known   for   its    universal   approximation
 782 property. Furthermore,  MLPs have been already  considered for chaotic
 783 time series prediction.  For example, in~\cite{dalkiran10} the authors
 784 have shown that a feedforward  MLP with two hidden layers, and trained
 785 with Bayesian  Regulation backpropagation, can  learn successfully the
 786 dynamics of Chua's circuit.
 787
 788 In these experimentations we consider  MLPs having one hidden layer of
 789 sigmoidal  neurons  and  output   neurons  with  a  linear  activation
 790 function.     They    are    trained    using    the    Limited-memory
 791 Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
 792 with the Wolfe linear search.  The training process is performed until
 793 a maximum number of epochs  is reached.  To prevent overfitting and to
 794 estimate the  generalization performance we use  holdout validation by
 795 splitting the  data set into  learning, validation, and  test subsets.
 796 These subsets  are obtained through  random selection such  that their
 797 respective size represents 65\%, 10\%, and 25\% of the whole data set.
 798
 799 Several  neural  networks  are  trained  for  both  iterations  coding
 800 schemes.   In  both  cases   iterations  have  the  following  layout:
 801 configurations of  four components and  strategies with at  most three
 802 terms. Thus, for  the first coding scheme a data  set pair is composed
 803 of 6~inputs and 5~outputs, while for the second one it is respectively
 804 3~inputs and 2~outputs. As noticed at the end of the previous section,
 805 this  leads to  data sets  that  consist of  2304~pairs. The  networks
 806 differ  in the  size of  the hidden  layer and  the maximum  number of
 807 training epochs.  We remember that  to evaluate the ability  of neural
 808 networks to  predict a  chaotic behavior for  each coding  scheme, the
 809 trainings of two data sets, one of them describing chaotic iterations,
 810 are compared.
 811
 812 Thereafter we give,  for the different learning setups  and data sets,
 813 the  mean prediction  success rate  obtained for  each  output.  These
 814 values  are  computed  considering  10~trainings with  random  subsets
 815 construction,  weights  and  biases initialization.   Firstly,  neural
 816 networks having 10  and 25~hidden neurons are trained,  with a maximum
 817 number  of  epochs that  takes  its  value  in $\{125,250,500\}$  (see
 818 Tables~\ref{tab1}  and \ref{tab2}).   Secondly, we  refine  the second
 819 coding scheme by splitting the  output vector such that each output is
 820 learned by a specific  neural network (Table~\ref{tab3}). In this last
 821 case, we increase  the size of the hidden layer  up to 40~neurons, and
 822 we consider larger number of epochs.
 823
 824 \begin{table}[htbp!]
 825 \caption{Prediction success rates for configurations expressed as boolean vectors.}
 826 \label{tab1}
 827 \centering {\small
 828 \begin{tabular}{|c|c||c|c|c|}
 829 \hline
 830 \multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
 831 \hline
 832 \hline
 833 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
 834 \cline{3-5}
 835 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\
 836 \hline
 837 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 90.92\% & 91.75\% & 91.82\% \\
 838 & Output~(2) & 69.32\% & 78.46\% & 82.15\% \\
 839 & Output~(3) & 68.47\% & 78.49\% & 82.22\% \\
 840 & Output~(4) & 91.53\% & 92.37\% & 93.4\% \\
 841 & Config. & 36.10\% & 51.35\% & 56.85\% \\
 842 & Strategy~(5) & 1.91\% & 3.38\% & 2.43\% \\
 843 \hline
 844 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.64\% & 98.10\% & 98.20\% \\
 845 & Output~(2) & 95.15\% & 95.39\% & 95.46\% \\
 846 & Output~(3) & 100\% & 100\% & 100\% \\
 847 & Output~(4) & 97.47\% & 97.90\% & 97.99\% \\
 848 & Config. & 90.52\% & 91.59\% & 91.73\% \\
 849 & Strategy~(5) & 3.41\% & 3.40\% & 3.47\% \\
 850 \hline
 851 \hline
 852 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{25 neurons} \\ %& \multicolumn{3}{|c|}{40 neurons} \\
 853 \cline{3-5}
 854 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ %& 125 & 250 & 500 \\
 855 \hline
 856 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 91.65\% & 92.69\% & 93.93\% \\ %& 91.94\% & 92.89\% & 94.00\% \\
 857 & Output~(2) & 72.06\% & 88.46\% & 90.5\% \\ %& 74.97\% & 89.83\% & 91.14\% \\
 858 & Output~(3) & 79.19\% & 89.83\% & 91.59\% \\ %& 76.69\% & 89.58\% & 91.84\% \\
 859 & Output~(4) & 91.61\% & 92.34\% & 93.47\% \\% & 82.77\% & 92.93\% & 93.48\% \\
 860 & Config. & 48.82\% & 67.80\% & 70.97\% \\%& 49.46\% & 68.94\% & 71.11\% \\
 861 & Strategy~(5) & 2.62\% & 3.43\% & 3.78\% \\% & 3.10\% & 3.10\% & 3.03\% \\
 862 \hline
 863 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.87\% & 97.99\% & 98.03\% \\ %& 98.16\% \\
 864 & Output~(2) & 95.46\% & 95.84\% & 96.75\% \\ % & 97.4\% \\
 865 & Output~(3) & 100\% & 100\% & 100\% \\%& 100\% \\
 866 & Output~(4) & 97.77\% & 97.82\% & 98.06\% \\%& 98.31\% \\
 867 & Config. & 91.36\% & 91.99\% & 93.03\% \\%& 93.98\% \\
 868 & Strategy~(5) & 3.37\% & 3.44\% & 3.29\% \\%& 3.23\% \\
 869 \hline
 870 \end{tabular}
 871 }
 872 \end{table}
 873
 874 Table~\ref{tab1}  presents the  rates  obtained for  the first  coding
 875 scheme.   For  the chaotic  data,  it can  be  seen  that as  expected
 876 configuration  prediction becomes  better  when the  number of  hidden
 877 neurons and maximum  epochs increases: an improvement by  a factor two
 878 is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
 879 25~neurons  and  500~epochs). We  also  notice  that  the learning  of
 880 outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
 881 non-chaotic  case the  simplest training  setup is  enough  to predict
 882 configurations.   For  all  network  topologies and  all  outputs  the
 883 obtained  results  for the  non-chaotic  case  outperform the  chaotic
 884 ones. Finally,  the rates for  the strategies show that  the different
 885 networks are unable to learn them.
 886
 887 For the second coding  scheme, Table~\ref{tab2} shows that any network
 888 learns about  five times more non-chaotic  configurations than chaotic
 889 ones. As in the previous scheme, the strategies cannot be predicted.
 890
 891 Let us now compare the  two coding schemes. Firstly, the second scheme
 892 disturbs  the   learning  process.   In   fact  in  this   scheme  the
 893 configuration is always expressed as  a natural number, whereas in the
 894 first one  the number  of inputs follows  the increase of  the boolean
 895 vectors coding configurations. In this latter case, the coding gives a
 896 finer information on configuration evolution.
 897
 898 \begin{table}[b]
 899 \caption{Prediction success rates for configurations expressed with Gray code}
 900 \label{tab2}
 901 \centering
 902 \begin{tabular}{|c|c||c|c|c|}
 903 \hline
 904 \multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
 905 \hline
 906 \hline
 907 & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
 908 \cline{2-5}
 909 & Epochs & 125 & 250 & 500 \\ %& 1000
 910 \hline
 911 \multirow{2}{*}{Chaotic}& Config.~(1) & 13.29\% & 13.55\% & 13.08\% \\ %& 12.5\%
 912 & Strategy~(2) & 0.50\% & 0.52\% & 1.32\% \\ %& 1.42\%
 913 \hline
 914 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 77.12\% & 74.00\% & 72.60\% \\ %& 75.81\%
 915 & Strategy~(2) & 0.42\% & 0.80\% & 1.16\% \\ %& 1.42\%
 916 \hline
 917 \hline
 918 & Hidden neurons & \multicolumn{3}{c|}{25 neurons} \\
 919 \cline{2-5}
 920 & Epochs & 125 & 250 & 500 \\ %& 1000
 921 \hline
 922 \multirow{2}{*}{Chaotic}& Config.~(1) & 12.27\% & 13.15\% & 13.05\% \\ %& 15.44\%
 923 & Strategy~(2) & 0.71\% & 0.66\% & 0.88\% \\ %& 1.73\%
 924 \hline
 925 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 73.60\% & 74.70\% & 75.89\% \\ %& 68.32\%
 926 & Strategy~(2) & 0.64\% & 0.97\% & 1.23\% \\ %& 1.80\%
 927 \hline
 928 \end{tabular}
 929 \end{table}
 930
 931 Unfortunately, in  practical applications the number  of components is
 932 usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
 933 systematically.   Therefore, we  provide  a refinement  of the  second
 934 scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
 935 presents the  results for  this approach.  In  any case,  whatever the
 936 network  topologies,  the  maximum   epoch  number  and  the  kind  of
 937 iterations,  the  configuration  success  rate is  slightly  improved.
 938 Moreover, the strategies predictions  rates reach almost 12\%, whereas
 939 in  Table~\ref{tab2}  they  never   exceed  1.5\%.   Despite  of  this
 940 improvement, a long term prediction of chaotic iterations appear to be
 941 an open issue.
 942
 943 \begin{table}
 944 \caption{Prediction success rates for split outputs.}
 945 \label{tab3}
 946 \centering
 947 \begin{tabular}{|c||c|c|c|}
 948 \hline
 949 \multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
 950 \hline
 951 \hline
 952 Epochs & 125 & 250 & 500 \\
 953 \hline
 954 \hline
 955 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
 956 \hline
 957 10~neurons & 12.39\% & 14.06\% & 14.32\% \\
 958 25~neurons & 13.00\% & 14.28\% & 14.58\% \\
 959 40~neurons & 11.58\% & 13.47\% & 14.23\% \\
 960 \hline
 961 \hline
 962 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
 963 \cline{2-4}
 964 %Epochs & 125 & 250 & 500 \\
 965 \hline
 966 10~neurons & 76.01\% & 74.04\% & 78.16\% \\
 967 25~neurons & 76.60\% & 72.13\% & 75.96\% \\
 968 40~neurons & 76.34\% & 75.63\% & 77.50\% \\
 969 \hline
 970 \hline
 971 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
 972 \cline{2-4}
 973 %Epochs & 125 & 250 & 500 \\
 974 \hline
 975 10~neurons & 0.76\% & 0.97\% & 1.21\% \\
 976 25~neurons & 1.09\% & 0.73\% & 1.79\% \\
 977 40~neurons & 0.90\% & 1.02\% & 2.15\% \\
 978 \hline
 979 \multicolumn{4}{c}{} \\
 980 \hline
 981 Epochs & 1000 & 2500 & 5000 \\
 982 \hline
 983 \hline
 984 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
 985 \hline
 986 10~neurons & 14.51\% & 15.22\% & 15.22\% \\
 987 25~neurons & 16.95\% & 17.57\% & 18.46\% \\
 988 40~neurons & 17.73\% & 20.75\% & 22.62\% \\
 989 \hline
 990 \hline
 991 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
 992 \cline{2-4}
 993 %Epochs & 1000 & 2500 & 5000 \\
 994 \hline
 995 10~neurons & 78.98\% & 80.02\% & 79.97\% \\
 996 25~neurons & 79.19\% & 81.59\% & 81.53\% \\
 997 40~neurons & 79.64\% & 81.37\% & 81.37\% \\
 998 \hline
 999 \hline
1000 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1001 \cline{2-4}
1002 %Epochs & 1000 & 2500 & 5000 \\
1003 \hline
1004 10~neurons & 3.47\% & 9.98\% & 11.66\% \\
1005 25~neurons & 3.92\% & 8.63\% & 10.09\% \\
1006 40~neurons & 3.29\% & 7.19\% & 7.18\% \\
1007 \hline
1008 \end{tabular}
1009 \end{table}
1010
1011 \section{Conclusion}
1012
1013 In  this paper,  we have  established an  equivalence  between chaotic
1014 iterations,  according to  the Devaney's  definition of  chaos,  and a
1015 class  of  multilayer perceptron  neural  networks.  Firstly, we  have
1016 described how to build a neural network that can be trained to learn a
1017 given chaotic map function.  Then,  we found a condition that allow to
1018 check whether the iterations induced by a function are chaotic or not,
1019 and thus  if a chaotic map  is obtained. Thanks to  this condition our
1020 approach is not limited to a particular function. In the dual case, we
1021 show  that  checking  if  a  neural network  is  chaotic  consists  in
1022 verifying  a property  on an  associated  graph, called  the graph  of
1023 iterations.   These results  are valid  for recurrent  neural networks
1024 with a  particular architecture.  However,  we believe that  a similar
1025 work can be done for  other neural network architectures.  Finally, we
1026 have  discovered at  least one  family of  problems with  a reasonable
1027 size, such  that artificial neural  networks should not be  applied in
1028 the  presence  of chaos,  due  to  their  inability to  learn  chaotic
1029 behaviors in this  context.  Such a consideration is  not reduced to a
1030 theoretical detail:  this family of discrete  iterations is concretely
1031 implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
1032 steganographic   detectors  embed  tools   like  neural   networks  to
1033 distinguish between  original and stego contents, our  studies tend to
1034 prove that such  detectors might be unable to  tackle with chaos-based
1035 information hiding schemes.  Furthermore, iterations such that not all
1036 of  the  components  are updated  at  each  step  are very  common  in
1037 biological  and  physics mechanisms.   Therefore,  one can  reasonably
1038 wonder whether neural networks should be applied in these contexts.
1039
1040 In  future  work we  intend  to  enlarge  the comparison  between  the
1041 learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
1042 computational intelligence tools such  as support vector machines will
1043 be investigated  too, to  discover which tools  are the  most relevant
1044 when facing a truly chaotic phenomenon.  A comparison between learning
1045 rate  success  and  prediction  quality will  be  realized.   Concrete
1046 consequences in biology, physics, and computer science security fields
1047 will be  stated.  Lastly,  thresholds separating systems  depending on
1048 the ability to learn their dynamics will be established.
1049
1050 \bibliography{chaos-paper}% Produces the bibliography via BibTeX.
1051
1052 \end{document}
1053 %
1054 % ****** End of file chaos-paper.tex ******