main.tex

   1 % ****** Start of file chaos-paper.tex ******
   2 %
   3 \documentclass[%
   4 aip,
   5 cha,% long,
   6 numerical bibliography, %(default)
   7 %jcp,% short, numerical bibliography,
   8 %jmp,
   9 amsmath,amssymb,
  10 preprint,%
  11 %reprint,%
  12 %author-year,%
  13 %author-numerical,%
  14 ]{revtex4-1}
  15
  16 \usepackage{graphicx}% Include figure files
  17 \usepackage{dcolumn}% Align table columns on decimal point
  18 \usepackage{bm}% bold math
  19 \usepackage{color}
  20 \usepackage{ulem}
  21 \usepackage[mathlines]{lineno}% Enable numbering of text and display math
  22 \linenumbers\relax % Commence numbering lines
  23 % The amssymb package provides various useful mathematical symbols
  24 \usepackage{amssymb}
  25 % The amsthm package provides various useful mathematical fonts
  26 \usepackage{amsfonts}
  27 % Various mathematical packages
  28 \PassOptionsToPackage{cmex10draft}{graphics}
  29 \usepackage[standard]{ntheorem}
  30 % Various packages
  31 \usepackage{stmaryrd} % llbracket
  32 \usepackage{dsfont}   % mathds
  33 \usepackage{multirow} % tables
  34
  35 \newcommand{\ie}{\textit{i.e.}}
  36 \newcommand{\Nats}[0]{\ensuremath{\mathds{N}}}
  37 \newcommand{\R}[0]{\ensuremath{\mathds{R}}}
  38 \newcommand{\Bool}[0]{\ensuremath{\mathds{B}}}
  39
  40 \begin{document}
  41
  42 \title[Neural Networks and Chaos]{Neural Networks and Chaos:
  43 Construction, Evaluation of Chaotic Networks, \\
  44 and Prediction of Chaos with Multilayer Feedforward Networks
  45 }
  46
  47 \author{Jacques M. Bahi}
  48 \author{Jean-Fran\c{c}ois Couchot}
  49 \author{Christophe Guyeux}
  50 \email{christophe.guyeux@univ-fcomte.fr.}
  51 \author{Michel Salomon}
  52 \altaffiliation[Authors in ]{alphabetic order}
  53 \affiliation{
  54 Computer Science Laboratory (LIFC), University of Franche-Comt\'e, \\
  55 IUT de Belfort-Montb\'eliard, BP 527, \\
  56 90016 Belfort Cedex, France
  57 }
  58
  59 \date{\today}
  60
  61 \newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
  62 \newcommand{\JFC}[1]{\begin{color}{blue}\textit{#1}\end{color}}
  63 \newcommand{\MS}[1]{\begin{color}{green}\textit{#1}\end{color}}
  64
  65 \begin{abstract}
  66 %% Text of abstract
  67 Many  research works  deal with  chaotic neural  networks  for various
  68 fields  of application. Unfortunately,  up to  now these  networks are
  69 usually  claimed to be  chaotic without  any mathematical  proof.  The
  70 purpose of this paper is to establish, based on a rigorous theoretical
  71 framework,  an  equivalence between  chaotic  iterations according  to
  72 Devaney and a particular class of  neural networks. On the one hand we
  73 show  how to build  such a  network, on  the other  hand we  provide a
  74 method to  check if a neural  network is a chaotic  one.  Finally, the
  75 ability of classical feedforward  multilayer perceptrons to learn sets
  76 of data obtained from a dynamical system is regarded.  Various Boolean
  77 functions are  iterated on finite  states. Iterations of some  of them
  78 are  proven to  be  chaotic as  it  is defined  by  Devaney.  In  that
  79 context,  important   differences  occur  in   the  training  process,
  80 establishing with  various neural networks that  chaotic behaviors are
  81 far more difficult to learn.
  82 \end{abstract}
  83
  84 %%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
  85 % Classification Scheme.
  86 %%\keywords{Suggested keywords}%Use showkeys class option if keyword
  87 %display desired
  88 \maketitle
  89
  90 \begin{quotation}
  91 Chaotic neural  networks have received a  lot of attention  due to the
  92 appealing   properties  of   deterministic   chaos  (unpredictability,
  93 sensitivity,  and so  on). However,  such networks  are  often claimed
  94 chaotic without  any rigorous mathematical proof.   Therefore, in this
  95 work  a theoretical  framework based  on the  Devaney's  definition of
  96 chaos is  introduced.  Starting  with a relationship  between discrete
  97 iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
  98 recurrent  neural network  that is  equivalent  to a  chaotic map  and
  99 secondly  a way  to  check  whether an  already  available network  is
 100 chaotic  or not.  We  also study  different topological  properties of
 101 these  truly  chaotic neural  networks.   Finally,  we  show that  the
 102 learning,  with   neural  networks  having   a  classical  feedforward
 103 structure, of chaotic behaviors represented by data sets obtained from
 104 chaotic maps, is far more difficult than non chaotic behaviors.
 105 \end{quotation}
 106
 107 %% Main text
 108 \section{Introduction}
 109 \label{S1}
 110
 111 Several research  works have proposed or used  chaotic neural networks
 112 these  last years.   The complex  dynamics  of such  networks lead  to
 113 various       potential      application       areas:      associative
 114 memories~\cite{Crook2007267}  and  digital  security tools  like  hash
 115 functions~\cite{Xiao10},                                       digital
 116 watermarking~\cite{1309431,Zhang2005759},           or          cipher
 117 schemes~\cite{Lian20091296}.  In the  former case, the background idea
 118 is to  control chaotic dynamics in  order to store  patterns, with the
 119 key advantage  of offering a  large storage capacity.  For  the latter
 120 case,   the  use   of   chaotic  dynamics   is   motivated  by   their
 121 unpredictability and random-like behaviors.  Indeed, investigating new
 122 concepts  is crucial  for  the computer  security  field, because  new
 123 threats  are constantly  emerging.   As an  illustrative example,  the
 124 former  standard in hash  functions, namely  the SHA-1  algorithm, has
 125 been recently weakened after flaws were discovered.
 126
 127 Chaotic neural networks have  been built with different approaches. In
 128 the context of associative  memory, chaotic neurons like the nonlinear
 129 dynamic  state neuron  \cite{Crook2007267}  frequently constitute  the
 130 nodes of the network. These neurons have an inherent chaotic behavior,
 131 which  is usually  assessed through  the computation  of  the Lyapunov
 132 exponent.  An alternative approach  is to consider a well-known neural
 133 network architecture: the  MultiLayer Perceptron (MLP). These networks
 134 are  suitable to model  nonlinear relationships  between data,  due to
 135 their            universal            approximator            capacity
 136 \cite{Cybenko89,DBLP:journals/nn/HornikSW89}.   Thus,   this  kind  of
 137 networks can  be trained  to model a  physical phenomenon known  to be
 138 chaotic such  as Chua's circuit \cite{dalkiran10}.   Sometime a neural
 139 network, which  is build by  combining transfer functions  and initial
 140 conditions  that are  both chaotic,  is itself  claimed to  be chaotic
 141 \cite{springerlink:10.1007/s00521-010-0432-2}.
 142
 143 What all of these chaotic neural  networks have in common is that they
 144 are claimed to be chaotic  despite a lack of any rigorous mathematical
 145 proof.   The first contribution  of this  paper is  to fill  this gap,
 146 using  a theoretical framework  based on  the Devaney's  definition of
 147 chaos \cite{Devaney}.  This mathematical theory of chaos provides both
 148 qualitative and quantitative tools to evaluate the complex behavior of
 149 a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
 150 precisely, in  this paper,  which is an  extension of a  previous work
 151 \cite{bgs11:ip},   we  establish   the  equivalence   between  chaotic
 152 iterations  and  a  class  of  globally  recurrent  MLP.   The  second
 153 contribution is a study of the converse problem, indeed we investigate
 154 the ability of classical  multilayer perceptrons to learn a particular
 155 family of discrete chaotic  dynamical systems.  This family is defined
 156 by a Boolean vector, an update function, and a sequence defining the
 157 component  to  update  at  each  iteration.  It  has  been  previously
 158 established that  such dynamical systems are  chaotically iterated (as
 159 it  is defined by  Devaney) when  the chosen  function has  a strongly
 160 connected iterations  graph.  In this document,  we experiment several
 161 MLPs and  try to  learn some  iterations of this  kind.  We  show that
 162 non-chaotic  iterations  can  be  learned,  whereas  it  is  far  more
 163 difficult for  chaotic ones.   That is to  say, we have  discovered at
 164 least  one  family of  problems  with  a  reasonable size,  such  that
 165 artificial  neural  networks  should  not  be  applied  due  to  their
 166 inability to learn chaotic behaviors in this context.
 167
 168 The remainder of this research  work is organized as follows. The next
 169 section is devoted to the basics of Devaney's chaos.  Section~\ref{S2}
 170 formally  describes  how  to  build  a neural  network  that  operates
 171 chaotically.  Section~\ref{S3} is devoted to the dual case of checking
 172 whether  an existing neural  network is  chaotic or  not.  Topological
 173 properties of chaotic neural networks are discussed in Sect.~\ref{S4}.
 174 The  Section~\ref{section:translation}  shows  how to  translate  such
 175 iterations  into  an Artificial  Neural  Network  (ANN),  in order  to
 176 evaluate the  capability for this  latter to learn  chaotic behaviors.
 177 This  ability  is  studied in  Sect.~\ref{section:experiments},  where
 178 various ANNs try to learn two  sets of data: the first one is obtained
 179 by chaotic iterations while the  second one results from a non-chaotic
 180 system.  Prediction success rates are  given and discussed for the two
 181 sets.  The paper ends with a conclusion section where our contribution
 182 is summed up and intended future work is exposed.
 183
 184 \section{Chaotic Iterations according to  Devaney}
 185
 186 In this section, the well-established notion of Devaney's mathematical
 187 chaos is  firstly recalled.   Preservation of the  unpredictability of
 188 such dynamical  system when implemented  on a computer is  obtained by
 189 using  some discrete  iterations  called ``asynchronous  iterations'',
 190 which are  thus introduced.  The result establishing  the link between
 191 such iterations and Devaney's chaos is finally presented at the end of
 192 this section.
 193
 194 In what follows and for  any function $f$, $f^n$ means the composition
 195 $f \circ f \circ \hdots \circ f$ ($n$ times) and an {\bf iteration} of
 196 a {\bf  dynamical system}  is the step  that consists in  updating the
 197 global  state  $x^t$  at time  $t$  with  respect  to a  function  $f$
 198 s.t. $x^{t+1} = f(x^t)$.
 199
 200 \subsection{Devaney's chaotic dynamical systems}
 201
 202 Various domains such as  physics, biology, or economy, contain systems
 203 that exhibit a chaotic behavior,  a well-known example is the weather.
 204 These  systems   are  in   particular  highly  sensitive   to  initial
 205 conditions, a concept usually presented as the butterfly effect: small
 206 variations in the initial conditions possibly lead to widely different
 207 behaviors.  Theoretically speaking, a  system is sensitive if for each
 208 point  $x$ in  the  iteration space,  one  can find  a  point in  each
 209 neighborhood of $x$ having a significantly different future evolution.
 210 Conversely, a  system seeded with  the same initial  conditions always
 211 has  the same  evolution.   In  other words,  chaotic  systems have  a
 212 deterministic  behavior  defined through  a  physical or  mathematical
 213 model  and a  high  sensitivity to  the  initial conditions.   Besides
 214 mathematically this  kind of unpredictability  is also referred  to as
 215 deterministic chaos.  For example, many weather forecast models exist,
 216 but they give only suitable predictions for about a week, because they
 217 are initialized with conditions  that reflect only a partial knowledge
 218 of the current weather.  Even  if the differences are initially small,
 219 they are  amplified in the course  of time, and thus  make difficult a
 220 long-term prediction.  In fact,  in a chaotic system, an approximation
 221 of  the current  state is  a  quite useless  indicator for  predicting
 222 future states.
 223
 224 From  mathematical  point  of   view,  deterministic  chaos  has  been
 225 thoroughly studied  these last decades, with  different research works
 226 that  have   provide  various  definitions  of   chaos.   Among  these
 227 definitions,    the   one    given   by    Devaney~\cite{Devaney}   is
 228 well-established.   This  definition  consists  of  three  conditions:
 229 topological  transitivity, density of  periodic points,  and sensitive
 230 point dependence on initial conditions.
 231
 232 {\bf  Topological transitivity} is  checked when,  for any  point, any
 233 neighborhood of its future evolution eventually overlap with any other
 234 given region.  This property implies that a dynamical system cannot be
 235 broken into simpler subsystems.   Intuitively, its complexity does not
 236 allow any  simplification.
 237
 238 However, chaos needs some regularity to ``counteracts'' the effects of
 239 transitivity. % Thus, two points close to each other can behave in a completely different manner, the first one visiting the whole space whereas the second one has a regular orbit.
 240 In the Devaney's formulation, a dense set  of periodic
 241 points is the element of regularity that a chaotic dynamical system has
 242 to exhibit.
 243 %\begin{definition} \label{def3}
 244 We recall that a  point $x$ is a {\bf periodic point} for $f$ of
 245 period~$n \in \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
 246 %\end{definition}
 247 Then, the map
 248 %\begin{definition} \label{def4}
 249 $f$ is {\bf regular}  on the topological space $(\mathcal{X},\tau)$ if
 250 the set of periodic points for  $f$ is dense in $\mathcal{X}$ (for any
 251 $x \in \mathcal{X}$, we can find at least one periodic point in any of
 252 its neighborhood).
 253 %\end{definition}
 254 Thus, due to these two properties,  two points close to each other can
 255 behave in  a completely different manner,  leading to unpredictability
 256 for the whole system.
 257
 258 Let  us recall  that  $f$  has {\bf  sensitive  dependence on  initial
 259   conditions} if  there exists  $\delta >0$ such  that, for  any $x\in
 260 \mathcal{X}$ and any neighborhood $V$ of $x$, there exist $y\in V$ and
 261 $n  > 0$ such  that $d\left(f^{n}(x),  f^{n}(y)\right) >\delta  $. The
 262 value $\delta$ is called the {\bf constant of sensitivity} of $f$.
 263
 264 Finally,  the  dynamical system  that  iterates  $f$  is {\bf  chaotic
 265   according  to Devaney}  on $(\mathcal{X},\tau)$  if $f$  is regular,
 266 topologically transitive, and has  sensitive dependence to its initial
 267 conditions.   In  what follows,  iterations  are  said  to be  chaotic
 268 (according  to Devaney)  when  the corresponding  dynamical system  is
 269 chaotic, as it is defined in the Devaney's formulation.
 270
 271 %Let us notice that for a  metric space the last condition follows from
 272 %the two first ones~\cite{Banks92}.
 273
 274 \subsection{Asynchronous Iterations}
 275
 276 %This section  presents some basics on  topological chaotic iterations.
 277 Let us  firstly discuss about the  domain of iteration.  As  far as we
 278 know, no result rules that  the chaotic behavior of a dynamical system
 279 that  has been  theoretically  proven  on $\R$  remains  valid on  the
 280 floating-point numbers, which is  the implementation domain.  Thus, to
 281 avoid  loss of chaos  this work  presents an  alternative, that  is to
 282 iterate Boolean maps: results  that are theoretically obtained in that
 283 domain are preserved in implementations.
 284
 285 Let us denote by $\llbracket  a ; b \rrbracket$ the following interval
 286 of integers: $\{a, a+1, \hdots, b\}$, where $a~<~b$.  In this section,
 287 a {\it  system} under consideration iteratively  modifies a collection
 288 of $n$~components.  Each component  $i \in \llbracket 1; n \rrbracket$
 289 takes  its  value $x_i$  among  the  domain  $\Bool=\{0,1\}$.  A  {\it
 290   configuration} of the system at discrete time $t$ is the vector
 291 %\begin{equation*}
 292 $x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
 293 %\end{equation*}
 294 The dynamics of the system is  described according to a  function $f :
 295 \Bool^n \rightarrow \Bool^n$ such that
 296 %\begin{equation*}
 297 $f(x)=(f_1(x),\ldots,f_n(x))$.
 298 %\end{equation*}
 299 % Notice that $f^k$ denotes the
 300 % $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
 301
 302 Let    be   given    a   configuration    $x$.    In    what   follows
 303 $N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$  is  the configuration
 304 obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
 305 indeed  the negation  of $x_i$).   Intuitively, $x$  and  $N(i,x)$ are
 306 neighbors.   The discrete  iterations of  $f$ are  represented  by the
 307 oriented  {\it graph  of iterations}  $\Gamma(f)$.  In  such  a graph,
 308 vertices are configurations  of $\Bool^n$ and there is  an arc labeled
 309 $i$ from $x$ to $N(i,x)$ if and only if $f_i(x)$ is $N(i,x)$.
 310
 311 In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
 312 sequence defining  which component to  update at time $t$  and $S^{t}$
 313 denotes its $t-$th term.  This iteration scheme that only modifies one
 314 element at each iteration is usually referred as {\it asynchronous
 315   iterations}.  More precisely, we have for any $i$, $1\le i \le n$,
 316 \begin{equation}
 317 \left\{ \begin{array}{l}
 318 x^{0}  \in \Bool^n \\
 319 x^{t+1}_i = \left\{
 320 \begin{array}{l}
 321   f_i(x^t) \textrm{ if $S^t = i$}\enspace , \\
 322   x_i^t \textrm{ otherwise}\enspace .
 323  \end{array}
 324 \right.
 325 \end{array} \right.
 326 \end{equation}
 327
 328 Next  section  shows  the  link between  asynchronous  iterations  and
 329 Devaney's chaos.
 330
 331 \subsection{On the link between asynchronous iterations and
 332   Devaney's Chaos}
 333
 334 In  this subsection  we recall  the link  we have  established between
 335 asynchronous   iterations  and   Devaney's  chaos.    The  theoretical
 336 framework is fully described in \cite{guyeux09}.
 337
 338 We  introduce the  function  $F_{f}$  that is  defined  for any  given
 339 application     $f:\Bool^{n}     \to     \Bool^{n}$     by     $F_{f}:
 340 \llbracket1;n\rrbracket\times        \mathds{B}^{n}        \rightarrow
 341 \mathds{B}^{n}$, s.t.
 342 \begin{equation}
 343 \label{eq:CIs}
 344   F_{f}(s,x)_j  =
 345     \left\{
 346     \begin{array}{l}
 347     f_j(x) \textrm{ if } j= s \enspace , \\
 348     x_{j} \textrm{ otherwise} \enspace .
 349     \end{array}
 350     \right.
 351 \end{equation}
 352
 353 \noindent With such a notation, asynchronously obtained configurations
 354 are defined for times \linebreak $t=0,1,2,\ldots$ by:
 355 \begin{equation}\label{eq:sync}
 356 \left\{\begin{array}{l}
 357   x^{0}\in \mathds{B}^{n} \textrm{ and}\\
 358   x^{t+1}=F_{f}(S^t,x^{t}) \enspace .
 359 \end{array}\right.
 360 \end{equation}
 361
 362 \noindent  Finally, iterations defined  in Eq.~(\ref{eq:sync})  can be
 363 described by the following system:
 364 \begin{equation}
 365 \left\{
 366 \begin{array}{lll}
 367 X^{0} & = &  ((S^t)^{t \in \Nats},x^0) \in
 368 \llbracket1;n\rrbracket^\Nats \times \Bool^{n}\\
 369 X^{k+1}& = & G_{f}(X^{k})\\
 370 \multicolumn{3}{c}{\textrm{where } G_{f}\left(((S^t)^{t \in \Nats},x)\right)
 371 = \left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right) \enspace ,}
 372 \end{array}
 373 \right.
 374 \label{eq:Gf}
 375 \end{equation}
 376 where $\sigma$ is the so-called  shift function that removes the first
 377 term of  the strategy ({\it i.e.},~$S^0$).  This  definition allows to
 378 link asynchronous iterations with  classical iterations of a dynamical
 379 system. Note that it can be extended by considering subsets for $S^t$.
 380
 381 To study topological properties of  these iterations, we are then left
 382 to  introduce a  {\bf distance}  $d$  between two  points $(S,x)$  and
 383 $(\check{S},\check{x})$           in           $\mathcal{X}          =
 384 \llbracket1;n\rrbracket^\Nats \times \Bool^{n}$. Let $\Delta(x,y) = 0$
 385 if   $x=y$,  and   $\Delta(x,y)  =   1$   else,  be   a  distance   on
 386 $\mathds{B}$. The distance $d$ is defined by
 387 \begin{equation}
 388 d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
 389 \enspace ,
 390 \end{equation}
 391 where
 392 \begin{equation}
 393 d_{e}(x,\check{x})=\sum_{j=1}^{n}\Delta
 394 (x_{j},\check{x}_{j}) \in \llbracket  0 ; n \rrbracket
 395 \end{equation}
 396 \noindent and
 397 \begin{equation}
 398 d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
 399 }\frac{|S^{t}-\check{S}^{t}|}{10^{t+1}} \in [0 ; 1] \enspace .
 400 \end{equation}
 401
 402 This    distance    is    defined    to    reflect    the    following
 403 information. Firstly, the more  two systems have different components,
 404 the  larger the  distance between  them.  Secondly,  two  systems with
 405 similar components and strategies, which have the same starting terms,
 406 must  induce only a  small distance.   The proposed  distance fulfills
 407 these  requirements: on  the one  hand  its floor  value reflects  the
 408 difference between  the cells, on  the other hand its  fractional part
 409 measures the difference between the strategies.
 410
 411 The relation  between $\Gamma(f)$ and  $G_f$ is obvious: there  exists a
 412 path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
 413 strategy  $s$ such  that iterations  of $G_f$  from the  initial point
 414 $(s,x)$   reach   the   configuration   $x'$.   Using   this   link,
 415 Guyeux~\cite{GuyeuxThese10} has proven that,
 416 \begin{theorem}%[Characterization of $\mathcal{C}$]
 417 \label{Th:Caracterisation   des   IC   chaotiques}
 418 Let $f:\Bool^n\to\Bool^n$. Iterations of $G_f$ are chaotic  according
 419 to  Devaney if and only if  $\Gamma(f)$ is strongly connected.
 420 \end{theorem}
 421
 422 Checking if  a graph  is strongly connected  is not difficult  (by the
 423 Tarjan's algorithm for  instance).  Let be given a  strategy $S$ and a
 424 function  $f$ such that  $\Gamma(f)$ is  strongly connected.   In that
 425 case, iterations of the function $G_f$ as defined in Eq.~(\ref{eq:Gf})
 426 are chaotic according to Devaney.
 427
 428
 429 Let   us  then   define  two   functions  $f_0$   and  $f_1$   both  in
 430 $\Bool^n\to\Bool^n$ that are used all along this paper.  The former is
 431 the   vectorial  negation,   \textit{i.e.},  $f_{0}(x_{1},\dots,x_{n})
 432 =(\overline{x_{1}},\dots,\overline{x_{n}})$.      The     latter    is
 433 $f_1\left(x_1,\dots,x_n\right)=\left(
 434 \overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$.  It is not hard to check
 435 that $\Gamma(f_0)$ and $\Gamma(f_1)$ are both strongly connected, then
 436 iterations  of $G_{f_0}$  and of  $G_{f_1}$ are  chaotic  according to
 437 Devaney.
 438
 439 With this  material, we are now  able to build a  first chaotic neural
 440 network, as defined in the Devaney's formulation.
 441
 442 \section{A chaotic neural network in the sense of Devaney}
 443 \label{S2}
 444
 445 Let  us   build  a  multilayer  perceptron   neural  network  modeling
 446 $F_{f_0}:\llbracket   1;   n   \rrbracket  \times   \mathds{B}^n   \to
 447 \mathds{B}^n$ associated  to the vectorial  negation.  More precisely,
 448 for   all   inputs   $(s,x)   \in  \llbracket   1;n\rrbracket   \times
 449 \mathds{B}^n$, the  output layer  will produce $F_{f_0}(s,x)$.   It is
 450 then possible to link the output  layer and the input one, in order to
 451 model the  dependence between two successive iterations.   As a result
 452 we obtain  a global recurrent  neural network that behaves  as follows
 453 (see Fig.~\ref{Fig:perceptron}).
 454
 455 \begin{itemize}
 456 \item   The   network   is   initialized   with   the   input   vector
 457   $\left(S^0,x^0\right)    \in    \llbracket   1;n\rrbracket    \times
 458   \mathds{B}^n$      and      computes      the     output      vector
 459   $x^1=F_{f_0}\left(S^0,x^0\right)$. This last  vector is published as
 460   an output one of the chaotic  neural network and is sent back to the
 461   input layer through the feedback links.
 462 \item When  the network  is activated at  the $t^{th}$  iteration, the
 463   state of the system $x^t  \in \mathds{B}^n$ received from the output
 464   layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
 465   (\textit{i.e.},  $S^0  \in \llbracket  1;n\rrbracket$)  are used  to
 466   compute the  new output vector.   This new vector,  which represents
 467   the new state of the dynamical system, satisfies:
 468   \begin{equation}
 469     x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
 470   \end{equation}
 471 \end{itemize}
 472
 473 \begin{figure}
 474   \centering
 475   \includegraphics[scale=0.625]{perceptron}
 476   \caption{A perceptron equivalent to chaotic iterations}
 477   \label{Fig:perceptron}
 478 \end{figure}
 479
 480 The behavior of the neural network is such that when the initial state
 481 is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
 482 given  as outside  input, then  the sequence  of  successive published
 483 output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
 484 the  one produced  by  the chaotic  iterations  formally described  in
 485 Eq.~(\ref{eq:Gf}).   It means  that mathematically  if we  use similar
 486 input  vectors   they  both  generate  the   same  successive  outputs
 487 $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
 488 are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
 489 $\mathcal{X}$. Finally, since the  proposed neural network is built to
 490 model  the  behavior  of   $G_{f_0}$,  whose  iterations  are  chaotic
 491 according to the  Devaney's definition of chaos, we  can conclude that
 492 the network is also chaotic in this sense.
 493
 494 The previous construction scheme  is not restricted to function $f_0$.
 495 It can  be extended to any function  $f$ such that $G_f$  is a chaotic
 496 map by training the network to model $F_{f}:\llbracket 1; n \rrbracket
 497 \times      \mathds{B}^n      \to      \mathds{B}^n$.      Due      to
 498 Theorem~\ref{Th:Caracterisation  des  IC  chaotiques},  we  can  find
 499 alternative functions  $f$ for $f_0$  through a simple check  of their
 500 graph of  iterations $\Gamma(f)$.  For  example, we can  build another
 501 chaotic neural network by using $f_1$ instead of $f_0$.
 502
 503 \section{Checking whether a neural network is chaotic or not}
 504 \label{S3}
 505
 506 We focus now on the case  where a neural network is already available,
 507 and for  which we want  to know if  it is chaotic. Typically,  in many
 508 research  papers neural  network  are usually  claimed  to be  chaotic
 509 without any  convincing mathematical proof. We propose  an approach to
 510 overcome  this  drawback  for  a  particular  category  of  multilayer
 511 perceptrons defined below, and for the Devaney's formulation of chaos.
 512 In  spite of  this restriction,  we think  that this  approach  can be
 513 extended to a large variety of neural networks.
 514
 515 We consider a multilayer perceptron  of the following form: inputs are
 516 $n$ binary digits  and one integer value, while  outputs are $n$ bits.
 517 Moreover, each  binary output is connected with  a feedback connection
 518 to an input one.
 519
 520 \begin{itemize}
 521 \item  During  initialization, the  network  is  seeded with  $n$~bits
 522   denoted $\left(x^0_1,\dots,x^0_n\right)$ and  an integer value $S^0$
 523   that belongs to $\llbracket1;n\rrbracket$.
 524 \item     At     iteration~$t$,     the     last     output     vector
 525   $\left(x^t_1,\dots,x^t_n\right)$   defines  the  $n$~bits   used  to
 526   compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
 527   While  the remaining  input receives  a new  integer value  $S^t \in
 528   \llbracket1;n\rrbracket$, which is provided by the outside world.
 529 \end{itemize}
 530
 531 The topological  behavior of these  particular neural networks  can be
 532 proven to be chaotic through the following process. Firstly, we denote
 533 by  $F:  \llbracket  1;n  \rrbracket \times  \mathds{B}^n  \rightarrow
 534 \mathds{B}^n$     the     function     that     maps     the     value
 535 $\left(s,\left(x_1,\dots,x_n\right)\right)    \in    \llbracket    1;n
 536 \rrbracket      \times      \mathds{B}^n$      into     the      value
 537 $\left(y_1,\dots,y_n\right)       \in       \mathds{B}^n$,       where
 538 $\left(y_1,\dots,y_n\right)$  is the  response of  the  neural network
 539 after    the    initialization     of    its    input    layer    with
 540 $\left(s,\left(x_1,\dots, x_n\right)\right)$.  Secondly, we define $f:
 541 \mathds{B}^n       \rightarrow      \mathds{B}^n$       such      that
 542 $f\left(x_1,x_2,\dots,x_n\right)$ is equal to
 543 \begin{equation}
 544 \left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
 545   F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
 546 \end{equation}
 547 Thus, for any $j$, $1 \le j \le n$, we have
 548 $f_j\left(x_1,x_2,\dots,x_n\right) =
 549 F\left(j,\left(x_1,x_2,\dots,x_n\right)\right)$.
 550 If this recurrent  neural network is seeded with
 551 $\left(x_1^0,\dots,x_n^0\right)$    and   $S   \in    \llbracket   1;n
 552 \rrbracket^{\mathds{N}}$, it produces  exactly the
 553 same      output      vectors  than the
 554 chaotic iterations of $F_f$  with initial
 555 condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
 556 \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.
 557 Theoretically  speaking, such iterations  of $F_f$  are thus  a formal
 558 model of these kind of recurrent  neural networks. In the rest of this
 559 <<<<<<< HEAD
 560 paper, we will call such multilayer perceptrons ``CI-MLP($f$)'', which
 561 =======
 562 paper,  we will  call such  multilayer perceptrons  ``CI-MLP($f$)'', which
 563 >>>>>>> 3df586d673bc4f3b32fa0dd1cb46796256744772
 564 stands for ``Chaotic Iterations based MultiLayer Perceptron''.
 565
 566 Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
 567 definition  of  chaos  is  simple:  we  need just  to  verify  if  the
 568 associated graph  of iterations  $\Gamma(f)$ is strongly  connected or
 569 not. As  an incidental consequence,  we finally obtain  an equivalence
 570 between  chaotic  iterations   and  CI-MLP($f$).   Therefore,  we  can
 571 obviously  study such multilayer  perceptrons with  mathematical tools
 572 like  topology  to  establish,  for  example,  their  convergence  or,
 573 contrarily, their unpredictable behavior.   An example of such a study
 574 is given in the next section.
 575
 576 \section{Topological properties of chaotic neural networks}
 577 \label{S4}
 578
 579 Let us first recall  two fundamental definitions from the mathematical
 580 theory of chaos.
 581
 582 \begin{definition} \label{def8}
 583 A  function   $f$  is   said  to  be   {\bf  expansive}   if  $\exists
 584 \varepsilon>0$, $\forall  x \neq y$,  $\exists n \in  \mathds{N}$ such
 585 that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
 586 \end{definition}
 587
 588 \noindent In  other words, a small  error on any  initial condition is
 589 always amplified  until $\varepsilon$,  which denotes the  constant of
 590 expansivity of $f$.
 591
 592 \begin{definition} \label{def9}
 593 A discrete dynamical  system is said to be  {\bf topologically mixing}
 594 if  and only  if,  for any  pair  of disjoint  open  sets $U$,$V  \neq
 595 \emptyset$, we  can find some $n_0  \in \mathds{N}$ such  that for any
 596 $n$, $n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
 597 \end{definition}
 598
 599 \noindent Topologically mixing means that the dynamical system evolves
 600 in  time such that  any given  region of  its topological  space might
 601 overlap with any other region.
 602
 603 It has been proven in Ref.~\cite{gfb10:ip} that chaotic iterations are
 604 expansive and topologically mixing  when $f$ is the vectorial negation
 605 $f_0$.    Consequently,  these   properties  are   inherited   by  the
 606 CI-MLP($f_0$)  recurrent neural  network  previously presented,  which
 607 induce  a greater  unpredictability.   Any difference  on the  initial
 608 value of the input layer is  in particular magnified up to be equal to
 609 the expansivity constant.
 610
 611 Let  us then  focus on  the consequences  for a  neural network  to be
 612 chaotic   according  to   Devaney's   definition.   Intuitively,   the
 613 topological transitivity property  implies indecomposability, which is
 614 formally defined as follows:
 615
 616 \begin{definition} \label{def10}
 617 A  dynamical  system  $\left(   \mathcal{X},  f\right)$  is  {\bf  not
 618   decomposable}  if it  is not  the  union of  two closed  sets $A,  B
 619 \subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
 620 \end{definition}
 621
 622 \noindent Hence, reducing the set of outputs generated by CI-MLP($f$),
 623 in order to  simplify its complexity, is impossible  if $\Gamma(f)$ is
 624 strongly connected.  Moreover, under this  hypothesis CI-MLPs($f$) are
 625 strongly transitive:
 626
 627 \begin{definition} \label{def11}
 628 A  dynamical system  $\left( \mathcal{X},  f\right)$ is  {\bf strongly
 629   transitive}  if  $\forall   x,y  \in  \mathcal{X}$,  $\forall  r>0$,
 630 $\exists z \in \mathcal{X}$,  $d(z,x)~\leq~r \Rightarrow \exists n \in
 631 \mathds{N}^{\ast}$, $f^n(z)=y$.
 632 \end{definition}
 633
 634 \noindent According to  this definition, for all pairs  of points $(x,
 635 y)$ in the  phase space, a point $z$ can be  found in the neighborhood
 636 of $x$  such that one  of its iterates  $f^n(z)$ is $y$.  Indeed, this
 637 result  has been  established  during the  proof  of the  transitivity
 638 presented  in Ref.~\cite{guyeux09}.   Among other  things,  the strong
 639 transitivity  leads to  the fact  that  without the  knowledge of  the
 640 initial input layer, all outputs are possible.  Additionally, no point
 641 of the output space can be discarded when studying CI-MLPs: this space
 642 is  intrinsically   complicated  and   it  cannot  be   decomposed  or
 643 simplified.
 644
 645 Furthermore, these  recurrent neural networks  exhibit the instability
 646 property:
 647 \begin{definition}
 648 A dynamical  system $\left( \mathcal{X}, f\right)$ is {\bf unstable}
 649 if for
 650 all  $x  \in  \mathcal{X}$,   the  orbit  $\gamma_x:n  \in  \mathds{N}
 651 \longmapsto f^n(x)$  is unstable,  that means: $\exists  \varepsilon >
 652 0$, $\forall  \delta>0$, $\exists y  \in \mathcal{X}$, $\exists  n \in
 653 \mathds{N}$,        such        that        $d(x,y)<\delta$        and
 654 $d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
 655 \end{definition}
 656
 657 \noindent  This property,  which  is implied  by  the sensitive  point
 658 dependence  on initial  conditions,  leads  to the  fact  that in  all
 659 neighborhoods of any point $x$, there  are points that can be apart by
 660 $\varepsilon$    in   the   future    through   iterations    of   the
 661 CI-MLP($f$). Thus,  we can  claim that the  behavior of these  MLPs is
 662 unstable when $\Gamma (f)$ is strongly connected.
 663
 664 Let  us  now consider  a  compact  metric space  $(M,  d)$  and $f:  M
 665 \rightarrow M$  a continuous map. For  each natural number  $n$, a new
 666 metric $d_n$ is defined on $M$ by
 667 \begin{equation}
 668 d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .
 669 \end{equation}
 670
 671 Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
 672 $\varepsilon$-close  with respect to  this metric  if their  first $n$
 673 iterates are $\varepsilon$-close.
 674
 675 This metric  allows one to distinguish  in a neighborhood  of an orbit
 676 the points  that move away from  each other during  the iteration from
 677 the points  that travel together.  A subset  $E$ of $M$ is  said to be
 678 $(n, \varepsilon)$-separated if each pair of distinct points of $E$ is
 679 at  least $\varepsilon$  apart in  the metric  $d_n$. Denote  by $H(n,
 680 \varepsilon)$     the    maximum     cardinality     of    an     $(n,
 681 \varepsilon)$-separated set,
 682 \begin{definition}
 683 The {\bf topological entropy} of the  map $f$ is defined by (see e.g.,
 684 Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
 685 $$h(f)=\lim_{\varepsilon\to      0}      \left(\limsup_{n\to      \infty}
 686 \frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
 687 \end{definition}
 688
 689 Then we have the following result \cite{GuyeuxThese10},
 690 \begin{theorem}
 691 $\left( \mathcal{X},d\right)$  is compact and  the topological entropy
 692 of $(\mathcal{X},G_{f_0})$ is infinite.
 693 \end{theorem}
 694
 695 \begin{figure}
 696   \centering
 697   \includegraphics[scale=0.5]{scheme}
 698   \caption{Summary of addressed neural networks and chaos problems}
 699   \label{Fig:scheme}
 700 \end{figure}
 701
 702 Figure~\ref{Fig:scheme} is a summary  of addressed neural networks and
 703 chaos  problems.   In  Section~\ref{S2}   we  have  explained  how  to
 704 construct    a    truly    chaotic    neural   networks,    $A$    for
 705 instance. Section~\ref{S3} has shown how  to check whether a given MLP
 706 $A$ or $C$ is chaotic or not in the sense of Devaney, and how to study
 707 its  topological  behavior.   The  last  thing  to  investigate,  when
 708 comparing neural networks and Devaney's chaos, is to determine whether
 709 an  artificial neural network  $C$ is  able to  learn or  predict some
 710 chaotic  behaviors  of  $B$,  as   it  is  defined  in  the  Devaney's
 711 formulation  (when  they are  not  specifically  constructed for  this
 712 purpose).  This statement is studied in the next section.
 713
 714 \section{Suitability of Feedforward Neural Networks
 715 for Predicting Chaotic and Non-chaotic Behaviors}
 716
 717 In  the context  of computer  science  different topic  areas have  an
 718 interest       in       chaos,       as       for       steganographic
 719 techniques~\cite{1309431,Zhang2005759}.    Steganography  consists  in
 720 embedding a  secret message within  an ordinary one, while  the secret
 721 extraction takes place once  at destination.  The reverse ({\it i.e.},
 722 automatically detecting the presence  of hidden messages inside media)
 723 is  called   steganalysis.   Among  the   deployed  strategies  inside
 724 detectors,          there         are          support         vectors
 725 machines~\cite{Qiao:2009:SM:1704555.1704664},                   neural
 726 networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
 727 Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
 728 detectors  give quite  good results  and are  rather  competitive when
 729 facing steganographic  tools.  However, to  the best of  our knowledge
 730 none of the considered information hiding schemes fulfills the Devaney
 731 definition  of chaos~\cite{Devaney}.  Indeed,  one can  wonder whether
 732 detectors  continue to  give good  results when  facing  truly chaotic
 733 schemes.  More  generally, there remains the open  problem of deciding
 734 whether artificial intelligence is suitable for predicting topological
 735 chaotic behaviors.
 736
 737 \subsection{Representing Chaotic Iterations for Neural Networks}
 738 \label{section:translation}
 739
 740 The  problem  of  deciding  whether  classical  feedforward  ANNs  are
 741 suitable  to approximate  topological chaotic  iterations may  then be
 742 reduced to  evaluate such neural  networks on iterations  of functions
 743 with  Strongly  Connected  Component  (SCC)~graph of  iterations.   To
 744 compare with  non-chaotic iterations, the experiments  detailed in the
 745 following  sections  are carried  out  using  both  kinds of  function
 746 (chaotic and non-chaotic). Let  us emphasize on the difference between
 747 this  kind  of  neural  networks  and  the  Chaotic  Iterations  based
 748 multilayer peceptron.
 749
 750 We are  then left to compute  two disjoint function  sets that contain
 751 either functions  with topological chaos properties  or not, depending
 752 on  the strong  connectivity of  their iterations graph.  This  can be
 753 achieved for  instance by removing a  set of edges  from the iteration
 754 graph $\Gamma(f_0)$ of the vectorial negation function~$f_0$.  One can
 755 deduce whether  a function verifies the topological  chaos property or
 756 not  by checking  the strong  connectivity of  the resulting  graph of
 757 iterations.
 758
 759 For instance let us consider  the functions $f$ and $g$ from $\Bool^4$
 760 to $\Bool^4$ respectively defined by the following lists:
 761 $$[0,  0,  2,   3,  13,  13,  6,   3,  8,  9,  10,  11,   8,  13,  14,
 762   15]$$ $$\mbox{and } [11, 14, 13, 14, 11, 10, 1, 8, 7, 6, 5, 4, 3, 2,
 763   1, 0]  \enspace.$$ In  other words,  the image of  $0011$ by  $g$ is
 764 $1110$: it  is obtained as the  binary value of the  fourth element in
 765 the  second  list  (namely~14).   It   is  not  hard  to  verify  that
 766 $\Gamma(f)$ is  not SCC  (\textit{e.g.}, $f(1111)$ is  $1111$) whereas
 767 $\Gamma(g)$ is. The  remaining of this section shows  how to translate
 768 iterations of such functions into a model amenable to be learned by an
 769 ANN.   Formally, input  and  output vectors  are pairs~$((S^t)^{t  \in
 770   \Nats},x)$          and          $\left(\sigma((S^t)^{t          \in
 771   \Nats}),F_{f}(S^0,x)\right)$ as defined in~Eq.~(\ref{eq:Gf}).
 772
 773 Firstly, let us focus on how to memorize configurations.  Two distinct
 774 translations are  proposed.  In the first  case, we take  one input in
 775 $\Bool$  per  component;  in   the  second  case,  configurations  are
 776 memorized  as   natural  numbers.    A  coarse  attempt   to  memorize
 777 configuration  as  natural  number  could  consist  in  labeling  each
 778 configuration  with  its  translation  into  decimal  numeral  system.
 779 However,  such a  representation induces  too many  changes  between a
 780 configuration  labeled by  a  power  of two  and  its direct  previous
 781 configuration: for instance, 16~(10000)  and 15~(01111) are close in a
 782 decimal ordering, but  their Hamming distance is 5.   This is why Gray
 783 codes~\cite{Gray47} have been preferred.
 784
 785 Secondly, let us detail how to deal with strategies.  Obviously, it is
 786 not possible to  translate in a finite way  an infinite strategy, even
 787 if both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong
 788 to  $\{1,\ldots,n\}^{\Nats}$.  Input  strategies are  then  reduced to
 789 have a length of size $l \in \llbracket 2,k\rrbracket$, where $k$ is a
 790 parameter of the evaluation. Notice  that $l$ is greater than or equal
 791 to $2$ since  we do not want the shift  $\sigma$~function to return an
 792 empty strategy.  Strategies are memorized as natural numbers expressed
 793 in base  $n+1$.  At  each iteration, either  none or one  component is
 794 modified  (among the  $n$ components)  leading to  a radix  with $n+1$
 795 entries.  Finally,  we give an  other input, namely $m  \in \llbracket
 796 1,l-1\rrbracket$, which  is the  number of successive  iterations that
 797 are applied starting  from $x$.  Outputs are translated  with the same
 798 rules.
 799
 800 To address  the complexity  issue of the  problem, let us  compute the
 801 size of the data set an ANN has to deal with.  Each input vector of an
 802 input-output pair  is composed of a configuration~$x$,  an excerpt $S$
 803 of the strategy to iterate  of size $l \in \llbracket 2, k\rrbracket$,
 804 and a  number $m \in  \llbracket 1, l-1\rrbracket$ of  iterations that
 805 are executed.
 806
 807 Firstly, there are $2^n$  configurations $x$, with $n^l$ strategies of
 808 size $l$ for  each of them. Secondly, for  a given configuration there
 809 are $\omega = 1 \times n^2 +  2 \times n^3 + \ldots+ (k-1) \times n^k$
 810 ways  of writing  the pair  $(m,S)$. Furthermore,  it is  not  hard to
 811 establish that
 812 \begin{equation}
 813 \displaystyle{(n-1) \times \omega = (k-1)\times n^{k+1} - \sum_{i=2}^k n^i} \nonumber
 814 \end{equation}
 815 then
 816 \begin{equation}
 817 \omega =
 818 \dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2} \enspace . \nonumber
 819 \end{equation}
 820 \noindent And then, finally, the number of  input-output pairs for our
 821 ANNs is
 822 $$
 823 2^n \times \left(\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2}\right) \enspace .
 824 $$
 825 For  instance, for $4$  binary components  and a  strategy of  at most
 826 $3$~terms we obtain 2304~input-output pairs.
 827
 828 \subsection{Experiments}
 829 \label{section:experiments}
 830
 831 To study  if chaotic iterations can  be predicted, we  choose to train
 832 the multilayer perceptron.  As stated  before, this kind of network is
 833 in  particular  well-known for  its  universal approximation  property
 834 \cite{Cybenko89,DBLP:journals/nn/HornikSW89}.  Furthermore,  MLPs have
 835 been  already  considered for  chaotic  time  series prediction.   For
 836 example,   in~\cite{dalkiran10}  the   authors  have   shown   that  a
 837 feedforward  MLP with  two hidden  layers, and  trained  with Bayesian
 838 Regulation  back-propagation, can learn  successfully the  dynamics of
 839 Chua's circuit.
 840
 841 In  these experiments  we consider  MLPs  having one  hidden layer  of
 842 sigmoidal  neurons  and  output   neurons  with  a  linear  activation
 843 function.     They    are    trained    using    the    Limited-memory
 844 Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
 845 with the Wolfe linear search.  The training process is performed until
 846 a maximum number of epochs  is reached.  To prevent overfitting and to
 847 estimate the  generalization performance we use  holdout validation by
 848 splitting the  data set into  learning, validation, and  test subsets.
 849 These subsets  are obtained through  random selection such  that their
 850 respective size represents 65\%, 10\%, and 25\% of the whole data set.
 851
 852 Several  neural  networks  are  trained  for  both  iterations  coding
 853 schemes.   In  both  cases   iterations  have  the  following  layout:
 854 configurations of  four components and  strategies with at  most three
 855 terms. Thus, for  the first coding scheme a data  set pair is composed
 856 of 6~inputs and 5~outputs, while for the second one it is respectively
 857 3~inputs and 2~outputs. As noticed at the end of the previous section,
 858 this  leads to  data sets  that  consist of  2304~pairs. The  networks
 859 differ  in the  size of  the hidden  layer and  the maximum  number of
 860 training epochs.  We remember that  to evaluate the ability  of neural
 861 networks to  predict a  chaotic behavior for  each coding  scheme, the
 862 trainings of two data sets, one of them describing chaotic iterations,
 863 are compared.
 864
 865 Thereafter we give,  for the different learning setups  and data sets,
 866 the mean prediction success rate obtained for each output. Such a rate
 867 <<<<<<< HEAD
 868 represents the percentage of  input-output pairs belonging to the test
 869 =======
 870 represents the  percentage of input-output pairs belonging  to the test
 871 >>>>>>> 3df586d673bc4f3b32fa0dd1cb46796256744772
 872 subset  for  which  the   corresponding  output  value  was  correctly
 873 predicted.   These values are  computed considering  10~trainings with
 874 random  subsets  construction,   weights  and  biases  initialization.
 875 Firstly, neural networks having  10 and 25~hidden neurons are trained,
 876 with   a  maximum   number  of   epochs  that   takes  its   value  in
 877 $\{125,250,500\}$  (see Tables~\ref{tab1} and  \ref{tab2}).  Secondly,
 878 we refine the second coding scheme by splitting the output vector such
 879 that   each  output   is  learned   by  a   specific   neural  network
 880 (Table~\ref{tab3}). In  this last  case, we increase  the size  of the
 881 hidden layer up to 40~neurons and we consider larger number of epochs.
 882
 883 \begin{table}[htbp!]
 884 \caption{Prediction success rates for configurations expressed as boolean vectors.}
 885 \label{tab1}
 886 \centering {\small
 887 \begin{tabular}{|c|c||c|c|c|}
 888 \hline
 889 \multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs, and one hidden layer} \\
 890 \hline
 891 \hline
 892 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
 893 \cline{3-5}
 894 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\
 895 \hline
 896 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 90.92\% & 91.75\% & 91.82\% \\
 897 & Output~(2) & 69.32\% & 78.46\% & 82.15\% \\
 898 & Output~(3) & 68.47\% & 78.49\% & 82.22\% \\
 899 & Output~(4) & 91.53\% & 92.37\% & 93.4\% \\
 900 & Config. & 36.10\% & 51.35\% & 56.85\% \\
 901 & Strategy~(5) & 1.91\% & 3.38\% & 2.43\% \\
 902 \hline
 903 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.64\% & 98.10\% & 98.20\% \\
 904 & Output~(2) & 95.15\% & 95.39\% & 95.46\% \\
 905 & Output~(3) & 100\% & 100\% & 100\% \\
 906 & Output~(4) & 97.47\% & 97.90\% & 97.99\% \\
 907 & Config. & 90.52\% & 91.59\% & 91.73\% \\
 908 & Strategy~(5) & 3.41\% & 3.40\% & 3.47\% \\
 909 \hline
 910 \hline
 911 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{25 neurons} \\ %& \multicolumn{3}{|c|}{40 neurons} \\
 912 \cline{3-5}
 913 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ %& 125 & 250 & 500 \\
 914 \hline
 915 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 91.65\% & 92.69\% & 93.93\% \\ %& 91.94\% & 92.89\% & 94.00\% \\
 916 & Output~(2) & 72.06\% & 88.46\% & 90.5\% \\ %& 74.97\% & 89.83\% & 91.14\% \\
 917 & Output~(3) & 79.19\% & 89.83\% & 91.59\% \\ %& 76.69\% & 89.58\% & 91.84\% \\
 918 & Output~(4) & 91.61\% & 92.34\% & 93.47\% \\% & 82.77\% & 92.93\% & 93.48\% \\
 919 & Config. & 48.82\% & 67.80\% & 70.97\% \\%& 49.46\% & 68.94\% & 71.11\% \\
 920 & Strategy~(5) & 2.62\% & 3.43\% & 3.78\% \\% & 3.10\% & 3.10\% & 3.03\% \\
 921 \hline
 922 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.87\% & 97.99\% & 98.03\% \\ %& 98.16\% \\
 923 & Output~(2) & 95.46\% & 95.84\% & 96.75\% \\ % & 97.4\% \\
 924 & Output~(3) & 100\% & 100\% & 100\% \\%& 100\% \\
 925 & Output~(4) & 97.77\% & 97.82\% & 98.06\% \\%& 98.31\% \\
 926 & Config. & 91.36\% & 91.99\% & 93.03\% \\%& 93.98\% \\
 927 & Strategy~(5) & 3.37\% & 3.44\% & 3.29\% \\%& 3.23\% \\
 928 \hline
 929 \end{tabular}
 930 }
 931 \end{table}
 932
 933 Table~\ref{tab1}  presents the  rates  obtained for  the first  coding
 934 scheme.   For  the chaotic  data,  it can  be  seen  that as  expected
 935 configuration  prediction becomes  better  when the  number of  hidden
 936 neurons and maximum  epochs increases: an improvement by  a factor two
 937 is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
 938 25~neurons  and  500~epochs). We  also  notice  that  the learning  of
 939 outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
 940 non-chaotic  case the  simplest training  setup is  enough  to predict
 941 configurations.  For all these  feedforward network topologies and all
 942 outputs the  obtained results for the non-chaotic  case outperform the
 943 chaotic  ones. Finally,  the rates  for the  strategies show  that the
 944 different feedforward networks are unable to learn them.
 945
 946 For  the  second  coding   scheme  (\textit{i.e.},  with  Gray  Codes)
 947 Table~\ref{tab2} shows  that any network learns about  five times more
 948 non-chaotic  configurations than  chaotic  ones.  As  in the  previous
 949 scheme,       the      strategies      cannot       be      predicted.
 950 Figures~\ref{Fig:chaotic_predictions}                              and
 951 \ref{Fig:non-chaotic_predictions} present the predictions given by two
 952 feedforward multilayer  perceptrons that were  respectively trained to
 953 learn chaotic  and non-chaotic data,  using the second  coding scheme.
 954 Each figure  shows for  each sample of  the test  subset (577~samples,
 955 representing 25\%  of the 2304~samples) the  configuration that should
 956 have been predicted and the one given by the multilayer perceptron. It
 957 can be  seen that for  the chaotic data  the predictions are  far away
 958 from the  expected configurations.  Obviously,  the better predictions
 959 for the non-chaotic data reflect their regularity.
 960
 961 Let us now compare the  two coding schemes. Firstly, the second scheme
 962 disturbs  the   learning  process.   In   fact  in  this   scheme  the
 963 configuration is always expressed as  a natural number, whereas in the
 964 first one  the number  of inputs follows  the increase of  the Boolean
 965 vectors coding configurations. In this latter case, the coding gives a
 966 finer information on configuration evolution.
 967 <<<<<<< HEAD
 968 =======
 969
 970 >>>>>>> 3df586d673bc4f3b32fa0dd1cb46796256744772
 971 \begin{table}[b]
 972 \caption{Prediction success rates for configurations expressed with Gray code}
 973 \label{tab2}
 974 \centering
 975 \begin{tabular}{|c|c||c|c|c|}
 976 \hline
 977 \multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs, and one hidden layer} \\
 978 \hline
 979 \hline
 980 & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
 981 \cline{2-5}
 982 & Epochs & 125 & 250 & 500 \\ %& 1000
 983 \hline
 984 \multirow{2}{*}{Chaotic}& Config.~(1) & 13.29\% & 13.55\% & 13.08\% \\ %& 12.5\%
 985 & Strategy~(2) & 0.50\% & 0.52\% & 1.32\% \\ %& 1.42\%
 986 \hline
 987 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 77.12\% & 74.00\% & 72.60\% \\ %& 75.81\%
 988 & Strategy~(2) & 0.42\% & 0.80\% & 1.16\% \\ %& 1.42\%
 989 \hline
 990 \hline
 991 & Hidden neurons & \multicolumn{3}{c|}{25 neurons} \\
 992 \cline{2-5}
 993 & Epochs & 125 & 250 & 500 \\ %& 1000
 994 \hline
 995 \multirow{2}{*}{Chaotic}& Config.~(1) & 12.27\% & 13.15\% & 13.05\% \\ %& 15.44\%
 996 & Strategy~(2) & 0.71\% & 0.66\% & 0.88\% \\ %& 1.73\%
 997 \hline
 998 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 73.60\% & 74.70\% & 75.89\% \\ %& 68.32\%
 999 & Strategy~(2) & 0.64\% & 0.97\% & 1.23\% \\ %& 1.80\%
1000 \hline
1001 \end{tabular}
1002 \end{table}
1003
1004 \begin{figure}
1005   \centering
1006   \includegraphics[scale=0.5]{chaotic_trace2}
1007   \caption {Second coding scheme - Predictions obtained for a chaotic test subset.}
1008   \label{Fig:chaotic_predictions}
1009 \end{figure}
1010
1011 \begin{figure}
1012   \centering
1013   \includegraphics[scale=0.5]{non-chaotic_trace2}
1014   \caption{Second coding scheme - Predictions obtained for a non-chaotic test subset.}
1015   \label{Fig:non-chaotic_predictions}
1016 \end{figure}
1017
1018 Unfortunately, in  practical applications the number  of components is
1019 usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
1020 systematically.   Therefore, we  provide  a refinement  of the  second
1021 scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
1022 presents the  results for  this approach.  In  any case,  whatever the
1023 <<<<<<< HEAD
1024 considered feedforward  network topologies, the  maximum epoch number,
1025 =======
1026 considered  feedforward network topologies,  the maximum  epoch number,
1027 >>>>>>> 3df586d673bc4f3b32fa0dd1cb46796256744772
1028 and the kind of iterations, the configuration success rate is slightly
1029 improved.   Moreover, the  strategies predictions  rates  reach almost
1030 12\%, whereas in Table~\ref{tab2} they never exceed 1.5\%.  Despite of
1031 this improvement,  a long term prediction of  chaotic iterations still
1032 appear to be an open issue.
1033
1034 \begin{table}
1035 \caption{Prediction success rates for split outputs.}
1036 \label{tab3}
1037 \centering
1038 \begin{tabular}{|c||c|c|c|}
1039 \hline
1040 \multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output, and one hidden layer} \\
1041 \hline
1042 \hline
1043 Epochs & 125 & 250 & 500 \\
1044 \hline
1045 \hline
1046 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1047 \hline
1048 10~neurons & 12.39\% & 14.06\% & 14.32\% \\
1049 25~neurons & 13.00\% & 14.28\% & 14.58\% \\
1050 40~neurons & 11.58\% & 13.47\% & 14.23\% \\
1051 \hline
1052 \hline
1053 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1054 \cline{2-4}
1055 %Epochs & 125 & 250 & 500 \\
1056 \hline
1057 10~neurons & 76.01\% & 74.04\% & 78.16\% \\
1058 25~neurons & 76.60\% & 72.13\% & 75.96\% \\
1059 40~neurons & 76.34\% & 75.63\% & 77.50\% \\
1060 \hline
1061 \hline
1062 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1063 \cline{2-4}
1064 %Epochs & 125 & 250 & 500 \\
1065 \hline
1066 10~neurons & 0.76\% & 0.97\% & 1.21\% \\
1067 25~neurons & 1.09\% & 0.73\% & 1.79\% \\
1068 40~neurons & 0.90\% & 1.02\% & 2.15\% \\
1069 \hline
1070 \multicolumn{4}{c}{} \\
1071 \hline
1072 Epochs & 1000 & 2500 & 5000 \\
1073 \hline
1074 \hline
1075 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1076 \hline
1077 10~neurons & 14.51\% & 15.22\% & 15.22\% \\
1078 25~neurons & 16.95\% & 17.57\% & 18.46\% \\
1079 40~neurons & 17.73\% & 20.75\% & 22.62\% \\
1080 \hline
1081 \hline
1082 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1083 \cline{2-4}
1084 %Epochs & 1000 & 2500 & 5000 \\
1085 \hline
1086 10~neurons & 78.98\% & 80.02\% & 79.97\% \\
1087 25~neurons & 79.19\% & 81.59\% & 81.53\% \\
1088 40~neurons & 79.64\% & 81.37\% & 81.37\% \\
1089 \hline
1090 \hline
1091 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1092 \cline{2-4}
1093 %Epochs & 1000 & 2500 & 5000 \\
1094 \hline
1095 10~neurons & 3.47\% & 9.98\% & 11.66\% \\
1096 25~neurons & 3.92\% & 8.63\% & 10.09\% \\
1097 40~neurons & 3.29\% & 7.19\% & 7.18\% \\
1098 \hline
1099 \end{tabular}
1100 \end{table}
1101
1102 \section{Conclusion}
1103
1104 In  this paper,  we have  established an  equivalence  between chaotic
1105 iterations,  according to  the Devaney's  definition of  chaos,  and a
1106 class  of multilayer  perceptron  neural networks.   Firstly, we  have
1107 described how to build a neural network that can be trained to learn a
1108 given chaotic map function. Secondly,  we found a condition that allow
1109 to check whether  the iterations induced by a  function are chaotic or
1110 not, and thus  if a chaotic map is obtained.  Thanks to this condition
1111 our  approach is not  limited to  a particular  function. In  the dual
1112 case, we show that checking if a neural network is chaotic consists in
1113 verifying  a property  on an  associated  graph, called  the graph  of
1114 iterations.   These results  are valid  for recurrent  neural networks
1115 with a  particular architecture.  However,  we believe that  a similar
1116 work can be done for  other neural network architectures.  Finally, we
1117 have  discovered at  least one  family of  problems with  a reasonable
1118 size, such  that artificial neural  networks should not be  applied in
1119 the  presence  of chaos,  due  to  their  inability to  learn  chaotic
1120 behaviors in this  context.  Such a consideration is  not reduced to a
1121 theoretical detail:  this family of discrete  iterations is concretely
1122 implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
1123 steganographic   detectors  embed  tools   like  neural   networks  to
1124 distinguish between  original and stego contents, our  studies tend to
1125 prove that such  detectors might be unable to  tackle with chaos-based
1126 information  hiding  schemes.
1127
1128 In  future  work we  intend  to  enlarge  the comparison  between  the
1129 learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
1130 computational intelligence tools such  as support vector machines will
1131 be investigated  too, to  discover which tools  are the  most relevant
1132 when facing a truly chaotic phenomenon.  A comparison between learning
1133 rate  success  and  prediction  quality will  be  realized.   Concrete
1134 consequences in biology, physics, and computer science security fields
1135 <<<<<<< HEAD
1136 will then be stated.
1137 =======
1138 will then be  stated.
1139 >>>>>>> 3df586d673bc4f3b32fa0dd1cb46796256744772
1140
1141 % \appendix{}
1142
1143 % \begin{definition} \label{def2}
1144 % A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
1145 % is defined  to be  {\emph{topologically transitive}}  if for any  pair of
1146 % open  sets $U$,  $V  \in  \mathcal{X}$ there  exists
1147 % $k \in
1148 % \mathds{N}^{\ast}$
1149 %  such  that
1150 % $f^k(U) \cap V \neq \emptyset$.
1151 % \end{definition}
1152
1153 \bibliography{chaos-paper}% Produces the bibliography via BibTeX.
1154
1155 \end{document}
1156 %
1157 % ****** End of file chaos-paper.tex ******