1 % ****** Start of file chaos-paper.tex ******
6 numerical bibliography, %(default)
7 %jcp,% short, numerical bibliography,
16 \usepackage{graphicx}% Include figure files
17 \usepackage{dcolumn}% Align table columns on decimal point
18 \usepackage{bm}% bold math
21 \usepackage[mathlines]{lineno}% Enable numbering of text and display math
22 \linenumbers\relax % Commence numbering lines
23 % The amssymb package provides various useful mathematical symbols
25 % The amsthm package provides various useful mathematical fonts
27 % Various mathematical packages
28 \PassOptionsToPackage{cmex10draft}{graphics}
29 \usepackage[standard]{ntheorem}
31 \usepackage{stmaryrd} % llbracket
32 \usepackage{dsfont} % mathds
33 \usepackage{multirow} % tables
35 \newcommand{\ie}{\textit{i.e.}}
36 \newcommand{\Nats}[0]{\ensuremath{\mathds{N}}}
37 \newcommand{\R}[0]{\ensuremath{\mathds{R}}}
38 \newcommand{\Bool}[0]{\ensuremath{\mathds{B}}}
42 \title[Recurrent Neural Networks and Chaos]{Recurrent Neural Networks
43 and Chaos: Construction, Evaluation, \\
44 and Prediction Ability}
46 \author{Jacques M. Bahi}
47 \author{Jean-Fran\c{c}ois Couchot}
48 \author{Christophe Guyeux}
49 \email{christophe.guyeux@univ-fcomte.fr.}
50 \author{Michel Salomon}
51 \altaffiliation[Authors in ]{alphabetic order}
53 Computer Science Laboratory (LIFC), University of Franche-Comt\'e, \\
54 IUT de Belfort-Montb\'eliard, BP 527, \\
55 90016 Belfort Cedex, France
60 \newcommand{\CG}[1]{\begin{color}{red}\textit{#1}\end{color}}
64 Many research works deal with chaotic neural networks for various
65 fields of application. Unfortunately, up to now these networks are
66 usually claimed to be chaotic without any mathematical proof. The
67 purpose of this paper is to establish, based on a rigorous theoretical
68 framework, an equivalence between chaotic iterations according to
69 Devaney and a particular class of neural
70 networks. On the one hand we show how to build such a network, on the
71 other hand we provide a method to check if a neural network is a
72 chaotic one. Finally, the ability of classical feedforward multilayer
73 perceptrons to learn sets of data obtained from a chaotic dynamical
74 system is regarded. Various Boolean functions are iterated on finite
75 states. Iterations of some of them are proven to be chaotic
77 Devaney. In that context, important differences occur in the training
78 process, establishing with various neural networks that chaotic
79 behaviors are far more difficult to learn.
82 %%\pacs{Valid PACS appear here}% PACS, the Physics and Astronomy
83 % Classification Scheme.
84 %%\keywords{Suggested keywords}%Use showkeys class option if keyword
90 Chaotic neural networks have received a lot of attention due to the
91 appealing properties of deterministic chaos (unpredictability,
92 sensitivity, and so on). However, such networks are often claimed
93 chaotic without any rigorous mathematical proof. Therefore, in this
94 work a theoretical framework based on the Devaney's definition of
95 chaos is introduced. Starting with a relationship between chaotic
96 iterations and Devaney's chaos, we firstly show how to build a
97 recurrent neural networks that is equivalent to a chaotic map and
98 secondly a way to check whether an already available network, is
99 chaotic or not. We also study different topological properties of
100 these truly chaotic neural networks. Finally, we show that the
101 learning, with neural networks having a feedforward structure, of
102 chaotic behaviors represented by data sets obtained from chaotic maps,
103 is far more difficult than non chaotic behaviors.
107 \section{Introduction}
110 REVOIR TOUT L'INTRO et l'ABSTRACT en fonction d'asynchrone, chaotic
113 Several research works have proposed or run chaotic neural networks
114 these last years. The complex dynamics of such a networks leads to
115 various potential application areas: associative
116 memories~\cite{Crook2007267} and digital security tools like hash
117 functions~\cite{Xiao10}, digital
118 watermarking~\cite{1309431,Zhang2005759}, or cipher
119 schemes~\cite{Lian20091296}. In the former case, the background idea
120 is to control chaotic dynamics in order to store patterns, with the
121 key advantage of offering a large storage capacity. For the latter
122 case, the use of chaotic dynamics is motivated by their
123 unpredictability and random-like behaviors. Thus, investigating new
124 concepts is crucial in this field, because new threats are constantly
125 emerging. As an illustrative example, the former standard in hash
126 functions, namely the SHA-1 algorithm, has been recently weakened
127 after flaws were discovered.
129 Chaotic neural networks have been built with different approaches. In
130 the context of associative memory, chaotic neurons like the nonlinear
131 dynamic state neuron \cite{Crook2007267} frequently constitute the
132 nodes of the network. These neurons have an inherent chaotic behavior,
133 which is usually assessed through the computation of the Lyapunov
134 exponent. An alternative approach is to consider a well-known neural
135 network architecture: the MultiLayer Perceptron (MLP). These networks
136 are suitable to model nonlinear relationships between data, due to
137 their universal approximator capacity. Thus, this kind of networks can
138 be trained to model a physical phenomenon known to be chaotic such as
139 Chua's circuit \cite{dalkiran10}. Sometimes, a neural network which
140 is build by combining transfer functions and initial conditions that are both
141 chaotic, is itself claimed to be chaotic
142 \cite{springerlink:10.1007/s00521-010-0432-2}.
144 What all of these chaotic neural networks have in common is that they
145 are claimed to be chaotic despite a lack of any rigorous mathematical
146 proof. The first contribution of this paper is to fill this gap,
147 using a theoretical framework based on the Devaney's definition of chaos
148 \cite{Devaney}. This mathematical theory of chaos provides both
149 qualitative and quantitative tools to evaluate the complex behavior of
150 a dynamical system: ergodicity, expansivity, and so on. More
151 precisely, in this paper, which is an extension of a previous work
152 \cite{bgs11:ip}, we establish the equivalence between asynchronous
153 iterations and a class of globally recurrent MLP.
154 The investigation the converse problem is the second contribution:
155 we indeed study the ability for
156 classical MultiLayer Perceptrons to learn a particular family of
157 discrete chaotic dynamical systems. This family, called chaotic
158 iterations, is defined by a Boolean vector, an update function, and a
159 sequence giving which component to update at each iteration. It has
160 been previously established that such dynamical systems is
161 chaotically iterated (as it is defined by Devaney) when the chosen function has
162 a strongly connected iterations graph. In this document, we
163 experiment several MLPs and try to learn some iterations of this kind.
164 We show that non-chaotic iterations can be learned, whereas it is
165 far more difficult for chaotic ones. That is to say, we have
166 discovered at least one family of problems with a reasonable size,
167 such that artificial neural networks should not be applied
168 due to their inability to learn chaotic behaviors in this context.
170 The remainder of this research work is organized as follows. The next
171 section is devoted to the basics of Devaney's
172 chaos. Section~\ref{S2} formally describes how to build a neural
173 network that operates chaotically. Section~\ref{S3} is
174 devoted to the dual case of checking whether an existing neural network
176 Topological properties of chaotic neural networks
177 are discussed in Sect.~\ref{S4}. The
178 Section~\ref{section:translation} shows how to translate such
179 iterations into an Artificial Neural Network (ANN), in order to
180 evaluate the capability for this latter to learn chaotic behaviors.
181 This ability is studied in Sect.~\ref{section:experiments}, where
182 various ANNs try to learn two sets of data: the first one is obtained
183 by chaotic iterations while the second one results from a non-chaotic
184 system. Prediction success rates are given and discussed for the two
185 sets. The paper ends with a conclusion section where our contribution
186 is summed up and intended future work is exposed.
188 \section{Chaotic Iterations according to Devaney}
190 In this section, the well-established notion of Devaney's mathematical
191 chaos is firstly recalled. Preservation of the unpredictability of
192 such dynamical system when implemented on a computer is obtained by
193 using some discrete iterations called ``asynchronous iterations'', which
194 are thus introduced. The result establishing the link between such
195 iterations and Devaney's chaos is finally presented at the end of this
198 In what follows and for any function $f$, $f^n$ means the composition
199 $f \circ f \circ \hdots \circ f$ ($n$ times) and an \emph{iteration}
200 of a \emph{dynamical system} the step that consists in
201 updating the global state $x$ with respect to a function $f$ s.t.
204 \subsection{Devaney's chaotic dynamical systems}
206 Various domains such as physics, biology, or economy, contain systems
207 that exhibit a chaotic behavior, a well-known example is the weather.
208 These systems are in particular highly sensitive to initial
209 conditions, a concept usually presented as the butterfly effect: small
210 variations in the initial conditions possibly lead to widely different
211 behaviors. Theoretically speaking, a system is sensitive if for each
212 point $x$ in the iteration space, one can find a point in each
213 neighborhood of $x$ having a significantly different future evolution.
214 Conversely, a system seeded with the same initial conditions always
215 has the same evolution. In other words, chaotic systems have a
216 deterministic behavior defined through a physical or mathematical
217 model and a high sensitivity to the initial conditions. Besides
218 mathematically this kind of unpredictability is also referred to as
219 deterministic chaos. For example, many weather forecast models exist,
220 but they give only suitable predictions for about a week, because they
221 are initialized with conditions that reflect only a partial knowledge
222 of the current weather. Even the differences are initially small,
223 they are amplified in the course of time, and thus make difficult a
224 long-term prediction. In fact, in a chaotic system, an approximation
225 of the current state is a quite useless indicator for predicting
228 From mathematical point of view, deterministic chaos has been
229 thoroughly studied these last decades, with different research works
230 that have provide various definitions of chaos. Among these
231 definitions, the one given by Devaney~\cite{Devaney} is
232 well-established. This definition consists of three conditions:
233 topological transitivity, density of periodic points, and sensitive
234 point dependence on initial conditions.
236 Topological transitivity is checked when, for any point, any
237 neighborhood of its future evolution eventually overlap with any other
238 given region. More precisely,
240 \begin{definition} \label{def2}
241 A continuous function $f$ on a topological space $(\mathcal{X},\tau)$
242 is defined to be {\emph{topologically transitive}} if for any pair of
243 open sets $U$, $V \in \mathcal{X}$ there exists
247 $f^k(U) \cap V \neq \emptyset$.
250 This property implies that a dynamical system cannot be broken into
252 Intuitively, its complexity does not allow any simplification.
253 On the contrary, a dense set of periodic points is an
254 element of regularity that a chaotic dynamical system has to exhibit.
256 \begin{definition} \label{def3}
257 A point $x$ is called a {\emph{periodic point}} for $f$ of period~$n \in
258 \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
261 \begin{definition} \label{def4}
262 $f$ is said to be {\emph{ regular}} on $(\mathcal{X},\tau)$ if the set of
263 periodic points for $f$ is dense in $\mathcal{X}$ ( for any $x \in
264 \mathcal{X}$, we can find at least one periodic point in any of its
268 This regularity ``counteracts'' the effects of transitivity. Thus,
269 due to these two properties, two points close to each other can behave
270 in a completely different manner, leading to unpredictability for the
273 \begin{definition} \label{sensitivity}
274 $f$ has {\emph{ sensitive dependence on initial conditions}} if there
275 exists $\delta >0$ such that, for any $x\in \mathcal{X}$ and any
276 neighborhood $V$ of $x$, there exist $y\in V$ and $n > 0$ such that
277 $d\left(f^{n}(x), f^{n}(y)\right) >\delta $. The value $\delta$ is called the
278 {\emph{constant of sensitivity}} of $f$.
283 \begin{definition} \label{def5}
284 The dynamical system that iterates $f$ is {\emph{ chaotic according to Devaney}}
285 on $(\mathcal{X},\tau)$ if $f$ is regular, topologically transitive,
286 and has sensitive dependence to its initial conditions.
289 In what follows, iterations are said to be \emph{chaotic according Devaney}
290 when corresponding dynamical system is chaotic according Devaney.
293 %Let us notice that for a metric space the last condition follows from
294 %the two first ones~\cite{Banks92}.
296 \subsection{Asynchronous Iterations}
298 %This section presents some basics on topological chaotic iterations.
299 Let us firstly discuss about the domain of iteration. As far as we
300 know, no result rules that the chaotic behavior of a dynamical system
301 that has been theoretically proven on $\R$ remains valid on the
303 numbers, which is the implementation domain. Thus, to avoid loss of
304 chaos this work presents an alternative, that is to iterate Boolean
305 maps: results that are theoretically obtained in that domain are
306 preserved in implementations.
308 Let us denote by $\llbracket a ; b \rrbracket$ the following interval
309 of integers: $\{a, a+1, \hdots, b\}$, where $a~<~b$.
310 In that section, a system
311 under consideration iteratively modifies a collection of
312 $n$~components. Each component $i \in \llbracket 1; n \rrbracket$
313 takes its value $x_i$ among the domain $\Bool=\{0,1\}$. A~{\it
314 configuration} of the system at discrete time $t$ is the vector
316 $x^{t}=(x_1^{t},\ldots,x_{n}^{t}) \in \Bool^n$.
318 The dynamics of the system is described according to a function $f :
319 \Bool^n \rightarrow \Bool^n$ such that
321 $f(x)=(f_1(x),\ldots,f_n(x))$.
323 % Notice that $f^k$ denotes the
324 % $k-$th composition $f\circ \ldots \circ f$ of the function $f$.
326 Let be given a configuration $x$. In what follows
327 $N(i,x)=(x_1,\ldots,\overline{x_i},\ldots,x_n)$ is the configuration
328 obtained by switching the $i-$th component of $x$ ($\overline{x_i}$ is
329 indeed the negation of $x_i$). Intuitively, $x$ and $N(i,x)$ are
330 neighbors. The discrete iterations of $f$ are represented by the
331 oriented {\it graph of iterations} $\Gamma(f)$. In such a graph,
332 vertices are configurations of $\Bool^n$ and there is an arc labeled
333 $i$ from $x$ to $N(i,x)$ if and only if $f_i(x)$ is $N(i,x)$.
335 In the sequel, the {\it strategy} $S=(S^{t})^{t \in \Nats}$ is the
336 sequence defining which component to update at time $t$ and $S^{t}$
337 denotes its $t-$th term.
338 This iteration scheme that only modifies one element at each iteration
339 is clasically refered as \emph{asynchronous iterations}.
340 More préciselly, we have here
344 f_i(x^t) \textrm{ if $S^t = i$} \\
345 x_i^t \textrm{ otherwise}
350 Next section shows the link between asynchronous iterations and
353 \subsection{On the link between asynchronous iterations and
356 In this subsection we recall the link we have established between
357 asynchronous iterations and Devaney's chaos. The theoretical framework is
358 fully described in \cite{guyeux09}.
360 We introduce the function $F_{f}$ that is
361 defined for any given application $f:\Bool^{n} \to \Bool^{n}$ by
362 $F_{f}: \llbracket1;n\rrbracket\times \mathds{B}^{n} \rightarrow
363 \mathds{B}^{n}$, s.t.
369 f_j(x) \textrm{ if } j= s \enspace , \\
370 x_{j} \textrm{ otherwise} \enspace .
375 \noindent With such a notation, configurations
376 asynchronously obtained are defined for times
378 \begin{equation}\label{eq:sync}
379 \left\{\begin{array}{l}
380 x^{0}\in \mathds{B}^{n} \textrm{ and}\\
381 x^{t+1}=F_{f}(S^t,x^{t}) \enspace .
385 \noindent Finally, iterations defined in Eq.~(\ref{eq:sync}) can be
386 described by the following system:
390 X^{0} & = & ((S^t)^{t \in \Nats},x^0) \in
391 \llbracket1;n\rrbracket^\Nats \times \Bool^{n}\\
392 X^{k+1}& = & G_{f}(X^{k})\\
393 \multicolumn{3}{c}{\textrm{where } G_{f}\left(((S^t)^{t \in \Nats},x)\right)
394 = \left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right) \enspace ,}
399 where $\sigma$ is the function that removes the first term of the
400 strategy ({\it i.e.},~$S^0$).
401 This definition allows to links asynchronous iterations with
402 classical iterations of a dynamical system.
406 %component of the system is updated at an iteration: the $S^t$-th
407 %element. But it can be extended by considering subsets for $S^t$.
410 To study topological properties of these iteations, we are then left to
411 introduce a {\emph{ distance}} $d$ between two points $(S,x)$ and
412 $(\check{S},\check{x})\in \mathcal{X} = \llbracket1;n\rrbracket^\Nats.
413 \times \Bool^{n}$. It is defined by
415 d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
420 d_{e}(x,\check{x})=\sum_{j=1}^{n}\Delta
421 (x_{j},\check{x}_{j}) \in \llbracket 0 ; n \rrbracket
425 d_{s}(S,\check{S})=\frac{9}{2n}\sum_{t=0}^{\infty
426 }\frac{|S^{t}-\check{S}^{t}|}{10^{t+1}} \in [0 ; 1] \enspace .
429 Notice that the more two systems have different components,
430 the larger the distance between them is. Secondly, two systems with
431 similar components and strategies, which have the same starting terms,
432 must induce only a small distance. The proposed distance fulfill
433 these requirements: on the one hand its floor value reflects the
434 difference between the cells, on the other hand its fractional part
435 measures the difference between the strategies.
437 The relation between $\Gamma(f)$ and $G_f$ is clear: there exists a
438 path from $x$ to $x'$ in $\Gamma(f)$ if and only if there exists a
439 strategy $s$ such that the parallel iteration of $G_f$ from the
440 initial point $(s,x)$ reaches the configuration $x'$. Using this
441 link, Guyeux~\cite{GuyeuxThese10} has proven that,
442 \begin{theorem}%[Characterization of $\mathcal{C}$]
443 \label{Th:Caracterisation des IC chaotiques}
444 Let $f:\Bool^n\to\Bool^n$. Iterations of $G_f$ are chaotic according
445 to Devaney if and only if $\Gamma(f)$ is strongly connected.
448 Checking if a graph is strongly connected is not difficult
449 (by the Tarjan's algorithm for instance).
450 Let be given a strategy $S$ and a function $f$ such that
451 $\Gamma(f)$ is strongly connected.
452 In that case, iterations of the function $G_f$ as defined in
453 Eq.~(\ref{eq:Gf}) are chaotic according to Devaney.
456 Let us then consider two function $f_0$ and $f_1$ both in
457 $\Bool^n\to\Bool^n$ defined as follows that are used all along this article.
458 The former is the vectorial negation, \textit{i.e.},
459 $f_{0}(x_{1},\dots,x_{n}) =(\overline{x_{1}},\dots,\overline{x_{n}})$.
460 The latter is $f_1\left(x_1,\dots,x_n\right)=\left(
461 \overline{x_1},x_1,x_2,\dots,x_{n-1}\right)$.
462 It is not hard to see that $\Gamma(f_0)$ and $\Gamma(f_1)$ are
463 both strongly connected, then iterations of $G_{f_0}$ and of
464 $G_{f_1}$ are chaotic according to Devaney.
466 With this material, we are now able to build a first chaotic neural
467 network, as defined in the Devaney's formulation.
469 \section{A chaotic neural network in the sense of Devaney}
472 Firstly, let us build a
473 multilayer perceptron neural network modeling
474 $F_{f_0}:\llbracket 1; n \rrbracket \times \mathds{B}^n \to
475 \mathds{B}^n$ associated to the vectorial negation.
476 More precisely, for all inputs
477 $(s,x) \in \llbracket 1;n\rrbracket \times \mathds{B}^n$,
478 the output layer produces $F_{f_0}(s,x)$. It is then possible to
479 link the output layer and the input one, in order to model the
480 dependence between two successive iterations. As a result we obtain a
481 global recurrent neural network that behaves as follows (see
482 Fig.~\ref{Fig:perceptron}).
485 \item The network is initialized with the input vector
486 $\left(S^0,x^0\right) \in \llbracket 1;n\rrbracket \times
487 \mathds{B}^n$ and computes the output vector
488 $x^1=F_{f_0}\left(S^0,x^0\right)$. This last vector is published as
489 an output one of the chaotic neural network and is sent back to the
490 input layer through the feedback links.
491 \item When the network is activated at the $t^{th}$ iteration, the
492 state of the system $x^t \in \mathds{B}^n$ received from the output
493 layer and the initial term of the sequence $(S^t)^{t \in \Nats}$
494 ($S^0 \in \llbracket 1;n\rrbracket$) are used to compute the new
495 output. This new output, which represents the new state of the
496 dynamical system, satisfies:
498 x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
504 \includegraphics[scale=0.625]{perceptron}
505 \caption{A perceptron equivalent to chaotic iterations}
506 \label{Fig:perceptron}
509 The behavior of the neural network is such that when the initial state
510 is $x^0~\in~\mathds{B}^n$ and a sequence $(S^t)^{t \in \Nats}$ is
511 given as outside input, then the sequence of successive published
512 output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
513 the one produced by the chaotic iterations formally described in
514 Eq.~(\ref{eq:CIs}). It means that mathematically if we use similar
515 input vectors they both generate the same successive outputs
516 $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$, and therefore that they
517 are equivalent reformulations of the iterations of $G_{f_0}$ in
518 $\mathcal{X}$. Finally, since the proposed neural network is built to
519 model the behavior of $G_{f_0}$, which is chaotic according to
520 Devaney's definition of chaos, we can conclude that the network is
521 also chaotic in this sense.
523 The previous construction scheme is not restricted to function $f_0$.
524 It can be extended to any function $f$ such that $G_f$ is a chaotic
525 map by training the network to model $F_{f}:\llbracket 1; n \rrbracket
526 \times \mathds{B}^n \to \mathds{B}^n$. Due to
527 Theorem~\ref{Th:Caracterisation des IC chaotiques}, we can find
528 alternative functions $f$ for $f_0$ through a simple check of their
529 graph of iterations $\Gamma(f)$. For example, we can build another
530 chaotic neural network by using $f_1$ instead of $f_0$.
532 \section{Checking whether a neural network is chaotic or not}
535 We focus now on the case where a neural network is already available,
536 and for which we want to know if it is chaotic. Typically, in
537 many research papers neural network are usually claimed to be chaotic
538 without any convincing mathematical proof. We propose an approach to
539 overcome this drawback for a particular category of multilayer
540 perceptrons defined below, and for the Devaney's formulation of chaos.
541 In spite of this restriction, we think that this approach can be
542 extended to a large variety of neural networks. We plan to study a
543 generalization of this approach in a future work.
545 We consider a multilayer perceptron of the following form: inputs
546 are $n$ binary digits and one integer value, while outputs are $n$
547 bits. Moreover, each binary output is connected with a feedback
548 connection to an input one.
551 \item During initialization, the network is seeded with $n$~bits denoted
552 $\left(x^0_1,\dots,x^0_n\right)$ and an integer value $S^0$ that
553 belongs to $\llbracket1;n\rrbracket$.
554 \item At iteration~$t$, the last output vector
555 $\left(x^t_1,\dots,x^t_n\right)$ defines the $n$~bits used to
556 compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
557 While the remaining input receives a new integer value $S^t \in
558 \llbracket1;n\rrbracket$, which is provided by the outside world.
561 The topological behavior of these particular neural networks can be
562 proven to be chaotic through the following process. Firstly, we denote
563 by $F: \llbracket 1;n \rrbracket \times \mathds{B}^n \rightarrow
564 \mathds{B}^n$ the function that maps the value
565 $\left(s,\left(x_1,\dots,x_n\right)\right) \in \llbracket 1;n
566 \rrbracket \times \mathds{B}^n$ into the value
567 $\left(y_1,\dots,y_n\right) \in \mathds{B}^n$, where
568 $\left(y_1,\dots,y_n\right)$ is the response of the neural network
569 after the initialization of its input layer with
570 $\left(s,\left(x_1,\dots, x_n\right)\right)$. Secondly, we define $f:
571 \mathds{B}^n \rightarrow \mathds{B}^n$ such that
572 $f\left(x_1,x_2,\dots,x_n\right)$ is equal to
574 \left(F\left(1,\left(x_1,x_2,\dots,x_n\right)\right),\dots,
575 F\left(n,\left(x_1,x_2,\dots,x_n\right)\right)\right) \enspace .
577 Then $F=F_f$. If this recurrent neural network is seeded with
578 $\left(x_1^0,\dots,x_n^0\right)$ and $S \in \llbracket 1;n
579 \rrbracket^{\mathds{N}}$, it produces exactly the
580 same output vectors than the
581 chaotic iterations of $F_f$ with initial
582 condition $\left(S,(x_1^0,\dots, x_n^0)\right) \in \llbracket 1;n
583 \rrbracket^{\mathds{N}} \times \mathds{B}^n$.
584 Theoretically speaking, such iterations of $F_f$ are thus a formal model of
585 these kind of recurrent neural networks. In the rest of this
586 paper, we will call such multilayer perceptrons CI-MLP($f$), which
587 stands for ``Chaotic Iterations based MultiLayer Perceptron''.
589 Checking if CI-MLP($f$) behaves chaotically according to Devaney's
590 definition of chaos is simple: we need just to verify if the
591 associated graph of iterations $\Gamma(f)$ is strongly connected or
592 not. As an incidental consequence, we finally obtain an equivalence
593 between chaotic iterations and CI-MLP($f$). Therefore, we can
594 obviously study such multilayer perceptrons with mathematical tools
595 like topology to establish, for example, their convergence or,
596 contrarily, their unpredictable behavior. An example of such a study
597 is given in the next section.
599 \section{Topological properties of chaotic neural networks}
602 Let us first recall two fundamental definitions from the mathematical
605 \begin{definition} \label{def8}
606 A function $f$ is said to be {\emph{ expansive}} if $\exists
607 \varepsilon>0$, $\forall x \neq y$, $\exists n \in \mathds{N}$ such
608 that $d\left(f^n(x),f^n(y)\right) \geq \varepsilon$.
611 \begin{definition} \label{def9}
612 A discrete dynamical system is said to be {\emph{ topologically mixing}}
613 if and only if, for any pair of disjoint open sets $U$,$V \neq
614 \emptyset$, we can find some $n_0 \in \mathds{N}$ such that for any $n$,
615 $n\geq n_0$, we have $f^n(U) \cap V \neq \emptyset$.
618 As proven in Ref.~\cite{gfb10:ip}, chaotic iterations are expansive
619 and topologically mixing when $f$ is the vectorial negation $f_0$.
620 Consequently, these properties are inherited by the CI-MLP($f_0$)
621 recurrent neural network previously presented, which induce a greater
622 unpredictability. Any difference on the initial value of the input
623 layer is in particular magnified up to be equal to the expansivity
626 Let us then focus on the consequences for a neural network to be chaotic
627 according to Devaney's definition. First of all, the topological
628 transitivity property implies indecomposability.
630 \begin{definition} \label{def10}
631 A dynamical system $\left( \mathcal{X}, f\right)$ is
632 {\emph{not decomposable}} if it is not the union of two closed sets $A, B
633 \subset \mathcal{X}$ such that $f(A) \subset A, f(B) \subset B$.
636 \noindent Hence, reducing the set of outputs generated by CI-MLP($f$),
637 in order to simplify its complexity, is impossible if $\Gamma(f)$ is
638 strongly connected. Moreover, under this hypothesis CI-MLPs($f$) are
641 \begin{definition} \label{def11}
642 A dynamical system $\left( \mathcal{X}, f\right)$ is {\emph{ strongly
643 transitive}} if $\forall x,y \in \mathcal{X}$, $\forall r>0$, $\exists
644 z \in \mathcal{X}$, $d(z,x)~\leq~r \Rightarrow \exists n \in
645 \mathds{N}^{\ast}$, $f^n(z)=y$.
647 According to this definition, for all pairs of points $(x, y)$ in the
648 phase space, a point $z$ can be found in the neighborhood of $x$ such
649 that one of its iterates $f^n(z)$ is $y$. Indeed, this result has been
650 established during the proof of the transitivity presented in
651 Ref.~\cite{guyeux09}. Among other things, the strong transitivity
652 leads to the fact that without the knowledge of the initial input
653 layer, all outputs are possible. Additionally, no point of the output
654 space can be discarded when studying CI-MLPs: this space is
655 intrinsically complicated and it cannot be decomposed or simplified.
657 Furthermore, those recurrent neural networks exhibit the instability
660 A dynamical system $\left( \mathcal{X}, f\right)$ is unstable if for
661 all $x \in \mathcal{X}$, the orbit $\gamma_x:n \in \mathds{N}
662 \longmapsto f^n(x)$ is unstable, that means: $\exists \varepsilon >
663 0$, $\forall \delta>0$, $\exists y \in \mathcal{X}$, $\exists n \in
664 \mathds{N}$, such that $d(x,y)<\delta$ and
665 $d\left(\gamma_x(n),\gamma_y(n)\right) \geq \varepsilon$.
667 This property, which is implied by the sensitive point dependence on
668 initial conditions, leads to the fact that in all neighborhoods of any
669 point $x$, there are points that can be apart by $\varepsilon$ in the
670 future through iterations of the CI-MLP($f$). Thus, we can claim that
671 the behavior of these MLPs is unstable when $\Gamma (f)$ is strongly
674 Let us now consider a compact metric space $(M, d)$ and $f: M
675 \rightarrow M$ a continuous map. For each natural number $n$, a new
676 metric $d_n$ is defined on $M$ by
677 $$d_n(x,y)=\max\{d(f^i(x),f^i(y)): 0\leq i<n\} \enspace .$$
679 Given any $\varepsilon > 0$ and $n \geqslant 1$, two points of $M$ are
680 $\varepsilon$-close with respect to this metric if their first $n$
681 iterates are $\varepsilon$-close.
683 This metric allows one to distinguish in a neighborhood of an orbit
684 the points that move away from each other during the iteration from
685 the points that travel together. A subset $E$ of $M$ is said to be
686 $(n, \varepsilon)$-separated if each pair of distinct points of $E$ is
687 at least $\varepsilon$ apart in the metric $d_n$. Denote by $H(n,
688 \varepsilon)$ the maximum cardinality of an $(n,
689 \varepsilon)$-separated set,
691 The {\it topological entropy} of the map $f$ is defined by (see e.g.,
692 Ref.~\cite{Adler65} or Ref.~\cite{Bowen})
693 $$h(f)=\lim_{\varepsilon\to 0} \left(\limsup_{n\to \infty}
694 \frac{1}{n}\log H(n,\varepsilon)\right) \enspace .$$
697 Then we have the following result \cite{GuyeuxThese10},
699 $\left( \mathcal{X},d\right)$ is compact and the topological entropy
700 of $(\mathcal{X},G_{f_0})$ is infinite.
705 \includegraphics[scale=0.625]{scheme}
706 \caption{Summary of addressed neural networks and chaos problems}
710 The Figure~\ref{Fig:scheme} is a summary of addressed neural networks and chaos problems.
711 Section~\ref{S2} has explained how to construct a truly chaotic neural
712 networks $A$ for instance.
713 Section~\ref{S3} has shown how to check whether a given MLP
714 $A$ or $C$ is chaotic or not in the sens of Devaney.
715 %, and how to study its topological behavior.
716 The last thing to investigate, when comparing
717 neural networks and Devaney's chaos, is to determine whether
718 an artificial neural network $A$ is able to learn or predict some chaotic
719 behaviors of $B$, as it is defined in the Devaney's formulation (when they
720 are not specifically constructed for this purpose). This statement is
721 studied in the next section.
729 \section{Suitability of Artificial Neural Networks
730 for Predicting Chaotic Behaviors}
732 In the context of computer science different topic areas have an
733 interest in chaos, as for steganographic
734 techniques~\cite{1309431,Zhang2005759}. Steganography consists in
735 embedding a secret message within an ordinary one, while the secret
736 extraction takes place once at destination. The reverse ({\it i.e.},
737 automatically detecting the presence of hidden messages inside media)
738 is called steganalysis. Among the deployed strategies inside
739 detectors, there are support vectors
740 machines~\cite{Qiao:2009:SM:1704555.1704664}, neural
741 networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36}, and
742 Markov chains~\cite{Sullivan06steganalysisfor}. Most of these
743 detectors give quite good results and are rather competitive when
744 facing steganographic tools. However, to the best of our knowledge
745 none of the considered information hiding schemes fulfills the Devaney
746 definition of chaos~\cite{Devaney}. Indeed, one can wonder whether
747 detectors continue to give good results when facing truly chaotic
748 schemes. More generally, there remains the open problem of deciding
749 whether artificial intelligence is suitable for predicting topological
752 \subsection{Representing Chaotic Iterations for Neural Networks}
753 \label{section:translation}
755 The problem of deciding whether classical feedforward ANNs are
756 suitable to approximate topological chaotic iterations may then be
757 reduced to evaluate ANNs on iterations of functions with Strongly
758 Connected Component (SCC)~graph of iterations. To compare with
759 non-chaotic iterations, the experiments detailed in the following
760 sections are carried out using both kinds of function (chaotic and
761 non-chaotic). Let us emphasize on the difference between this kind of
762 neural networks and the Chaotic Iterations based MultiLayer
765 We are then left to compute two disjoint function sets that contain
766 either functions with topological chaos properties or not, depending
767 on the strong connectivity of their iterations graph. This can be
768 achieved for instance by removing a set of edges from the iteration
769 graph $\Gamma(f_0)$ of the vectorial negation function~$f_0$. One can
770 deduce whether a function verifies the topological chaos property or
771 not by checking the strong connectivity of the resulting graph of
774 For instance let us consider the functions $f$ and $g$ from $\Bool^4$
775 to $\Bool^4$ respectively defined by the following lists:
776 $$[0, 0, 2, 3, 13, 13, 6, 3, 8, 9, 10, 11, 8, 13, 14,
777 15]$$ $$\mbox{and } [11, 14, 13, 14, 11, 10, 1, 8, 7, 6, 5, 4, 3, 2,
778 1, 0] \enspace.$$ In other words, the image of $0011$ by $g$ is
779 $1110$: it is obtained as the binary value of the fourth element in
780 the second list (namely~14). It is not hard to verify that
781 $\Gamma(f)$ is not SCC (\textit{e.g.}, $f(1111)$ is $1111$) whereas
782 $\Gamma(g)$ is. Next section shows how to translate iterations of
783 such functions into a model amenable to be learned by an ANN.
785 This section presents how (not) chaotic iterations of $G_f$ are
786 translated into another model more suited to artificial neural
787 networks. Formally, input and output vectors are pairs~$((S^t)^{t \in
788 \Nats},x)$ and $\left(\sigma((S^t)^{t \in \Nats}),F_{f}(S^0,x)\right)$
789 as defined in~Eq.~(\ref{eq:Gf}).
791 Firstly, let us focus on how to memorize configurations. Two distinct
792 translations are proposed. In the first case, we take one input in
793 $\Bool$ per component; in the second case, configurations are
794 memorized as natural numbers. A coarse attempt to memorize
795 configuration as natural number could consist in labeling each
796 configuration with its translation into decimal numeral system.
797 However, such a representation induces too many changes between a
798 configuration labeled by a power of two and its direct previous
799 configuration: for instance, 16~(10000) and 15~(01111) are closed in a
800 decimal ordering, but their Hamming distance is 5. This is why Gray
801 codes~\cite{Gray47} have been preferred.
803 Let us secondly detail how to deal with strategies. Obviously, it is not
804 possible to translate in a finite way an infinite strategy, even if
805 both $(S^t)^{t \in \Nats}$ and $\sigma((S^t)^{t \in \Nats})$ belong to
806 $\{1,\ldots,n\}^{\Nats}$. Input strategies are then reduced to have a
807 length of size $l \in \llbracket 2,k\rrbracket$, where $k$ is a
808 parameter of the evaluation. Notice that $l$ is greater than or equal
809 to $2$ since we do not want the shift $\sigma$~function to return an
810 empty strategy. Strategies are memorized as natural numbers expressed
811 in base $n+1$. At each iteration, either none or one component is
812 modified (among the $n$ components) leading to a radix with $n+1$
813 entries. Finally, we give an other input, namely $m \in \llbracket
814 1,l-1\rrbracket$, which is the number of successive iterations that
815 are applied starting from $x$. Outputs are translated with the same
818 To address the complexity issue of the problem, let us compute the
819 size of the data set an ANN has to deal with. Each input vector of an
820 input-output pair is composed of a configuration~$x$, an excerpt $S$
821 of the strategy to iterate of size $l \in \llbracket 2, k\rrbracket$,
822 and a number $m \in \llbracket 1, l-1\rrbracket$ of iterations that
825 Firstly, there are $2^n$ configurations $x$, with $n^l$ strategies of
826 size $l$ for each of them. Secondly, for a given configuration there
827 are $\omega = 1 \times n^2 + 2 \times n^3 + \ldots+ (k-1) \times n^k$
828 ways of writing the pair $(m,S)$. Furthermore, it is not hard to
831 \displaystyle{(n-1) \times \omega = (k-1)\times n^{k+1} - \sum_{i=2}^k n^i} \nonumber
836 \dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2} \enspace . \nonumber
838 \noindent And then, finally, the number of input-output pairs for our
841 2^n \times \left(\dfrac{(k-1)\times n^{k+1}}{n-1} - \dfrac{n^{k+1}-n^2}{(n-1)^2}\right) \enspace .
843 For instance, for $4$ binary components and a strategy of at most
844 $3$~terms we obtain 2304~input-output pairs.
846 \subsection{Experiments}
847 \label{section:experiments}
849 To study if chaotic iterations can be predicted, we choose to train
850 the MultiLayer Perceptron. As stated before, this kind of network is
851 in particular well-known for its universal approximation
852 property. Furthermore, MLPs have been already considered for chaotic
853 time series prediction. For example, in~\cite{dalkiran10} the authors
854 have shown that a feedforward MLP with two hidden layers, and trained
855 with Bayesian Regulation back-propagation, can learn successfully the
856 dynamics of Chua's circuit.
858 In these experiments we consider MLPs having one hidden layer of
859 sigmoidal neurons and output neurons with a linear activation
860 function. They are trained using the Limited-memory
861 Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
862 with the Wolfe linear search. The training process is performed until
863 a maximum number of epochs is reached. To prevent overfitting and to
864 estimate the generalization performance we use holdout validation by
865 splitting the data set into learning, validation, and test subsets.
866 These subsets are obtained through random selection such that their
867 respective size represents 65\%, 10\%, and 25\% of the whole data set.
869 Several neural networks are trained for both iterations coding
870 schemes. In both cases iterations have the following layout:
871 configurations of four components and strategies with at most three
872 terms. Thus, for the first coding scheme a data set pair is composed
873 of 6~inputs and 5~outputs, while for the second one it is respectively
874 3~inputs and 2~outputs. As noticed at the end of the previous section,
875 this leads to data sets that consist of 2304~pairs. The networks
876 differ in the size of the hidden layer and the maximum number of
877 training epochs. We remember that to evaluate the ability of neural
878 networks to predict a chaotic behavior for each coding scheme, the
879 trainings of two data sets, one of them describing chaotic iterations,
882 Thereafter we give, for the different learning setups and data sets,
883 the mean prediction success rate obtained for each output. These
884 values are computed considering 10~trainings with random subsets
885 construction, weights and biases initialization. Firstly, neural
886 networks having 10 and 25~hidden neurons are trained, with a maximum
887 number of epochs that takes its value in $\{125,250,500\}$ (see
888 Tables~\ref{tab1} and \ref{tab2}). Secondly, we refine the second
889 coding scheme by splitting the output vector such that each output is
890 learned by a specific neural network (Table~\ref{tab3}). In this last
891 case, we increase the size of the hidden layer up to 40~neurons, and
892 we consider larger number of epochs.
895 \caption{Prediction success rates for configurations expressed as boolean vectors.}
898 \begin{tabular}{|c|c||c|c|c|}
900 \multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
903 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
905 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\
907 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 90.92\% & 91.75\% & 91.82\% \\
908 & Output~(2) & 69.32\% & 78.46\% & 82.15\% \\
909 & Output~(3) & 68.47\% & 78.49\% & 82.22\% \\
910 & Output~(4) & 91.53\% & 92.37\% & 93.4\% \\
911 & Config. & 36.10\% & 51.35\% & 56.85\% \\
912 & Strategy~(5) & 1.91\% & 3.38\% & 2.43\% \\
914 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.64\% & 98.10\% & 98.20\% \\
915 & Output~(2) & 95.15\% & 95.39\% & 95.46\% \\
916 & Output~(3) & 100\% & 100\% & 100\% \\
917 & Output~(4) & 97.47\% & 97.90\% & 97.99\% \\
918 & Config. & 90.52\% & 91.59\% & 91.73\% \\
919 & Strategy~(5) & 3.41\% & 3.40\% & 3.47\% \\
922 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{25 neurons} \\ %& \multicolumn{3}{|c|}{40 neurons} \\
924 \multicolumn{2}{|c||}{Epochs} & 125 & 250 & 500 \\ %& 125 & 250 & 500 \\
926 \multirow{6}{*}{\rotatebox{90}{Chaotic}}&Output~(1) & 91.65\% & 92.69\% & 93.93\% \\ %& 91.94\% & 92.89\% & 94.00\% \\
927 & Output~(2) & 72.06\% & 88.46\% & 90.5\% \\ %& 74.97\% & 89.83\% & 91.14\% \\
928 & Output~(3) & 79.19\% & 89.83\% & 91.59\% \\ %& 76.69\% & 89.58\% & 91.84\% \\
929 & Output~(4) & 91.61\% & 92.34\% & 93.47\% \\% & 82.77\% & 92.93\% & 93.48\% \\
930 & Config. & 48.82\% & 67.80\% & 70.97\% \\%& 49.46\% & 68.94\% & 71.11\% \\
931 & Strategy~(5) & 2.62\% & 3.43\% & 3.78\% \\% & 3.10\% & 3.10\% & 3.03\% \\
933 \multirow{6}{*}{\rotatebox{90}{Non-chaotic}}&Output~(1) & 97.87\% & 97.99\% & 98.03\% \\ %& 98.16\% \\
934 & Output~(2) & 95.46\% & 95.84\% & 96.75\% \\ % & 97.4\% \\
935 & Output~(3) & 100\% & 100\% & 100\% \\%& 100\% \\
936 & Output~(4) & 97.77\% & 97.82\% & 98.06\% \\%& 98.31\% \\
937 & Config. & 91.36\% & 91.99\% & 93.03\% \\%& 93.98\% \\
938 & Strategy~(5) & 3.37\% & 3.44\% & 3.29\% \\%& 3.23\% \\
944 Table~\ref{tab1} presents the rates obtained for the first coding
945 scheme. For the chaotic data, it can be seen that as expected
946 configuration prediction becomes better when the number of hidden
947 neurons and maximum epochs increases: an improvement by a factor two
948 is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
949 25~neurons and 500~epochs). We also notice that the learning of
950 outputs~(2) and~(3) is more difficult. Conversely, for the
951 non-chaotic case the simplest training setup is enough to predict
952 configurations. For all network topologies and all outputs the
953 obtained results for the non-chaotic case outperform the chaotic
954 ones. Finally, the rates for the strategies show that the different
955 networks are unable to learn them.
957 For the second coding scheme (\textit{i.e.}, with Gray Codes)
958 Table~\ref{tab2} shows that any network
959 learns about five times more non-chaotic configurations than chaotic
960 ones. As in the previous scheme, the strategies cannot be predicted.
962 Let us now compare the two coding schemes. Firstly, the second scheme
963 disturbs the learning process. In fact in this scheme the
964 configuration is always expressed as a natural number, whereas in the
965 first one the number of inputs follows the increase of the boolean
966 vectors coding configurations. In this latter case, the coding gives a
967 finer information on configuration evolution.
970 \caption{Prediction success rates for configurations expressed with Gray code}
973 \begin{tabular}{|c|c||c|c|c|}
975 \multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
978 & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
980 & Epochs & 125 & 250 & 500 \\ %& 1000
982 \multirow{2}{*}{Chaotic}& Config.~(1) & 13.29\% & 13.55\% & 13.08\% \\ %& 12.5\%
983 & Strategy~(2) & 0.50\% & 0.52\% & 1.32\% \\ %& 1.42\%
985 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 77.12\% & 74.00\% & 72.60\% \\ %& 75.81\%
986 & Strategy~(2) & 0.42\% & 0.80\% & 1.16\% \\ %& 1.42\%
989 & Hidden neurons & \multicolumn{3}{c|}{25 neurons} \\
991 & Epochs & 125 & 250 & 500 \\ %& 1000
993 \multirow{2}{*}{Chaotic}& Config.~(1) & 12.27\% & 13.15\% & 13.05\% \\ %& 15.44\%
994 & Strategy~(2) & 0.71\% & 0.66\% & 0.88\% \\ %& 1.73\%
996 \multirow{2}{*}{Non-Chaotic}&Config.~(1) & 73.60\% & 74.70\% & 75.89\% \\ %& 68.32\%
997 & Strategy~(2) & 0.64\% & 0.97\% & 1.23\% \\ %& 1.80\%
1002 Unfortunately, in practical applications the number of components is
1003 usually unknown. Hence, the first coding scheme cannot be used
1004 systematically. Therefore, we provide a refinement of the second
1005 scheme: each output is learned by a different ANN. Table~\ref{tab3}
1006 presents the results for this approach. In any case, whatever the
1007 network topologies, the maximum epoch number and the kind of
1008 iterations, the configuration success rate is slightly improved.
1009 Moreover, the strategies predictions rates reach almost 12\%, whereas
1010 in Table~\ref{tab2} they never exceed 1.5\%. Despite of this
1011 improvement, a long term prediction of chaotic iterations still
1016 \caption{Prediction success rates for split outputs.}
1019 \begin{tabular}{|c||c|c|c|}
1021 \multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
1024 Epochs & 125 & 250 & 500 \\
1027 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1029 10~neurons & 12.39\% & 14.06\% & 14.32\% \\
1030 25~neurons & 13.00\% & 14.28\% & 14.58\% \\
1031 40~neurons & 11.58\% & 13.47\% & 14.23\% \\
1034 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1036 %Epochs & 125 & 250 & 500 \\
1038 10~neurons & 76.01\% & 74.04\% & 78.16\% \\
1039 25~neurons & 76.60\% & 72.13\% & 75.96\% \\
1040 40~neurons & 76.34\% & 75.63\% & 77.50\% \\
1043 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1045 %Epochs & 125 & 250 & 500 \\
1047 10~neurons & 0.76\% & 0.97\% & 1.21\% \\
1048 25~neurons & 1.09\% & 0.73\% & 1.79\% \\
1049 40~neurons & 0.90\% & 1.02\% & 2.15\% \\
1051 \multicolumn{4}{c}{} \\
1053 Epochs & 1000 & 2500 & 5000 \\
1056 Chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1058 10~neurons & 14.51\% & 15.22\% & 15.22\% \\
1059 25~neurons & 16.95\% & 17.57\% & 18.46\% \\
1060 40~neurons & 17.73\% & 20.75\% & 22.62\% \\
1063 Non chaotic & \multicolumn{3}{c|}{Output = Configuration} \\
1065 %Epochs & 1000 & 2500 & 5000 \\
1067 10~neurons & 78.98\% & 80.02\% & 79.97\% \\
1068 25~neurons & 79.19\% & 81.59\% & 81.53\% \\
1069 40~neurons & 79.64\% & 81.37\% & 81.37\% \\
1072 Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
1074 %Epochs & 1000 & 2500 & 5000 \\
1076 10~neurons & 3.47\% & 9.98\% & 11.66\% \\
1077 25~neurons & 3.92\% & 8.63\% & 10.09\% \\
1078 40~neurons & 3.29\% & 7.19\% & 7.18\% \\
1083 \section{Conclusion}
1085 In this paper, we have established an equivalence between chaotic
1086 iterations, according to the Devaney's definition of chaos, and a
1087 class of multilayer perceptron neural networks. Firstly, we have
1088 described how to build a neural network that can be trained to learn a
1089 given chaotic map function. Then, we found a condition that allow to
1090 check whether the iterations induced by a function are chaotic or not,
1091 and thus if a chaotic map is obtained. Thanks to this condition our
1092 approach is not limited to a particular function. In the dual case, we
1093 show that checking if a neural network is chaotic consists in
1094 verifying a property on an associated graph, called the graph of
1095 iterations. These results are valid for recurrent neural networks
1096 with a particular architecture. However, we believe that a similar
1097 work can be done for other neural network architectures. Finally, we
1098 have discovered at least one family of problems with a reasonable
1099 size, such that artificial neural networks should not be applied in
1100 the presence of chaos, due to their inability to learn chaotic
1101 behaviors in this context. Such a consideration is not reduced to a
1102 theoretical detail: this family of discrete iterations is concretely
1103 implemented in a new steganographic method \cite{guyeux10ter}. As
1104 steganographic detectors embed tools like neural networks to
1105 distinguish between original and stego contents, our studies tend to
1106 prove that such detectors might be unable to tackle with chaos-based
1107 information hiding schemes. Furthermore, iterations such that not all
1108 of the components are updated at each step are very common in
1109 biological and physics mechanisms. Therefore, one can reasonably
1110 wonder whether neural networks should be applied in these contexts.
1112 In future work we intend to enlarge the comparison between the
1113 learning of truly chaotic and non-chaotic behaviors. Other
1114 computational intelligence tools such as support vector machines will
1115 be investigated too, to discover which tools are the most relevant
1116 when facing a truly chaotic phenomenon. A comparison between learning
1117 rate success and prediction quality will be realized. Concrete
1118 consequences in biology, physics, and computer science security fields
1119 will be stated. Lastly, thresholds separating systems depending on
1120 the ability to learn their dynamics will be established.
1122 \bibliography{chaos-paper}% Produces the bibliography via BibTeX.
1126 % ****** End of file chaos-paper.tex ******