X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/chaos1.git/blobdiff_plain/9e6c76e39059d7f227f7ed48ad71195374a2f2a5..refs/heads/master:/main.tex?ds=inline

diff --git a/main.tex b/main.tex
index de92eaa..49564db 100644
--- a/main.tex
+++ b/main.tex
@@ -40,7 +40,7 @@ preprint,%
 \begin{document}
 
 \title[Neural Networks and Chaos]{Neural Networks and Chaos:
-Construction, Evaluation of Chaotic Networks \\
+Construction, Evaluation of Chaotic Networks, \\
 and Prediction of Chaos with Multilayer Feedforward Networks
 }
 
@@ -88,7 +88,6 @@ far more difficult to learn.
 \maketitle
 
 \begin{quotation}
-
 Chaotic neural  networks have received a  lot of attention  due to the
 appealing   properties  of   deterministic   chaos  (unpredictability,
 sensitivity,  and so  on). However,  such networks  are  often claimed
@@ -97,7 +96,7 @@ work  a theoretical  framework based  on the  Devaney's  definition of
 chaos is  introduced.  Starting  with a relationship  between discrete
 iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
 recurrent  neural network  that is  equivalent  to a  chaotic map  and
-secondly  a way  to check  whether  an already  available network,  is
+secondly  a way  to  check  whether an  already  available network  is
 chaotic  or not.  We  also study  different topological  properties of
 these  truly  chaotic neural  networks.   Finally,  we  show that  the
 learning,  with   neural  networks  having   a  classical  feedforward
@@ -110,7 +109,7 @@ chaotic maps, is far more difficult than non chaotic behaviors.
 \label{S1}
 
 Several research  works have proposed or used  chaotic neural networks
-these last  years.  The  complex dynamics of  such a network  leads to
+these  last years.   The complex  dynamics  of such  networks lead  to
 various       potential      application       areas:      associative
 memories~\cite{Crook2007267}  and  digital  security tools  like  hash
 functions~\cite{Xiao10},                                       digital
@@ -136,8 +135,8 @@ are  suitable to model  nonlinear relationships  between data,  due to
 their            universal            approximator            capacity
 \cite{Cybenko89,DBLP:journals/nn/HornikSW89}.   Thus,   this  kind  of
 networks can  be trained  to model a  physical phenomenon known  to be
-chaotic such as Chua's circuit \cite{dalkiran10}.  Sometimes, a neural
-network  which is build  by combining  transfer functions  and initial
+chaotic such  as Chua's circuit \cite{dalkiran10}.   Sometime a neural
+network, which  is build by  combining transfer functions  and initial
 conditions  that are  both chaotic,  is itself  claimed to  be chaotic
 \cite{springerlink:10.1007/s00521-010-0432-2}.
 
@@ -151,10 +150,10 @@ a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
 precisely, in  this paper,  which is an  extension of a  previous work
 \cite{bgs11:ip},   we  establish   the  equivalence   between  chaotic
 iterations  and  a  class  of  globally  recurrent  MLP.   The  second
-contribution is a  study of the converse problem,  indeed we study the
-ability  of classical  multiLayer  perceptrons to  learn a  particular
+contribution is a study of the converse problem, indeed we investigate
+the ability of classical  multilayer perceptrons to learn a particular
 family of discrete chaotic  dynamical systems.  This family is defined
-by a Boolean vector, an update function, and a sequence defining which
+by a Boolean vector, an update function, and a sequence defining the
 component  to  update  at  each  iteration.  It  has  been  previously
 established that  such dynamical systems are  chaotically iterated (as
 it  is defined by  Devaney) when  the chosen  function has  a strongly
@@ -234,14 +233,15 @@ point dependence on initial conditions.
 neighborhood of its future evolution eventually overlap with any other
 given region.  This property implies that a dynamical system cannot be
 broken into simpler subsystems.   Intuitively, its complexity does not
-allow any  simplification.  On the  contrary, a dense set  of periodic
-points is an element of regularity that a chaotic dynamical system has
-to exhibit.
+allow any  simplification.  
 
 However, chaos needs some regularity to ``counteracts'' the effects of
-transitivity.
+transitivity. % Thus, two points close to each other can behave in a completely different manner, the first one visiting the whole space whereas the second one has a regular orbit.
+In the Devaney's formulation, a dense set  of periodic
+points is the element of regularity that a chaotic dynamical system has
+to exhibit.
 %\begin{definition} \label{def3}
-We recall that a  point $x$ is  {\bf periodic point} for $f$ of 
+We recall that a  point $x$ is a {\bf periodic point} for $f$ of 
 period~$n \in \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
 %\end{definition}
 Then, the map 
@@ -261,12 +261,12 @@ Let  us recall  that  $f$  has {\bf  sensitive  dependence on  initial
 $n  > 0$ such  that $d\left(f^{n}(x),  f^{n}(y)\right) >\delta  $. The
 value $\delta$ is called the {\bf constant of sensitivity} of $f$.
 
-Finally,  The  dynamical system  that  iterates  $f$  is {\bf  chaotic
+Finally,  the  dynamical system  that  iterates  $f$  is {\bf  chaotic
   according  to Devaney}  on $(\mathcal{X},\tau)$  if $f$  is regular,
 topologically transitive, and has  sensitive dependence to its initial
 conditions.   In  what follows,  iterations  are  said  to be  chaotic
-according  Devaney  when  corresponding  dynamical system  is  chaotic
-according Devaney.
+(according  to Devaney)  when  the corresponding  dynamical system  is
+chaotic, as it is defined in the Devaney's formulation.
 
 %Let us notice that for a  metric space the last condition follows from
 %the two first ones~\cite{Banks92}.
@@ -311,15 +311,15 @@ $i$ from $x$ to $N(i,x)$ if and only if $f_i(x)$ is $N(i,x)$.
 In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
 sequence defining  which component to  update at time $t$  and $S^{t}$
 denotes its $t-$th term.  This iteration scheme that only modifies one
-element at each iteration is classically referred as {\it asynchronous
+element at each iteration is usually referred as {\it asynchronous
   iterations}.  More precisely, we have for any $i$, $1\le i \le n$,
 \begin{equation}
 \left\{ \begin{array}{l}
 x^{0}  \in \Bool^n \\ 
 x^{t+1}_i = \left\{ 
 \begin{array}{l}
-  f_i(x^t) \textrm{ if $S^t = i$} \\
-  x_i^t \textrm{ otherwise} 
+  f_i(x^t) \textrm{ if $S^t = i$}\enspace , \\
+  x_i^t \textrm{ otherwise}\enspace . 
  \end{array}
 \right.
 \end{array} \right.
@@ -373,15 +373,17 @@ X^{k+1}& = & G_{f}(X^{k})\\
 \right.
 \label{eq:Gf}
 \end{equation}
-where  $\sigma$ is the  function that  removes the  first term  of the
-strategy  ({\it  i.e.},~$S^0$).    This  definition  allows  to  links
-asynchronous  iterations  with  classical  iterations of  a  dynamical
+where $\sigma$ is the so-called  shift function that removes the first
+term of  the strategy ({\it i.e.},~$S^0$).  This  definition allows to
+link asynchronous iterations with  classical iterations of a dynamical
 system. Note that it can be extended by considering subsets for $S^t$.
 
 To study topological properties of  these iterations, we are then left
 to  introduce a  {\bf distance}  $d$  between two  points $(S,x)$  and
-$(\check{S},\check{x})\in \mathcal{X} = \llbracket1;n\rrbracket^\Nats
-\times \Bool^{n}$. It is defined by
+$(\check{S},\check{x})$           in           $\mathcal{X}          =
+\llbracket1;n\rrbracket^\Nats \times \Bool^{n}$. Let $\Delta(x,y) = 0$
+if   $x=y$,  and   $\Delta(x,y)  =   1$   else,  be   a  distance   on
+$\mathds{B}$. The distance $d$ is defined by
 \begin{equation}
 d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
 \enspace ,
@@ -406,10 +408,10 @@ these  requirements: on  the one  hand  its floor  value reflects  the
 difference between  the cells, on  the other hand its  fractional part
 measures the difference between the strategies.
 
-The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
+The relation  between $\Gamma(f)$ and  $G_f$ is obvious: there  exists a
 path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
 strategy  $s$ such  that iterations  of $G_f$  from the  initial point
-$(s,x)$   reaches   the   configuration   $x'$.   Using   this   link,
+$(s,x)$   reach   the   configuration   $x'$.   Using   this   link,
 Guyeux~\cite{GuyeuxThese10} has proven that,
 \begin{theorem}%[Characterization of $\mathcal{C}$]
 \label{Th:Caracterisation   des   IC   chaotiques}  
@@ -424,7 +426,7 @@ case, iterations of the function $G_f$ as defined in Eq.~(\ref{eq:Gf})
 are chaotic according to Devaney.
 
 
-Let   us  then   define  two   function  $f_0$   and  $f_1$   both  in
+Let   us  then   define  two   functions  $f_0$   and  $f_1$   both  in
 $\Bool^n\to\Bool^n$ that are used all along this paper.  The former is
 the   vectorial  negation,   \textit{i.e.},  $f_{0}(x_{1},\dots,x_{n})
 =(\overline{x_{1}},\dots,\overline{x_{n}})$.      The     latter    is
@@ -460,9 +462,9 @@ we obtain  a global recurrent  neural network that behaves  as follows
 \item When  the network  is activated at  the $t^{th}$  iteration, the
   state of the system $x^t  \in \mathds{B}^n$ received from the output
   layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
-  ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
-  output vector.  This  new vector, which represents the  new state of
-  the dynamical system, satisfies:
+  (\textit{i.e.},  $S^0  \in \llbracket  1;n\rrbracket$)  are used  to
+  compute the  new output vector.   This new vector,  which represents
+  the new state of the dynamical system, satisfies:
   \begin{equation}
     x^{t+1}=F_{f_0}(S^0, x^t) \in \mathds{B}^n \enspace .
   \end{equation}
@@ -477,20 +479,17 @@ we obtain  a global recurrent  neural network that behaves  as follows
 
 The behavior of the neural network is such that when the initial state
 is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
-given  as outside  input,
-\JFC{en dire davantage sur l'outside world} %% TO BE UPDATED
-then  the sequence  of  successive published
+given  as outside  input, then  the sequence  of  successive published
 output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
 the  one produced  by  the chaotic  iterations  formally described  in
-Eq.~(\ref{eq:Gf}).  It  means that  mathematically if we  use similar
+Eq.~(\ref{eq:Gf}).   It means  that mathematically  if we  use similar
 input  vectors   they  both  generate  the   same  successive  outputs
 $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
 are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
 $\mathcal{X}$. Finally, since the  proposed neural network is built to
-model  the  behavior  of  $G_{f_0}$,  whose iterations are
-  chaotic  according  to
-Devaney's definition  of chaos,  we can conclude  that the  network is
-also chaotic in this sense.
+model  the  behavior  of   $G_{f_0}$,  whose  iterations  are  chaotic
+according to the  Devaney's definition of chaos, we  can conclude that
+the network is also chaotic in this sense.
 
 The previous construction scheme  is not restricted to function $f_0$.
 It can  be extended to any function  $f$ such that $G_f$  is a chaotic
@@ -527,7 +526,6 @@ to an input one.
   compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
   While  the remaining  input receives  a new  integer value  $S^t \in
   \llbracket1;n\rrbracket$, which is provided by the outside world.
-\JFC{en dire davantage sur l'outside world}
 \end{itemize}
 
 The topological  behavior of these  particular neural networks  can be
@@ -558,7 +556,7 @@ condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
 \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.
 Theoretically  speaking, such iterations  of $F_f$  are thus  a formal
 model of these kind of recurrent  neural networks. In the rest of this
-paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
+paper, we will call such multilayer perceptrons ``CI-MLP($f$)'', which
 stands for ``Chaotic Iterations based MultiLayer Perceptron''.
 
 Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
@@ -640,7 +638,7 @@ of the output space can be discarded when studying CI-MLPs: this space
 is  intrinsically   complicated  and   it  cannot  be   decomposed  or
 simplified.
 
-Furthermore, those  recurrent neural networks  exhibit the instability
+Furthermore, these  recurrent neural networks  exhibit the instability
 property:
 \begin{definition}
 A dynamical  system $\left( \mathcal{X}, f\right)$ is {\bf unstable}
@@ -701,7 +699,7 @@ Figure~\ref{Fig:scheme} is a summary  of addressed neural networks and
 chaos  problems.   In  Section~\ref{S2}   we  have  explained  how  to
 construct    a    truly    chaotic    neural   networks,    $A$    for
 instance. Section~\ref{S3} has shown how  to check whether a given MLP
-$A$ or $C$ is chaotic or not  in the sens of Devaney, and how to study
+$A$ or $C$ is chaotic or not in the sense of Devaney, and how to study
 its  topological  behavior.   The  last  thing  to  investigate,  when
 comparing neural networks and Devaney's chaos, is to determine whether
 an  artificial neural network  $C$ is  able to  learn or  predict some
@@ -720,7 +718,7 @@ extraction takes place once  at destination.  The reverse ({\it i.e.},
 automatically detecting the presence  of hidden messages inside media)
 is  called   steganalysis.   Among  the   deployed  strategies  inside
 detectors,          there         are          support         vectors
-machines~\cite{Qiao:2009:SM:1704555.1704664},                    neural
+machines~\cite{Qiao:2009:SM:1704555.1704664},                   neural
 networks~\cite{10.1109/ICME.2003.1221665,10.1109/CIMSiM.2010.36},  and
 Markov   chains~\cite{Sullivan06steganalysisfor}.    Most   of   these
 detectors  give quite  good results  and are  rather  competitive when
@@ -827,7 +825,7 @@ $3$~terms we obtain 2304~input-output pairs.
 \label{section:experiments}
 
 To study  if chaotic iterations can  be predicted, we  choose to train
-the multiLayer perceptron.  As stated  before, this kind of network is
+the multilayer perceptron.  As stated  before, this kind of network is
 in  particular  well-known for  its  universal approximation  property
 \cite{Cybenko89,DBLP:journals/nn/HornikSW89}.  Furthermore,  MLPs have
 been  already  considered for  chaotic  time  series prediction.   For
@@ -861,10 +859,10 @@ trainings of two data sets, one of them describing chaotic iterations,
 are compared.
 
 Thereafter we give,  for the different learning setups  and data sets,
-the mean prediction success rate obtained for each output. A such rate
-represent the  percentage of input-output pairs belonging  to the test
+the mean prediction success rate obtained for each output. Such a rate
+represents the percentage of  input-output pairs belonging to the test
 subset  for  which  the   corresponding  output  value  was  correctly
-predicted.  These values  are computed  considering  10~trainings with
+predicted.   These values are  computed considering  10~trainings with
 random  subsets  construction,   weights  and  biases  initialization.
 Firstly, neural networks having  10 and 25~hidden neurons are trained,
 with   a  maximum   number  of   epochs  that   takes  its   value  in
@@ -880,7 +878,7 @@ hidden layer up to 40~neurons and we consider larger number of epochs.
 \centering {\small
 \begin{tabular}{|c|c||c|c|c|}
 \hline 
-\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
+\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs, and one hidden layer} \\
 \hline
 \hline
 \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
@@ -932,32 +930,39 @@ is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
 25~neurons  and  500~epochs). We  also  notice  that  the learning  of
 outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
 non-chaotic  case the  simplest training  setup is  enough  to predict
-configurations.  For all those  feedforward network topologies and all
+configurations.  For all these  feedforward network topologies and all
 outputs the  obtained results for the non-chaotic  case outperform the
 chaotic  ones. Finally,  the rates  for the  strategies show  that the
-different networks are unable to learn them.
-
-\newpage
+different feedforward networks are unable to learn them.
 
 For  the  second  coding   scheme  (\textit{i.e.},  with  Gray  Codes)
 Table~\ref{tab2} shows  that any network learns about  five times more
-non-chaotic  configurations  than chaotic  ones.  As  in the  previous
-scheme, the strategies cannot be predicted.
+non-chaotic  configurations than  chaotic  ones.  As  in the  previous
+scheme,       the      strategies      cannot       be      predicted.
+Figures~\ref{Fig:chaotic_predictions}                              and
+\ref{Fig:non-chaotic_predictions} present the predictions given by two
+feedforward multilayer  perceptrons that were  respectively trained to
+learn chaotic  and non-chaotic data,  using the second  coding scheme.
+Each figure  shows for  each sample of  the test  subset (577~samples,
+representing 25\%  of the 2304~samples) the  configuration that should
+have been predicted and the one given by the multilayer perceptron. It
+can be  seen that for  the chaotic data  the predictions are  far away
+from the  expected configurations.  Obviously,  the better predictions
+for the non-chaotic data reflect their regularity.
 
 Let us now compare the  two coding schemes. Firstly, the second scheme
 disturbs  the   learning  process.   In   fact  in  this   scheme  the
 configuration is always expressed as  a natural number, whereas in the
-first one  the number  of inputs follows  the increase of  the boolean
+first one  the number  of inputs follows  the increase of  the Boolean
 vectors coding configurations. In this latter case, the coding gives a
 finer information on configuration evolution.
-\JFC{Je n'ai pas compris le paragraphe precedent. Devrait Ãªtre repris}
 \begin{table}[b]
 \caption{Prediction success rates for configurations expressed with Gray code}
 \label{tab2}
 \centering
 \begin{tabular}{|c|c||c|c|c|}
 \hline 
-\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
+\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs, and one hidden layer} \\
 \hline
 \hline
 & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
@@ -984,12 +989,26 @@ finer information on configuration evolution.
 \end{tabular}
 \end{table}
 
+\begin{figure}
+  \centering
+  \includegraphics[scale=0.5]{chaotic_trace2}
+  \caption {Second coding scheme - Predictions obtained for a chaotic test subset.}
+  \label{Fig:chaotic_predictions}
+\end{figure}
+
+\begin{figure}
+  \centering
+  \includegraphics[scale=0.5]{non-chaotic_trace2} 
+  \caption{Second coding scheme - Predictions obtained for a non-chaotic test subset.}
+  \label{Fig:non-chaotic_predictions}
+\end{figure}
+
 Unfortunately, in  practical applications the number  of components is
 usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
 systematically.   Therefore, we  provide  a refinement  of the  second
 scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
 presents the  results for  this approach.  In  any case,  whatever the
-considered  feedforward network topologies,  the maximum  epoch number
+considered feedforward  network topologies, the  maximum epoch number,
 and the kind of iterations, the configuration success rate is slightly
 improved.   Moreover, the  strategies predictions  rates  reach almost
 12\%, whereas in Table~\ref{tab2} they never exceed 1.5\%.  Despite of
@@ -1002,7 +1021,7 @@ appear to be an open issue.
 \centering
 \begin{tabular}{|c||c|c|c|}
 \hline 
-\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
+\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output, and one hidden layer} \\
 \hline
 \hline
 Epochs & 125 & 250 & 500 \\ 
@@ -1064,21 +1083,17 @@ Chaotic/non chaotic & \multicolumn{3}{c|}{Output = Strategy} \\
 \end{tabular}
 \end{table}
 
-%
-% TO BE COMPLETED
-%
-
 \section{Conclusion}
 
 In  this paper,  we have  established an  equivalence  between chaotic
 iterations,  according to  the Devaney's  definition of  chaos,  and a
-class  of  multilayer perceptron  neural  networks.  Firstly, we  have
+class  of multilayer  perceptron  neural networks.   Firstly, we  have
 described how to build a neural network that can be trained to learn a
-given chaotic map function.  Then,  we found a condition that allow to
-check whether the iterations induced by a function are chaotic or not,
-and thus  if a chaotic map  is obtained. Thanks to  this condition our
-approach is not limited to a particular function. In the dual case, we
-show  that  checking  if  a  neural network  is  chaotic  consists  in
+given chaotic map function. Secondly,  we found a condition that allow
+to check whether  the iterations induced by a  function are chaotic or
+not, and thus  if a chaotic map is obtained.  Thanks to this condition
+our  approach is not  limited to  a particular  function. In  the dual
+case, we show that checking if a neural network is chaotic consists in
 verifying  a property  on an  associated  graph, called  the graph  of
 iterations.   These results  are valid  for recurrent  neural networks
 with a  particular architecture.  However,  we believe that  a similar
@@ -1092,10 +1107,7 @@ implemented  in a  new steganographic  method  \cite{guyeux10ter}.  As
 steganographic   detectors  embed  tools   like  neural   networks  to
 distinguish between  original and stego contents, our  studies tend to
 prove that such  detectors might be unable to  tackle with chaos-based
-information hiding schemes.  Furthermore, iterations such that not all
-of  the  components  are updated  at  each  step  are very  common  in
-biological  and  physics mechanisms.   Therefore,  one can  reasonably
-wonder whether neural networks should be applied in these contexts.
+information  hiding  schemes.
 
 In  future  work we  intend  to  enlarge  the comparison  between  the
 learning   of  truly   chaotic  and   non-chaotic   behaviors.   Other
@@ -1104,13 +1116,10 @@ be investigated  too, to  discover which tools  are the  most relevant
 when facing a truly chaotic phenomenon.  A comparison between learning
 rate  success  and  prediction  quality will  be  realized.   Concrete
 consequences in biology, physics, and computer science security fields
-will be  stated.  Lastly,  thresholds separating systems  depending on
-the ability to learn their dynamics will be established.
+will then be stated.
 
 % \appendix{}
 
-
-
 % \begin{definition} \label{def2}
 % A continuous function $f$  on a topological space $(\mathcal{X},\tau)$
 % is defined  to be  {\emph{topologically transitive}}  if for any  pair of
@@ -1121,7 +1130,6 @@ the ability to learn their dynamics will be established.
 % $f^k(U) \cap V \neq \emptyset$.
 % \end{definition}
 
-
 \bibliography{chaos-paper}% Produces the bibliography via BibTeX.
 
 \end{document}