Fin de la relecture

[chaos1.git] / main.tex
diff --git a/main.tex b/main.tex

index 0ce8df055c645140ec18bcc3e3343b0188ab037c..5bd42bc0b215cf1fc3388b8a26391d1d901b7f76 100644 (file)
--- a/main.tex
+++ b/main.tex
@@ -40,7 +40,7 @@ preprint,%
  \begin{document}
  
  \title[Neural Networks and Chaos]{Neural Networks and Chaos:
-Construction, Evaluation of Chaotic Networks \\
+Construction, Evaluation of Chaotic Networks, \\
  and Prediction of Chaos with Multilayer Feedforward Networks
  }
  
@@ -97,7 +97,7 @@ work  a theoretical  framework based  on the  Devaney's  definition of
  chaos is  introduced.  Starting  with a relationship  between discrete
  iterations  and  Devaney's chaos,  we  firstly  show  how to  build  a
  recurrent  neural network  that is  equivalent  to a  chaotic map  and
-secondly  a way  to check  whether  an already  available network,  is
+secondly  a way  to check  whether  an already  available network  is
  chaotic  or not.  We  also study  different topological  properties of
  these  truly  chaotic neural  networks.   Finally,  we  show that  the
  learning,  with   neural  networks  having   a  classical  feedforward
@@ -110,7 +110,7 @@ chaotic maps, is far more difficult than non chaotic behaviors.
  \label{S1}
  
  Several research  works have proposed or used  chaotic neural networks
-these last  years.  The  complex dynamics of  such a network  leads to
+these last  years.  The  complex dynamics of  such networks  lead to
  various       potential      application       areas:      associative
  memories~\cite{Crook2007267}  and  digital  security tools  like  hash
  functions~\cite{Xiao10},                                       digital
@@ -136,8 +136,8 @@ are  suitable to model  nonlinear relationships  between data,  due to
  their            universal            approximator            capacity
  \cite{Cybenko89,DBLP:journals/nn/HornikSW89}.   Thus,   this  kind  of
  networks can  be trained  to model a  physical phenomenon known  to be
-chaotic such as Chua's circuit \cite{dalkiran10}.  Sometimes, a neural
-network  which is build  by combining  transfer functions  and initial
+chaotic such as Chua's circuit \cite{dalkiran10}.  Sometime a neural
+network,  which is build  by combining  transfer functions  and initial
  conditions  that are  both chaotic,  is itself  claimed to  be chaotic
  \cite{springerlink:10.1007/s00521-010-0432-2}.
  
@@ -151,10 +151,10 @@ a  dynamical  system:  ergodicity,   expansivity,  and  so  on.   More
  precisely, in  this paper,  which is an  extension of a  previous work
  \cite{bgs11:ip},   we  establish   the  equivalence   between  chaotic
  iterations  and  a  class  of  globally  recurrent  MLP.   The  second
-contribution is a  study of the converse problem,  indeed we study the
+contribution is a  study of the converse problem,  indeed we investigate the
  ability  of classical  multiLayer  perceptrons to  learn a  particular
  family of discrete chaotic  dynamical systems.  This family is defined
-by a Boolean vector, an update function, and a sequence defining which
+by a Boolean vector, an update function, and a sequence defining the
  component  to  update  at  each  iteration.  It  has  been  previously
  established that  such dynamical systems are  chaotically iterated (as
  it  is defined by  Devaney) when  the chosen  function has  a strongly
@@ -234,14 +234,15 @@ point dependence on initial conditions.
  neighborhood of its future evolution eventually overlap with any other
  given region.  This property implies that a dynamical system cannot be
  broken into simpler subsystems.   Intuitively, its complexity does not
-allow any  simplification.  On the  contrary, a dense set  of periodic
-points is an element of regularity that a chaotic dynamical system has
-to exhibit.
+allow any  simplification.  
  
  However, chaos needs some regularity to ``counteracts'' the effects of
-transitivity.
+transitivity. % Thus, two points close to each other can behave in a completely different manner, the first one visiting the whole space whereas the second one has a regular orbit.
+In the Devaney's formulation, a dense set  of periodic
+points is the element of regularity that a chaotic dynamical system has
+to exhibit.
  %\begin{definition} \label{def3}
-We recall that a  point $x$ is  {\bf periodic point} for $f$ of 
+We recall that a  point $x$ is a {\bf periodic point} for $f$ of 
  period~$n \in \mathds{N}^{\ast}$ if $f^{n}(x)=x$.
  %\end{definition}
  Then, the map 
@@ -261,12 +262,11 @@ Let  us recall  that  $f$  has {\bf  sensitive  dependence on  initial
  $n  > 0$ such  that $d\left(f^{n}(x),  f^{n}(y)\right) >\delta  $. The
  value $\delta$ is called the {\bf constant of sensitivity} of $f$.
  
-Finally,  The  dynamical system  that  iterates  $f$  is {\bf  chaotic
+Finally,  the  dynamical system  that  iterates  $f$  is {\bf  chaotic
    according  to Devaney}  on $(\mathcal{X},\tau)$  if $f$  is regular,
  topologically transitive, and has  sensitive dependence to its initial
  conditions.   In  what follows,  iterations  are  said  to be  chaotic
-according  Devaney  when  corresponding  dynamical system  is  chaotic
-according Devaney.
+(according to Devaney)  when the corresponding  dynamical system  is  chaotic, as it is defined in the Devaney's formulation.
  
  %Let us notice that for a  metric space the last condition follows from
  %the two first ones~\cite{Banks92}.
@@ -311,15 +311,15 @@ $i$ from $x$ to $N(i,x)$ if and only if $f_i(x)$ is $N(i,x)$.
  In the  sequel, the  {\it strategy} $S=(S^{t})^{t  \in \Nats}$  is the
  sequence defining  which component to  update at time $t$  and $S^{t}$
  denotes its $t-$th term.  This iteration scheme that only modifies one
-element at each iteration is classically referred as {\it asynchronous
+element at each iteration is usually referred as {\it asynchronous
    iterations}.  More precisely, we have for any $i$, $1\le i \le n$,
  \begin{equation}
  \left\{ \begin{array}{l}
  x^{0}  \in \Bool^n \\ 
  x^{t+1}_i = \left\{ 
  \begin{array}{l}
-  f_i(x^t) \textrm{ if $S^t = i$} \\
-  x_i^t \textrm{ otherwise} 
+  f_i(x^t) \textrm{ if $S^t = i$}\enspace , \\
+  x_i^t \textrm{ otherwise}\enspace . 
   \end{array}
  \right.
  \end{array} \right.
@@ -373,15 +373,15 @@ X^{k+1}& = & G_{f}(X^{k})\\
  \right.
  \label{eq:Gf}
  \end{equation}
-where  $\sigma$ is the  function that  removes the  first term  of the
-strategy  ({\it  i.e.},~$S^0$).    This  definition  allows  to  links
+where  $\sigma$ is the so-called shift function that  removes the  first term  of the
+strategy  ({\it  i.e.},~$S^0$).    This  definition  allows  to  link
  asynchronous  iterations  with  classical  iterations of  a  dynamical
  system. Note that it can be extended by considering subsets for $S^t$.
  
  To study topological properties of  these iterations, we are then left
  to  introduce a  {\bf distance}  $d$  between two  points $(S,x)$  and
-$(\check{S},\check{x})\in \mathcal{X} = \llbracket1;n\rrbracket^\Nats
-\times \Bool^{n}$. It is defined by
+$(\check{S},\check{x})$ in $\mathcal{X} = \llbracket1;n\rrbracket^\Nats
+\times \Bool^{n}$. Let $\Delta(x,y) = 0$ if $x=y$, and $\Delta(x,y) = 1$ else, be a distance on $\mathds{B}$. The distance $d$ is defined by 
  \begin{equation}
  d((S,x);(\check{S},\check{x}))=d_{e}(x,\check{x})+d_{s}(S,\check{S})
  \enspace ,
@@ -406,10 +406,10 @@ these  requirements: on  the one  hand  its floor  value reflects  the
  difference between  the cells, on  the other hand its  fractional part
  measures the difference between the strategies.
  
-The relation  between $\Gamma(f)$ and  $G_f$ is clear: there  exists a
+The relation  between $\Gamma(f)$ and  $G_f$ is obvious: there  exists a
  path from  $x$ to $x'$  in $\Gamma(f)$ if  and only if there  exists a
  strategy  $s$ such  that iterations  of $G_f$  from the  initial point
-$(s,x)$   reaches   the   configuration   $x'$.   Using   this   link,
+$(s,x)$   reach   the   configuration   $x'$.   Using   this   link,
  Guyeux~\cite{GuyeuxThese10} has proven that,
  \begin{theorem}%[Characterization of $\mathcal{C}$]
  \label{Th:Caracterisation   des   IC   chaotiques}  
@@ -424,7 +424,7 @@ case, iterations of the function $G_f$ as defined in Eq.~(\ref{eq:Gf})
  are chaotic according to Devaney.
  
  
-Let   us  then   define  two   function  $f_0$   and  $f_1$   both  in
+Let   us  then   define  two   functions  $f_0$   and  $f_1$   both  in
  $\Bool^n\to\Bool^n$ that are used all along this paper.  The former is
  the   vectorial  negation,   \textit{i.e.},  $f_{0}(x_{1},\dots,x_{n})
  =(\overline{x_{1}},\dots,\overline{x_{n}})$.      The     latter    is
@@ -460,7 +460,7 @@ we obtain  a global recurrent  neural network that behaves  as follows
  \item When  the network  is activated at  the $t^{th}$  iteration, the
    state of the system $x^t  \in \mathds{B}^n$ received from the output
    layer and  the initial  term of the  sequence $(S^t)^{t  \in \Nats}$
-  ($S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
+  (\textit{i.e.}, $S^0  \in \llbracket 1;n\rrbracket$)  are used  to compute  the new
    output vector.  This  new vector, which represents the  new state of
    the dynamical system, satisfies:
    \begin{equation}
@@ -478,7 +478,6 @@ we obtain  a global recurrent  neural network that behaves  as follows
  The behavior of the neural network is such that when the initial state
  is  $x^0~\in~\mathds{B}^n$ and  a  sequence $(S^t)^{t  \in \Nats}$  is
  given  as outside  input,
-\JFC{en dire davantage sur l'outside world} %% TO BE UPDATED
  then  the sequence  of  successive published
  output vectors $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$ is exactly
  the  one produced  by  the chaotic  iterations  formally described  in
@@ -488,7 +487,7 @@ $\left(x^t\right)^{t \in \mathds{N}^{\ast}}$,  and therefore that they
  are  equivalent  reformulations  of  the iterations  of  $G_{f_0}$  in
  $\mathcal{X}$. Finally, since the  proposed neural network is built to
  model  the  behavior  of  $G_{f_0}$,  whose iterations are
-  chaotic  according  to
+  chaotic  according  to the
  Devaney's definition  of chaos,  we can conclude  that the  network is
  also chaotic in this sense.
  
@@ -527,7 +526,6 @@ to an input one.
    compute the new output one $\left(x^{t+1}_1,\dots,x^{t+1}_n\right)$.
    While  the remaining  input receives  a new  integer value  $S^t \in
    \llbracket1;n\rrbracket$, which is provided by the outside world.
-\JFC{en dire davantage sur l'outside world}
  \end{itemize}
  
  The topological  behavior of these  particular neural networks  can be
@@ -558,7 +556,7 @@ condition  $\left(S,(x_1^0,\dots,  x_n^0)\right)  \in  \llbracket  1;n
  \rrbracket^{\mathds{N}}  \times \mathds{B}^n$.
  Theoretically  speaking, such iterations  of $F_f$  are thus  a formal
  model of these kind of recurrent  neural networks. In the rest of this
-paper,  we will  call such  multilayer perceptrons  CI-MLP($f$), which
+paper,  we will  call such  multilayer perceptrons  ``CI-MLP($f$)'', which
  stands for ``Chaotic Iterations based MultiLayer Perceptron''.
  
  Checking  if CI-MLP($f$)  behaves chaotically  according  to Devaney's
@@ -640,7 +638,7 @@ of the output space can be discarded when studying CI-MLPs: this space
  is  intrinsically   complicated  and   it  cannot  be   decomposed  or
  simplified.
  
-Furthermore, those  recurrent neural networks  exhibit the instability
+Furthermore, these  recurrent neural networks  exhibit the instability
  property:
  \begin{definition}
  A dynamical  system $\left( \mathcal{X}, f\right)$ is {\bf unstable}
@@ -861,8 +859,8 @@ trainings of two data sets, one of them describing chaotic iterations,
  are compared.
  
  Thereafter we give,  for the different learning setups  and data sets,
-the mean prediction success rate obtained for each output. A such rate
-represent the  percentage of input-output pairs belonging  to the test
+the mean prediction success rate obtained for each output. Such a rate
+represents the  percentage of input-output pairs belonging  to the test
  subset  for  which  the   corresponding  output  value  was  correctly
  predicted.  These values  are computed  considering  10~trainings with
  random  subsets  construction,   weights  and  biases  initialization.
@@ -880,7 +878,7 @@ hidden layer up to 40~neurons and we consider larger number of epochs.
  \centering {\small
  \begin{tabular}{|c|c||c|c|c|}
  \hline 
-\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs and one hidden layer} \\
+\multicolumn{5}{|c|}{Networks topology: 6~inputs, 5~outputs, and one hidden layer} \\
  \hline
  \hline
  \multicolumn{2}{|c||}{Hidden neurons} & \multicolumn{3}{c|}{10 neurons} \\
@@ -932,7 +930,7 @@ is observed (from 36.10\% for 10~neurons and 125~epochs to 70.97\% for
  25~neurons  and  500~epochs). We  also  notice  that  the learning  of
  outputs~(2)   and~(3)  is   more  difficult.    Conversely,   for  the
  non-chaotic  case the  simplest training  setup is  enough  to predict
-configurations.  For all those  feedforward network topologies and all
+configurations.  For all these  feedforward network topologies and all
  outputs the  obtained results for the non-chaotic  case outperform the
  chaotic  ones. Finally,  the rates  for the  strategies show  that the
  different networks are unable to learn them.
@@ -947,17 +945,17 @@ scheme, the strategies cannot be predicted.
  Let us now compare the  two coding schemes. Firstly, the second scheme
  disturbs  the   learning  process.   In   fact  in  this   scheme  the
  configuration is always expressed as  a natural number, whereas in the
-first one  the number  of inputs follows  the increase of  the boolean
+first one  the number  of inputs follows  the increase of  the Boolean
  vectors coding configurations. In this latter case, the coding gives a
  finer information on configuration evolution.
-\JFC{Je n'ai pas compris le paragraphe precedent. Devrait être repris}
+
  \begin{table}[b]
  \caption{Prediction success rates for configurations expressed with Gray code}
  \label{tab2}
  \centering
  \begin{tabular}{|c|c||c|c|c|}
  \hline 
-\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs and one hidden layer} \\
+\multicolumn{5}{|c|}{Networks topology: 3~inputs, 2~outputs, and one hidden layer} \\
  \hline
  \hline
  & Hidden neurons & \multicolumn{3}{c|}{10 neurons} \\
@@ -989,7 +987,7 @@ usually  unknown.   Hence, the  first  coding  scheme  cannot be  used
  systematically.   Therefore, we  provide  a refinement  of the  second
  scheme: each  output is learned  by a different  ANN. Table~\ref{tab3}
  presents the  results for  this approach.  In  any case,  whatever the
-considered  feedforward network topologies,  the maximum  epoch number
+considered  feedforward network topologies,  the maximum  epoch number,
  and the kind of iterations, the configuration success rate is slightly
  improved.   Moreover, the  strategies predictions  rates  reach almost
  12\%, whereas in Table~\ref{tab2} they never exceed 1.5\%.  Despite of
@@ -1002,7 +1000,7 @@ appear to be an open issue.
  \centering
  \begin{tabular}{|c||c|c|c|}
  \hline 
-\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output and one hidden layer} \\
+\multicolumn{4}{|c|}{Networks topology: 3~inputs, 1~output, and one hidden layer} \\
  \hline
  \hline
  Epochs & 125 & 250 & 500 \\ 
@@ -1101,7 +1099,7 @@ be investigated  too, to  discover which tools  are the  most relevant
  when facing a truly chaotic phenomenon.  A comparison between learning
  rate  success  and  prediction  quality will  be  realized.   Concrete
  consequences in biology, physics, and computer science security fields
-will be  stated.
+will then be  stated.
  
  % \appendix{}