-To study if chaotic iterations can be predicted, we choose to train
-the multilayer perceptron. As stated before, this kind of network is
-in particular well-known for its universal approximation property
-\cite{Cybenko89,DBLP:journals/nn/HornikSW89}. Furthermore, MLPs have
-been already considered for chaotic time series prediction. For
-example, in~\cite{dalkiran10} the authors have shown that a
-feedforward MLP with two hidden layers, and trained with Bayesian
-Regulation back-propagation, can learn successfully the dynamics of
-Chua's circuit.
-
-In these experiments we consider MLPs having one hidden layer of
-sigmoidal neurons and output neurons with a linear activation
-function. They are trained using the Limited-memory
-Broyden-Fletcher-Goldfarb-Shanno quasi-newton algorithm in combination
-with the Wolfe linear search. The training process is performed until
-a maximum number of epochs is reached. To prevent overfitting and to
-estimate the generalization performance we use holdout validation by
-splitting the data set into learning, validation, and test subsets.
-These subsets are obtained through random selection such that their
-respective size represents 65\%, 10\%, and 25\% of the whole data set.
-
-Several neural networks are trained for both iterations coding
-schemes. In both cases iterations have the following layout:
-configurations of four components and strategies with at most three
-terms. Thus, for the first coding scheme a data set pair is composed
-of 6~inputs and 5~outputs, while for the second one it is respectively
-3~inputs and 2~outputs. As noticed at the end of the previous section,
-this leads to data sets that consist of 2304~pairs. The networks
-differ in the size of the hidden layer and the maximum number of
-training epochs. We remember that to evaluate the ability of neural
-networks to predict a chaotic behavior for each coding scheme, the
-trainings of two data sets, one of them describing chaotic iterations,
-are compared.
-
-Thereafter we give, for the different learning setups and data sets,
-the mean prediction success rate obtained for each output. Such a rate
-represents the percentage of input-output pairs belonging to the test
-subset for which the corresponding output value was correctly
-predicted. These values are computed considering 10~trainings with
-random subsets construction, weights and biases initialization.
-Firstly, neural networks having 10 and 25~hidden neurons are trained,
-with a maximum number of epochs that takes its value in
-$\{125,250,500\}$ (see Tables~\ref{tab1} and \ref{tab2}). Secondly,
-we refine the second coding scheme by splitting the output vector such
-that each output is learned by a specific neural network
-(Table~\ref{tab3}). In this last case, we increase the size of the
-hidden layer up to 40~neurons and we consider larger number of epochs.