Deep Learning of Chaos Classification

We train an artificial neural network which distinguishes chaotic and regular dynamics of the two-dimensional Chirikov standard map. We use finite length trajectories and compare the performance with traditional numerical methods which need to evaluate the Lyapunov exponent. The neural network has superior performance for short periods with length down to 10 Lyapunov times on which the traditional Lyapunov exponent computation is far from converging. We show the robustness of the neural network to varying control parameters, in particular we train with one set of control parameters, and successfully test in a complementary set. Furthermore, we use the neural network to successfully test the dynamics of discrete maps in different dimensions, e.g. the one-dimensional logistic map and a three-dimensional discrete version of the Lorenz system. Our results demonstrate that a convolutional neural network can be used as an excellent chaos indicator.


I. INTRODUCTION
Chaotic dynamics exists in many natural systems, such as heartbeat irregularities, weather and climate 1,2 .Such dynamics can be studied through the analysis of proper mathematical models which generate nonlinear dynamics and determenistic chaos.Chaotic and regular dynamics can co-exist in the phase space of low-dimensional systems 3 .To distinguish chaotic from regular dynamics, the tangent dynamics is used to compute Lyapunov exponents λ .In practice one integrates the tangent dynamics along a given trajectory and averages a finite time Lyapunov exponent λ (t).The averaging time T needed to reliably tell regular (λ = 0) from chaotic (λ = 0) trajectories apart is usually orders of magnitude larger than the Lyapunov time T λ ≡ 1/λ .
Here, we introduce a machine learning approach that alleviates the problems of calculating Lyapunov exponents and can be used as a new chaos indicator.Machine learning has shown tremendous performance e.g. in pattern recognition 4,5 .Machine learning approaches turned useful to solve partial differential equations and identify hidden physics models from experimental data [6][7][8] .Machine learning was used recently to predict future chaotic dynamics details from time series data without knowledge of the generating equations 9,10 .In this paper, we introduce a machine learning way to use short time series data for telling chaos from regularity apart.We train a neural network using chaotic and regular trajecories from the Chirikov standard map.Our method has a success rate of 98% using trajectories with length 10T λ , while conventional methods need up to 10 4 T λ to reach the same accuracy.The main reason for the small but finite failure rate of our machine learning method is due to sticky orbits.These orbits are chaotic,yet can mimic regular ones for long times due to trapping in fractal boundary phase space regions separating chaotic and regular dynamics.Our method is also surprisingly successfull when trained with Standard Map data but tested on maps with different dimensions such as the logistic map (d = 1) and the Lorenz system (d = 3).

II. THE CHIRIKOV STANDARD MAP
The Chirikov standard map is an area-preserving map in dimension d = 2 11 also known as the kicked rotor 3 : The kick strength K controls the degree of nonintegrability and chaos appearing in the dynamics generated by the map.Consider the case when K = 0. Eq. 1 reduces to p n+1 = p n (mod 1) and x n+1 = x n + p n+1 (mod 1) which is integrable and every orbit resides on an invariant torus.The orbit can exhibit periodic or quasi-periodic behavior depending on the initial conditions (p 0 , x 0 ).For small values of K e.g.K = 0.5 (Fig. 1(a)) most of these orbits persist, with tiny regions of chaotic dynamics appearing which are not visible on the presented plotting scales.At K = K c ≈ 0.97 the last invariant KAM tori are destroyed and a simply connected chaotic sea is formed which allows for unbounded momentum diffusion.For larger values of K the chaotic fraction grows confining regular dynamics to regular islands embedded in a chaotic sea (Fig. 1).Further increase of K leads to a flooding of the regular islands by the chaotic sea.

III. LYAPUNOV EXPONENTS AND PREDICTIONS
The Lyapunov exponent (LE) characterizes the exponential rate of separation of a trajectory {p n , x n } and its infinitesimal perturbation {δ n , ζ n }: Linearizing (2) in the perturbation yields the tangent dynamics generated by the variational equations For computational pruposes δ and ζ can be rescaled after any time step without loss of generality, while keeping the rescaling factor.The LE λ for each trajectory is obtained from the time dependence of λ N : The Lyapunov time is then defined as T λ ≡ 1/λ .For the main chaotic sea it is a function of the control parameter K.A suitable fitting function yields λ ≈ ln(0.7 + 0.42K) 12 .
For a regular trajectory λ N ∼ 1/N and λ = 0, at variance to a chaotic trajectory for which λ N saturates at λ at a time N ≈ T λ .Technically this saturation, and the value of λ can be safely confirmed and read off only on time scales N ≈ 10 2 ..10 3 T λ , without becoming a quantifiable distinguisher of the two types of trajectories, see Fig.To quantify our statements, we run the standard map at K = 2.5 Fig. 1(d).We use a grid of 51 × 51 points which partitions the phase space {p, x} into a square lattice.We use the corresponding 2601 initial conditions and generate trajectories.Each trajectory returns a function λ N .We plot the resulting histogram for N = 20 and N = 3 • 10 5 in Fig. 3 (a) and (b) respectively.For N → ∞ the histogram should show two bars only -one at λ N = 0 (all regular trajectories) and one at λ N = λ (all chaotic trajectories).For finite N the distributions smoothen.Note that even negative values λ N are generated due to fluctuations and finite averaging times.To tell chaotic from regular dynamics apart, we use the following protocol.We identify the two largest peaks in each histogram, and identify the threshold dividing dynamics into regular and chaotic as the deepest minimum between them (in case of a degeneracy, the one with the smallest value of λ N ).The location of the threshold is shown for N = 20 and N = 3 • 10 5 in Fig. 3 (a) and (b) respectively.We then assign a chaos respectively regular label to each trajectory.This label can fluctuate as a function of time for any given trajectory.We use the division for the largest simulation time N = 3 • 10 5 as a reference ('true') label for all trajectories.The success rate in predicting the correct regular P R or chaotic P C label is defined by the ratio of the correctly predicted labels within each subgroup of identical true labels.Likewise the success rate of predicting any label correctly is denoted by P tot .The results are plotted versus time N in Fig. 3 (c).While regular labels are predicted with high accuracy, chaotic ones are reaching 98% at only N ≈ 10 3 T λ .The low success rate P C is therefore also lowering the total success rate P tot .

IV. NEURAL NETWORKS AND PREDICTIONS
The input data of an artificial neural network consisting of only fully connected layers are limited to a one-dimensional (array) form 13 .Fully connected layers connect all the inputs from one layer to every activation unit of the next layer.The standard map generates sequences embedded in two dimensions.In order to learn data embedded in dimensions two or larger, the data must be flattened, and spatial information can get lost.A Convolutional Neural Network (CNN) is known to learn while maintaining spatial informations of images 14 .A CNN is usually configured with convolution and pooling layers.The former employ convolutional integrals with input data and filters to produce output feature maps.An additional activation function turns the network non-linear.At the end of the convolution layers a pooling layer is added which performs value extraction in a given pooling region.Through multiple convolution layers and pooling layers, the network can improve its prediction features.Finally, a fully connected layer generates classified output data.For binary classification, the last layer consists of one node.Its output value is either zero or one.We refer the reader to Appendix A for further technical details of the CNN we use.

A. The standard map
The input of the neural network is a time series (p n , x n ) from Eq. 1.The trajectory (p n , x n ) shows regular or chaotic behavior depending on the initial values (p 0 , x 0 ).Each of the trajectories is assigned a class label based on the Lyapunov time: Class R corresponds to a non-chaotic trajectories while C corresponds to a chaotic trajectories.We remind that the phase space is discretized into 51 × 51 = 2601 grid points.
The training and testing is quantified with a set of parameters: i) K min and K max denote the range of training values of K on an equidistant grid with M K values; ii) M tr is the number of training trajectories per K value; iii) N K is the training trajectory length; iv) M tt is the number of test trajectories per K value.
To quantify the CNN performance, we assign a discrete label to each of the initial phase space points -C respectively R based on the Lyapunov exponent method with trajectory length N = 3•10 5 .This way we separate all phase space points into two sets -C and R, each containing A C and A R points.We then run the CNN prediction on trajectories of length N = 20 which start from each of the gridded phase space points.We compute the accuracy quantifying probabilities where B C and B R are the numbers of trajectories predicted by the CNN to be chaotic respectively regular within each of the true sets A C and A R .Thus strictly B C ≤ A C and B R ≤ A R .Fig. 3(d) compares the CNN performance to the standard Lyapunov base one.Accuracies of 98% and more are reached by the CNN for trajectory length N K ≥ 30.Similar accuracies need trajectory length N ≈ 10 4 and more when using standard Lyapunov testing.Fig. 4 shows the CNN performance with N K = 10 in the phase space of the standard map.We observe that most of the failures correspond to chaotic trajectories starting in the fractal border region close to regular islands.These trajectories can be trapped for long times in the border region, with trapping time distributions exhibiting power law tails 15 .To quantify the performance of the CNN, we first vary the N K from 1 to 20 (Table I).The network is trained with chaotic and regular trajectories for K min = 1.0,K max = 2.0, M K = 11, and 1 ≤ N K ≤ 20 and the network performance is evaluated for 3 ≤ K ≤ 3.5 and M K = 6.The CNN

B. Training with the standard map, testing the logistic map
We proceed with testing how the CNN trained with standard map data performs in predicting chaos for other maps.We choose the logistic map as a simple one-dimensional chaotic test bed.The logistic map is written as x n+1 = rx n (1 − x n ).The parameter r controls the crossover from regular to chaotic dynamics, which happens at r c ≈ 3.56995.We use two training methods.The first one trains the network only with the p n data sequence from the standard map in Eq. 1.We coin that trained network 1D.The second one is the original CNN discussed above, coined here 2D.As shown in Fig. 6, the network mainly generates errors at the boundary of chaos region similar to the standard map.For 2.5 ≤ r ≤ 4.0 the accuracy is 84% for 2D network and 90% for the 1D network.

C. Training with the standard map, testing the Lorenz system
Next we test Lorenz system which is a three-dimensional map, with a CNN trained on the two-dimensional standard map.The Lorenz system is given by the following map equations: The parameters σ = 10, β = 8 3 , and ∆n = 0.001.The chaos parameter 0 ≤ ρ ≤ 39.8 was varied in steps of 0.2.Because the network is trained with 2D data (standard map), the prediction is performed by selecting only two dimensions in the 3D Lorenz system ((X n ,Y n ), (X n , Z n ), (Y n , Z n )).As Fig. 7 (a) shows, using trajectories obtained from Eq. 6 directly as a network input classifies most of them as chaotic.We think this happens because the trajectory data of the standard map used for training are bounded between 0 and 1, but the trajectories from Lorenz system are not.Input values that exceed these p 0,i , x 0, j ) were selected as (p 0,i = (i − 1) 1 50 , x 0, j = ( j − 1) 1 50 , (i, j ∈ Z, 1 ≤ i, j ≤ 51, )).Other parameters are listed in the main text.boundaries cause nodes in the network to be active regardless of the input characteristics.Therefore we normalize the input data from the Lorenz system.This leads to a drastic increase of accuracy as shown in Fig. 7 (b).We also tested the outcome when selecting only one dimension in the Lorenz system for the input vector.We find a strong reduction of the accuracy.We therefore conclude that the training and testing data are yielding best performance when for both the minimum of the two dimensions (training map, testing map) is chosen.

V. CONCLUSION
We trained convolutional neural networks with time series data from the two-dimensional standard map.As a result, the network can classify unknown short trajectory sequences into chaotic or regular with high accuracy.To reach accuracies of up to 98% we need trajectory segments length less than 5-10 Lyapunov times.Similar accuracies need 100-1000 longer segments when using traditional classifiers based on measuring Lyapunov exponents.The main cause of errors is due to fractal phase space structures at the boundaries between chaotic and regular dynamics.Trajectories launched in these regions yield sticky trajectories which can mimick regular ones for long times, only to escape at even larger times into the chaotic sea.We also used a network trained with twodimensional standard map data to classify chaotic and regular dynamics in one-and three-dimensional maps.Surprisingly high accuracy is reached when the training data are projected into one dimension for predictions on the one-dimensional logistic map, and when to-be-predicted data from the threedimensional Lorenz system are projected onto two dimensions.We conclude that accuracy is optimized when the minimum of the two dimensions (training map, testing map) is chosen for both training and testing.

2 .FIG. 2 .
FIG.2.λ N versus N for a chaotic (triangles) respectively regular (squares) trajectory with K = 1.0.The dashed horizontal line indicates the value of λ for the chaotic trajectory, and the dashed vertical one the corresponding value of T λ .

FIG. 3 .
FIG. 3. Performance comparison of a Lyapunov exponent based method and a deep learning method to distinguish chaotic and regular trajectories for K = 2.5 and λ ≈ 0.56.(a) Histogram of of λ N=20 .the dashed vertical line indicates the location of the threshold (see text for details).(b) Same as (a) but N = 3 × 10 5 .(c) The success rates P R , P C and P tot as a function of N for the Lyapunov exponent based method (see text for details).(d) Same as in (c) but for the deep learning based method.The network was trained for K = 2.5 and 2081 trajectories.The remaining 520 trajectories are used for testing.N in (d) represents the trajectory length used for network training and test.K min = K max = 2.5, M tr = 2081, M tt = 520, N K ≡ N

FIG. 4 .
FIG. 4. Chaos classification in the standard map.The Lyapunov exponent classification with trajectory length N = 3 • 10 5 is used as a reference classifyer for K = 1 (a) and K = 2 (b).The CNN test results are shown for K = 1 (c) and K = 2 (d).Open circles -regular, gray circles -chaotic.Black circles show the error locations of the CNN prediction.The CNN parameters are K min = 1.0,K max = 2.0, M K = 11, M tr = 2081, M tt = 520, N K = 10.

FIG. 6 .
FIG. 6.The result of predictions for the logistic map with a network trained from the standard map.The blue and red dots are the cases where the network correctly predicts chaotic and regular trajectories respectively.The black dots show where the prediction fails.The network is trained with K min = 1.0,K max = 2.0, M K = 11, M tr = 2081, M tt = 520, and N K = 20.(a) Test results for the 2D training (see text for details).(b) Test results for the 1D training (see text for details).

FIG. 7 .
FIG. 7. The result of predictions for the Lorenz system with a network trained from the standard map.The XY, XZ, YZ bars represent the dimensions of the Lorenz system used as input to the network trained with (p, x) data from the standard map.The training conditions are N K = 20, K min = 1.0,K max = 2.0, and M K = 11.The X, Y, Z bars represent the single dimensions of the Lorenz system used as input to the network trained with p data only from the standard map.(a) Accuracy without normalizing the trajectories of the Lorenz system.(b) Accuracy when normalizing trajectories of the Lorenz system.