Recurrence Rate spectrograms for the classification of nonlinear and noisy signals

Time series analysis of real-world measurements is fundamental in natural sciences and engineering, and machine learning has been recently of great assistance especially for classification of signals and their understanding. Yet, the underlying system’s nonlinear response behaviour is often neglected. Recurrence Plot (RP) based Fourier-spectra constructed through τ-Recurrence Rate (RR τ ) have shown the potential to reveal nonlinear traits otherwise hidden from conventional data processing. We report a so far disregarded eligibility for signal classification of nonlinear time series by training RESnet-50 on spectrogram images, which allows recurrence-spectra to outcompete conventional Fourier analysis. To exemplify its functioning, we employ a simple nonlinear physical flow of a continuous stirred tank reactor, able to exhibit exothermic, first order, irreversible, cubic autocatalytic chemical reactions, and a plethora of fast-slow dynamics. For dynamics with noise being ten times stronger than the signal, the classification accuracy was up to ≈ 75% compared to ≈ 17% for the periodogram. We show that an increase in entropy only detected by the RR τ allows differentiation. This shows that RP power spectra, combined with off-the-shelf machine learning techniques, have the potential to significantly improve the detection of nonlinear and noise contaminated signals.


Introduction
Dynamical systems are often mathematically described by nonlinear differential equations, which are difficult to solve using analytical calculus, and commonly require simplification and approximation.Real-world problems related to engineering [1], biology [2], chemistry [3], physics [4], medicine [5] (neuroscience) or economy [6], regularly deal with nonlinear observations to which the underlying formulae are unknown, revealing the significance of nonlinear time series analysis [7,8].Often important signal figures are neglected or are assumed as noise, especially when only analyzing the Fourier transform {i} spectrum.In recent years, Machine Learning (ML) based techniques using Convolutional Neural Networks (CNNs) {ii} and their application to classifying acoustic or vibration data using periodogram techniques have found growing popularity [9][10][11].
Recurrent, dynamic behaviour of an observable measured as time series of arbitrary dimension can be represented as matrix of recurrences.This is visualized using a two dimensional Recurrence Plot (RP) and quantified using Recurrence Quantification Analysis (RQA) {iii} , not limiting the data to properties such as stationarity, linearity, or determinism [8,[12][13][14].Embedding univariate time series may reveal higher order and often also higher-dimensional dynamics, both of which are often neglected in conventional signal processing but can be incorporated in a RP-based approach.Thus, linking RQA to linear data processing may contribute to bridging the gap in the analysis of nonlinear deterministic systems.The Fourier spectrum of a time series can be estimated from the RQA measure τ-Recurrence Rate (RR τ ) and is able to identify higher periodic orbits, which a classical spectrum estimation fails to achieve [15].This technique has already been successfully applied to reveal unstable and stable periodic orbits of various continuous and discrete theoretical dynamical systems [16,17]; to characterize hyperspectral geological data [18], proteinsolvent interaction energy [19], and a phasor measurement unit in electrical power systems [20].
Despite potentially higher information content on recurrences present3 in a RP-based spectrum, physical interpretation is difficult owing to the abstract meaning of power per frequency.Attempts have been made to overcome this issue [17].However, the application of recurrence-based power spectra in combination with ML has never been attempted.
The current work aims to bridge this gap by employing recurrence-based power spectra to highly nonlinear, and strongly noise contaminated signals, to calculate spectrograms and further use this information for the classification of dynamics via CNN.The new approach is compared to the application of conventional periodograms using different dynamics and noise levels.

Power spectral density estimation
Consider the discrete time series x ä R N×m with N state vectors  x i of dimension m = 1.By application of Takens' embedding theorem {iv} , the observable is expanded to a higher dimension through shifted copies of x, recovering more of the dynamics [8,13].Embedding dimension m and time-delay τ m are estimated using both, the global False-Nearest Neighbors {vi} [21,22] and Average Mutual Information {v} [13,23] algorithms.The embedded time series x emb has state vectors Let the distance matrix D ä R N×N be derived from x emb with arbitrary dimension m, holding pairwise distance information of all data points, i.e.
The distance matrix is symmetric by definition [13], since D i,j = D j,i .The recurrence matrix R is an abstraction of D by introducing the Heaviside function Θ(x) {vii} [24] and a thresholding criterion RPs represent a binary visualization of R, where R i,j = 1 is coloured black, and white otherwise.For εneighbourhoods of constant size, which are used throughout this work, RPs are mirrored along the line of identity (i = j) with R i,i = 1. 4or a univariate observable [22], the discrete Fourier transform {ix} [25] of x results in a discretized frequency spectrum with i indexing over all elements of x, and k denoting the kth frequency bin, corresponding to f k = kf n /N, where f n = F s /2 is the Nyquist frequency {x} [26].
Due to the finite length of x, its actual Power Spectral Density (PSD) {xi} can only be estimated (superscript ˆ) . The Wiener-Khinchin theorem {xii} directly relates a signal's true (continuous) PSD S(Ω) and its auto-correlation function r xx (κ) {xiii} as Fourier transform pair 15,27], where κ is the continuous time shift between the original and a copy of the signal.For a stochastic processe.g.discrete, finite signal x corrupted by Gaussian White Noise (GWN) {xiv}r xx (κ) at discrete time shifts τ can be estimated as x denoting the complex conjugate of  x [28] and the scaling factor α controlling the expected value t {ˆ[ ]} E r xx .For α = 1/(N − |τ|) the estimation is unbiased, i.e. the expected and true values are equal . Digital signals are time-discrete.Therefore, to apply the Wiener-Khinchin theorem defined for continuous time, periodogram and auto-correlation function must be related such that the following condition holds . 5  Extracted from the RP are a range of RQA measures.The Recurrence Rate (RR) represents the amount of recurrence points in a RP relative to its size, i.e.
, depending on ε [13,18], with instances of RR often expressed as percentages.Closely related, RR τ is the probability of recurrence after time delay τ which is the ratio of RR on a diagonal at distance τ parallel to the line of identity [13] and the expected value RR τ = E{R i,i+τ }.Analogously to the auto-correlation function this estimate is unbiased, since may be used as PSD estimates in nonlinear dynamical systems [15][16][17][18].
Due to the scaling factor, unbiased auto-correlation functions are indefinite and have statistical and numerical disadvantages for practical applications in power spectral density estimations.They may result in negative values of power, contradicting physical interpretation and prohibiting a representation in logarithmic scale [26,[28][29][30].This was observed by us also for unbiased RR τ and the usage of a biased estimator (superscript * ) is proposed in this paper for PSD estimation. 6

Spectrogram
A sliding window of size N w is moved across the time series x with overlap of η.An individual PSD is estimated for each window placement and thus the spectrogram [31] is constructed as a two-dimensional time-frequency representation of x, with the first dimension denoting the window number and the second dimension the frequency bins f k .
To display the spectrogram as a digital image, the power value of each frequency bin is window-wise mapped to an integer value in the range of [0, 1, 2,...,255], later used to represent the spectrogram as an 8-bit grey-scale {xv} image (cf.figure 2).This process is called contrast stretching [32].Before stretching, power is converted to dB-scale {xvi} to increase the dynamic range.

The dynamical system: continuous stirred tank reactor
As benchmark system, a representative for chemical reactions of specimens inside an open, iso-thermal, continuous stirred tank reactor [33], found in industrial applications, well-mixed, and without diffusion, is used.The system shows exothermic, first-order, irreversible, cubic autocatalytic reactions, occurring in parallel, and exhibiting rich dynamic behaviour, following the period-doubling route to chaos [7].Due to the higher complexity of two reactants involved, mixed mode oscillations (fast-slow dynamics) are present [34][35][36][37].
The system is explicitly described by three differential equations 5 As a property of the discrete Fourier transform [26], the equality in equation ( 6) is attained from the auto-correlation function using the cyclic continuation of the underlying signal x.However, the potential numerical error is assumed to be negligible. 6The positive definiteness of t  * RR has not been proven and where negative values of power (generally close to zero) were encountered, their absolute values were used.

= --
the first two representing the dimensionless concentrations Y and Z of reactants A and B (and their conversions into their products).The third equation represents the reactor temperature T. The time series of the latter are used in this analysis, in reference to the accessibility of measurements in real autocatalytic reactors [33,38].For arbitrary values of β 2 and Da 1 , with α = 250, β 1 = 0.04, γ 1 = γ 2 = 25 and Da 2 = 2.75, the differential equations provide one unique solution to the system's steady-state [39].A number of parameter configurations resulting in different limit cycles on the attractor or chaotic behaviour are considered in table 1. Period doubling bifurcations occur between A → B → C → D. In figure 1, phase portraits of all configurations with their embedding are given and the thresholds ε satisfying RR = 30% in the respective RPs are provided, which are applied throughout here.This was used to maximize the information content, contrary to application domains where a lower RR of e.g.1% is recommended [40].However, a state in phase space may be characterized by the trapping time and with an appropriately high threshold, this is encrypted in the RP as vertical structures [13].
One dimensionless time unit of the differential equations is hereafter assumed to be equal to 1 s.Time series of dimensionless T oscillations were computed using the function ode45 of Matlab ® (ver.R2023a) at a sampling frequency of F s = 1 kHz and initial state equal to zero.The sampling time was 40 s and only the steady-state response was extracted by discarding the first 20 s of the time series.F s was chosen with respect to the slow dynamics of the system and validated by comparison to the time series sampled at F s,high = 100 kHz.Periods were aligned with regards to different convergence times to the steady-state solution.The The Root Mean Square Error (RMSE) {xvii} of temperature was less than 0.65% relative to the temperature range for all periodic configurations.
To simulate ambient conditions, computer generated signals are contaminated by additive WGN = + x x n.Signal to Noise Ratio (SNR) is defined as the ratio of Root Mean Square (RMS) {xviii} of signal and noise, respectively

RESnet-50 as machine learning architecture
Within the domain of ML, CNNs have emerged as the most effective image recognition algorithms [41].They are successfully used in signal detection when trained with spectrogram images [42,43].In this work, the pretrained network RESnet-50 [44] is incorporated in a transfer learning approach using the Deep Network Designer for Matlab.Overall accuracy of a network is evaluated as the percentage of correctly classified validation images, which are not part of the training set.The input image size for the CNN is 224 × 224 pixels.The network is altered, such that the last fully connected layer and classification layer have an output size of six, i.e. one probability for each signal class.The learning rate of the modified layers was increased tenfold to mitigate the disadvantage compared to pre-trained layers more quickly within the training process.As a solver stochastic gradient descent with momentum of 0.9 is used.During training, iterative optimization of classifiers is achieved through minimization of the cross-entropy loss function {xix} [45] åå Table 1.Parameter configurations to simulate the dynamical system inside a nonlinear chemical reactor [33].A-D are limit cycles on the attractor; E and F are chaotic steady-state solutions.n sub is the number of subharmonics in one period and T avg the average subharmonic duration, averaged over 24 local maxima of T.   was used, namely x-translation up to ±24 pixels.Other common augmentations (e.g.stretching, rotation) alter the information content of a spectrogram and were therefore not considered.

Power spectral entropy
Shannon entropy [46] measures the randomness of a probability distribution.In information theory, a discrete sequence of quantized values comprises a signal, which carries a certain amount of information measured by its entropy [47].The entropy of a system's observation indicates how well it was captured and characterized [8].By interpretation of a signal's PSD representation as a probability distribution of power per frequency, the power spectral entropy [48,49] is obtained as where ŝ denotes the PSD estimation normalized to total power.The logarithmic base of equation ( 14) can be changed to b = N by which gives a normalized value in the range [0, 1].

Spectrogram classification using CNN
The t  * RR -spectrum is beneficial for capturing higher-dimensional temporal dynamics of nonlinear time series.This shall be exploited by visualization of signal evolution as a spectrogram [31], then used to classify the observable.Training data is generated from steady-state time series x of system configurations A-F.Segments of t = 16.36 s are extracted, with a random shift between 0 to 1s, imitating a random time of measurement, as would be encountered in real-world data acquisitions.
The signals are corrupted by additive WGN n to test the robustness of PSD estimation in challenging SNR scenarios.Signals A-F do not have equal power, hence the power of n is fixed relative to power of signal A equaling SNR = { ∞ , 1, 0.5, 0.33, 0.2, 0.1}.This is applied to all signals rather than choosing power relative to each signal, to prevent the CNN from learning the different power in noise as a classifier.This is also in agreement with the assumption that the ambient noise is constant and independent of the signal.
Noisy time series x are normalized to standard distribution σ = ± 1.For recurrence-based signal processing with t  * RR , the parameters given in figure 1 are used to delay-embed the respective signals and construct their RPs.Additionally, the analysis is repeated using fixed embedding parameters m = 3 and τ m = 9 for all configurations, denoted as t  * RR ,fe , with unchanged thresholds.For window size N w = 692 and overlap η = 622 ≈ 0.9 × N w , spectrogram images are computed for each of the three techniques (periodogram, individual and fixed recurrence spectrum).The window was chosen to have the minimal size covering at least one period of steady-state signals A-D and the overlap facilitating 224 window shifts.The highest measurable frequency bin is F s /2 = 500 hz, however, throughout all considered signals, more than 90% (>99% for periodogram) of total power is distributed within frequencies f = (0, 120] hz, which is hence used to capture only the most important dynamics.For every SNR scenario and signal configuration, 1000 random realizations of WGN were generated.Each of them is added to the respective signal and spectrogram images are computed for all techniques as previously described.Thus, for every SNR scenario, a set of 6000 images per processing method is obtained.Of each data set, 30% of images were separated for validation, leaving 4200 training images.Example spectrograms of the system in configuration C, without additive noise, are given in figure 2 and the final CNN accuracy is given in figure 3(a).
For signals without noise, 100% accuracy is achieved by all methods.Classification accuracy for the periodogram is directly related to SNR, as it decreases monotonically with increasing noise level.Up to SNR 0.5, the accuracy is above 90% but then drops significantly, until eventually reaching an accuracy of 16.89% for SNR = 0.1, which is nearly identical to classification by chance (100%/6 = 16.67%).In contrast, the performance when recurrence-based training data is used is more consistent.Although the accuracy at SNR 0.5 for t  * RR is not able to match that of the periodogram, the validation accuracy is only slowly decreasing and remains above 75% throughout all considered noise levels.For a fixed embedding t  * RR ,fe , the accuracy at SNR = 1 is significantly improved compared to t  * RR and matches that of the periodogram.However, for SNR 0.5 the accuracy is consistently ≈5 to 10% lower than for t  * RR , suggesting that different embedding is beneficial but not solely responsible for classification.From a dynamical point of view, dynamic regimes A-D are periodic and thus all of them do not generate any new information.Therefore an evenly distributed validation accuracy could be expected.Table 2 gives the Kullback-Leibler divergence {xx} [50,51] for all combinations of configurations.The Kullback-Leibler is used here as a measure of how well one configuration is separable from the rest.A low divergence therefore is an indicator for a high probability of confusion during CNN classification.For each configuration, the average divergence among the set of signals is provided.In each case, the average divergence is strongly influenced by a relatively high divergence to the chaotic regime F. Therefore, to accentuate the low divergences in between limit cycle solutions, the median is expected to depict the probability of confusion better.
Across all 15 CNN training processes on noisy time series, B performs worst six times, C five times (once shared last with B), D four times and E once.This relates statistically with the configurations having the lowest median Kullback-Leibler divergence towards all other regimes, although the distribution is distorted, e.g.configuration B performs overall the worst but has higher median divergence than C and D. The reason for this distortion may lie in the relatively low number of only 15 training sets.Also, the estimation of the confusion probability is not complete and takes into account only the global median divergence rather than a local confusion probability to the least separable regimes.
In addition, the training behaviour of CNNs must also be considered, which aims at the minimization of a loss function.If no global solution, i.e. successful classification of all variables, is found, convergence to a local minimum will occur, favouring the validation accuracy of one or a few classes over other classes [52].Which class this bias will fall on may differ with each training iteration, as it also depends on the exact images being considered at certain times during training while shuffled randomly.This is a widely studied problem within the ML research community, yet finding a solution here is beyond the scope of this study and requires optimising the training behaviour to become more evenly distributed.
For SNR = 0.1 and periodogram training, it appears that the CNN has converged to a solution-favouring configuration D, which was therefore accurately classified in 38.67% of cases, whereas the other signals where classified correctly only 20% of times, or less.For t  * RR it stands out, that configurations A, D and F were correctly identified in more than 85% of cases.Signal A has a unique embedding delay compared to the other signals, which could be a reason for the accurate discrimination.However, for configurations D and F, the delay parameters are shared respectively by E, as well as B and C.This again confirms that the embedding alone is not responsible for successful classification.
Furthermore, although configurations D and E have not only common embeddings but also similar thresholds, with deviation of less than 3%, the chaotic behaviour of signal E was successfully discriminated even for high noise.For periodic signals B and C with neither unique embedding, nor unique threshold, the classification accuracy is significantly lower.In the classification, B and C were always correctly separated from the other signals but not from each other, i.e. false classification of B exclusively predicted a signal of class C and vice versa.The same is true for signal pairs D-E and A-F.This also occurred in the fixed embedding analysis, indicating that the classification is grouped into three separate problems, according to the similarity of threshold values.The overall validation accuracy of above 50% proves that within these groups successful discrimination is however possible.

Complexity analysis
Evaluating the complexity of a classification problem is useful in predicting the performance of a ML model [53].Similarly, it can contribute to choice and validation of data pre-processing.Three categories are identified as

Information content
PSDs are calculated for signal segments with a size of N w = 8T avg F s at random position.The factor of eight ensures that at least one period of all limit cycle solutions is included, while the windows across all signals have similar number of samples.The power spectral entropy, as a measure of information content, for the bandwidth f = (0, 120] Hz is shown in figure 4. Recurrence-based spectra consistently generate significantly more information about the underling system than the periodogram throughout all configurations.

Consistency of PSD estimation
One period of a linear, periodic, stationary signal includes all information about the continuation of the signal.Hence, the same spectral density can be measured in a signal segment of length equal to the period, independent of time [25], which is not necessarily true for nonlinear signals.To test the robustness of PSD estimation against a delay in the time of measurement, signals A-F are segmented again by the rectangular window of size N w , which is shifted in time incrementally for a total of N w steps.For each realization the individual PSD is estimated using both, periodogram and t  * RR .As in the spectrogram, power is then converted to dB for better dynamic range and re-scaled to [0, 100], allowing comparison between periodogram and recurrence-based spectra, which have different total power.For a statistical analysis of the homogeneity of PSD estimation, the average spectral density across all realizations is considered as control.RMSE between each realization and the control acts as an indicator of similarity [55].The standard distribution of RMSE for all signals is presented in table 3

(a).
There is no clear indication that one technique is advantageous in this respect as each of them shows lower average RMSE in three of the signals.The high variation of ± 3.08 for periodograms of signal F stands out being twice as high as for the others and could be related to the chaotic nature of the signal.Due to the complex periodic behaviour and qualitative differences between the signals, differences in RMSE between signals would be expected and exist for both techniques.Table 3(b) gives the results, when the experiment is repeated for noisy signals with SNR = 0.5.The distribution of RMSE in the periodogram is higher and now consistent at ≈10 ± 1.5 across all configurations, likely due to the noise substantially contributing to the PSD.In contrast, for t  * RR the average RMSE increased moderately compared to the signals without noise and now with the exception of A is always ≈12 to 30% lower than for the periodogram.Further, distinctions between signals are persistent and the variance in RMSE is high, indicating that the impact of noise varies with signal evolution and therefore features may not suppressed equally strong.Overall, this indicates a better resilience against noise, which is in good agreement with the improved classification accuracy of noisy signals, cf.[16,18].

Uniqueness analysis
For classification, the representation of information has to be unique.Thus, Spearman's rank correlation r rs (x, y) [55] acts as a metric for the similarity of two signals' PSD.It returns a value in the range [−1, 1], where r rs = 1 means identity, r rs = − 1 indicates that x and y are opposites, and r rs = 0 denotes maximum disparity. 7Figure 5 shows the correlation of control spectra (see above) between configurations A-F as a percentage.
Figure 5(a) displays the correlation of PSDs computed using the periodogram.The lowest correlation of r rs = 76.24%occurs between A and D. The spectra of pairs B-C; C-D; D-E; and E-F correlate significantly, each with 96.51%; 99.15%; 93.43%; and 98.98%, which translates to a low discriminative power between them.  Equivalently, the mutual information [56] could be used and could be more suitable in case the r rs does not work.RR spectra shows much lower overall correlation between configurations.Signal A has a correlation of r rs 72.31% with all other signals and as low as r rs = 58.62% for the pair A-F.This is in good agreement with the accurate classification of this signal by the CNN.Further, the correlation of all neighbouring pairs is below 90%, which is a significant improvement, compared to the periodogram, and shows that the t  * RR spectrum is more sensitive to weak characteristics of the signals.

Discussion
Classification of noisy, nonlinear data is a recurring and practically relevant, widespread issue which we address here using nonlinear time series analysis, especially RQA.A recurrence-based power spectrum, even though rarely studied, has been useful in a range of applications [15][16][17][18].We therefore extended the PSD estimate to form the recurrence-based spectrogram using recurrence-based power spectra.We systematically estimated the PSD of periodogram and t  * RR using the time series of a model of a continuous stirred tank reactor [33].The signal's complexity as well as its signal to noise level have been altered by considering higher order dynamics and adding noise to the signal.Different dynamic regimes following the period-doubling route to chaos, ranging from periodic to aperiodic/chaotic motion were classified using the deep CNN RESnet-50.For increasingly noisy conditions down to a SNR = 0.1, we showed that for the t  * RR method, both with signal dependent as well as fixed embedding, compared to the classical periodogram method a higher classification accuracy can be achieved.
The main improvement in validation accuracy for noisy time series achieved through the introduction and use of recurrence-based spectrograms is considered to result partly from the thresholding practice in recurrence plots, which offers a natural resilience against noise [13], and partly from the incorporation of higher order dynamics through attractor reconstruction, which adds more information to the PSD and therefore to the training images for the CNN to learn and classify.
By employing the power spectral entropy, we demonstrate that the recurrence-based power spectra contain more information; they are more homogeneous when measured at different times of the noisy signal, and they show stronger uniqueness, when compared to similar signals of the same attractor.Homogeneity is important because it increases ML trainability and efficiency and allows task specific network designs [57,58].Uniqueness is significant because it leads to higher discriminative power of features and fewer class overlaps reduce false predictions [53].The information content in recurrence plots, underlying the RR τ is dependent on a-priori information about the system and its embedding.The embedding delay τ m has shown to have a strong influence on the information of the spectrogram and thus the classification accuracy.Determining optimal embedding in noisy signals is however a complex process and more research needs to be conducted to better understand optimality in light of different scenarios.The efficacy of equal embedding on all signals was demonstrated here though, again achieving excellent results, with a reduction in overall classification accuracy of only ≈10%.The threshold ε applied to RPs for the creation of recurrence-based spectrograms additionally supports the classification but is difficult to obtain from the noisy signal.While research on the effects of noise on the RP is ongoing [59], it was here shown, that a CNN trained on signals with optimal thresholding is capable of distinguishing between noisy signals processed with near identical threshold values, without any de-noising attempts on the RP.
While validation of the proposed technique on real-life data featuring various types of noise is yet outstanding and may further reveal the method's high potential in the future, the here considered scenario suggests already good applicability to numerous problems.Having extended the classification capability to very low SNR makes it suitable to the detection of weak signals.It is therefore expected to be well suited for e.g.classification of acoustic emissions [60] or vibratory insect signals [61], often being nonlinear due to the wave carrier material.
Oberst et al [18] filtered multiplicative noise with the GHKSS method [8] and then applied the recurrencebased power spectrum.However, it is yet unclear how phase noise and multi-operational coupling models of noise can be studied with the recurrence-based spectrograms.Also, further improvements may be achieved in low SNR scenarios combining feature extraction with using wavelet decomposition, either on the signal itself [62], or two-dimensional on the RP [63] before processing.
The best improvements over conventional signal processing were achieved when the optimal embedding and thresholds were known.Hence, application is highly recommended to problems where the observable of interest may be measured under laboratory conditions and later also classified during exposition to noise in its designated working environment.One area of such could be health monitoring of machine parts.Signature signals of e.g.faulty roll bearings [10] or friction parts [64] may be characterized in a test setup and then detected during active machine operation.Another applicable research field is acoustic source localization in anisotropic materials, which often faces the challenge of time difference of arrival measurements at nearby sensors.More reliable classification and therefore timing of the nonlinear vibration response signals could lead to better approximation of the excitation's origin.
Using recurrence-based spectrograms in combination with more elaborate embedding algorithms and thresholding criteria could also lead to fully automatic classification capabilities without a-priori knowledge of the system for unsupervised ML in those applications.
(vi) Global false-nearest neighbours algorithm: Using an already determined time delay τ m , a time series is embedded.The algorithm tests if neighbours of the time series remain neighbours after embedding.If not, the embedding dimension m is increased to minimize the false neighbours [21,23].
(vii) Heaviside function: The function, defined as maps negative inputs to zero and all other inputs to unity [24].
(viii) Thresholding criterion: The thresholding criterion is responsible for the appearance of a recurrence plot.It defines for each element of the distance matrix a threshold value, marking the state as recurrent or not [13].
While many criteria exist, in this work the threshold is fixed and thus equal for every element of the matrix.
(ix) Discrete fourier transform: For digital signals, the discrete Fourier transform is defined and can be efficiently computed using the fast Fourier transform algorithm [26,67].
(x) Nyquist frequency: The Nyquist frequency is half of the sampling frequency and denotes the highest frequency that is observable in a discrete time series.If higher frequencies are present in the original signal [26].
(xi) Power spectral density: The power of each frequency component present in a signal is given as a function called the power spectral density.The function is independent of the frequencies phase, which makes it well suited for detecting certain signatures within the signal contents [28].
(xii) Wiener-khinchin theorem: The Wiener-Khinchin theorem states, that for stationary, statistical processes, the power spectral density may equivalently be derived from either the signals frequency representation or the Fourier transform of its auto-correlation function [15,28].
(xiii) Auto-correlation function: The auto-correlation function may be regarded as the linear correlation measure of a signal with a shifted version of itself, i.e. how well they align.This is especially useful for the analysis of periodicities in a signal [28].
(xiv) Gaussian white noise: The noise signal has a Gaussian distribution in the time domain around its mean value of zero, while featuring a uniform power spectral density across its bandwidth [68].
(xv) 8-bit grey-scale: In digital imaging, when the grey-scale colour of a pixel is given by a number in {0,...,2 8 − 1}, it may be represented by one Byte of storage.Each number has assigned a shade in between zero being white and 2 8 − 1 = 255 displayed as black.
(xvi) Decibel scale: To emphasize differences and relations between frequencies, their power is often expressed logarithmically as decibel (dB) [28], with the conversion as follows (xviii) Root mean square: For a set of numbers, the root mean square is defined as the square root of the arithmetic mean of the square of all values in the set.
(xix) Loss function: During the training of a neural network, the weights of all functional layers that the input is propagates through are iteratively adjusted.The loss function measures the performance of these weights and is zero only when the network's classification is optimal.Minimization of the loss function drives the training process [45].
(xx) Kullback-leibler divergence: Also known as relative entropy, it measures the difference of two probability distributions, i.e. the information of discrimination [50,51].A divergence of zero means equality of the distributions.

ORCID iDs
Thore Hertrampf https:/ /orcid.org/0009-0003-1327-4027Sebastian Oberst https:/ /orcid.org/0000-0002-1388-2749 (training images), K is the number of classes and y lk the probability output from the network, that element l belongs to class k.The binary value t lk is one, if element l is of class k and zero otherwise.Other training settings of the network are (i) a fixed learning rate of 0.001; (ii) the mini batch size of 12; (iii) the training duration of 10 epochs; (iv) a validation frequency of 30; (v) shuffling of training data after every epoch; (vi) all other settings according to the default.To reduce over-fitting issues, basic image augmentation

Figure 1 .
Figure 1.Left: phase space trajectories of nonlinear attractor configurations in table 1. Right: delay embedded phase space from the single variable T (N = 3000), with each dimension normalized to standard distribution σ = ± 1. Embedding parameters m and τ m , as well as the neighbourhood ε to satisfy RR = 30% in the RP are given for each configuration.

Figure 2 .
Figure 2. Spectrograms for system configuration C with n = 0; N short = 692; η = 622.Power in dB is scaled to 8-bit values and mapped to grey-scale.

Figure 3 .
Figure 3. CNN validation accuracy with 700 training and 300 validation spectrogram images per system configuration, based on the periodogram, individual and fixed embedding t  * RR and t  * RR ,fe recurrence spectra, respectively.

Figure 3 (
Figure 3(b)  gives the classification accuracy per signal for SNR = 0.1.From a dynamical point of view, dynamic regimes A-D are periodic and thus all of them do not generate any new information.Therefore an evenly distributed validation accuracy could be expected.Table2gives the Kullback-Leibler divergence {xx}[50,51] for all combinations of configurations.The Kullback-Leibler is used here as a measure of how well one configuration is separable from the rest.A low divergence therefore is an indicator for a high probability of confusion during CNN classification.For each configuration, the average divergence among the set of signals is provided.In each case, the average divergence is strongly influenced by a relatively high divergence to the chaotic regime F. Therefore, to accentuate the low divergences in between limit cycle solutions, the median is expected to depict the probability of confusion better.Across all 15 CNN training processes on noisy time series, B performs worst six times, C five times (once shared last with B), D four times and E once.This relates statistically with the configurations having the lowest median Kullback-Leibler divergence towards all other regimes, although the distribution is distorted, e.g.configuration B performs overall the worst but has higher median divergence than C and D. The reason for this distortion may lie in the relatively low number of only 15 training sets.Also, the estimation of the confusion probability is not complete and takes into account only the global median divergence rather than a local confusion probability to the least separable regimes.In addition, the training behaviour of CNNs must also be considered, which aims at the minimization of a loss function.If no global solution, i.e. successful classification of all variables, is found, convergence to a local minimum will occur, favouring the validation accuracy of one or a few classes over other classes[52].Which class this bias will fall on may differ with each training iteration, as it also depends on the exact images being considered at certain times during training while shuffled randomly.This is a widely studied problem within the ML research community, yet finding a solution here is beyond the scope of this study and requires optimising the training behaviour to become more evenly distributed.For SNR = 0.1 and periodogram training, it appears that the CNN has converged to a solution-favouring configuration D, which was therefore accurately classified in 38.67% of cases, whereas the other signals where classified correctly only 20% of times, or less.For

Figure 4 .
Figure 4. Normalized power spectral entropy of PSD estimation with bandwidth f = (0, 120] Hz for each configuration of the system, calculated using periodogram and biased t  * RR .

Figure 5 .
Figure 5. Similarity of PSD estimation for all pairs of system configurations as Spearman's rank correlation r rs .

10 (
xvii) Root mean square error: The error or deviation between two sets of numbers is measured as the square root of the arithmetic mean of the squared difference between the sets.
RR ,fe , respectively, to utilize all available 224 rows and columns of pixels in the image.
res by zero-padding [25] of r xx , t  * RR and t  *

Table 2 .
[53,54]k-Leibler divergence between pairs of reactor temperature time series of dynamic regimes given in table 1.For each column the average (Avg) and median (Med) divergence is given.This is used here as an estimation for each configuration, implying how well the time series is separable globally (within the set of configurations).theunderlyingproblem, namely existence of informative class features, i.e. maximization of information as input to the classification problem; homogeneity of topology and the distribution of manifolds of the input; and class linearity, i.e. the grade of separability of inputs belonging to different classes[53,54].In the following, complexity measures are used to validate the CNN classification results.

Table 3 .
Distribution of RMSE between PSD estimation and mean PSD, as the window of length N w = 8T avg F s of the respective configuration is incrementally shifted N w times.Comparison of signals with and without noise.