Spectral density classification for environment spectroscopy

Spectral densities encode the relevant information characterizing the system–environment interaction in an open-quantum system problem. Such information is key to determining the system’s dynamics. In this work, we leverage the potential of machine learning techniques to reconstruct the features of the environment. Specifically, we show that the time evolution of a system observable can be used by an artificial neural network to infer the main features of the spectral density. In particular, for relevant examples of spin-boson models, we can classify with high accuracy the Ohmicity parameter of the environment as either Ohmic, sub-Ohmic or super-Ohmic, thereby distinguishing between different forms of dissipation.


I. INTRODUCTION
Recent progress in the field of quantum technologies has advanced our capabilities to control quantum systems and exploit their non-classical properties.Yet, this task presents significant challenges.Quantum systems are inherently open, as they inevitably interact with their surrounding environment [1,2].They are thus susceptible to gain and losses, as well as to the genuine quantum phenomenon of decoherence [3][4][5], which disrupts the phase coherence of superposition states, posing a major obstacle in preserving quantum states [6,7].If we are to effectively devise strategies for mitigating adverse environmental influences on a system, it is crucial to have a comprehensive understanding of the effects that need to be addressed when facing open quantum dynamics.This, in turn, requires a full characterisation of the mechanism governing the system-environment interaction.
To address such challenge, here we tackle the problem of characterising environmental effects on an open quantum system harnessing recent advances in the field of Machine Learning (ML).The latter has opened up new data-driven approaches, which have shown their effectiveness in various applications in the field of quantum technologies [8].Among those, some are very close to the spirit of this work.MLbased methodologies have been applied to quantum tomography [9][10][11], quantum channel discrimination [12], simulation of open quantum systems [13][14][15][16], as well as quantum control [17][18][19].Refs.[20,21] reported the deployment of deep-learning methods to the inference of photon correlation functions and phonon blockade effects based on homodynedetection schemes.
We focus on the typical open quantum system scenario, where we are able to effectively describe and control the reduced system, as opposed to the infinitely many uncontrollable environmental degrees of freedom which are responsible for dissipation and decoherence.In this setting, we focus on the interaction of a given system with an external environment in terms of the Spectral Density (SD), which, by encoding full information about the system-environment coupling, allows us to determine the two-time correlation function of the environment.Having full knowledge of this quantity allows us to predict the temporal behaviour of an open quantum system without a full microscopic description of the environment.
The SD for a given system-environment interaction, however, is rarely directly available and challenging to calculate from first principles.The form of a SD is at best phenomenologically inferred through empirical data gathered from experimental observations, and at worst guessed using ad hoc assumptions, which might result in significant discrepancies between the predicted and actual dynamics of the system [22].In this work, we consider the case of a quantum system interacting with a bosonic thermal bath.Depending on the nature of the system-bath interaction, the system dynamics can manifest as either pure dephasing or amplitude damping [23][24][25].The particular choice of the SD in this setting is responsible for possible memory effects.On one hand, we can encounter a scenario where the information is monotonically flowing from the system to the bath, i.e. the usual scenario characterising quantum Markovian processes [26,27].On the other hand, some functional forms of the SD are suitable to model a physical situation in which the system, dynamically interacting with its environment, can partially retrieve the information that was previously lost -these processes are dubbed as non-Markovian instead [28,29].
Prior works have studied the use of ML for noise characterisation in open quantum systems.Various methods have been explored, such as studying the noise in qubit systems using two-pulse echo decay curves [30], and random pulse sequences that are applied to the qubit [31].Additionally, other studies have focused on constructing the power spectral density for ensembles of carbon impurities around a nitrogen vacancy centre in diamond [32], and inferring the environment affecting superconducting qubits [33].
In this work, we show that an artificial Neural Network (NN) can be used to classify the SD characterising the dynamics of a system, based on its features.Previous research has examined the classification of aspects of noise in open quantum systems.For instance, in Ref. [34] ML techniques were used to discern between Markovian and non-Markovian noise.More pertinent to the matter at hand, aspects of the problem of distinguishing between Ohmic, sub-Ohmic and super-Ohmic SDs have already been studied: in Ref. [35], a scenario where a probe qubit is used to access a second inaccessible one is proposed to infer the Ohmicity class by using NNs and leveraging the special features of quantum synchronisation.In Ref. [36], a different use of NNs was put forward as tomographic data at just two instants of time were used, rather than a time-series approach.In contrast, this work takes a simpler approach by utilising the time evolution of a system observable for classification without the need for a probe system or tomographically complete information.We focus on the case of a general Spin-Boson (SB) model to show that, even when the environment cannot be exactly traced out to infer the reduced dynamics of a system, a NN can classify the SD with high accuracy.Furthermore, we discuss the limitations imposed by the fluctuation of the parameters in the SD, the number of sampled points in the time signals, and measurement sampling noise.Our study emphasises the potential of ML techniques to characterise environments with arbitrary SDs.
The remainder of this paper is structured as follows: in Section II we provide an introduction to the general setting under consideration, as well as the ML approach utilised.Specifically, we examine an arbitrary system that is interacting with a bosonic environment and we give some background on the ML model used, namely, NNs.Next, in Section III we detail the physical models that are considered.We investigate two SB models: in the first case, we are able to exactly derive the pure dephasing dynamics starting from the full system-bath unitary evolution; in the second case, we work in the weak coupling limit to approximately derive the reduced dynamics of the system.In both cases, the dynamics can feature non-Markovian effects, depending on the SD we select.In Section IV we discuss the architecture of the NNs, along with a detailed discussion of the results of training and testing for each model.Finally, we give our conclusive remarks and discuss our future outlook in Section V.

II. GENERAL SETTING AND METHODS
Let us consider the general setting of an arbitrary system interacting with an environment which is comprised of infinitely many bosonic modes, as shown in Figure 1.This scenario reproduces the ubiquitous Caldeira-Leggett model, which describes the motion of a quantum particle undergoing a Brownian motion [37,38].The full (time-independent) Hamiltonian reads as where ĤS and ĤB are the Hamiltonian operators of the system and the environment, respectively.The system-environment interaction term ĤI is expressed in the form where X is a generic system operator, while B is an operator of the bath.We take the latter as FIG. 1. Sketch of a generic open quantum system S interacting with a bosonic environment composed of infinitely many harmonic oscillators labelled by the integer n.Each oscillator has frequency ω n and is coupled to the system at a rate g n .
where the coefficient g k accounts for the interaction strength between the k-th mode of frequency ω k , while b † k and bk are the creation and annihilation operators associated with it.The coupling coefficients enter in the formal definition of the SD, i.e.J(ω) = ∑ k |g k | 2 δ (ω − ω k ), the latter encoding all the information about the system-environment interaction.Since we are interested in the typical irreversible open system scenario, we will assume that the distribution of modes forms a continuum, so that the system dynamics does not display recurrences [1,39,40].In this limit, the SD appears in the expression for the correlation function of a bosonic bath, defined as α β (t) ≡ ⟨ B(t) B(0)⟩ B , where B(t) is the bath operator in the interaction picture with respect to the free Hamiltonian Ĥ0 = ĤS + ĤB .In Appendix A, we show that if the environment is in a thermal Gibbs state, the correlation function can also be expressed as where with β = 1/T.Note that hereafter we will work in units such that h = 1 and k B = 1.The two functions ν(t) and µ(t) are also referred to as noise and dissipation kernels, respectively: the latter is independent of the temperature of the environment.The effective dynamics of the system, governed by a master equation, crucially depends on the correlation function α β (t), which represents the fingerprint of the environment.The function α β (t) is ultimately determined by the shape of the SD, which essentially contains all of the information about the environment needed to solve the dynamics of the system, and, thus, obtain the time evolution of any of its observables.
The expectation value of a generic system observable at time t is indeed given by where Ĥ is the system-environment Hamiltonian of Equation (1), while the global initial state is factorised as ρ0 SB = ρ0 ⊗ ρB , with ρ0 and ρB being the initial system and environmental states, respectively.We assume the environment to be given by a large bosonic thermal reservoir, i.e. ρB = e −β ĤB /Z B , where Z B ≡ tr B (e −β ĤB ) is the reservoir partition function.Under these hypotheses, it can be shown that the only environmental quantity entering in the expression of ⟨ Ô(t)⟩ is the SD J(ω).
Here we focus on special classes of SDs, which can be expressed as [38,41] where s > 0 is known as Ohmicity parameter, and η > 0 is the coupling strength between the system and the environment.The constant ω c is the cut-off frequency, while f (ω, ω c ) is the cut-off function, which ensures that J(ω) → 0 in the limit of large frequencies, i.e. ω → ∞.In what follows we consider the exponential cutoff, namely f (ω, ω c ) = e −ω/ω c .Depending on the value of s, we model different system-environment couplings, corresponding to various physical scenarios [38,41,42].SDs with s = 1 (i.e.linear in the frequency ω) are called Ohmic, while those for which s > 1 (s < 1) are known as super-Ohmic (sub-Ohmic).
In this work, we will use the tools provided by ML to classify the SD characterising the system-environment interaction.Specifically, we use an artificial NN that comprises many artificial neurons -essentially a computational unit -arranged in a series of layers, as in Fig. 2 [8,43].Given a set of inputs {x i }, each neuron computes the weighted sum with weights w i and a bias term b.A non-linear activation function f is then applied to the result z, yielding the output of the neuron y = f (z).The activation function used in this work is the standard sigmoid function, i.e. f (z) = 1/(1 + e −z ).The aforementioned weights and biases are free parameters to be optimised.In addition, the outputs from each layer are input to the next layer.In this way, the input data propagates through the network, so that outputs from later layers become increasingly complex functions of the data.The first layer receives the input data and passes it to the subsequent layer, without performing any computation, while the final layer computes the final output of the network.Accordingly, we refer to these layers as the input layer and the output layer, respectively.The layers between the input and output layers are known as hidden layers.Note that we opt for the FIG. 2. Schematic of the setup: given the time evolution of an observable, denoted as ⟨ Ôi (t)⟩, we compute the corresponding Fourier coefficients {X k }.Then, with the aim of determining which class of spectral density is most compatible with the observed dynamics, we input the real and imaginary parts of the coefficients to a Neural Network.The outputs of the three artifical neurons in the output layer are the probabilities that the input belongs to each of the classes.
aforementioned architecture due to its success in accomplishing the intended objective, without necessitating the use of a more complex architecture, such as a recurrent neural network [43].
For the purpose of classifying the SD using ML techniques, let us suppose we have the time evolution of a family of system observables ⟨ Ôj (t)⟩ (for a set of indices j) as input.These time signals can be gathered as outcomes of an experiment carried out in a laboratory, or, as in our case, they can be generated by solving the system dynamics (either exactly or approximately).
As each signal is a time series, we Fourier-decompose the signal.To this end, we compute the Fourier coefficients as where N is the total number of time-steps and ⟨ Ôj (t n )⟩ denotes the n-th sampled point.We can reconstruct the original signal by inverting Equation ( 9), where, X j k ∈ C, and the sum runs over all the sampled points in the time series, and k ∈ [0, N − 1].We split each coefficient X j k into its real and imaginary parts and train the network using the Fourier coefficients rather than the time series ⟨ Ôj (t)⟩ directly.Using the resulting dataset, we address the ternary classification problem of distinguishing between three different families (i.e.classes) of SDs according to their value of the Ohmicity parameter [cf.Equation (7)].In our case, the output layer of the NN has three artificial neurons which compute weighted sums z j and apply the softmax activation function [44], defined as where N c is the number of classes (in our case N c = 3).
It follows that the outputs of the network are the predicted probabilities that the input belongs to a particular class.As is common for classification problems, we use the categorical cross-entropy as a loss function.Given a dataset containing N t trajectories, let y ij represent the true probability that the i-th trajectory belongs to the j-th class and let ŷij denote the predicted probability of the same.Then the categorical crossentropy is defined as [45] L The task of training the network reduces to an optimisation problem where the aim is to find the set of parameters that minimises the loss function.A schematic view of the setup is shown in Figure 2.

III. GENERATION OF THE DATASET: SPIN-BOSON MODELS
Given the general framework outlined in Section II, we now identify the systems to scrutinise.We focus on the dynamics of a Spin-Boson (SB) model consisting of a two-level system interacting with a bosonic bath.Therefore, in Equation ( 1), we choose with σz being the z Pauli operator.The choice of the systemenvironment coupling Hamiltonian ĤI leads to different physical scenarios, in general requiring different techniques to solve the dynamics.In Section III A, we introduce an exactly solvable SB model, where the full system-environment unitary dynamics can be accessed, and the system dynamics is obtained by tracing out the environmental degrees of freedom.
In Section III B we then choose a different form of coupling, which requires further approximations to effectively trace out the environment.
In both cases, the reduced dynamics of the system is governed by a master equation of the form where L t is the Liouvillian (super)-operator accounting for both the unitary and non-unitary dynamics, and ρ is the reduced density operator.Given the initial state of the system ρ(0) = ρ0 , Equation ( 13) can be formally solved yielding ρ = ρ(t) = e L t t ρ0 at any time t.It is thus immediate to obtain the expectation value of a generic observable Ô, i.e. ⟨ Ô(t)⟩ ≡ tr S Ô ρ(t) .Since we are considering a SB model, a natural choice of the observable would be given by the Pauli operators, i.e. ( Ô1 , Ô2 , Ô3 ) = ( σx , σy , σz ) or a combination thereof.

A. Pure Dephasing
Let us consider the case in which X = σz in Equation (2).Owing to this choice, the interaction Hamiltonian commutes with the system Hamiltonian and the populations of the reduced density matrix are left invariant by the dynamics.In this case, we can access the full unitary evolution, and exactly trace out the environmental degrees of freedom, thus yielding an analytical solution for the reduced dynamics [1,46,47].In Appendix B, we explicitly solve the dynamics under the standard assumption of an initially uncorrelated system-environment state, where we assume the environment to be in a thermal Gibbs state.Working in the interaction picture, the evolved reduced density matrix at time t can be written in the σz basis {|0⟩ , |1⟩} as with the decoherence function From Equation ( 14) we can easily deduce that the interaction with the environment induces pure dephasing in the σz basis, with no dissipation (as deduced by comparing Equation (15) with Equation ( 5)).Moreover, it is worth emphasising that there might be choices of the SD leading to negative values of Γ(t).In such intervals of time, the system re-coheres as a result of (non-Markovian) memory effects of the dynamics [48].

B. Amplitude Damping
Alternatively, we can turn to a set-up beyond pure dephasing, just by choosing X = − σx /2 in the interaction Hamiltonian of Equation (2).Unlike the case discussed in Section III A, the Hamiltonian does not exhibit any explicit symmetry, therefore we are not able to provide an exact solution for the dynamics.We can nevertheless effectively solve the dynamics, provided that we rely on further assumptions.
Starting from an initial uncorrelated state, we can derive a master equation in the weak coupling regime, where we are still able to obtain non-Markovian effects.As outlined in the Appendix C, we can derive a second-order approximated master equation that is local in time [1,49,50] and can be written in terms of dynamical equations for the components of the Bloch vector ⟨⃗ σ(t)⟩ = ⟨ σx (t)⟩, ⟨ σy (t)⟩, ⟨ σz (t)⟩ T , with ⟨ σi ⟩ = tr S ( σi ρ).These equations can be cast in the form with ⃗ b(t) = (0, 0, b z (t)) T and b z (t) = t 0 ds µ(s) sin (ω 0 s).We have also introduced the matrix with the time-dependent entries The noise and dissipation kernels ν(t) and µ(t) are defined in Equation ( 5).

IV. ANALYSIS AND RESULTS
In this Section, we present the results of our numerical experiments.We consider a two-level system, whose open dynamics depends on the choice of the coupling between the system and the bosonic environment, as discussed in Section III.For a given initial state, we generate a set of curves reproducing the time evolution of the expectation value of a system observable, i.e. ⟨ Ô(t)⟩.Each signal is sampled at N = 400 successive and equally spaced points over a certain time interval [t min , t max ], to ensure a sufficient resolution of the dynamics.As discussed in Section II, instead of directly using the time series, we input the 2N real and imaginary parts of the Fourier coefficients X k .For this reason, we build the input layer with 2N input neurons.The NN for each model consists of the input layer followed by 2 hidden layers where the first hidden layer comprises 250 neurons, and the second comprises 80 neurons.The output layer, instead, is made of 3 neurons, which matches the number of classes (Ohmic, sub-Ohmic, super-Ohmic).The choice of network architecture was iteratively refined, adding layers and neurons until the network achieved a high accuracy without overfitting.The code employed for data generation, the datasets, and the code utilised for subsequent analysis are available in the following GitHub respository [51].
In order to evaluate the performance of the NN, we use the classification accuracy which is defined as the percentage of trajectories that are classified correctly.We generate a training dataset containing N Train trajectories which is used to train the model, a validation dataset containing N Valid trajectories which is used to assess the performance during training, and a test dataset containing N Test trajectories which is used to assess the final accuracy of the network.We optimise the NNs using whole batch gradient descent and the Adam optimiser with a learning rate of 1 × 10 −4 .

A. Pure Dephasing
We consider the evolution of the pure dephasing model introduced in Section III A. We solve the system dynamics choosing the initial state ρ0 = |+⟩⟨+|, with |+⟩ = (|0⟩ + |1⟩)/ √ 2, while -without loss of generality -we keep the thermal bath at zero temperature, i.e. β → ∞.With this choice, we obtain the expectation value ⟨ σx (t)⟩ within the time interval t ∈ [0, 10].It is worth noting that alternative choices for the initial state and the observable can be made, however, it should be recognised that, within the context of the pure dephasing model, the time evolution of ⟨ σz (t)⟩ will always be trivial.Moreover, should the initial state possess coherences equal to zero, the time evolution of the density matrix, and by extension any observables, will be trivial as well.As input to the NN we use the real and imaginary components of the Fourier coefficients obtained using Equation (9).We generate a training, validation, and test set of size N Train /2 = N Valid = N Test = 2400.The number of trajectories in the Ohmic, sub-Ohmic and super-Ohmic classes are equal in all datasets.
We assess the performance of the NN in two scenarios: the first being where ω c and η are fixed, and the second being where they vary.In the first scenario, we consider the case where η = 0.25, ω c = 0.5, while the only parameter that varies is s.At first, we want to test how the model performs when the classes are easy to differentiate.To that end, we consider trajectories with s ∈ (0, 0.5] if the SD is sub-Ohmic and s ∈ [1.5, 4] if it is super-Ohmic.If the SD is Ohmic then s = 1.In Figure 3a a subset of trajectories from the resulting training set are plotted where the green curves correspond to sub-Ohmic dissipation, while the yellow and blue curves are trajectories characterised by Ohmic and super-Ohmic dissipation, respectively.Given the substantial separation in the permissible values for s across the different classes, we expect that the performance of the NN will be high.In Figure 3a, it is evident that the classes are easily distinguishable due to distinct characteristics exhibited by each of them.Specifically, the super-Ohmic curves exhibit the steepest initial descent.In addition, while the sub-Ohmic and Ohmic curves show a comparable initial rate of descent, their oscillatory patterns differ.Oscillations are exhibited by all three classes, but the amplitude of oscillation for the sub-Ohmic curves appears to reduce more rapidly than that of the Ohmic or super-Ohmic curves as time grows.Confirming our expectation, the accuracy of the network evaluated on both the training and the test set reaches 100% after ≈ 80 training iterations.
We then make the task a bit more difficult for the network by allowing s ∈ (0, 1) for the sub-Ohmic dissipation and s ∈ (1,4] for the super-Ohmic dissipation.We anticipate that the task will be more difficult in this scenario due to the reduced separation in the allowed values for s across the classes.This is reflected in the resulting training trajectories, a subset of which are plotted in Figure 3b, where we observe that the super-Ohmic curves maintain a more pronounced initial descent relative to the Ohmic and sub-Ohmic curves.However, there are instances where the oscillation amplitudes between the classes are similar.The final training accuracy of the network in this case reaches 99.31% after around 5000 training iterations, while the final test accuracy reaches 99.50%.
Next, to challenge the NN further, we consider the second scenario where we let η and ω c vary: the idea is to assess the performance as we increase the upper bounds of the intervals from which they are sampled.We let s ∈ (0, 1) for the sub- Ohmic spectral densities and s ∈ (1, 4] for the super-Ohmic spectral densities.Initially, we set both η and ω c equal to 0.25, then we let them vary into the interval [0.25, 0.45].We increase the upper bound in increments of 0.2 until the interval becomes [0.25, 2.05].Figure 3c shows some example trajectories from the training set for η = ω c = 0.25, while Figure 3d shows some for η, ω c ∈ [0.25, 2.05].From Figure 3c, we can observe that the scenario closely resembles that depicted in Figure 3b.In particular, the initial decay rates of the super-Ohmic curves are larger than those corresponding to the Ohmic or sub-Ohmic curves.However, the oscillation amplitudes across the three classes are comparable in some cases.In Figure 3d, it is evident that there is considerable overlap both in the initial decay rates and the amplitudes of oscillation between the three classes, indicating that the differences between the behaviours of the classes are less pronounced and that the classification task will be significantly more difficult.The classification results after 2 × 10 4 training iterations are shown in figure 4 where the blue curve is the accuracy evaluated on the training set and the green curve is the accuracy evaluated on the test set.As expected, we can see that the accuracy decreases as we consider larger intervals η and ω c .This is indeed the case, as taking larger intervals essentially increases the amount of noise in the dataset.It is worth noting that the accuracy may improve with larger datasets or more training iterations.

Measurement Sampling Noise
The accurate measurement and classification of experimental expectation values are inherently impacted by various sources of noise.One of the predominant ones is sampling noise, which arises due to the finite number of measurement samples that one can realistically acquire experimentally.In this Section, we analyse the impact that sampling noise has on the NN, with the aim of providing a deeper insight into the performance of the model under realistic conditions.In our approach, we artificially introduce noise into the trajectories by adding a random value to each time point.Such value is drawn from a normal distribution with zero mean and a given standard deviation, σ.
We assess how the performance of the NN varies with σ in the same two scenarios as before: firstly, we hold both η = 0.25 and ω c = 0.5 constant.For the dataset with clear separation in the allowed values of s among classes [s ∈ (0, 0.5) for a sub-Ohmic SD; s ∈ [1.5, 4] for a super-Ohmic one], the results after 10 3 training iterations are shown in Figure 5a.The training accuracy remains consistently close to 100% for all of the considered σ values.This suggests that the NN can learn from the training data well, regardless of the magnitude of the noise that is introduced.However, the test accuracy decreases from 99.58% for σ = 0.1 to 61.33% for σ = 1, thus indicating that the capacity of the model to generalise to unseen data diminishes as the noise intensity increases.
For the dataset with η = 0.25, ω c = 0.5 and s ∈ (0, 1)for a sub-Ohmic SD -and s ∈ (1, 4] -for a super-Ohmic one -the results after 10 3 training iterations are shown in Figure 5b.Mirroring the trends observed for the preceding dataset, the training accuracy remains notably high and close to 100% for the range of σ examined.On the other hand, the test accuracy starts at 96.17% for σ = 0.01 and decreases to 84.21% for σ = 0.19.Therefore, relative to the previously examined case, the NNs performance with this dataset exhibits a heightened susceptibility to noise.
We now redirect our attention to the case where η and ω c vary.To begin with, we analyse the dataset corresponding to the shortest interval length in Figure 4 [where s ∈ (0, 1) if the SD is sub-Ohmic and s ∈ (1,4] if it is super-Ohmic] with η = ω c = 0.25.The results of this analysis after 10 4 training iterations, shown in Figure 5c, closely resemble those in Figure 5b, albeit with a noticeable decrease in performance.
The training accuracy remains at 100% while the test accuracy starts at 95.08% for σ = 0.01 and decreases to 83.63% for σ = 0.1.Lastly, we turn our attention to the dataset corresponding to the longest interval length in Figure 4.The results after 2 × 10 4 training iterations are shown in Figure 5d.While the conditions for s are consistent with those defined for the shortest interval length, η and ω c vary into the interval [0.25, 2.05].The training accuracy for this dataset starts at 94.33% for σ = 0.001 and exhibits minor fluctuations across the considered σ range but remains quite close to 100%.Meanwhile, the test accuracy begins at 86.67% for σ = 0.001 and drops to 64.63% at σ = 0.01.This dataset exhibits the greatest sensitivity to noise, leading to the lowest performance metrics.Moreover, the results emphasise the fact that despite the NNs ability to learn from training data, increasing noise levels hamper its generalisation to previously unseen data.

B. Amplitude Damping
We shall now analyse the amplitude damping model detailed in section III B. We choose the initial state ρ0 = |+⟩⟨+|, the bare frequency of the oscillator ω 0 = 1, while we keep the environmental inverse temperature β = 0.1.We subsequently solve for the dynamics of the system and determine the expectation value ⟨ σx (t)⟩ within the time interval t ∈ [0, 10].As for the previous model, we use the real and imaginary components of the Fourier coefficients obtained through Equation ( 9) as input to the NN.We let η ∈ (0, 0.2], if the SD is super-Ohmic [sub-Ohmic] and s = 1 if the SD is Ohmic.We generate a training, validation, and test set such that N Train = 1500, and N Valid = N Test = 300.In all datasets, the Ohmic, sub-Ohmic and super-Ohmic classes have an equal number of trajectories.Figure 6 shows some of the curves from the resulting training set where, as before, the green curves represent sub-Ohmic dissipation while the yellow and blue curves correspond to trajectories characterised by Ohmic and super-Ohmic dissipation, respectively.The final training accuracy of the network in this case reaches 97.93% after 10 4 training iterations while the test accuracy is significantly lower and reaches 93.00%.
Firstly, we would like to assess the number of time-points  required to attain a high level of accuracy.It should be noted that it is generally advisable to avoid having highly correlated features in a dataset, whose linear dependence implies that the value of one can be derived from that of the other [44].Hence, mutually correlated features convey redundant information to the model since each feature provides little or no additional information beyond what the other features already capture.Including all of the features will not improve the ability of the model to discriminate, but will increase the complexity of the algorithm, thus increasing the computational cost.
To this end, we introduce the Pearson correlation coefficient, which is a statistical measure of linear correlation between two variables [52,53].It ranges from a value of −1, indicating perfect anti-correlations, to 1, when the variables are perfectly correlated.A value of 0 indicates that there is no linear relationship between the two variables.Let ⟨ σx ⟩ i n denote the n-th time-point of the i-th trajectory in a given dataset.
Then the Pearson correlation coefficient between the n-th and m-th time-points, denoted as C nm , is given by the formula where ∆⟨ σx ⟩ i j ≡ ⟨ σx ⟩ i j − ⟨ σx ⟩ j , with ⟨σ x ⟩ n the average value of the n-th time step, and N the total number of trajectories in the dataset.We calculate the Pearson correlation coefficient between each time step in our training set and generate a correlation matrix, C, whose entries quantify the correlation between time-points.The resulting correlation heatmap, a graphical representation of the correlation matrix, is shown in Figure 7. From the heatmap, it can be observed that there is a high degree of correlation between adjacent and nearadjacent time points.
To address this issue, a common approach is to perform fea-  ture selection, identifying a subset of features that are the most informative and non-redudant.Retaining only one of two correlated features may expedite the learning process, without compromising the accuracy of the model.While we ideally want to avoid correlation between the features in a dataset, it is preferable to retain features which are correlated with the dependent variable [54].Correlations make it possible to use the value of one variable to predict the value of another, mean-ing that features which are correlated with the output are predictive of the output.
Note that the Pearson correlation coefficient is only suitable for measuring the correlation between two continuous variables.As, in our case, the dependent variable consists of discrete labels, we can instead determine the degree of correlation between a feature and the dependent variable by examining whether the variance of the feature can be explained by the dependent variable.To do this, we group the feature into classes based on the discrete labels, compute the variance of each class, and calculate the difference between the mean of the resulting variances and the overall variance of the feature.If the mean of the class variances is significantly lower than the overall variance, this suggests that the feature and the dependent variable are correlated.
A possible strategy for performing feature selection and identifying the most salient features for learning is thus to sort the entries in the correlation matrix into descending order.Then, starting from the highest correlation, one can remove the contributing feature that exhibits the lowest correlation with the dependent variable.Using the above strategy, we can obtain a ranking of the features based on their importance and determine the order in which to remove features if we are to maintain a high classification accuracy.In this scenario, given that the time intervals between points may not be uniformly distributed, it becomes necessary to compute the Fourier coefficients using the non-uniform discrete Fourier transform [55] where p n are the non-uniform time points suitably scaled to fall between 0 and 1, while ⟨ σx (t n )⟩ denotes the n-th sampled point in a given trajectory.As for the discrete Fourier transform, k is the frequency which is an integer number between 0 and N − 1.Note that if p n = n/N, then this equation reduces to the discrete Fourier transform shown in Equation (9).We compare the results obtained using the proposed feature selection algorithm with the results obtained by selecting time points uniformly, i.e. choosing time points that are evenly spaced throughout the datasets.For example, we might select the first in every 5 points or the first in every 100.Figure 8 shows a plot of the test accuracy against the number of selected time points for the two different selection methods.The blue curve show the results obtained using uniform sampling, while the green curve shows the results obtained using the proposed feature selection algorithm.Firstly, the plot shows that the test accuracy remains consistently high until the number of time points is reduced to approximately 20.Beyond this point, a sharp decline in the accuracy is observed, as shown in the inset of Figure 8. Analysis of the plot indicates that the performance of the two time point selection methods is comparable across different ranges of selected time points.Specifically, when we take a number of time points between ≈ 250 and ≈ 400, there is little difference between the accuracy obtained using uniform sampling and the feature selection algorithm.However, in the range of approximately 40 to 250 time points, the feature selection algorithm shows slightly better results compared to uniform sampling.Lastly, taking less than ≈ 40 points, the test accuracy fluctuates, but we can conclude that the performance of both methods is similar.
For the sake of completeness, we also explored various other methods for feature selection.For instance, after grouping each feature according to the discrete labels, we used one-way-analysis-of-variance (ANOVA) to determine if there were statistically significant differences between the three groups [56].We also considered the ratio of the mean of the variances of the groups and the overall variance, as opposed to the difference.Lastly, we attempted to assess the importance of each feature using principal component analysis.Specifically, we examined the degree to which each feature contributed to the principal components, as a large contribution to the principal components suggests that a feature is important in explaining the overall variability of the data [57].We observed that none of the aforementioned methods outperformed the correlation based feature selection algorithm employed in Figure 8.

Measurement Sampling Noise
In this section, similar to the analysis conducted for the pure dephasing model, we assess how the NN performs when subject to realistic conditions.Due to the presence of noise sources such as sampling noise, experimentally obtained expectation values are seldom completely accurate, as it is only feasible to collect a finite number of measurement samples experimentally.Given this, we aim to investigate how the performance of the NN is affected by these realistic challenges.To this end, we simulate the impact of sampling noise by incorporating artificial noise into the trajectories.We use the entire trajectory, consisting of 400 time points, add a random value drawn from a normal distribution with zero mean and a standard deviation σ to each point, and then assess the performance of the NN as σ increases.
The results of our analysis, after 10 shown in Figure 9, where the blue dashed line represents the accuracy evaluated on the training set, and the green solid line corresponds to the accuracy evaluated on the test set.We observe that the training accuracy remains consistently high, at around 100%.In contrast, the test accuracy starts off at 90.95% when σ = 0.001 and experiences a decline, dropping to 69.05% as the value of σ increases to 0.01.Consequently, our observations are consistent with those obtained for the pure dephasing model.The model is able to effectively learn from the training data, and maintain a high training accuracy, regardless of the noise levels.However, its capacity to generalise to new, unseen data deteriorates.

V. CONCLUSIONS
We have shown that, in a standard open system scenario, a NN can perform SD-classification with high accuracy.First, we have considered an exactly solvable, pure-dephasing model, and assessed the performance of the NN as a classifier, highlighting the limiting role played by the fluctuations of the SD parameters.We have then considered a SB model that, under a number of reasonable approximations, results in a master equation accounting for energy losses and decoherence.We observed that, despite the approximations being invoked, the NN can perform the SD-classification task with high accuracy.Furthermore, we thoroughly discussed the interplay between high accuracy in the classification task and the number of sampled points for the system observable.Lastly, we investigated how the NN's performance for both models withstands the challenge of measurement sampling noise, thus providing insights into its robustness under realistic conditions.
The methodology introduced in this paper, as well as the case studies analysed therein, highlight the capability of ML techniques to characterise environments with arbitrary SDs, thus embodying a reliable tool for environment characterization and the provision of useful information for control and process diagnosis.This paves the way to, and leaves great hopes for, the full characterization of an unknown SD through, for instance, regression of the parameters rather than classification.We also stress that the method put forward here does not rely critically on how the information on the dynamics is specifically acquired.In this sense, we expect the method to maintain effectiveness even when considering classes of SDs leading to long-lived correlations that, in turn, would hinder the direct derivation of master equations in Lindblad-like form.In such cases, one should rely on more sophisticated simulation techniques -such as Hierarchical Equation Of Motion (HEOM) [58,59], Time-Evolving Matrix Product Operators (TEMPO) [60], or Time-Evolving Density with Orthogonal Polynomials Algorithm (TEDOPA) [61][62][63], just to name a few.The combinations of one of these methods with the ML will help achieving the successful characterization and control of the environment affecting a given open system.which holds since the second order commutators vanish.The unitary evolution operator becomes where we have noticed that the commutator in the first exponent is just a complex number, so we may omit the time ordering operator.Recombining the exponentials of the operators we find with ρ10 (t) = ρ * 01 (t).Resorting to the identity ⟨e Â⟩ = e ⟨ Â⟩ 2 /2 , where the operator Â is a linear combination of creation and annihilation operators [66], we find that Finally, substituting the expressions for α k and the mean occupation number of the k-th mode of the environment, N k , we obtain ⟨e 2 Â(t) ⟩ = e −Γ(t) , (B13) where we have assumed the the bath modes form a continuum.The function Γ(t) is the decoherence function given in Equation ( 15) of the main text.
Appendix C: SB model (amplitude damping) Here, we derive the equations governing the dynamics of the system described in Section III B. The second-order generator of the TCL master equation leads to the following equation for the reduced density matrix in the interaction picture ρ [1, 2]: where the time-dependent coefficients are defined in the main text [Cf.Equations ( 18) and (19)].This set of coupled differential equations can be recast in the matrix form of Equations ( 16) and (17).

FIG. 4 .
FIG. 4. Pure dephasing model: the classification accuracy against the length of the interval from which η and ω c are sampled.

FIG. 5 .
FIG. 5. Pure dephasing model: The training and test accuracy of the NN in relation to the standard deviation σ of the artificial noise.Panels (a) and (b) show results for datasets with η = 0.25 and ω c = 0.5.In panel (a) we have s ∈ (0, 0.5] for sub-Ohmic SDs and s ∈ [1.5, 4] for super-Ohmic SDs.The datasets in panels (b), (c) and (d) are characterised by s ∈ (0, 1) for sub-Ohmic SDs and s ∈ (1, 4] for super-Ohmic SDs.In panel (c) the parameters are set as η = ω c = 0.25, whereas in panel (d) we have η, ω c ∈ [0.25, 2.05].The blue dashed lines represent the training accuracy, while the green solid lines show the test accuracy.

FIG. 7 .
FIG. 7. Amplitude damping model: The correlation heatmap for the entries C ij of the correlation matrix C, where C ij is the Pearson Correlation coefficient between the i-th and j-th time-point in the training set [cf. Equation (20)].

FIG. 8 .
FIG.8.Amplitude damping model: the test accuracy of the NN against the number of time points when the time points are selected uniformly or using the proposed feature selection method described in the main text.
Amplitude damping model: The training and test accuracy of the NN against the standard deviation, σ, of the artificial noise.
2e ∑ n H n dt [ ĤI (t 2 ), ĤI (t 1 )] e −i t 0 ĤI (τ)dτ ,(B6)where the first exponent -as a consequence of Equation (B1) -only applies a global phase to the qubit.As a result, the dynamics of the system are solely governed by the operatore −i t 0 ĤI (τ)dτ = e σz ⊗∑ k (α k (t) b † ≡ e σz ⊗ Â(t) , (B7) with α k (t) = g k 1 − e iω k t /ω k .It is convenient to rewrite this operator in the form:e σz ⊗ Â(t) = I ⊗The matrix elements of the reduced density matrix are determined by explicitly tracing out the environmental degrees of freedom, i.e.