Explainable Representation Learning of Small Quantum States

Unsupervised machine learning models build an internal representation of their training data without the need for explicit human guidance or feature engineering. This learned representation provides insights into which features of the data are relevant for the task at hand. In the context of quantum physics, training models to describe quantum states without human intervention offers a promising approach to gaining insight into how machines represent complex quantum states. The ability to interpret the learned representation may offer a new perspective on non-trivial features of quantum systems and their efficient representation. We train a generative model on two-qubit density matrices generated by a parameterized quantum circuit. In a series of computational experiments, we investigate the learned representation of the model and its internal understanding of the data. We observe that the model learns an interpretable representation which relates the quantum states to their underlying entanglement characteristics. In particular, our results demonstrate that the latent representation of the model is directly correlated with the entanglement measure concurrence. The insights from this study represent proof of concept towards interpretable machine learning of quantum states. Our approach offers insight into how machines learn to represent small-scale quantum systems autonomously.


I. INTRODUCTION
Over the past decades, (un)supervised representation learning has revolutionized machine learning research [1].While manual feature engineering with specific domain expertise used to be required [2], powerful deep neural networks have proven to be successful in automatically extracting useful representations of data.This advent has led to better performance on a wide range of tasks, such as language modeling and computer vision [3][4][5].In recent years, the application of representation learning has found its way into the physical sciences.It has been applied to studying phases of matter [6][7][8], detection of outliers in particle collision experiments [9,10], learning spectral functions [11], and compression of quantum states [12].The last category, in particular, raises the interesting question of which properties of quantum systems are deemed important to capture by the machine learning model when compressing them.By examining and interpreting salient features of the learned representation built without human intervention, we can uncover the models internal understanding of a quantum physical system.Adding the constraint of learning representations which are meaningful [13] and explainable [14] is an important prerequisite for the development of an artificial intelligence system for physics research.The incorporation of this constraint serves as a vital prerequisite for achieving the ultimate goal of building artificial intelligence systems that can facilitate new scientific discoveries [13,15].
In this work, we focus on studying two-qubit quan- * E-mail:f.frohnert@liacs.leidenuniv.nltum circuits in the presence of information scrambling and depolarization, and investigate if a generative model [16] is able to learn representations highlighting entanglement features.We apply local information scrambling to the states to inhibit the model's ability to exploit local features for the purpose of identifying the generative parameter, while simultaneously preserving the non-local entanglement properties.We therefore follow the recent development of training generative models to discover interpretable physical representations [11,[17][18][19][20][21].
We encode the full density matrices generated by twoqubit circuits using a variational autoencoder (VAE), which has been established as a suitable model for learning meaningful internal representations [22].This is schematically depicted in Fig. 2. A VAE performs dimensionality reduction [23], compressing an input into a smaller dimension called the latent space, and then attempts to reconstruct the input from that latent representation.Originally proposed as a generative model for image data, this architecture has proven to be capable of extracting ground-truth generative factors (underlying feature of the data which captures a distinct attribute or characteristic) from highly complex feature spaces and representing them in a human interpretable manner [24,25].In particular, the so-called β-VAE introduces a regularization hyperparameter which encourages independent latent variables, leading to more interpretable representations [25].Thus, we conduct a hyperparameter search on β, and our results reveal that the smallest latent representation the model can learn is interpretable and captures entanglement properties.Specifically, our investigation shows that the latent space encodes a quantity which effectively follows known entanglement measures such as concurrence and negativity, which are identical for the two-qubit systems we focus on.Moreover, we show that the model generalizes to any other two-qubit state, as well as to two-qubit subsets of three-qubit states.
The remainder of this paper is structured as follows.In section II, we present a description of the twoqubit system under consideration, including information about the corresponding data sets that were generated.Additionally, we give a brief introduction to variational autoencoders.In section III, we present the results of experiments on density matrices with and without information scrambling.We furthermore test the ability of the model to generalize to different quantum states.We provide a thorough analysis of the learned representations and explore their relationship to the underlying properties of quantum states.Finally, in section IV, we conclude the results and provide an overview on future work that can be undertaken to extend and improve upon the results presented in this paper.

A. Data
We study quantum states generated by the two-qubit parameterized quantum circuit in Fig. 1a [26].This circuit consists of a Hadamard gate and a Controlled-R Y (α) rotation with input angle α, which produces the density matrix ρ(α) (see B 1 for the full description).The random unitaries U A and U B will be discussed shortly.For such a two-qubit system, the amount of entanglement can be quantified through the concur- rence [27]: In this, λ i are the eigenvalues (in descending order) of the Hermitian matrix: At α = 0 the state is fully separable (and hence C[ρ(0)] = 0), while for any non-zero α the state is entangled and has non-zero concurrence.This is visualized in Fig. 9 in the appendices.The motivation for choosing to study the states ρ(α) is that a single parameter α uniquely determines the entanglement properties of each state, and drawing α ∈ [0, π] explores the entire range of entanglement measure values.This simple structure-property relation makes it easy to interpret learned representations.

B. Variational Autoencoders
Variational autoencoders aim to find an effective compressed representation of data by learning the identity map through an informational bottleneck [22].As visualized in Fig. 2, VAEs accomplish this task by using an encoder and decoder network.The encoder q ϕ (z|x) is a neural network with weights and biases ϕ that maps high-dimensional data to the so-called latent space: From a given data point x it generates a normal distribution N over the possible values of the latent variable z ∼ q ϕ (z|x), from which x could have been generated.In this, z = [z 0 , • • • , z N ] is a point in an N -dimensional latent space, where N is chosen manually beforehand.
Though with an arbitrarily complex encoder a dataset can in principle be encoded in just one latent variable 1 , in practice a well-trained latent representation captures ground-truth generative factors in the input data [28].
In our case, the encoder is a fully connected feedforward neural network consisting of multiple hidden layers with nonlinear activation functions The mean µ ϕ and variance σ 2 ϕ are the learned parameters defining the distribution in Eq. 3.For the visualization of the learned latent variables throughout the remainder of this manuscript, we will exclude their variance and instead concentrate solely on the mean values of their latent distributions, denoted as z = µ.
Similarly, the decoder p θ (x|z) is a neural network with weights and biases θ that attempts to reconstruct the input x from given latent variables z and follows the reversed structure of the encoder as shown in Fig. 2.
During training the parameters ϕ and θ are tuned with the goal of minimizing the following loss function: This loss function is composed of two terms: a reconstruction loss L R and a regularization loss L KL .The reconstruction loss measures the difference between the original input and the output of the decoder.In our case, the metric for this difference will be the elementwise mean squared error of the input density matrices.This choice of metric influences the results, because with this metric the off-diagonal elements of the density matrix have a larger relative contribution.The regularization loss, on the other hand, is given by the Kullback-Leibler divergence of the latent representation and a standard normal distribution.This encourages the latent representation to be smooth and continuous, and moreover aims at having latent variables represent independent generative factors [22,25].For a single data point this loss can be expressed as where i runs over the N latent variables.The hyperparameter β in Eq. 5 controls the impact of regularization on the overall optimization objective, regulating the trade-off between the effective encoding capacity of the latent space and the statistical independence of individual latent variables in the learned representation [25].

III. RESULTS AND DISCUSSION
In the following, we perform a series of experiments to evaluate VAE models with varying training data, latent dimensions, and β regularization strengths.More details about training and model implementation can be found in section A. We obtain a number of results from the experiments, which we discuss in the following.This section initially focuses on encoding pure state density matrices without regularization, demonstrating the successful extraction of their generative parameter α using the VAE.Next, an information scrambling technique is introduced to prevent the direct extraction of α, and the optimization of the regularization parameter is shown to produce an interpretable representation closely following concurrence.Finally, the section explores the generalization abilities of the VAE by investigating its performance on mixed states and three-qubit W states.

A. Encoding Quantum States ρ(α)
In this investigation, we study how a VAE learns to encode pure state density matrices ρ(α) and refer to this specific model as ρ-VAE.Though the data has one generative factor, we wish to explicitly confirm that one latent variable indeed suffices for reconstruction.To empirically confirm this, we train VAEs with different latent space dimensions (N = [1, . . ., 8]) on quantum states ρ(α), with α = [0, π] in 10 3 steps and record the final loss L. For each N , we run 9 experiments and average the results.Throughout the training process, we maintain a regularization strength of β = 0.The inset in Fig. 3 shows these results, plotting the reconstruction quality of the trained model at different latent space sizes.We find that indeed a one-dimensional (scalar) latent space is sufficient for compressing quantum states ρ(α), since increasing the number of latent variables does not lead to a significant decrease of the final loss.
The next step of the analysis is to examine and interpret the learned representation of the one-dimensional model to uncover what property of quantum states it extracts to structure its latent space.For this, we use the trained ρ-VAE (N = 1) to encode a test set of quantum states at different α (10 samples at 21 unique angles) and record the resulting 10 predicted latent variables z.
Fig. 3 shows the correlation between the mean of the predicted latent variable values (blue) and angle α of the corresponding input quantum states ρ(α).We find that the model assigns latent variable values that scale mostly linearly with the angle α, as demonstrated by the linear regression with a coefficient of determination r 2 > 0.99 [29].In other words, the VAE extracts a latent parameter that is linearly correlated with the generative factor α. We note that there is no incentive for the VAE to extract the actual value of α, as long as the latent representation can uniquely reconstruct inputs.
Finally, by investigating the structure of density ma-  trices in Eq.B1, we can also interpret why the model has learned to use this specific mapping from quantum state to latent representation: Each angle α ∈ [0, π] generates a density matrix with a unique structure, which means that extracting the generative angle α is a sufficient mapping of the sample to a single latent variable that allows for reconstruction.As a final detail, we note that the predicted latent variable values exhibit a standard deviation near zero, with the error bars consistently falling within the markers.This observation shows the robustness of the model's predictions, indicating that an identical representation is consistently obtained across multiple experiments.

B. Encoding Quantum States ρs(α)
In the next step, we introduce an information scrambling procedure to prevent the VAE from learning a direct map to the generative factor α, and that additionally fully removes the ability to extract local features from quantum states.In this experiment the density matrices are scrambled utilizing random local unitaries where tributed according to the Haar measure [30].The procedure to generate these unitaries is detailed in Appendix B 2. By applying the unitary transformation in Eq. 7 to the density matrices, local information becomes inaccessible while non-local information remains invariant [31].
We now study how a VAE, which we label the ρ s -VAE, learns to encode the scrambled density matrices ρ s (α), keeping β = 0.The inset in Fig. 4 again illustrates the change in reconstruction quality of the trained model at different latent space sizes.A perfect reconstruction would require extracting 7 generative factors: the angle α and 3 angles each for the random unitaries.And though the lowest loss values are indeed at N ≤ 7, we observe a clear kink at a three-dimensional latent space, after which the loss flattens out.We are not after a perfect reconstruction but rather focus on interpretable latent spaces, and hence a smaller latent space is preferred over exact reconstruction.
To examine and attempt to interpret the learned representation, we encode a test set of ρ s (α) quantum states using the trained N = 3 model and record the predicted latent variable values z.
Fig. 4 visualizes the latent encoding of quantum states, where each point is color-coded by the concurrence value C[ρ s (α)].We note that this representation is structured by regions of high entanglement (yellow), minimal entanglement (purple), and mixed re-gions.This observation suggests that the model constructs its latent space according to some underlying properties of the quantum states.However, one caveat of this representation is that the extracted information is shared between the three latent dimensions, as all of them appear to capture certain aspects of non-local properties.This makes it impossible to readily interpret the latent variables and to derive a general statement about the learned map from sample to latent representation, which is a well-known problem of VAEs with non-optimized regularization strength [25].

C. Tuning Regularization Strength β
Hence, to optimize for interpretability, we tune the regularization strength β of the ρ s -VAE.The goal is to find a representation with factorized (disentangled) latent variables, meaning that each latent dimension represents a unique independent feature of the encoded data.This is beneficial for us, as representation in which the latent variables learn to encode different independent generative factors of variation in the data is better tuned to human intuition in interpreting data compared to the previous standard VAE approach [28].
By adjusting the value of β, we can control how much the latent variables resemble a normal distribution throughout the optimization process.This naturally incorporates the properties of the normal prior into the learned representation, such as its factorized nature.Specifically, the characteristic of a diagonal covariance matrix of the latent variables is advantageous for the goal of finding interpretable representations, as it creates a disentangled latent space in which each dimension is independent and uncorrelated with the others.Importantly, the tuning process leads to a trade-off between the reconstruction quality of the encoded input and the degree of disentanglement of the learned latent space, where a higher value of β generally leads to more disentangled latent variables but lower reconstruction quality [25].
We train the ρ s -VAE on quantum states ρ s (α) with β ranging from 0.01 to 1.2 with a large latent space of dimension N = 8 to give the latent bottleneck sufficient capacity.For each value of β, we train a model and record the regularization loss value L (i) KL (see Eq. 6) of each latent variable averaged across the data set.
Fig. 5a visualizes the contribution of each latent variable to the regularization loss at different β values.In this figure, each row is sorted and normalized.To interpret this visualization, we note that a regularization loss of 0 corresponds to a latent variable z i that predicts the normal prior N (0, 1) regardless of the input.This is equivalent to not encoding information from the data.Conversely, any deviation from 0 reg- ularization loss corresponds to a latent variable which encodes information.We observe that at low regularization strengths β ∈ [0.01, 0.4], multiple latent variables contribute to the regularization loss.In detail, Fig. 5b illustrates the two-dimensional latent space (z 0 , z 1 ) spanned by the two latent variables with the largest regularization losses L (i) KL at β = 0.01.Consistently with Fig. 4b, both representations exhibit some observable structure according to the entanglement properties, but the information between the two axes is mixed.As β increases, the encoded information is increasingly concentrated in fewer latent variables.This is because of the increased pressure on the latent variables to encode statistically independent features [28].In Fig. 5c, for example, the two-dimensional latent space (z 0 , z 1 ) is shown at β = 0.4, and we observe a clearer relationship between encoding and entanglement properties.In the critical region of β ∈ [0.5, 0.9], the number of active latent variables is equal to the number of ground-truth generative factors in the data set, namely one.This means that majority of extracted information is represented in a single latent variable.Fig. 5d shows the two-dimensional latent space (z 0 , z 1 ) at β = 0.75, where there is a direct relationship between encoding and entanglement properties.Increasing β above 0.9 reduces the capacity of the latent variables to a point where the reconstruction quality becomes too poor to encode meaningful information.This leads to the latent variables becoming more similar to the prior again, as they encode a decreasing amount of information about the quantum states.This is visualized in 5e, where the two- dimensional latent space (z 0 , z 1 ) at β = 1 exhibits less observable structure again.

D. Encoding Quantum States ρs(α) with Tuned Regularization Strength β
Based on the insight gained in the previous experiment, we proceed to analyze the ρ s -VAE trained at β = 0.75, where only a single latent variable z i is active.Throughout the training process, we anneal the regularization strength from β = 0 to β = 0.75 to alleviate the problem of KL Vanishing [5].The inset in Fig. 6 shows that the one dimensional latent space is sufficient for this fixed β, which is what we expect from Fig. 5a.
The next step of the analysis is to examine and interpret the learned representation of the N = 1 model to uncover what properties of quantum states are extracted to build the latent representation.For this, we encode a test set of quantum states ρ s (α) (10 samples at 21 unique angles) using the trained N = 1 model and record the predicted latent variable values z.
After encoding ρ s (α) with the N = 1 model, we plot the resulting latent variables against the concurrence C[ρ s (α)] of the corresponding input in Fig. 6.As before, the resulting correlation is very close to linear, and we conclude that the learned mapping from input to latent representation is based on the extraction of entanglement information.The understanding of why the model has learned to use this specific mapping from quantum state to latent representation starts with a comparison to the result in section III A. In this, the ρ-VAE with β = 0 and no information scrambling has learned to base its latent representation on the extraction of the angle α, as this variable determines the underlying structure and enables the model to distinguish between the states.By scrambling the density matrices, local properties such as the angle α become obscured.As a result, the ρ s -VAE must extract a different quantity that contains equivalent information about the ground-truth generative factor to still be able to distinguish between quantum states.Learning a function of α that remains invariant under the information scrambling transformation accomplishes this task, and the extraction of the concurrence C[ρ s (α)] = C[ρ(α)] does so.A given angle α generates a unique concurrence C[ρ s (α)] and thus provides a direct relation to the ground truth generative factor.
Finally, we remark that we report the absolute values |z| rather than z, as the model has learned a representation with symmetry around z = 0, which is a direct result of the regularization of latent variables.The unchanged latent space z is presented in Appendix B 2. Since we are only interested in the relative distances of the quantum states in the encoding, this step does not remove the ability to interpret the latent representation.

E. Testing the Ability of the ρs-VAE to Generalize to Random Two-Qubit States
We proceed to explore the robustness and generalization capability of the representation learned by the ρ s -VAE.Our objective is to determine whether the ρ s -VAE can effectively extract entanglement information from any given pure (real) two-qubit state.
The information scrambling procedure in Eq. 7 results in pure states that cover any real two-qubit state (see Appendix B 3), and hence the quantum states used for training and testing (ρ s (α) for α ∈ [0, π] and ρ u ) belong to the same family of states.We therefore expect the model to work well on this task.For this we test the model trained on ρ s (α) with N = 1 and β = 0.75 on fully random two-qubit quantum states ρ u and record the predicted latent variable values z.The set of density matrices ρ u comprises randomly generated two-qubit density matrices where U AB represents the real components of randomly sampled 4 × 4 unitary operators, which are uniformly distributed according to the Haar measure.Fig. 7a illustrates the resulting correlation between mean predicted latent variable values (blue) and concurrence C[ρ u ] of the corresponding input quantum states, showing that also for ρ u the encoding is linearly related to the concurrence.In other words, the trained ρ s -VAE is able to extract entanglement features from any pure (real) quantum state.

F. Testing the Ability of the ρs-VAE to Generalize to Depolarized Two-Qubit States
We now proceed to study mixed states ρ d (γ) obtained through a depolarization channel [27] starting from the maximally entangled state ρ(π) In this transformation, ρ(π) is mapped to a linear combination of the maximally mixed state and itself, and the degree of depolarization is set by γ.A depolarization parameter of γ = 0 produces a pure state and γ = 1 produces the maximally mixed state.We now encode quantum states ρ d (γ) for γ ∈ [0, 1] using the trained ρ s -VAE.Fig. 7b illustrates the correlation between (transformed) mean predicted latent variable values (blue) and concurrence C[ρ d (γ)] of the corresponding input quantum states at varying depolarization parameters γ.In this, the transformation T (z) re-scales the latent variable values which is motivated in Appendix B 4. We find that the model assigns latent variable values T (z) that scale linearly with the concurrence, as demonstrated by the linear regression with r 2 > 0.99.This result is significant in that encoding the linear transformation of the maximally entangled state (Eq.9) using the (highly) nonlinear ρ s -VAE network leads to a latent representation that clearly shows the linear transformation of the input in an readily interpretable manner.This observation that the ρ s -VAE extracts a quantity that scales linearly with the depolarization process, in conjunction with the results of previous experiments, is compelling evidence that the ρ s -VAE constructs its internal representation by extracting a quantity that is closely related to concurrence.
G. Testing the Ability of the ρs-VAE to Generalize to subsets of Three-Qubit States In the final step, we explore the capability of the trained ρ s -VAE (N = 1 and β = 0.75) to investigate larger quantum systems.To achieve this, we examine quantum states ρ w (α) generated by a parameterized three-qubit quantum circuit shown in Fig. 12.These states span a range from α = 0 (representing a separable state) to α = 2 arccos 1 √ 3 (representing the W-state).We sample these states and record the corresponding two-qubit subpartitions ρ AB w , ρ AC w , and ρ BC w for subsequent encoding using the ρ s -VAE.
Figure 8 displays the correlation between the predicted latent variable values and the concurrence C[ρ w ] for the three subpartitions.It is observed that the model assigns latent variable values that exhibit a linear scaling relationship (r 2 > 0.99) with the concurrence.This indicates that the model successfully generalizes to this system as well.

IV. CONCLUSION
In this study, we investigate the use of the β-VAE framework for representation learning of small quantum systems.We focus on two-qubit density matrices generated by a parameterized quantum circuit, where the entanglement properties are determined by a single angle.By incorporating an information scrambling technique and optimizing the regularization strength, we observe that the VAE captures a quantity closely related to concurrence to structure its latent representation.Additionally, we demonstrate the generalization capability of the optimized model to other two-/threequbit systems.In conclusion, our findings establish the concept of employing machine learning techniques to derive interpretable representations for small quantum systems.These results serve as a solid foundation for future research endeavors, wherein the utilized methodology can be extended to investigate larger quantum systems.0 to the final value [5,34].In Fig. 6, we present an analysis of the predicted latent variables of the ρ s (α) data set, focusing on their absolute values |z|.This presentation is necessary due to the symmetry around z = 0 in the learned representation.For the sake of completeness, we include Fig. 10, which illustrates the unchanged latent space z as a function of α.

Random Unitary Quantum States ρu
In section III E, we argue that the ρ s -VAE is able to effectively generalize to the ρ u data set as its training data explores the whole pure (real) two-qubit state space.To gain intuition for this statement, we first represent the density matrix ρ s (α) in its state vector representation and apply the Schmidt decomposition The idea now is that with the rotation U A |ψ i 1 ⟩ we can reach, by definition, any single qubit state |ϕ i 1 ⟩.By combining this with the ability to generate any entanglement value λ i (α) with α ∈ [0, π] lets us explore the complete pure state space and can express any twoqubit state Hence, the underlying structure of quantum states used to train and test the ρ s -VAE is identical, which makes the generalization possible.In Section III F, we introduce the transformation T (z) for the predicted latent variables to ensure the correct scaling with concurrence.For completeness, we include Figure 11a, which depicts the correlation between the unchanged latent variables and the depolarization parameter γ.We observe that at γ = 0, we encode the maximally entangled state ρ(π) and obtain the same value for z as shown in Fig. 6.As γ increases linearly, there is a corresponding linear decrease in the predicted latent variables.The point γ = 2  3 marks the transition from entangled to separable quantum states, determined by the positive partial transpose (PPT) criterion [35].As discussed in section III D, the unchanged latent space accurately encodes the relative distances between encoded points in relation to the encoded density matrices, but the scaling is affected by the regularization of the latent encoding.To address this, we employ a linear transformation: This transformation modifies the slope by a factor very close to 2 and introduces offset to the latent variables ensuring that the maximally entangled state is encoded as L(z) = 1 and the transition from entangled to separable occurs at L(z) = 0.This is visualized in Fig. 11b.Drawing inspiration from the definition of concurrence in Equation 1, we introduce the function T (z) = max(L(z), 0) to achieve the desired performance.The impact of excluding the max operation is illustrated in Fig. 11c, which exhibits identical results to Fig. 11b.Quantum states ρw(α) are generated by a three-qubit quantum circuit with a single parameterized Ry(α) gate.

Three-Qubit Quantum States ρw
The ρ w (α) data set utilized in Fig. 8 consists of threequbit states generated by the parametrized quantum circuit in Fig. 12.This circuit is parameterized by a single parameter α which determines its entanglement properties: For α = 0 the output state is separable and for α = 2 arccos 1 √ 3 the output is the W-state.To be able to use the model trained on two-qubit states, we subpartition the three-qubit states by performing a partial trace:

Figure 1 .
Figure 1.Conceptual overview.a) Quantum states ρ(α) are generated by a two-qubit quantum circuit consisting of a Hadamard, a Controlled-Ry gate parameterized by the angle α, and two single-qubit rotations.b) Data are encoded from a density matrix into a stochastic latent representation z using the trained encoder network.c) Latent variables z = (z0, z1) are visualized to analyze the relation of structure of the learned representation and encoded properties.In this figure, the two-dimensional latent space is colorcoded by an entanglement measure of underlying states (the concurrence).Here, the low entanglement region is colored purple and the high entanglement region is colored yellow.

Figure 2 .
Figure 2. Schematic overview of VAE architecture.The input x is compressed by the neural network-based encoder into the latent space, represented as z, serving as an information bottleneck.The decoder network then uses the information from the latent space to reconstruct x * .

Figure 3 .
Figure 3.The ρ-VAE learns to extract the parameter α from quantum states to structure its latent space.The correlation between the one-dimensional latent space z of ρ-VAE and parameter α of encoded density matrices (blue, mean and standard deviation of 10 samples).The error bars are contained within the markers.The regression of encoded quantum states (black) shows that the correlation has a small sinusoidal feature but is sufficiently characterized by a linear function with r 2 > 0.99.Inset: The final loss of ρ-VAE trained on quantum states ρ(α) at β = 0 with latent space dimensions N ∈ [1, 8] (mean and standard deviation of 9 experiments) indicates that a one-dimensional latent space has sufficient information capacity.

2 Figure 4 .
Figure 4.The ρs-VAE learns an efficient but uninterpretable representation of quantum states with information scrambling ρs(α).Three-dimensional latent space z = (z0, z1, z2) of ρs-VAE trained with β = 0.Each encoded density matrix is color-coded by its corresponding concurrence value.Inset: The final loss of ρs-VAE trained on quantum states ρs(α) at β = 0 with latent space dimensions N ∈ [1, 8] (mean and standard deviation of 9 experiments) indicates that a three-dimensional latent space has sufficient information capacity.

Figure 5 .
Figure 5. Tuning the β parameter of ρs-VAE leads to a compressed representation of quantum states.a) Regularization loss L (i) KL contributed by each latent variable zi of ρs-VAE at different β values.The N = 8 latent variables are normalized and presented in descending order of loss values.b-e) Two-dimensional latent space (z0, z1) of two largest L (i) KL at β ∈ (0.01, 0.4, 0.75, 1.0) values.The color-coding is identical to Fig. 4 and indicates the concurrence value of the encoded quantum states.

LFigure 6 .
Figure 6.The ρs-VAE learns to extract concurrence from quantum states to structure its latent space.Correlation between one-dimensional latent space |z| of ρs-VAE and concurrence C[ρs(α)] of encoded density matrices (blue, mean and standard deviation of 10 samples).The error bars are contained within the markers.The regression of encoded quantum states (black) shows a linear correlation with r 2 > 0.99.Inset: The final loss of ρs-VAE trained on quantum states ρs(α) at β = 0.75 with latent space dimensions N ∈ [1, 8] (mean and standard deviation of 9 experiments) indicates that a one-dimensional latent space has sufficient information capacity.

Figure 7 .
Figure 7.The latent representation of the ρs-VAE is able to generalize to other two-qubit systems.Correlation between one-dimensional latent space z of ρs-VAE and concurrence C[ρ] of encoded density matrices (blue, mean and standard deviation of 10 samples).The error bars are contained within the markers.In this, the ρs-VAE is trained on ρs(α) and tested on a) states generated by random 4 × 4 unitaries ρu, and depolarized quantum states ρ d (γ).Both regressions of encoded quantum states (black)show that the correlation is linear with r 2 > 0.99.

Figure 8 .
Figure 8.The latent representation of the ρs-VAE generalizes to subpartitions of three-qubit states.Manual offset of correlation between one-dimensional latent space |z| of ρs-VAE and concurrence C[ρw(α)] of encoded density matrices.In this, the ρs-VAE is trained on ρs(α) and tested on subparitions of the three-quit density matrices ρw(α).All regressions of encoded quantum states (black) show that the correlation is linear with r 2 > 0.99.

Figure 10 .
Figure 10.The ρs-VAE learns to extract concurrence from quantum states to structure its latent space.Correlation between one-dimensional latent space z of ρs-VAE and generative parameter α of encoded density matrices.Each point is color-coded by its concurrence value.

Figure 11 .Figure 12 . 3 -
Figure 11.The latent space of the ρs-VAE generalizes to mixed states using the transformation T (z).Correlation between the generative parameter γ of encoded density matrices ρ d (γ) and a) the one-dimensional latent space z of the ρs-VAE, b) the linear transformation L(z) of the latent space of the ρs-VAE, and c) the sum of eigenvalues λi of the Hermitian matrix R in Eq. 2. All regressions of the encoded quantum states (depicted in black) demonstrate a strong linear correlation with r 2 > 0.99.
The training of all models was conducted on a CPU node within the Xmaris cluster, with each training session completed within a time frame of fewer than two hours.The encoder and decoder architectures each consist of a fully connected (16,8,4,2)6,8,4,2)hidden units in each respective layer and tanh as activation functions.The encoder (decoder) network receives (produces) input (output) vectors consisting of 16 entries, which represent a given density matrix.As a final detail, the models are trained on data sets comprising 101 × 10 3 quantum states.For the generation of these training sets, we select 101 angles within the range of α ∈ [0, π] and extract 10 3 samples at each angle.