Observing Schrödinger’s cat with artificial intelligence: emergent classicality from information bottleneck

We train a generative language model on the randomized local measurement data collected from Schrödinger’s cat quantum state. We demonstrate that the classical reality emerges in the language model due to the information bottleneck: although our training data contains the full quantum information about Schrödinger’s cat, a weak language model can only learn to capture the classical reality of the cat from the data. We identify the quantum–classical boundary in terms of both the size of the quantum system and the information processing power of the classical intelligent agent, which indicates that a stronger agent can realize more quantum nature in the environmental noise surrounding the quantum system. Our approach opens up a new avenue for using the big data generated on noisy intermediate-scale quantum devices to train generative models for representation learning of quantum operators, which might be a step toward our ultimate goal of creating an artificial intelligence quantum physicist.


Introduction
Quantum mechanics offers a remarkably precise depiction of nature at its most fundamental level, particularly in the world of microscopic particles where phenomena like quantum uncertainty, coherence, and entanglement prevail.Yet, our everyday experiences are firmly anchored in the classical world, where macroscopic objects follow well-defined trajectories in a deterministic manner, and the peculiarities of quantum behavior seem imperceptible.This discrepancy between the quantum and classical realms presents a profound enigma in theoretical physics: the quantum-to-classical transition 1,2 , or how and why the classical world emerges from the underlying quantum reality.
Historically, this enigma was epitomized by the paradox of Schrödinger's cat 3 -a thought experiment in which a hypothetical cat can be prepared in a quantum superposition state of both alive and dead, although we have never witnessed such a superposition cat in our daily life.According to the Copenhagen interpretation, the act of observing the cat triggers a collapse of its superposition state into one of the two classical realities: either the cat is alive or it is dead.However, this explanation raises further questions about the role of the observer and the nature of quantum state collapse.Over the years, many theories have been proposed to better understand the emergence of classicality in quantum many-body systems, including decoherence theory [4][5][6] , quantum Darwinism [7][8][9][10][11] , many-worlds interpretation [12][13][14] , spontaneous localization [15][16][17] , quantum Bayesianism [18][19][20][21][22][23][24][25] , and information-based interpretations [26][27][28][29] .A consistent modern understanding is gradually crystallizing from these diverse perspectives.
Decoherence provides a key mechanism bridging the quantum and classical worlds.It arises from the inevitable interaction of a quantum system with its environment, causing the "leaking" of quantum information into the surroundings and the subsequent loss of quantum coherence.Spontaneous localization suggests that the effects of decoherence can be modeled as spontaneous random local measurements of the quantum system by the environment.These measurements extract classical information about the quantum system and spread them in the environment.Quantum Darwinism further explains the quantum state collapse as a result of the natural selection of a classical reality that is consistent with the classical information proliferated in the environment.This perspective aligns arXiv:2306.14838v2[quant-ph] 14 Jul 2023 with quantum Bayesianism, which interprets quantum states as descriptions of beliefs and expectations regarding potential future experimental outcomes.The classical reality, in this view, emerges as an intelligent agent updates its belief based on the observed randomized measurement outcomes in the environment.
It is conceivable that an agent's ability to process classical information could influence its interpretation of reality.This task of reconstructing quantum states from classical information is referred to as quantum state tomography [30][31][32][33][34][35][36][37][38][39] in quantum information science.If the environment can provide classical descriptions of sufficiently many copies of identical quantum states in different measurement basis, an agent with powerful enough classical information processing abilities could, in theory, reconstruct the full quantum reality from the classical data with considerable accuracy.This principle has been demonstrated in research on quantum state tomography, especially in recent advances of classical shadow tomography [40][41][42] .We hypothesize that the difficulty we often experience in comprehending the full quantum reality as compared to the classical reality might be linked to our limited ability in processing classical information.

Quantum superposition
Classical reality To test this hypothesis, we propose training a generative language model 43 on random local measurement outcomes gathered from Schrödinger's cat quantum state.The trained model can then be prompted with new experiment proposals to explore its understanding of the reality of Schrödinger's cat, thereby investigating the emergent classicality from the perspective of artificial intelligence.Fig. 1 provides a cartoon illustration of our setup.In this research, we do not intend to address how the quantum state collapses under the randomized measurements from the environment.Instead, we will adhere to the standard quantum mechanical approach to simulate the randomized measurement outcomes that the environment could collect.Our primary question is to what extent a classical intelligent agent (or a classical algorithm) can process this classical information to form an understanding of reality.More importantly, we seek to study how this emergent reality is influenced by the size of the quantum system and the information bottleneck 44,45 of the classical agent.Through this research, we hope to quantitatively identify the boundary between the quantum and classical worlds 46 , should one exist.

Randomized Measurement Scheme
We begin with an N-qubit Greenberger-Horne-Zeilinger (GHZ) state 47 as a model for the quantum state of Schrödinger's cat, denoted as This state can be repeatedly prepared 48 by a quantum circuit depicted in Fig. 2(a), which comprises a Hadamard gate followed by a series of controlled-NOT gates 49 .This circuit mimics the unitary quantum dynamics that generates Schrödinger's cat state by entangling the qubits together.The decoherence of Schrödinger's cat in the environment can be simulated by a series of random local measurements, which represent the environment's random interactions with the cat, akin to events such as air molecules bouncing off the body of the cat.While we could assume these measurements to be weak and continuous for a more accurate reflection of reality, this assumption is not essential for our discussion.For simplicity, we assume that the environment randomly selects one of the three Pauli observables {X,Y, Z} for each qubit and performs a projective measurement of the chosen Pauli observable.As a result, the cat state will collapse into certain post-measurement state.We will not delve into the nature of how this process occurs, as it is not the focus of our study.We merely follow the principles of quantum mechanics to simulate the measurement process and collect the binary measurement outcomes {±1}.We regard these outcomes as the classical information dispersed in the environment after the decoherence of the cat.Our goal is to analyze how much we can tell about the original quantum state from such classical information.

Classical Shadow Data Structure
Specifically, the data from each random measurement can be represented as a pair of sequences denoted as (x x x, y y y), where x x x ∈ {X,Y, Z} ×N is the observable sequence and y y y ∈ {±1} ×N is the measurement outcome sequence, as exemplified in Fig. 2. Both are sequences of N tokens.Their joint probability distribution, p dat (x x x, y y y) = p(y y y|x x x)p(x x x), defines the data distribution, where p(x x x) = 3 −N is the probability of randomly choosing an observable sequence x x x, which is assumed to be uniform, and is the probability for the measurement outcomes y y y to occur, which is calculated according to Born's rule in quantum mechanics.It encodes non-trivial information about the original quantum state |cat .
We build a classical simulator to sample the sequence pair (x x x, y y y) from the distribution p dat (x x x, y y y) upon request.This essentially simulates the repeated process of creating the Schrödinger's cat state, allowing it to decohere, and collecting the classical information it leaves behind in the environment.Example samples of (x x x, y y y) sequence pairs can be found in Supplementary Information.These (x x x, y y y) sequence pairs, also referred to as classical shadows of the original quantum state, describe random projections of the quantum state in a random measurement basis, akin to a high-dimensional object casting a shadow in a low-dimensional subspace.Classical shadow tomography offers a systematic classical post-processing technique for quantum state reconstruction from its classical shadows [40][41][42] .Given the randomized Pauli measurement scheme mentioned above, the reconstruction formula is This demonstrates that given a sufficient amount of classical data about repeated copies of a quantum state, it is in principle possible to accurately reconstruct the full quantum reality.

Generative Modeling of Classical Shadows
If we are short of memory resources to store the entire dataset of classical shadows, a potential workaround is to train a generative model "on the fly" as we collect the classical shadow data.Once trained, the generative model can approximate the data distribution p dat (x x x, y y y) with a model distribution p mdl (x x x, y y y) and provide us with an endless supply of samples.This approach enables a more efficient compression and utilization of the classical shadow data, gaining an edge in addressing quantum problems.Many recent studies [50][51][52][53][54][55][56][57] have demonstrated the theoretical and practical advantages of combining machine learning with classical shadows.
In constructing the probability model p mdl (x x x, y y y) = p θ (y y y|x x x)p(x x x), our focus lies in modeling the conditional distribution p(y y y|x x x) with parameters θ .This is because p(x x x) = 3 −N is a trivial uniform distribution that does not need modeling.If we perceive the observable sequence x x x as a question, and the measurement outcome sequence y y y as an answer to that question by the quantum experiment, then the modeling of p(y y y|x x x) can be formulated as a chat completion task in natural language processing, which suggests the generative language model as a natural solution.Once trained, the language model can take over the role of the quantum experiment to answer inquiries about the underlying quantum state |cat .In other words, the model can "speak" the quantum language.The learning process imitates the way an intelligent agent accumulates knowledge about the world by observing the environment.
The transformer 58 architecture stands out as a natural choice for modeling p(y y y|x x x).As illustrated in Fig. 2(b), we have made a slight modification in its latent space by imposing a variational information bottleneck 44,45 borrowed from the β -variational auto-encoder (β -VAE) 59 architecture.This structure allows us to adjust the model's information processing power, which will be crucial for our subsequent study.The transformer-based β -VAE comprises two probability models: an encoder p θ (z z z|x x x) that infers latent variables z z z from the input sequence x x x, and a decoder p θ (y y y|z z z) that generates the output sequence y y y based on z z z, such that A more detailed description of the architecture can be found in the Supplementary Information.The goal is to approximate p(y y y|x x x) in Eq. ( 2) with p θ (y y y|x x x) in Eq. ( 4) by optimizing the model parameters θ .The model can be trained by minimizing the β -VAE loss L = E (x x x,y y y)∼p dat L (x x x, y y y) on the training data of classical shadows collected from the cat state, where the loss function for each classical shadow (x x x, y y y) reads The first term is the negative log likelihood loss and the second term is a Kullback-Leibler (KL) divergence regularization.p N (z z z) denotes the normal distribution of zero mean and unit variance.The hyper-parameter β permits us to adjust the variational information bottleneck of the transformer.A large β enforces p θ (z z z|x x x) to approach p N (z z z) regardless of x x x, which limits the model's ability to encode information about x x x in the latent variables z z z.Therefore, increasing the hyperparameter β will impose a stronger information bottleneck, thereby diminishing the model's information processing capacity.

Model Evaluation
We take an N-qubit cat state, collect its classical shadows, and train a generative language model concurrently.For each level of the information bottleneck strength β and each distinct qubit number N, we train a separate model.Upon convergence of the training, we evaluate the performance of each model as follows.First, we sample from the model distribution p mdl (x x x, y y y) by prompting the model with a random observable sequence x x x and collect the model generated measurement outcome sequence y y y.Then, we use the classical shadow tomography approach to reconstruct a quantum state ρ mdl based on the model generated classical shadows, Finally, we evaluate the model constructed quantum state ρ mdl by two metrics: • Quantum fidelity: F(ρ cat , ρ mdl ) = cat|ρ mdl |cat , given that ρ cat = |cat cat| is a pure state.The fidelity measures how closely the state ρ mdl approximates the original cat state.
• Von Neumann entropy: S(ρ mdl ) = − Tr ρ mdl log ρ mdl .The entropy quantifies the disorder or uncertainty of a quantum state.A zero entropy indicates that ρ mdl is pure.
If the model is strong enough to reconstruct the full quantum reality, i.e., ρ mdl = ρ cat , we should expect the fidelity F(ρ cat , ρ mdl ) = 1 to be one and the entropy S(ρ mdl ) = 0 to be zero.Fig. 3 presents fidelity and entropy evaluations for various models.When β is small, the model reconstructed state ρ mdl approximates the cat state ρ cat , as indicated from F(ρ cat , ρ mdl ) ≈ 1 and S(ρ mdl ) ≈ 0. This suggests that the model has learnt the complete quantum reality from the classical shadows.We label this parameter region as the "quantum" regime.Away from this regime, the quality of ρ mdl deteriorates as β increases.This is due to the model's declining ability to capture the statistical features of the classical shadows under a more restrictive information bottleneck.Eventually, for large β , the model generates (x x x, y y y) almost uniformly, corresponding to a maximally mixed state ρ mdl 1/2 N roughly.We mark this limit as "thermal".Interestingly, as the qubit number N increases, an

5/17
intermediate "classical" regime emerges.In this regime, the reconstructed state ρ mdl 1 2 (|0 ⊗N 0| ⊗N + |1 ⊗N 1| ⊗N ) is approximately the decohered density matrix, signifying that the model has learnt the distinct classical realities of Schrödinger's cat but is unable to discern the quantum coherence.
To justify the above interpretations, we selected three representative models from these three regimes separately, named Atlas, Boreas, and Cygnus (standing for A, B, C, respectively).They are trained on the N = 5 classical shadow data with different hyperparameters β = 2 −5 , 2 −1 , 2 6 respectively, as marked out in Fig. 3. To understand the differences between Atlas, Boreas, and Cygnus, let us chat with them!

One-Shot Classification Tasks
We can guide the language models to perform different classification tasks by prompt engineering.The first problem we are interested in is: given a one-shot observation of a Schrödinger's cat, try to determine whether it is alive or dead.Here's how we might prompt the model: x x x : ZZZZZ y y y : ++++?
Here "?" stands for a blank token for the language model to complete.This is akin to asking, "If most of the cat's cells are alive, is the cat alive or dead?"If the model has learnt the perfect correlation among the Z measurement outcomes on the cat state, it will choose to fill in the blank with a "+".Similarly, for the prompt: x x x : ZZZZZ y y y : ----?
We would expect the model to complete the sequence with a "-".Answering these questions essentially classifies the observed cat into alive and dead categories.Fig. 4 shows the performance of Atlas, Boreas, and Cygnus in this test.We observe that Atlas and Boreas perform flawlessly on the task, while Cygnus is essentially guessing.Then what about the following prompt?
This is an out-of-distribution prompt, since it will never appear as a classical shadow of the cat state due to the mismatched Z-basis measurement outcomes.We test the representative models will all combinations of the Z i (for i = 1, 2, 3, 4) measurement outcomes of the first four qubits, and collect the models' responses of Z 5 measurement outcomes.The predicted Z-polarization Z 5 predict is plotted against the observed average Z-polarization Z 1:4 observe := 1 4 ∑ 4 i=1 Z i in Fig. 5.The tests at the two limits of Z 1:4 observe = ±1 belong to in-distribution tests, while the remaining tests are out-of-distribution.From our results, it appears that both Atlas and Boreas are capable of generating reasonable interpolations between the two in-distribution limits.However, Boreas seems to form a binary understanding of the cat's state, either live or dead, while Atlas exhibits a more non-binary understanding, viewing the transition from alive to dead as a continuous spectrum.
We are also interested in whether these models can decode the quantum coherence encoded in the classical shadow data.In previous examples, local Z-measurements destroy the quantum coherence of the cat state, preventing us from testing coherence on the last qubit.To preserve the quantum coherence, we turn to local X-measurements.Suppose the first four measurement outcomes are X i = +1 (for i = 1, 2, 3, 4).This prepares the last qubit into a superposition state 1 √ 2 (|0 + |1 ).We can examine the models' understanding of this state using the following prompts: x x x : XXXXZ y y y : ++++?
The Z-test in Eq. ( 10) is like asking "Q: Is Schrödinger's cat alive or dead?(+) Alive.(-) Dead.", while the X-test in Eq. ( 11) corresponds to probing "Q: What is the sign of quantum coherence between alive and dead?(+) Positive.(-) Negative."Performances of Atlas, Boreas and Cygnus are shown in Fig. 6.While both Atlas and Boreas can realize the superimposed classical realities, only Atlas can correctly predict the quantum coherence between them.Tab. 1 summarizes the performances of the representative models on the cat classification task Eq. ( 7) and the coherence prediction task Eq. ( 11), together with their fidelity and entropy evaluations.These results indicate that Atlas nicely captures the quantum nature of Schrödinger's cat, Boreas exhibits a strong understanding of classical reality, while Cygnus lacks a clear grasp of reality.They represent models in the quantum, classical and thermal regimes respectively in Fig. 3. Their reconstructed density matrices ρ mdl can be found in Supplementary Information.

Latent Representations of Observable Sequences
To better understand how the information bottleneck constrains the model's ability to generate the outcome sequence y y y based on observable sequence x x x, we examine how Atlas, Boreas, and Cygnus utilize the latent space to encode x x x.Fig. 7 presents the t-SNE visualizations of the latent representations µ z z z (x x x) for all x x x ∈ {X,Y, Z} ×N as inferred by different models, where µ z z z stands for the mean of the latent variables z z z as computed by the transformer encoder (see Fig. 2(b)).= 243 distinct observable sequences for three representative models.Each dot represents an observable sequence, and is colored according to the proportionality of X (cyan), Y (magenta), Z (yellow) in the sequence.Different clusters are encircled for ease of view.
We find that the observable sequence embeddings are clustered in the latent space, and Atlas provides the most finely devided clustering.For predicting measurement outcomes, there are two important aspects about observables that the encoder should convey: 1.The locations of Z observables.Since the measurement outcomes of Z observables are all identical, the decoder needs to know where all Z observables are in order to correctly correlate the measurement outcomes on these qubits.
2. The number of Y observables in pure-X/Y sequences.The quantum coherence of the cat state is reflected in the high-order (N-qubit) correlations among X and Y observables.Consider a string operator S = ∏ N i=1 S i with S i ∈ {X,Y }, with n Y being the number of Y operator in S, the cat state has the following feature Therefore, the decoder needs to know n Y in order to correctly determine the high-order correlation among the outcomes.
We can see that Atlas correctly groups the observable sequences according to both aspects, providing all the necessary information for the decoder.The Boreas takes the aspect 1 into account, but groups all pure-X/Y sequences within the same large cluster without clear distinction, so it cannot convey information about the aspect 2 to the decoder.This prevents Boreas from recognizing the quantum coherence of the cat state.Cygnus does not get either aspect right.Instead, it loosely groups all observable sequences based on what the first and last observables are.However, this classification seems to have little practical significance for informing the measurement outcomes.
As the information bottleneck strengthens, different clusters are forced to merge.In comparison to Atlas, Boreas choose to merge all pure-X/Y sequences into a single cluster.The motivation to differentiate these sequences originally stems from the high-order correlations present in the classical shadow data, as described by Eq. ( 12).However, because these high-order correlations are high-variance statistical features, they are the first to be discarded under the pressure of information bottleneck.This leads to the emergence of classicality.

Implication of Results
In this research, we investigate the potential of generative language models for modeling classical shadows collected from randomized Pauli measurements on quantum many-body states.We specifically focus on the GHZ state, an idealized representation of Schrödinger's cat.Our findings indicate that as the size of the quantum system increases, the language model rapidly loses its grasp of quantum coherence.This is because quantum coherence, encoded as high-order correlations in the data, has a variance that escalates exponentially with the system size.
This phenomenon ushers in a boundary between quantum and classical realities, which we quantitatively delineate in Fig. 3. Interestingly, we discover that this boundary is not absolute, but rather influenced by the model's inherent capacity to process classical information.A more potent model can push the quantum-classical boundary towards larger system sizes.In fact, if we conduct classical shadow tomography directly based on the data, we can precisely reconstruct the full quantum state for any system size, even though the data and computational resources required for this operation also grow exponentially with system size.
Our findings suggest that our ability to process classical information may restrict our perception of the quantum essence of the universe.Despite the quantum nature of the universe, our daily experiences are predominantly classical, a perception that might stem from our limitations as classical intelligent agents.
5][76] As a result, for larger quantum systems, generative models might struggle to fully reconstruct quantum coherence and entanglement.This makes it difficult to avoid a certain degree of decoherence in the reconstruction results.

Related Works
Our research aligns and intersects with existing work in the following domains: • Emergent Classicality: Some studies [26][27][28][29] have analyzed emergent classicality from the perspective of partial observation.When a portion of a quantum system (the quantum Markov blanket) is excluded from observation during the data acquisition phase, any locally accessible information about the remaining observable subsystem will appear classical.In contrast, our work illustrates the emergence of classicality from an information bottleneck in the classical post-processing phase.This emergence occurs even when every qubit of the quantum system is observed, suggesting that a lossy compression encoding of the observable sequence can also lead to the emergence of classicality.
• Machine-Learning Quantum State Tomography (MLQST): The objective of MLQST is to employ machine learning models to facilitate an efficient representation of quantum states.The combination of the generative 9/17 language model with classical shadow tomography in this work can be viewed as a strategy for MLQST.Our approach does not directly use a neural network to model the quantum state itself, instead, we employ a generative model to learn the probability distribution of the measurement outcomes under random measurements of the quantum state.This approach diverges from many neural-network-based MLQST methods [62][63][64][65][66][67][68] that rely on direct modeling of the quantum state.Additionally, in terms of the technical approach to quantum state reconstruction from randomized measurements, we follow the classical shadow reconstruction rather than positive operator-valued measure (POVM) inversion [71][72][73] .This choice grants us more flexibility in the selection of the measurement basis.
• Classical Shadows and Machine Learning: Classical shadow tomography provides an effective interface for the mutual conversion between quantum states and classical data.Consequently, it is perceived as a crucial integration point between quantum information and machine learning.Numerous studies [50][51][52][53][54][55][56][57] have showcased the superiority of machine learning algorithms in classifying or interpolating quantum states based on classical shadow data, with the majority of these studies concentrating on supervised learning.Our research delves into the realm of unsupervised generative modeling of classical shadows, demonstrating the feasibility of driving representation learning of quantum observables through classical shadow data.

Future Directions
Many advances in deep learning are based on representation learning, which transforms complex data like images and language into a more manageable latent space.Extending this idea to quantum information, we aim to let artificial intelligence comprehend the "language" of quantum states and quantum operators through representation learning [77][78][79] .However, this process requires a vast amount of data.
Our research showcases the representation learning of quantum observables, as illustrated in Fig. 7.We demonstrate that randomized measurement serves as a potent data source, capable of providing a large amount of unlabeled data for generative models.Such data can now be conveniently acquired on Noisy Intermediate-Scale Quantum (NISQ) 80 devices.Utilizing these data to train dedicated language models could provide foundational models for quantum many-body physics.The learned latent representations can also support numerous downstream applications, contributing to our ultimate goal of building AI quantum physicists.
There are a few future directions to explore.First, unconstraint generative modeling of classical shadows may produce non-physical states (indefinite density matrices).The question is, how can we restrict the probability space to the physical subspace?One possible solution could be adversarial learning, which introduces a discriminator to keep the generator from breaking the positivity bound of the reconstructed state.Also, another pressing issue is to go beyond single-qubit Pauli measurements to gain advantages from quantum entanglement.Recent advancements in shallow-circuit classical shadow tomography [81][82][83][84][85] have made promising strides.It allows the extension of random measurements to commuting multi-qubit observables, thereby improving measurement efficiency.However, translating these classical shadow data into a format suitable for language models, and integrating them with generative models, remains a future research direction.
The sampling can be performed autoregressively.

Reconstructed Density Matrices
We can sample classical shadows from the trained generative language models and reconstruct the density matrix ρ mdl using the reconstruction formula in Eq. (6).Fig. 9 presents the visualizations of these density matrices in the computational basis (the Z-basis), as reconstructed by Atlas, Boreas and Cygnus respectively.Darker pixel represents larger matrix elements, and the color encodes the complex phase (+1: red, +i: yellow, −1: green, −i: blue).
Atlas correctly reconstructs the full quantum density matrix of the GHZ state |0 ⊗N + |1 ⊗N .Boreas fails to capture the off-diagonal matrix elements that represent quantum coherence, as a result, the density matrix is decoherent.Nevertheless, Boreas correctly captures the two classical states (|0 ⊗N and |1 ⊗N ) represented by the

Figure 1 .
Figure 1.Illustration of the general idea.Quantum evolution prepares an entangled Schrödinger's cat state in the quantum world.Decoherence occurs as random local measurements by the environment, which serves as the quantum-classical interface.The randomized measurement outcomes train an intelligent agent in the classical world, such that the agent can realize and identify the emergent classical reality.

Figure 2 .
Figure 2. The model setup.(a) The quantum circuit prepares the cat state.The random local measurements collapse the cat state and generates the classical shadow (x x x, y y y).(b) The classical shadow data is used to train a generative language model for p(y y y|x x x), built with a transformer-based β -VAE architecture.

5 SFigure 3 .
Figure 3. (a) Quantum fidelity and (b) von Neumann entropy of the model reconstructed state ρ mdl for model trained at different β (in logarithmic scale) and N. Dashed curves are suggestive cross-over boundaries.The entropy evaluation for N = 6 is not available, as we are not aware of an efficient approach to estimate entropy other than the full state tomography (which becomes computationally infeasible for N = 6).Three representative models are named as Atlas, Boreas and Cygnus.

Figure 4 .
Figure 4. Performances of three representative models on the one-shot cat classification task.

Figure 5 .
Figure 5. Behaviors of three representative models under out-of-distribution prompts for the one-shot cat classification task.Error bars indicate the mean deviations.

Figure 6 .
Figure 6.Performances of three representative models on the one-shot cat classification and coherence prediction tasks, when the previous measurements has not collapses the superposition.

Figure 7 .
Figure 7.Visualizations of latent encoding of all 3 5 = 243 distinct observable sequences for three representative models.Each dot represents an observable sequence, and is colored according to the proportionality of X (cyan), Y (magenta), Z (yellow) in the sequence.Different clusters are encircled for ease of view.

for i = 1 , 2 ,
• • • , N. In this way, the probability p θ (y y y|z z z) is modeled by p θ (y y y|z z z) = N ∏ i=1 p θ (y i |y y y <i , z z z).

Figure 9 .
Figure 9.The reconstructed quantum states (32 × 32 density matrices) ρ mdl based on the classical shadows generated by the three representative models respectively.Each pixel represents a matrix element.

Table 1 .
Quantitative comparison of three representative models.