Attention-based Quantum Tomography

With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. Recent works found promise in recasting the problem of quantum state reconstruction to learning the probability distribution of quantum state measurement vectors using generative neural network models. Here we propose the"Attention-based Quantum Tomography"(AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. The AQT is based on the model proposed in"Attention is all you need"by Vishwani et al (2017) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for natural language processing captures the correlations among words in a sentence.

With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. There has been a growing interest in approaching the problem of quantum state reconstruction using generative neural network models. Here we propose the "Attention-based Quantum Tomography" (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. AQT is based on the model proposed in "Attention is all you need" by Vaswani, et al. (2017) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for natural language processing captures the correlations among words in a sentence.
With rapid progress in modern quantum devices [1], the characterization and validation of large quantum systems becomes an important challenge. Quantum state tomography offers a comprehensive characterization of quantum systems [2]. However, the exponentialin-N q Hilbert space of N q -qubit many-body states implies that exact tomography techniques, such as Gaussian maximum likelihood estimation (MLE) [3], require exponential-in-N q amount of data as well as an exponential-in-N q time for processing. Such prohibitive costs limit exact density matrix reconstruction to small system sizes N q 10. In fact, the tomographic measurement method that is integrated into IBM's Qiskit library is limited to N q = 3. Hence, many experiments rely on indirect methods of error determination, for example variants of randomized benchmarking [4]. Indeed there are efforts to directly estimate properties of quantum states from measurements showing promising scaling [5]. Nevertheless, the scalability of this approach depends crucially on the availability of global entangling gates acting on all qubits simultaneously, which are outside the reach of experimental systems. Thus, new strategies for the characterization of noisy, entangled manybody quantum states using experimentally realistic measurements are much needed.
Recently, there has been a rapidly growing interest in using machine learning tools, such as deep neural networks, for quantum state reconstruction through generative modeling [6][7][8]. The foundation for this approach was laid in Ref [9], which trained a restricted Boltzmann machine to represent complex quantum manybody states without requiring exponentially many parameters or memory size. However, the expressibility of restricted Boltzmann machines and scalability of training is typically restricted to pure, positive quantum states [6,[9][10][11][12], single quantum oscillators [13], and small mixed states [14], which limits their applicability at the scale of modern noisy quantum computers. In contrast, Ref. [7] demonstrated, using a recurrent neural network (RNN), that generative neural network models trained on informationally complete positive operatorvalued measurements (IC-POVM) may be capable of providing a classical description of a noisy quantum manybody state. (See Supplement "Informationally complete positive operator-valued measurements" for a brief introduction to the POVM formalism and a discussion of the POVM employed in this work) However, RNN-based tomography has so far only been demonstrated on classically simulated data, and despite promising indications, its ability to reconstruct a full density matrix has not been demonstrated even in simulation.
The Attention-based Quantum Tomography (AQT) adapts the Transformer architecture, a generative neural network model recently developed for natural language processing tasks [15], for the task of quantum state tomography. We begin by giving an intuition behind the Transformer and the rationale for its suitability for the tomography task. We then demonstrate that AQT outperforms a previous RNN-based approach by a significant reduction in the sample complexity of the reconstruction procedure. We also simulate a simple faultyqubit model and demonstrate the promise of AQT in the task of mixed-state reconstruction. Next, we deploy AQT on experimental data from IBMQ's quantum computer, showing strong qualitative agreement with MLE. Finally, we demonstrate reconstruction of a density matrix with a system size that exceeds the reach of the tomographic tools offered publicly by IBMQ. [16] arXiv:2006.12469v3 [quant-ph] 3 Nov 2021 The rationale behind the AQT is our observation of a promising parallel between the task of natural language processing (NLP) and quantum state tomography (see Fig. 1a). Sentences in natural language are highly structured with long-range relationships among their constituent words. Learning a language with an NLP model is the task of learning such structures and relationships by training on a set of sample sentences that constitute an extraordinarily tiny fraction of the complete set of all possible word combinations in the language. In more technical terms, this means training an autoregressive model to encode the conditional probabilities that govern which words may appear in which location in a sentence given the words that have come before it. In quantum state tomography, the key insight is that the density matrix representation of a quantum state is equivalent to the probability distribution of IC-POVM outcomes. Like sentences in natural language, entangled quantum states feature long-range correlations among their constituent qubits. The task of tomography is to learn this quantum state from a number of projective measurement outcomes that are as small as possible compared to the total space of projective measurement outcomes.
The Transformer [15], which is the neural network architecture used in AQT, is an autoregressive model that employs the "attention" mechanism [17][18][19]. It has been shown to be a dramatic step forward in efficiency and accuracy compared to previous state-of-art NLP models such as RNN [20,21]. Before the Transformer, NLP tasks primarily relied on the RNN architecture [22,23], which incorporates correlations between words by pass-ing an encoded "memory" of the words going back to the beginning of the sentence as each new word was read in sequentially. However, the correlations captured in this approach are inherently short-ranged, as the encoded memory in a sequential model such as the RNN suffers from exponential suppression in correlation [24]. The challenge of long-range correlations in semantic modeling were addressed with the Transformer architecture, which uses self-attention to study correlations between all words in a sentence simultaneously. As qubits in a many-body entangled quantum state have can have arbitrarily long-range entanglement, we can anticipate that the ability of the self-attention mechanism of the Transformer to capture long-range correlations among different positions of the data will be well-adapted to tomography.
As we schematically depict in Fig. 1b, AQT takes as input a set of positive operator-valued measurement (POVM) outcomes, whether from a simulation or a real quantum device, and returns the reconstructed density matrix as output. The Transformer in AQT trains on a data set of N s one-shot local POVM outcomes where M a = (M a ) ⊗Nq are the operators defining the POVM. From this data, the Transformer learns a distribution p T ( a) which is ideally close to p ρ and serves as a generative model that can sample from p T in linear-in-N q time. Once this training procedure is complete, (1) can then be inverted for an appropriately chosen POVM. The POVM T -matrix is defined as T a, a = Tr M a M a .
If T a, a is invertible, the reconstructed density matrix ρ T can be computed from the learned POVM distribution p T ( a) in a post-processing step In this work, we use the Pauli POVM, which is invertible and easily accessible in the IBMQ quantum computers. [7] Our target state both in experiment and in classical simulations will be the N q -qubit Greenberger-Horne-Zeilinger (GHZ) state with system sizes ranging from N q = 3 to 90 qubits. We choose the GHZ because it is a pure state of interest for quantum communication protocols and we can benchmark our results against others in the literature, including those that do not reconstruct the full density matrix [5,7]. A comprehensive measure of reconstruction is the quantum fidelity where ρ 0 is the target density matrix against which we compare the reconstructed density matrix ρ 1 . Quantum fidelity in general requires full density matrix reconstruction [25]. We will carry out full density matrix reconstruction for small system sizes (N q ≤ 6), for which we will be able to evaluate the exact quantum fidelity. However, in order to benchmark our results against the earlier works using neural networks, we initially investigate the classical fidelity, which can be used when the state reconstruction only yields measurement probabilities: Here the sum is over all IC-POVM outcomes a = (a 1 , a 2 , . . . , a Nq ), a i ∈ {1, 2, . . . , N a } and p 0 and p 1 represent the measurement statistics of an IC-POVM over states ρ 0 and ρ 1 , respectively. Even though the classical fidelity contains a number of terms exponential in N q , it is possible to estimate F C (p 0 , p 1 ) efficiently by sampling from the generative model representing p 1 , i.e., p1( a) , where the final sum is an average over a sampled from the distribution p 1 ( a). This choice is enabled by the Transformer architecture, which allows for both exact sampling from p 1 and the exact calculation of p 1 ( a) for any choice of a in time linear-in-N q . However it should be noted that the classical fidelity only provides an upper bound on the quantum fidelity [7], and the discrepancy can be substantial [5].
We first benchmark AQT against previous state-of-art neural quantum state tomography using RNN. Ref [7] studied an N q -qubit GHZ state, with N q = 10 to 90, using classically sampled measurements, and demonstrated that N * s (N q ), the minimum size of training data for which the RNN can achieve a classical fidelity of 0.99, increases linearly with N q . In Fig. 2a, we demonstrate a similarly linear dependence of N * s on N q using AQT, indicating that these natural language processing models can indeed learn non-trivial information about a quantum state from an amount of data that grows sub-exponentially with the system size. However, AQT exhibits an orderof-magnitude improvement in the sample complexity of learning the GHZ state compared to RNN tomography with a comparable slope dN * s /dN q . As we will see shortly, this improvement in learning ability from RNN to AQT is even more dramatic in the task of density matrix reconstruction and quantum fidelity estimation, which is significantly more challenging than the task of achieving a good classical fidelity, and thus was out of reach for RNN tomography. [26] We now investigate the AQT's performance on a mixed state with a built-in simulated error. We consider a 3qubit GHZ system and assume there is one faulty qubit, which we pick to be qubit-0. We assume that the faulty qubit flips (0 ↔ 1) with probability p. More precisely, this represents the mixed state ρ err (p) = (1 − p)|GHZ 3 GHZ| 3 + p|ψ 3 ψ| 3 , (6) where |ψ 3 = 1 √ 2 |100 + |011 . For the small number of qubits that we study, we are able to compute the exact quantum fidelity. First, we consider the fidelity between the reconstructed state and the noisy state in Eq. 6, for which we find F Q (ρ model , ρ err ) = 1 within statistical error. This demonstrates that the AQT is sufficiently expressive to support a successful training procedure. To facilitate comparison to an experimental setting where p is a priori unknown, we compute the fidelity of the "realized" density matrix ρ model to the "target" density matrix ρ GHZ , which is the error-free pure GHZ state. The numerical results for p = 0.0 ∼ 0.3 displayed in Fig. 2b are consistent with the expectations from the built-in error. (See Supplement "Error model # 2" for another model of error for the 3-qubit GHZ state) Note that as the density matrix reconstructed in AQT is not guaranteed to be positive, F Q ≤ 1 is also not guaranteed. However, we find that the negative eigenvalues are less significant as the quality of reconstruction is increased. (See Supplement "Positive density matrix reconstruction" for a discussion of scaling of negative eigenvalues in sample size as well as one possible approach for positive density matrix reconstruction) Next, we benchmark the AQT aginst the MLE algorithm that is built into the IBM Qiskit library by per- forming tomography using the two approaches on the measurements taken on IBMQ OURENSE on a 3-qubit system (Fig. 3). For the reconstruction we took 100 measurements in each of 3 3 = 27 possible measurement configurations, for a total of 2,700 measurements. Fig 3 shows two reconstructed density matrices using the usual graphical representation. Here, each bar represents a matrix element, in general complex, with the bar height set by its absolute value. The tall bar to the left is the density matrix element |000 000|, the bar to the rear is |000 111|, and so on. Note that the matrix elements represented by the bars in the rear and front are related by complex conjugation. The AQT-reconstructed density matrix is in strong qualitative agreement with the MLE reconstruction, capturing the error in realizing the GHZ state on the quantum computer. From the Transformer reconstruction, we find an exact quantum fidelity to the target pure GHZ state of F Q = 0.917, while the MLE reconstruction has fidelity 0.897. [27] These results give a mutually consistent estimation of the reliability of the IBMQ OURENSE quantum computer. The advantage of AQT as compared to exact tomographic methods such as MLE is that AQT can be scaled to larger systems.
To further demonstrate the characterization ability of AQT, we reconstruct the density matrix for a 6-qubit GHZ state, already beyond the tomography functionality offered in Qiskit (see Fig. 4). We use classically generated data sampled from the noise-free GHZ state rather than data from IBMQ. (This is due in part to limited number of POVM measurements publicly accessible with IBMQ.) The reconstructed density matrix in Fig. 4 uses a total of 200,000 measurements and has quantum fidelity 0.977. The reconstructed IC-POVM probability distribution p 1 (see Eq. (5)) is in excellent agreement with the GHZ state, as expected from Fig. 2a. Namely, achieving classical fidelity of F C = 0.99, which directly measures the accuracy of p 1 reconstruction, for a 6-qubit state requires even fewer than the 3,000 measurements required for the 10-qubit state. On the other hand, the reconstruction of the full density matrix ρ 1 shows noise even with 200,000 measurements, though it is still in reasonable agreement with the GHZ state. (See Supplement "Reconstruction of Dicke states" demonstrating reconstruction of simulated 3-and 6-qubit Dicke states with 2700 and 72,900 measurements for comparison) In general, an accurate reconstruction of ρ 1 requires much more data and computing time than an accurate reconstruction of p 1 , since even small errors in p 1 are amplified into large errors in ρ 1 . This is a restatement of the well-known fact that classical fidelity is an upper bound on quantum (exact) fidelity. Exact error-scaling analysis in the number of samples is in general NP-hard [28] and remains an open question in AQT. The scaling of the POVM probability mean-squared error (MSE) in sample size N s for the 6qubit GHZ state suggests that the AQT error scaling is comparable to statistical scaling MSE ∼ N −0.5 s , but with a significant reduction in overall magnitude compared to directly using the data. (See Supplement "Transformer advantage" for a detailed discussion of error scaling in sample size in AQT) In summary, we proposed the AQT which adopts elements of the Transformer, a generative deep neural network for NLP, to the task of quantum state tomography. The AQT outperformed earlier neural quantum tomography based on the RNN architecture on an identical task, demonstrating a significant enhancement in the sample complexity of the reconstruction. This suggests that the AQT provides a nontrivial inductive bias suitable for the reconstruction of entangled states such as the ones considered in our experiments. We constructed a qubit-error model and showed that AQT provides a reliable estimate of a priori unknown quantum mixed states and error rates in our specific setting. We then demonstrated for the first time that a machine learning based tomographic technique can reliably reconstruct the noisy density matrix of an entangled state, by reconstructing the 3-qubit GHZ state realized by a quantum computer provided publicly by IBMQ. Furthermore, using AQT we have reconstructed a 6-qubit GHZ state, which is a tomography task of a size beyond the reach of the tomog-raphy functionality in IBM Qiskit's software.
To the best of our knowledge, AQT represents the first machine-learning based approach to successfully reconstruct density matrices describing the states produced in an experimentally realized quantum computer. AQT offers a substantial improvement in sample complexity over next-leading neural-network based tomographic methods, and is demonstrably capable of reconstructing noisy, full-rank states from experimental measurements. AQT is competitive with MLE in arbitrary density matrix reconstruction at small system sizes, but is also capable of POVM state characterization at large system sizes. Furthermore, AQT is designed to work with experimentally realizable local POVM measurements, without requiring globally-acting gates which are inaccessible in current experimental systems. Finally, AQT requires no assumptions about the entanglement structure or purity of the state being reconstructed and is expressive enough to characterize arbitrary states. With these features, AQT represents not only a leap forward in the cooperation of machine-learning and near-term quantum computing, but a powerful tomographic method uniquely suited to bridging the gap between simulation and experiment in the emerging era of noisy, intermediate-scale quantum computing.
AQT holds much promise for future progress. Because AQT learns a POVM representation of the quantum state that can be used to compute operator expectation values, a future investigation of AQT as a platform for shadow tomography will be fruitful. This work has been largely based on the GHZ state, facilitating a comparison with previous works without full density matrix reconstruction. Nevertheless the AQT is not inherently limited to a special pure state, and an examination of how N * s scales with N q in states with more complex entanglement will provide much insight into machine-learning based tomography. Tests on a bigger experimental system and other architectures will help us determine the full scalability of the AQT. Furthermore, whether the AQT approach can build on the initial insight from our elementary error model towards more sophisticated error modeling and assessment to complement gate-set tomography [29,30] would be also an interesting direction. Author Contributions: EK, PC, PG planned the AQT. FW and PC implemented the AQT. EK, JC, PC, and PG were involved in benchmarking AQT against RNN and other architectures. EK, PC, PG, and PM were involved in experimenting on IBMQ. All authors contributed equally to writing the manuscript.
Competing Interests: The authors declare no competing interests.