Comparison of confidence regions for quantum state tomography

The quantum state associated to an unknown experimental preparation procedure can be determined by performing quantum state tomography. If the statistical uncertainty in the data dominates over other experimental errors, then a tomographic reconstruction procedure must express this uncertainty. A rigorous way to accomplish this is via statistical confidence regions (CRs) in state space. Naturally, the size of this region decreases when increasing the number of samples, but it also depends critically on the construction method of the region. We compare recent methods for constructing CRs as well as a reference method based on a Gaussian approximation. For the comparison, we propose an operational measure with the finding, that there is a significant difference between methods, but which method is preferable can depend on the details of the state preparation scenario.


Introduction
Quantum state tomography is a complete and agnostic procedure to characterize an unknown quantum state from experimental data [1].As such, it plays a central role as a quality control subroutine in quantum experiments or quantum technological applications, allowing us to assess the outcome of a state-preparation procedure and verify that a targeted quantum state is indeed being produced.
The outcome of a tomographic procedure is usually a point estimate ρ of the true density matrix ρ, along with some sort of error bar.A common approach is to use maximum likelihood estimation [2,3] to construct ρ, and report the variance of the point estimate obtained by bootstrapping [4,5] or the width of the likelihood function [6].However, this approach accounts only for the statistical fluctuations of the maximum likelihood point estimate, which can turn out to be smaller than its actual bias [7,8].While such error bars do not measure how far off is ρ from the true state ρ, they can have use-cases and justification in scenarios where other information on the tomographic experiment is intended.To analyze tomographic data rigorously in statistical terms, confidence regions (CRs) are constructed: these are regions in the space of Hermitian operators that intersect the state space and contain the true state with high probability.Generally speaking, we can regard a confidence region C as a function from the experimental data to the space of Hermitian operators fulfilling for every state ρ, where 1 − δ is the confidence level.
There exist diverse methods to construct CRs from tomographic data in the literature [9,10,11,12,13,14,15,16,17,18].Methods may differ on the theoretical grounds they are built on, in turn leading to differences in their tightness, speed of convergence, and computational cost.However, these features are not always conveyed in a way that allows for a direct comparison between different methods, particularly in the practically relevant regime of finite data.As a result, the natural question of which CR should one choose to report the results of a tomographic experiment has, to this day, no straightforward answer.The main objective of this work is to close this gap by developing an operational benchmark under which methods to build CRs can be fairly contrasted.
Generically, for the same dataset and confidence level, the smaller a CR is, the better.To illustrate why this simple criterion is problematic to evaluate in practice, let us have a first look at two recent results in Refs.[15,16], which we will further analyze in this work.A usual way of expressing Eq. (1) (see, for example, Refs.[13,16]) is where d is the dimension of the quantum system, N is the number of tomographic experiments, and the form of f varies with the norm ⋆ (usual examples are trace, Hilbert-Schmidt, or operator norm) and with the estimator used for ρ.In terms of speed of convergence with the number of samples taken, Guţă et al. [16] report an almost optimal scaling of f with respect to N , d, but fixed ϵ (showing an improvement over Ref. [13] for the same norms).In contrast, Wang et al. [15] provide an explicit construction of the function C as a polytope, and the convergence of the region with N is implicit, preventing a direct comparison to Eq. ( 2) based only on the region size.For finite N , neither of these works consider the tightness of the CRs quantitatively.Then the question arises, how can we judge whether one of these CRs is smaller than the other in a given experiment?First, the regions have incomparable shapes (a sphere in a norm and a polytope), hence a parametric comparison is not possible.One could look at volumes to disregard shape, but the relevant volume to consider would be that of the intersection of the CRs with the state space, hence we would require numerical estimates (for example, by Monte Carlo sampling) that are not scalable beyond relatively small systems and will be in any case dependent on the chosen integration measure.Finally, we would encounter the same problems when comparing CRs of the form in Eq. ( 2) if these are expressed in different norms.In this case, we could use inequalities between norms to transform one into another, but we would always be underestimating or overestimating one of the regions.
In this work we propose to assess the power of a CR by how well can it distinguish a pair of states.More precisely, we simulate a tomography experiment on two given target states, and find, for a given confidence level, the minimum number of repetitions necessary for their corresponding CRs (built with the same method) to stop intersecting, which ensures that the two states can be distinguished with the chosen confidence.Intuitively, smaller, more powerful CRs should generally lead to fewer repetitions required for distinguishing two given states, thus allowing us to correlate the statistical power of a CR to a single number.While, clearly, this is not the most efficient test to discriminate the two states, as a test it has several desirable features: It has an operational meaning, it is agnostic to the specifics of a CR, and it is computationally efficient.The number output by our distinguishability test will certainly depend not only in the confidence level but also on the pair of states chosen; this can be considered to be a feature, since one may tailor the test to assess the power of CRs along a particular direction in state space.We use this test to carry out a comparative analysis of recent CRs in paradigmatic scenarios involving pairs of states of up to 4 qubits, and we focus on performing local Pauli-basis measurements as our tomographically complete set of measurements for its practical relevance and widespread use; we also analyse measurements of local symmetric informationally complete positive operatorvalued measures (local SIC-POVMs), see Appendix E. Our results show that there is no single CR that outperforms the others in all cases.The performance of the CRs varies not only with the dimension of the system, but also with the pair of states selected for the distinguishability test.An example of this is the finding that while a CR may allow to distinguish a pure state from the completely mixed state with the fewest number of samples, it underperforms when distinguishing two pure states that are very close to each other.This notwithstanding, our analysis allows us to make some general recommendations.
We apply our comparative analysis to the recent results in Ref. [16] and Ref. [15].In addition, we provide a method to build CRs based on a Gaussian approximation, which will serve us as a reference.While there are other works on CRs in the literature, the rationale behind focusing on these ones is the following.First, we concentrate on unconditional tomography, leaving aside Bayesian methods that require assuming a prior distribution over the possible states (and result in credibility regions instead of CRs) [9,12,18], methods that assume rank-deficient states like compressed sensing [19], permutationally invariant states [20], or matrix product states [21], and heuristic methods like self-guided tomography [22] or CRs based on likelihood ratios [10].Second, we select those works which have not yet been clearly superseded (for example, Ref. [16] improves on Ref. [13], and Ref. [15] on Ref. [11]).Finally, it is worth mentioning with Ref. [23] another recent comparative study on quantum state tomography, which evaluates a different aspect in tomography schemes, namely the typical error (distance between the estimate and the true state) of various point estimators.
The paper is structured as follows.In Section 2 we summarize the CRs considered in our study.Section 3 details the methodology used for comparing between CRs in a series of simulated tomography experiments, and includes the results of our testing.We end the paper with a short discussion in Section 4. In the Appendix, we include details on the derivation of the Gaussian CRs as well as technical considerations about the simulations.

Methods for building confidence regions
Here we summarize the results that we use for our comparative analysis.A common feature of all selected methods is that the CRs are built for the free least-squares estimator for ρ, see Appendix B.2. Importantly, this estimator is unbiased, although due to finite statistics in experiments it generally produces a non-physical density matrix.This is not problematic, because the associated CR will almost certainly have some overlap with the state space.In fact, if the confidence region does not have an overlap with the state space, this is a clear sign of a misaligned experiment, see Ref. [20] for an analysis of this point.Hence, even if the point estimate is unphysical, the intersection of the CR with the state space is the physical confidence region, see also Ref. [24].CRs for physical (hence biased) estimators exist, for example, for projected least-squares [16] or other variants [23], but they are derived from a CR for the free least squares estimator and thus do not improve over them.Hence, to treat all methods on equal footing, in this paper we restrict ourselves to free least squares.For easy referencing, we label the CRs used in our analysis as A, B, C 1 , and C 2 .
The first result we review is a non-asymptotic concentration bound for the operatornorm error of the free least-squares estimator.In Ref. [16], Guţǎ et al. show that the free least-squares estimator obeys where ∥X∥ ∞ is the operator norm of X, N is the total number of state preparations, and d is the dimension of the underlying Hilbert space.Hence, the CR is of the form in Eq. ( 1).One has where the factor g(d) takes different forms depending on the chosen measurement scheme.For multiqubit states and local Pauli-basis measurements, one has g(d) = 3 − log 2 d , for SIC-POVM measurements, g(d) = 2d, but no expression for g(d) has been provided for the case of local SIC-POVM measurements of two or more qubits.
Another method by Wang et al. [15] constructs a polytope-shaped region Γ( ⃗ f ) as a function of the empirical frequencies ⃗ f = (f 1 , . . ., f M ) of a single positive operator-valued measure (POVM), that is, positive semidefinite operators {E i } M i=1 obeying i E i = 1, applied on the state ρ, where f i = n i /N , n i is the number of occurrences of outcome i, and N the total number of state preparations.The region Each facet of the polytope is determined by the half-space where is the relative binary entropy.Then, the CR is given by the intersection of all half-spaces, that is, Γ( ⃗ f ) = i Γ(f i ).Because each half-space corresponds to an independent confidence interval for the frequency f i , we are free to choose each individual confidence level by fixing δ i as long as they add up to the total confidence parameter δ.This method thus allows us to shrink the CR along a preferred measurement direction at the expense of enlarging it in other ones, which adds versatility.For our comparative analysis, we make the uniform choice δ i = δ/M .We also provide bounds based on a Gaussian approximation.Here we make the approximation that the empirical frequencies ⃗ f of the measurement outcomes are Gaussian distributed with mean ⃗ p(ρ) and a nonsingular covariance matrix Σ(ρ).Then we obtain the Gaussian CR defined by where P k (x) is the survival function of the central χ 2 -distribution with k = rank(Σ) degrees of freedom, see Appendix A for details.Note that Σ(ρ) here scales with the number of samples.This CR is exact, but comes with the disadvantage that the region depends nonlinearly on ρ through Σ.We can improve on this at the cost of getting a more relaxed bound, see Appendix A, with σ 2 max the largest eigenvalue of Σ(ρ) across all states.Note that the resulting CR forms an ellipsoid in the state space.For multinomial distributions, we have σ 2 max = 1/(2n), where n is the number of samples per measurement setting.For example, for local Pauli-basis measurements on a q-qubit state, we have 3 q distinct measurement settings, N = 3 q n (see Appendix A).

Methodology for comparing CRs and results
We propose two universal tests to benchmark CRs irrespective of their specifications.The first test consists in computing the (1 − δ)-quantile of the distribution of the ratio over several repetitions of the same tomography experiment with confidence level (1−δ).
The terms in r depend on the chosen CR, as they appear in Pr[ ∥. ..∥ ⋆ > ϵ(δ) ] ≤ δ and note that the norm can be defined either in probability space or in the state space.This test quantifies the tightness of a CR in a scale-independent way.On the one hand, a value 100% for the (1 − δ)-quantile of r indicates that the region is exact, inasmuch as the probability of obtaining ρ at a distance no more than ϵ from ρ coincides with the confidence level, that is, the inequality in Eq. ( 2) is saturated.On the other hand, a small value of r tells us that the CR is unnecessarily large, since in (1 − δ) × 100% of the experiments ρ is actually much closer to ρ than the size of the region ϵ.
We run simulations of tomography experiments on Haar-random pure states of up to 4 qubits.We simulate using N = 60 000 samples and local Pauli-basis measurements, see Appendix B, and evaluate r for a confidence level of 1 − δ = 99%.For Confidence Region A, Confidence Region C 1 and Confidence Region C 2 , and for each state considered, we determine the empirical (1 − δ)-quantile from 10 000 samples of the ratio r; the results are summarized in Table 1.Note that, since Confidence Region B is not norm-based, we do not run this test on it; we show here the results for Paulibasis measurements, in Appendix E we provide the quantiles for local SIC-POVMs, and Confidence Regions C 2 and A (the latter only for 1-qubit cases).We observe that Confidence Region C 1 is an exact region, which supports the validity of the Gaussian (1 − δ)-quantile 1 qubit 2 qubits 3 qubits 4 qubits CR A 38% ± 2% 41% ± 3% 36% ± 1% 32% ± 1% CR C 1 100% ± 2% 100% ± 1% 100% ± 3% 100% ± 2% CR C 2 61% ± 3% 58% ± 6% 45% ± 3% 33% ± 1% Table 1.Empirical (1 − δ)-quantile of the distribution of the ratio r defined in Eq. ( 9) for different CRs and pure Haar-random states for tomography based on Pauli-basis measurements.We used N = 60 000 samples on each tomography simulation, and evaluated the (1 − δ)-quantile from 10 000 samples of r.A perfect region reaches 100% while a lower value shows that the CR is too large.The provided errors show the range of the values attained for 50 different random states.sampling assumption in our numerical experiments.For Confidence Region A and C 2 we see that the latter is tighter for the 1-qubit, 2-qubit and 3-qubit cases and then becomes looser for states of 4 qubits, whereas the tightness of Confidence Region A is more stable with an increasing number of qubits.As we see next, these results correlate well with those of our second test, even though they quantify different features of the CRs.
The second test we devise is aimed at quantifying the distinguishability power of CRs.Choosing two distinct target states over which we perform tomography, we ask the question of how many samples N are required so that their respective CRs do not contain a common state.At this point we can distinguish the two states with confidence 1 − δ based on the CRs, see Figure 1.In this way we can attach a number to a CR and a given pair of states which is indirectly related to the size of the region.This test thus allows us to rank CRs according to an operationally well defined task and independently of their specifications.
We run this test with Confidence Region A, B, and C 2 for two types of pairs of states of up to 4 qubits: pairs comprised by a pure state and the completely mixed state, and pairs of two pure states relatively close to each other, with fidelity |⟨ϕ 1 |ϕ 2 ⟩| 2 = 0.63.The specific states are detailed in Appendix D, and we fix the confidence level to 1−δ = 90%.Importantly, the value of N at which the CRs stop intersecting is a stochastic variable, since it directly depends on the estimate ρ where the regions are centered and thus on the data obtained in each tomography experiment.For increasing values of N , we look at the frequency of finding an intersection over many repetitions (3000 for q = 1, 2; 1000 for q = 3 and 500 for q = 4).We also note that testing whether two regions share a common state can be computed efficiently via a convex optimization program, see Appendix C.This is the case with the exception of Confidence Region C 1 since this region depends nonlinearly on the true state through Σ(ρ) − 1 2 , and for this reason we do not include it in this test.
In Figure 2 we show the frequency that the CRs of the two target states contain a common state, for each of the three CRs considered and for tomography using Paulibasis measurements.Each plot corresponds to a different pair of target states of one and two qubits.For the case of one-qubit states, Confidence Region C 2 performs best in this test, closely followed by Confidence Region B, regardless of whether we try to distinguish a pure state versus the completely mixed state or two pure states close to each other, whereas Confidence Region A requires roughly a factor of two more state preparations.For two-qubit states we observe an interesting phenomenon: Confidence Region A distinguishes a Bell state from the identity with the least number of state preparations and Confidence Region C 2 has almost the same performance, but it becomes notably more inefficient when distinguishing between a Bell state and the same state with a slight local rotation applied (with fidelity 0.63), for which case Confidence Region C 2 is again the best one.An important remark is that the states selected for the test are not aligned with any measurement direction, in order to represent a generic case of state tomography.Particularly for one-qubit states, if the state would point in the x, y or z direction, we find that Confidence Region B slightly outperforms Confidence Region C 2 .This is due to the facets of the polytope Γ( ⃗ f ) being perpendicular to the measurement directions by design.
We show the results of the intersection test for pairs of 3-qubit and 4-qubit states in Figure 3, again for Pauli-basis measurements.Here we observe a clear advantage of Confidence Region A in all cases.This is consistent with the results of Ref. [16].Indeed, the derivation of Confidence Region A is based on the observation that ρ can be seen as a sum of N random matrices and the subsequent application of a matrix concentration inequality, which becomes sharper as N increases (which is necessary for a larger number of qubits).Also, the ranking between the analyzed CRs is maintained throughout the studied 3-qubit and 4-qubit cases, with Confidence Region C 2 in a second position and Confidence Region B requiring the largest number of state preparations.
We also apply the same analysis as in Figure 2 and Figure 3 for measurements of local SIC-POVMs, see Appendix E. The results for one qubit are essentially comparable to Pauli basis measurements, although Confidence Region B performs slightly better.When considering two or more qubits, we find that Confidence Region C 2 becomes slightly worse while Confidence Region B becomes worse in distinguishing pure states from the completely mixed state, but gets significantly better in distinguishing two pure states.Generally, Confidence Region B outperforms C 2 on the usage of SIC-POVMs for 3 or more qubit systems.Note that for two and more qubits we cannot apply the analysis to Confidence Region A, since the factor g(d) is not known for local SIC-POVMs.

Discussion
We have proposed and tested two methods to benchmark confidence regions for quantum state tomography.One of them, based on the distinguishability power of a given confidence region construction, is operationally defined and applicable to any such construction regardless of its shape, specifications, and whether the region is defined in probability space or in state space.In addition, we proposed a second method which determines how tight a given region is.These new metrics allow us to compare existing (and future) methods to build confidence regions that were until now not directly comparable.Thus, we provide a robust tool to help in deciding which confidence region to report in any state tomography experiment.
Our numerical experiments show that this choice should depend on the dimension of the system at hand and the goal for which tomography is used.We see this variability as a reflection of the diverse techniques used to derive confidence regions.The method by Guţǎ et al. [16] appears to perform particularly well for large systems, while in special cases involving 1-qubit and 2-qubit states, the method of Wang et al. [15] or a Gaussian approximation as in Confidence Region C 2 can be better suited.In addition, Confidence Region C 2 has also been found to be useful in deriving optimal tests for detecting properties of the measured state directly from the tomographic data, without going through the intermediate step of computing a point estimate of the density matrix.An example of this is a proposal for the experimental detection of bound entanglement [25].We argued that the three confidence regions A [16], B [15], and C 2 are superior to previous regions and should be taken in consideration when evaluating tomographic data.Among those regions, there are differences that can make one preferable over the other.From our experience (with simulated data), one can generally recommend to consider Confidence Region A first, if the measurement scheme in the experiment belongs to the cases for which the factor g(d) is known.As a fallback, one can consider Confidence Region C 2 ; improvements on this bound may be subject to further research.The Confidence Region B yields a mixed experience, but can be very good, in particular for measurements of local SIC POVMs.This region can be more difficult to handle and to grasp intuitively, because it yields a polytope with many facets in a high-dimensional space.
We analyzed recent unconditional confidence regions for full quantum state tomography and we found that the considered regions perform already reasonably well in the light of our quantile tests and are considerably easy to apply.A point estimate of the unknown state obtained by linear inversion equipped with a confidence region is computationally inexpensive and should suffice for any purpose where tomography is needed.As for which confidence region is best to use, as we have seen, it depends.However, there may still be room for improvement, since, for example, for the method of Guţǎ et al. we observe the (1 − δ)-quantile is only around 40%.This means there could be an even tighter construction, although this seemingly suboptimal slackness does not have a great effect on our intersection tests.fellowships ID100010434: CF/BQ/PR23/11980043; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, Project Numbers 447948357 and 440958198); Sino-German Center for Research Promotion (Project M-0294); German Ministry of Education and Research (Project QuKuK, BMBF Grant No. 16KIS1618K).Views and opinions expressed in this work are, however, those of the author(s) only and do not necessarily reflect those of the European Union, European Climate, Infrastructure and Environment Executive Agency (CINEA), nor any other granting authority.Neither the European Union nor any granting authority can be held responsible for them.and we have f i → p i in the limit n → ∞.

Appendix B.1. Local Pauli-basis measurements
We now particularize this generic POVM for state tomography with local Pauli-basis measurements.For one qubit, measuring in the three Pauli bases can be viewed as a single six-outcome POVM, whose elements correspond to the eigenstates of each Pauli matrix σ j , j = x, y, z.Each POVM element has two labels, the direction j and the two possible outcomes along that direction, +1 −1.Therefore, we write E j,± = (1 ± σ j )/2 = |w j,± ⟩⟨w j,± |.When each measurement σ j is performed n times, the frequency vector corresponding to the six-outcome POVM has components f j,± = n j,± /(3n).
Generalizing the tomography scheme for q qubits, the dimension of the system is d = 2 q , and there are 3 q Pauli measurement settings ⃗ m = (m 1 , . . ., m q ) ∈ { x, y, z } q , each with d possible outcomes ⃗ o = (o 1 , . . ., o q ), where o j ∈ { ±1 }.This tomography scheme can also be written as a single POVM, whose elements are now specified by the measurement setting ⃗ m and the outcome ⃗ o, as If each measurement setting ⃗ m is performed n times, we obtain for the single POVM the frequencies f ⃗ m,⃗ o = n ⃗ m,⃗ o /(3 q n) where the total number of samples is N = 3 q n.At this point we note that, formally, there is no difference between measuring each combination of Pauli matrices ⃗ m separately as 2 q -outcome projective measurement and considering the single POVM E ⃗ m,⃗ o with 6 q outcomes.However, in the finite data regime there is a potential difference when it comes to the covariance matrix of the data.Indeed, when considering a single POVM there is only one frequency that is dependent of the others, by the normalization ⃗ m,⃗ o f ⃗ m,⃗ o = 1.If, instead, we consider applying the Pauli measurements specified by each ⃗ m independently and aggregating the data at a later stage, we have now 3 q dependent frequencies, since now ⃗ o f ⃗ m,⃗ o = 1.This is relevant when computing the covariance matrix Σ(ρ) that appears in Confidence Region C 1 : if we do not eliminate frequencies corresponding to the trivial outcomes from ⃗ f , Σ(ρ) will not have a well defined inverse.Moreover, as we saw in Appendix Appendix A, discarding these outcomes results in a tighter CR.When detailing how to compute Confidence Region A and Confidence Region B for local Pauli-basis measurements, we use the single-POVM formulation to follow the original references [16] and [15] closer.We switch to the individual measurement settings ⃗ m when discussing our bound Confidence Region C 2 (8).

Appendix B.2. Adaptations for the Confidence Regions
Generalized Bloch representation.It is convenient for our computations to write the density matrix in the generalized Bloch representation.For the tuple of 4 q matrices ⃗ µ = 2 − q 2 (1, σ x , σ y , σ z ) ⊗q , any Hermitian matrix can be written as X = ⃗ x • ⃗ µ with ⃗ x ∈ R 4 q .Due to the choice of the normalization, we have tr Free least-squares estimator.The free estimator in state tomography is the solution to the least-squares problem that seeks to minimize the distance between the frequencies and the probabilities, see also Eq. (A.4), Note that the optimization is performed over all Hermitian matrices with unit trace and hence the resulting estimate ρ, in general, will fail to be positive semidefinite.For local Pauli-basis measurements, where ⃗ f has 6 q entries, this problem has an analytical solution [16].In our definitions, the frequencies have the form f ⃗ m,⃗ o = n ⃗ m,⃗ o /(3 q n), and the free least-squares estimate is given by We use this explicit solution when evaluating Confidence Region A.
Therefore, there is one statistically dependent frequency per measurement setting ⃗ m, which we can compute from the remaining frequencies of the same setting.We thus omit the 3 q dependent frequencies to avoid having a singular covariance matrix, which reduces the dimension of the data ⃗ f to 3 q (2 q − 1) components.

Appendix C. Intersection of confidence regions via convex optimization
Given two regions of the types A, B, or C 2 , we can determine whether they have a nonempty intersection via convex optimization, see, for example, Ref. [26].In our numerical experiments we use the Splitting Conic Solver SCS [27] for convex-cone optimization with the Python programming language.

Figure 1 .
Figure 1.Intersecting CRs of different shapes corresponding to tomography experiments performed for two one-qubit states.CRs of type B are polytopes (left); CRs of type A are spheres, although they need not be for higher-dimensional systems (center); and CRs of type C 2 are generally ellipsoids (right).

Figure 2 .
Figure 2. Relative frequency of finding a state in the intersection between the CRs of a) (1 qubit) a pure state and the completely mixed state, b) (1 qubit) two pure states with fidelity 0.63, c) (2 qubits) a rotated Bell state and the completely mixed state, and d) (2 qubits) two rotated Bell states with fidelity 0.63.See also Appendix D for details.The tomography is here performed with Pauli-basis measurements.For the corresponding analysis using local SIC-POVM measurements, see Appendix E.

Figure 3 .
Figure 3. Relative frequency of finding a state in the intersection between the CRs of a) (3 qubits) a rotated Greenberger-Horne-Zeilinger (GHZ) state and the completely mixed state, b) (3 qubits) two rotated GHZ states with fidelity 0.63, c) (4 qubits) a rotated entangled state and the completely mixed state, and d) (4 qubits) two rotated entangled states with fidelity 0.63.See also Appendix D for details.