Fast state tomography with optimal error bounds

Projected least squares (PLS) is an intuitive and numerically cheap technique for quantum state tomography. The method first computes the least-squares estimator (or a linear inversion estimator) and then projects the initial estimate onto the space of states. The main result of this paper equips this point estimator with a rigorous, non-asymptotic confidence region expressed in terms of the trace distance. The analysis holds for a variety of measurements, including 2-designs and Pauli measurements. The sample complexity of the estimator is comparable to the strongest convergence guarantees available in the literature and -- in the case of measuring the uniform POVM -- saturates fundamental lower bounds.The results are derived by reinterpreting the least-squares estimator as a sum of random matrices and applying a matrix-valued concentration inequality. The theory is supported by numerical simulations for mutually unbiased bases, Pauli observables, and Pauli basis measurements.


I. INTRODUCTION
Quantum state tomography is the task of reconstructing a quantum state from experimental data.Many methods have been proposed for this problem.Maximum-likelihood estimation [1,2] is a popular universal approach that produces point estimators, but error bars are only available in asymptotic scenarios involving fully mixed states [55].This shortcoming spurred the development of alternatives, such as Bayesian [3][4][5] and region estimators [6][7][8].These methods have other drawbacks, such as comparatively high computational cost and weak (or implicit) convergence guarantees.In parallel, researchers proposed compressed sensing techniques [9][10][11][12] to estimate (approximately) low-rank states from fewer samples.
In this work, we revisit linear inversion, one of the oldest and simplest methods for tomography [13].We prove that a variant, projected least squares (PLS), supports easy-to-interpret, non-asymptotic error guarantees.Our main result demonstrates that roughly r 2 d −2 log d independent samples suffice to reconstruct any rank-r state ρ of level d up to accuracy in trace distance.This sampling rate is competitive with the most powerful techniques in the literature [11,14], and it (almost) saturates fundamental lower bounds on the minimal number of independent samples [56] required for tomography [11,15].As an added benefit, the PLS method is numerically "cheap" in the sense that its computational cost is dominated by forming the least-squares estimator.Numerical simulations indicate that PLS is much faster and performs well in comparison to prominent alternatives, including maximum-likelihood estimation [16] and compressed sensing [9].* corresponding author: rkueng@caltech.edu

A. Background and estimator
For d-level systems, a tomographically complete measurement [57] is described by d × d hermitian matrices M 1 , . . ., M m ∈ H d that are positive semidefinite and obey ∑ m i=1 M i = I.Measuring a quantum state ρ results in one of m outcomes, indexed by i ∈ [m].The probability of observing outcome i depends on ρ and is described by Born's rule: These probabilities can be estimated by frequencies: Prepare n copies of the state, measure each of them separately, and set where n i is the number of times outcome i was observed.These frequencies converge to the true probabilities as n → ∞.The least-squares estimator is the solution to the least-squares problem that results from replacing the true probabilities in Born's rule by frequencies (2): This optimization inverts the m linear equations specified by Born's rule; see Eq. ( 7) below (linear inversion).
In general, Ln can have negative eigenvalues, so it may fail to be a quantum state.Several ways to overcome this drawback have been proposed; e.g.[17][18][19][20][21] Projected least squares (PLS) estimator ρn • Estimate probabilities by frequencies (2), • Compute the least squares estimator (3), • Project it onto the set of quantum states (4).knowledge, PLS is new.Indeed, existing techniques are (apparently) more sophisticated.Nevertheless, the simplicity of our approach is a key advantage for both the analysis and for the actual computation.The leastsquares estimator (3) and the projection onto the set of quantum states (11) admit closed-form expressions.The total cost of the PLS estimator is dominated by forming the least-squares estimate Ln .PLS requires considerably less storage and arithmetic than existing techniques that are based on more complicated optimization problems.

A. Error bounds and confidence regions for ρn
We analyze several important and practically relevant measurement systems: structured POVMs (e.g.SIC-POVMs, MUBs and stabilizer states), (global) Pauli observables, Pauli basis measurements, and the uniform/covariant POVM.For each of these settings, the PLS estimator ρn provably converges to the true state ρ in trace distance • 1 .
Theorem 1 (Error bound for ρn ).Let ρ ∈ H d be state and fix a number of samples n ∈ N.Then, for each of the aforementioned measurements, the PLS estimator (Tab.I) obeys where r = min {rank(ρ), rank( ρn )} and g(d) specifies dependence on the ambient dimension: g(d) = 2d for structured POVMs; see Eq. (8) for Ln , g(d) = d 2 for Pauli observables; see Eq. (9) for Ln , g(d) d 1.6 for Pauli basis measurements; see Eq. (10) for Ln .
The following immediate Corollary endows ρn with rigorous error-bars in trace distance.
We emphasize the following aspects of this result: (i) (Almost) optimal sampling rate: Theorem 1 highlights that samples suffice to ensure ρn − ρ 1 ≤ with probability at least 1 − δ.For structured POVMs and Pauli observables, this sampling rate is comparable to the best theoretical bounds for alternative tomography algorithms [9,22].Moreover, fundamental lower bounds in [15] and [11] indicate that this scaling is optimal up to a single log(d)-factor, so it cannot be improved substantially.
(ii) implicit exploitation of (approximate) low rank: the number of samples required to achieve a good estimator scales quadratically in the rank, rather than the ambient dimension d.This behavior extends to the case where ρ, or ρn , is well-approximated by a rank-r matrix; see Theorem 4 in the appendix.These results are comparable with guarantees for compressed sensing methods [11] that are specifically designed to exploit low-rank.Proof sketch for Theorem 1.The least-squares estimator Ln can be viewed as a sum of n independent random matrices.To illustrate this, consider a single structured POVM measurement.Then L1 defined in ( 8) is an instance of the random matrix This generalizes to Ln = 1 n ∑ n i=1 X i , where the matrices X i are statistically independent.Such sums of random matrices concentrate sharply around their expectation value E Ln = ρ, and matrix concentration inequalities [23] quantify this convergence: The operator norm bound induces a Frobenius norm bound, and the projection onto quantum states contracts the Frobenius norm.The claim then follows from relating the (projected) Frobenius norm distance to the trace distance.None of these comparisons depend on the ambient dimension, but only on the (approximate) rank.We refer to the appendix for details.

B. Optimal performance guarantee for the uniform POVM
Theorem 1 involves a factor of the dimension d that may be extraneous.In turn, this factor introduces an additional log(d)-gap between Eq. ( 5) and existing lower bounds [15].The dimensional factors emerge because we employ matrix-valued concentration inequalities in the proof.Our second main result shows that we can remove the dimensional factor for the uniform POVM, which encompasses all rank-one projectors: Theorem 2 (Convergence of ρn for the uniform POVM).For uniform POVM measurements, the PLS estimator obeys where r = min {rank(ρ), rank( ρn )}.
This result exactly reproduces the best existing performance guarantees for tomography from independent measurements [14].The bound follows from standard techniques from high-dimensional probability theory.
Proof sketch of Theorem 2. The operator norm has a variational formulation: Ln − ρ ∞ = max z∈S d | z| Ln − ρ|z .The optimization over the unit sphere may be replaced by a maximization over a finite point set, called a covering net, whose cardinality scales exponentially in d.For any z ∈ S d , z| Ln − ρ|z is a sum of n i.i.d.random variables that exhibit subexponential tail decay.(Measuring the uniform POVM allows us to draw this conclusion.)Standard concentration inequalities yield a tail bound that decays exponentially in the number n of samples.Applying a union bound over all points z i in the net then ensures Pr Ln − ρ ∞ ≥ τ ≤ 2e c 1 d−c 2 nτ 2 .Subsequently, closeness in operator norm for Ln may be converted into closeness in trace-norm for ρn at the cost of an additional (effective) rank factor.

III. ALGORITHMIC CONSIDERATIONS
A. Explicit solutions for the least squares estimator (3) Tomographically complete measurements can be viewed as injective linear maps M : H d → R m with components [M(X)] i = tr(M i X) specified by Born's rule (1).It is well known that the least-squares problem (3) admits the closed-form solution: We evaluate this formula for different measurements and content ourselves with sketching key steps and results (see appendix for details).a. Structured POVMs and the uniform POVM: Also known as 2-designs, these systems include highly structured, rank-one POVMs , such as symmetric informationally complete POVMs [24], maximal sets of mutually unbiased bases [25], the set of all stabilizer states [26,27], as well as the uniform POVM.By definition, for X ∈ H d , all of the above systems obey These equations can readily be inverted, and Eq. ( 7) simplifies to b. Pauli observables: Fix d = 2 k (k qubits), and let W 1 , . . . ,W d 2 ∈ H d be the set of Pauli observables, comprising all possible k-fold tensor products of the elementary 2 × 2 Pauli matrices.We can approximate the expectation value tr(W i ρ) of each Pauli observable by the Pauli matrices form a unitary operator basis, and the evaluation of Eq. ( 7) is simple: 3 I denotes a single-qubit depolarizing channel.Evaluating Eq. ( 7) yields Finally, we point out that all of these explicit solutions are guaranteed to have unit trace: tr Ln = 1.They are generally not positive semidefinite, B. Explicit solutions for the projection step (4) The PLS estimator is defined to be the state closest in Frobenius norm to the least-squares estimator Ln .The search (4) admits a simple, analytic solution [18].Define the all-ones vector 1 ∈ R d and the thresholding function where x 0 ∈ R is chosen so that tr( ρn ) = 1.The fact that Ln itself has unit trace ensures that this solution to Eq. ( 4) is unique.The number x 0 may be determined by applying a root-finding algorithm to the non-increasing function

C. Runtime analysis
The two steps discussed here are inherently scalable: just count frequencies to determine the LI estimators (8,9,10) at a total cost of (at most) min {m, n} matrix additions.The subsequent projection onto state-space is a particular type of soft-thresholding.The associated computational cost is dominated by the eigenvalue decomposition and has runtime (at most) O d 3 .
In summary, forming Ln is the dominant cost of a naïve implementation.However, the high degree of structure may allow us to employ techniques from randomized linear algebra [28] to reduce the cost.

IV. NUMERICAL EXPERIMENTS
We numerically compare the performance of PLS to maximum likelihood (ML) and compressed sensing (CS), respectively.Additional numerical studies for structured POVMs can be found in the appendix.
Fig. 1 compares ML and PLS for Pauli basis measurements in dimension d = 2 4 .The trace-norm error incurred by PLS is within a factor of two of ML for lowrank states.Additional simulations (not included here) indicate that this gap closes for full-rank states.
CS is a natural benchmark for low-rank tomography.The papers [9,10] apply to Pauli observables, and they show that a random choice of m ≥ Crd log 6 (d) Pauli observables is sufficient to reconstruct any rank-r state.The actual reconstruction is performed by solving a convex optimization problem, namely the least-squares fit over the set of quantum states [29,30].Numerical studies from [11] suggest that m = 256 is appropriate for d = 2 5  and r = 1.Fig. 2 shows that PLS consistently outperforms the CS estimator in this regime.Importantly, PLS was also much faster to evaluate than both, ML and CS.

V. CONCLUSION AND OUTLOOK
Linear inversion is one of the oldest and simplest approaches to solve the practically important task of quantum state tomography.In this work, we introduced a variant called projected least squares (PLS) that projects the least-squares estimator onto the set of all quantum states.Not only is this estimator numerically cheap, but it comes with strong, non-asymptotic convergence guarantees.These results are derived using concentration inequalities for sums of random matrices, and they exploit the randomness inherent in quantum experiments.
We show that PLS is competitive, both in theory and in practice.For a variety of measurements, the results match the best existing theoretical results for the sampling rate of other tomography methods.In particular, for the uniform POVM, an order of rd 2 2 samples suffice to reconstruct any rank-r state up to accuracy in trace distance.This result also saturates existing lower bounds [15] on the minimal sampling rate required for any tomographic procedure with independent measurements.Numerical studies underline these competitive features.
Outlook: Corollary 1 is not (yet) optimal.Bootstrapping could be used to obtain tighter confidence regions, and the low computational cost of PLS may speed up this process considerably.It also seems fruitful to combine the ideas presented here with recent insights from [31].Finally, the proof of Theorem 1 indicates that PLS is stable with respect to time-dependent state generation (drift).We intend to address these points in future work.
Acknowledgements: The authors thank Philippe Faist, Anirudh Acharya and Theodore Kypraios for fruitful discussions and valuable feedback.RK and JT are supported by ONR Award No. N00014-17-12146.RK also acknowledges funding provided by the Institute of Quantum Information and Matter, an NSF Physics Frontiers Center (NSF Grant PHY-1733907).

Appendix
At the heart of this work is projected least squares (PLS) -a simple point estimator for quantum state tomography from tomographically complete measurements {M 1 , . . . ,M m } ⊂ H d .PLS is a three-step procedure, see also Tab.I: 1. Estimate outcome probabilities by frequencies.
2. Construct the least squares (linear inversion) estimator: 3. Project onto the set of all quantum states: We analyze the performance of PLS for a variety of concrete measurement scenarios: structured POVMs, Pauli observables, Pauli basis measurements and the uniform POVM.For each of them, ρn may be equipped with rigorous non-asymptotic confidence regions in trace distance.In this appendix, we complement the rather succinct presentation in the main text with additional explanations, motivations and more detailed arguments.
Outline: In Section VI we provide explicit least squares solutions (12) for the different measurements.We also review essential features and properties of the individual scenarios to provide context.
Section VII contains the main conceptual insight of this work: least squares estimators may be interpreted as sums of independent random matrices -the randomness is due to the fundamental laws of quantum mechanics (Born's rule).This allows us to apply strong matrix-valued concentration inequalities to show that, with hight probability, Ln is close to the true target state in operator norm.
Section VIII is devoted to showing that closeness of Ln in operator norm implies closeness of ρn in trace norm.
We combine these two insights in Section IX to arrive at the main result of this work: convergence guarantees for the PLS estimator in trace norm.The result derived there is a strict generalization of Theorem 1 quoted in the main text.It extends to the notion of effective rank which may be beneficial in concrete applications.We illustrate this potential benefit with a caricature of a faulty state preparation apparatus.
In Section X yet stronger convergence guarantees for the uniform POVM are derived.The proof technique is completely different and we believe that it may be of independent interest to the community.
Finally, we present additional numerical experiments in Section XI.

VI. CLOSED-FORM EXPRESSIONS FOR LEAST SQUARES ESTIMATORS
As outlined in the main text, any POVM measurement can be viewed as a linear map M : This map is injective if and only if the measurement is tomographically complete.Provided that this is the case, the least squares estimator (12) admits a unique solution: where f n ∈ R m subsumes the individual frequency estimates.In this section, we evaluate this formula explicitly for different types of prominent measurements.

A. The uniform POVM and 2-designs
The uniform/covariant POVM in d-dimensions corresponds to the union of all (properly re-normalized) rank-one projectors: {d|v v|dv} v∈S d .Here, dv denotes the unique, unitarily invariant, measure on the complex unit sphere induced by the Haar measure (over the unitary group U(d)).Its high degree of symmetry allows for analyzing this POVM by means of powerful tools from representation theory.This is widely known, see e.g.[33,34], but we include a short presentation here to be self-contained.Define the frame operator of order k: Unitary invariance of dv implies that this frame operator commutes with every k-fold tensor product of a unitary matrix U ∈ U(d): Here, we have used a change of variables ( ṽ = Uv) together with the fact that dv is unitarily invariant (d ṽ = dv).Schur's Lemma -one of the most fundamental tools in representation theory -states that any matrix that commutes with every element of a given group representation must be proportional to a sum of the projectors onto the associated irreducible representations (irreps).
For the task at hand, the representation of interest is the diagonal representation of the unitary group: U → U ⊗k for all U ∈ U(d).This representation affords, in general, many irreps that may be characterized using Schur-Weyl duality.The symmetric subspace Sym (k) ⊂ C d ⊗k is one of them and corresponds to the subspace of all vectors that are invariant under permuting tensor factors.Crucially, F (k) is an average over rank-one projectors onto vectors |v ⊗k ∈ Sym (k) and, therefore, its range must be contained entirely within Sym (k) .Combining this with the assertion of Schur's lemma then yields The pre-factor ( d+k−1 k ) −1 = dim Sym (k) −1 follows from the fact that F (k) has unit trace.This closed-form expression is very useful.In particular, it implies that the uniform POVM {d|v v|} v∈S d is almost an isometry.Fix X ∈ H d and compute where tr 1 (A ⊗ B) = tr(A)B denotes the partial trace over the first tensor factor.The projector onto the totally symmetric subspace of two parties has an explicit representation: P Sym (2) = 1 2 (I + F), where F denotes the flip operator, i.e.F|x ⊗ |y = |y ⊗ |x for all |x , |y ∈ C d and extend it linearly to the entire tensor product.Inserting this explicit characterization into Eq.( 33) yields We emphasize that the full symmetry of the uniform POVM is not required to derive this formula: Eq. ( 14) for k = 2 is sufficient.This motivates the following definition: Definition 1 (2-design).A (finite) set of m rank-one projec- Taking the partial trace of this expression yields . Moreover, viewed as a map M : for any X ∈ H d , which can be readily inverted: Inserting this formula into the closed-form expression of the linear-inversion estimator yields for any frequency vector is constant for all i = j.The maximal cardinality of such a set is m = d 2 in which case the angle must be fixed: v i , v j 2 = 1 d+1 .Such maximal sets of equiangular lines are known to form 2-designs [24] and have been termed symmetric, informationally complete (SIC) POVMs.This nomenclature underlines the importance of Eq. ( 16) for the original quantum motivation of the study of equiangular lines.While several explicit constructions of SIC POVMs exist, the general question of their existence remains an intriguing open problem.
(ii) Mutually unbiased bases (MUBs): The study of such mutually unbiased bases (MUBs) has a rich history in quantum mechanics that dates back to Schwinger [35].It is known that at most (d + 1) pairwise mutually unbiased bases can exist in dimension d and explicit algebraic constructions are known for prime power dimensions (d = p k ).Klappenecker and Roettler [25] showed that maximal sets of MUBs are guaranteed to form 2designs.
(iii) stabilizer states (STABs): the stabilizer formalism is one of the cornerstones of quantum computation, fault tolerance and error correction, see e.g.[36].Let P k be the Pauli group on k qubits (d = 2 k ), i.e. the group generated by k-fold tensor products of the elementary Pauli matrices.It is then possible to find maximal abelian subgroups S ⊂ P k of size d = 2 k .Since all matrices W ∈ S commute, they can be simultaneously diagonalized and determine a single unit vector which is the joint eigenvector with eigenvalue +1 of all the matrices in S (provided that −I / ∈ S).Such vectors are called stabilizer states (STAB) and the group S ⊂ P k is its associated stabilizer group.
different stabilizer states can be generated this way.The union of all of them is actually known to form a 3-design [37][38][39] and, therefore, also a 2-design.The latter is also a consequence of earlier results [26,27]

B. Pauli observables
For d = 2 k , the Pauli matrices W 1 , . . ., W d 2 ∈ H d arise from all possible k-fold tensor products of elementary Pauli matrices I, σ x , σ y , σ z ⊂ H 2 .They are wellknown to form a unitary operator basis: for all X ∈ H d .While they do constitute observables, Pauli matrices by themselves are not POVMs.However, every observable W i may be associated with a two- i=1 M i of all these 2-outcome POVMs consitutes a linear map M : H → R 2m that obeys where the last line is due to Eq. (20).Once more, this expression can be readily inverted: Before we continue, we note that one Pauli matrix is equal to the identity, say W 1 = I, and the associated POVM is trivial.Hence, we suppose that n copies of ρ are distributed equally among all d 2 − 1 non-trivial 2-Outcome POVMs M i .We denote the resulting frequencies by [ f ] ± i and suppress the dependence on the number of samples.Then, the explicit solution to the least We can simplify this expression further by noticing that each 2-outcome POVM is dichotomic: either + or − is observed for every run.This implies because [ f ] + 1 = 1.This is the formula from the main text and has a compelling interpretation: the difference μi = i is an empirical estimate for the expectation value µ i = tr (W i X) of the i-th Pauli observable.Finally, note that this estimator is again unbiased with respect to random fluctuations in the sample statistics:

C. Pauli basis measurements
Before considering the general case, we find it instructive to consider the single qubit case in more detail.For now, fix d = 2 and note that there are three non-trivial Pauli matrices σ s with s ∈ {x, y, z}.We may associate each σ s with a 2-outcome POVM that is also a basis measurement: 1  2 because σ s , σ s and σ s σ s have vanishing trace.This implies that the six vectors |b  Here, D 1/3 : H d → H d denotes a single-qubit depolarizing channel with loss parameter p = 1  3 .This behavior extends to multi-qubit systems, i.e. d = 2 k .Suppose that we perform k local (single-qubit) Pauli measurements on a k-qubit state ρ ∈ H d .Then, there are a total of k potential combinations that we label by a string s = (s 1 , . . ., s k ) ∈ {x, y, z} k .Each of them corre- sponds to a POVM M (s) with 2 k = d outcomes that we label by o = (o 1 , . . ., o k ) ∈ {±1} k .The POVM element associated with index s and outcome o has an appealing tensor-product structure: |b denote the union of all such basis measurements.Then, the following formula is true for tensor product matrices X = k i=1 X i and X i ∈ H 2 : where we have used Eq.( 23).Linear extension ensures that this formula remains valid for arbitrary matrices X ∈ H d .Since the single qubit depolarizing channel is invertible, the same is true for its k-fold tensor product and we conclude Inserting this explicit expression into the closed-form expression for the least squares estimator yields as advertised in the main text.Here, [ f ] (s) o is assumed to be a frequency approximation to p We conclude this section with a single-qubit observations that allows for characterizing this expression in a more explicit fashion.Note that one may rewrite D 1/3 (X) as tr(X) 2 I + 1 6 ∑ s tr (σ s X) σ s .This facilitates the computation of the single-qubit inverse: and, in particular This in turn implies which, again, is an unbiased estimator with respect to random fluctuations in the sample statistics.Finally, we also point out another consequence that will be important later on: where D 3/5 is another single-qubit depolarizing channel.

VII. THE MATRIX BERNSTEIN INEQUALITY AND CONCENTRATION IN OPERATOR NORM
Scalar concentration inequalities provide sharp bounds on the probability of a sum of independent random variables deviating from their mean.Classical examples include Hoeffding's, Chernoff's and Bernstein's inequality -all of which have found widespread use in a variety of scientific disciplines.The main results of this work are based on a matrix generalizations of these classical statements -in particular the matrix Bernstein inequality developed by one of the authors, see [23,Theorem 1.4].
Theorem 3 (Matrix Bernstein inequality).Consider a sequence of n independent, hermitian random matrices A 1 , . . . ,A n ∈ H d .Assume that each A i satisfies Then, for any t > 0 where First results of this kind originate in Banach space theory [40][41][42][43] and were later independently developed in quantum information theory [44,45].Further advances by Oliveira [46] and and one of the authors [23] led to the result that we employ here.We refer to the monograph [23] for a detailed exposition of related work and history.
Similar to the scalar Bernstein inequality, the tail behavior in Theorem 3 consists of two regimes.Small deviations are suppressed in a subgaussian fashion, while larger deviations follow a subexponential decay.The ratio σ 2 R marks the transition from one regime into the other.We also note in passing that this result recovers the scalar Bernstein inequality for d = 1 (H 1 R).

A. Concentration for structured POVM measurements
For structured measurements (2-designs) we may rewrite the (plain) least squares estimator (19) as where each X i is an i.i.d.copy of the random matrix where k ∈ [m] is arbitrary.Next, note that the random matrix X obeys and also according to Eq. ( 16).This allows us to bound the variance parameter: The ratio σ 2 R = 2 indicates that any choice of τ ∈ [0, 2] will fall into the subgaussian regime of the matrix Bernstein inequality and Theorem 3 yields

B. Concentration for (global) Pauli observables
We assume that the total number of samples n is distributed equally among the d 2 different Pauli measurements.Similar to before, unbiasedness (22) and the explicit characterization of the LI estimator (21) allow us to write Here, each X (k) i is an independent instance of the random matrix X (k) = ±dW k with probability 1 2 (1 ± tr(W k ρ)) each.This is a sum of centered random matrices that are independent, but in general not identically distributed.However, independence alone suffices for applying Theorem 3. We note in passing that this would not be the case for earlier (weaker) versions of the matrix Bernstein inequality.Bound and use the fact that (in the positive semidefinite order) to considerably simplify the variance computation:

C. Concentration for Pauli-basis measurements
Once more, we assume that the total budget of samples n is distributed equally among all 3 k Pauli basis choices.Unbiasedness of the LI estimator together with the explicit description (25) allows us to once more interpret Ln − ρ as a sum of independent, centered random matrices: For each s ∈ {x, y, z} k , X (s) i is an independent copy of the random matrix For the variance, we once more use (28) and compute where we have used Eq. ( 26) and the fact that the combination of two depolarizing channels is again a depolarizing channel.This expression can be evaluated explicitly.For α ⊂ [k], let tr α (ρ) denote the partial trace over all indices contained in α.Then, for X = k j=1 X j ∈ H d 2 |α| tr α (X) ⊗ I ⊗α and this extends linearly to all of This estimate is actually tight for pure product states of the form ρ = (|ψ ψ|) ⊗k .We may now apply Theorem 3 to conclude

VIII. CONVERSION OF CONFIDENCE REGIONS FROM OPERATOR NORM TO TRACE NORM
The final ingredient for the framework presented in this manuscript is a reliable way to transform operatornorm closeness of the (plain) least squares estimator Ln into a statement about closeness of the PLS estimator ρn in trace distance.Recall that the optimization problem (13) admits an analytic solution [18].Let Udiag(λ)U † be an eigenvalue decomposition of Ln .Then, where x 0 is chosen such that tr( ρn ) = 1 and [y] + i = max {[y] i , 0} denotes thresholding on non-negative components.This solution is unique, provided that tr Ln = 1, which is the case for all the least squares estimators we consider.
The conversion from closeness in operator norm to closeness in trace norm will introduce a factor that is proportional to the effective rank of the density matrix ρ, rather than a full dimensional factor.For r ∈ N, we define the best rank-r approximation ρ r of a quantum state ρ ∈ H d as the optimal feasible point of This problem can be solved analytically.Let ρ = ∑ d i=1 λ i |x i x i | be an eigenvalue decomposition with eigenvalues arranged in non-increasing order.Then, highlighting that the best rank-r approximation is simply a truncation onto the r largest contributions in the eigenvalue decomposition.This truncated description is accurate if the residual error σ r (ρ) is small.If this is the case it is reasonable to say that ρ is well approximated by a rank-r matrix Z and has effective rank r.Proposition 1. Suppose that Ln ∈ H d obeys tr Ln = 1 and Ln − ρ ∞ ≤ τ for some quantum state ρ ∈ H d and τ ≥ 0.Then, for any r ∈ N the PLS estimator ρn obeys ρn − ρ 1 ≤ 4rτ + 2 min {σ r (ρ), σ r ( ρn )} , where σ r (ρ) is defined in Eq. (30).
The statement readily follows from combining two auxiliary results.The first one states that the threshold value x 0 in the analytic solution of ρn must be small if Ln is operator-norm close to a quantum state.Proof.By assumption Ln has unit trace.If it is in addition psd, ρn = Ln , because Ln is already a quantum state and the projection is trivial (x 0 = 0) Otherwise, Ln is indefinite and unit trace ensures that the positive part dominates.Hence, x 0 must be strictly positive to enforce tr( ρn ) = 1.
The second technical lemma generalizes a result that is somewhat folklore in quantum information theory: the "effective rank" of a difference of two quantum states is proportional to the minimal rank of the two density operators involved.Lemma 2. Fix r ∈ N and let ρ, σ ∈ H d be quantum states.Then, ρ − σ 1 ≤ 2r ρ − σ ∞ + 2 min {σ r (ρ), σ r (σ)} , where the residual error σ r (•) was defined in Eq. (30).
The main result of this section is a rather straightforward combination of these two technical statements.
Next, note that according to (29), ρn may be viewed as the positive definite part of the matrix Ln − x 0 I.Such a restriction to the positive part can never increase the operator norm distance to another positive semidefinite matrix.Hence, where the last inequality follows from Lemma 1.

IX. PROOF OF THE MAIN RESULT
By now we have everything in place to provide a complete proof of the main result of this work.Theorem 4. Let ρ ∈ N be a state.Suppose that we either perform n structured POVM measurements (set g(d) = 2d), n Pauli observable measurements (set g(d) = d 2 ), or n Pauli basis measurements (set g(d) = d 1.6 ).Then, for any r ∈ N and ∈ [0, 1],the PLS estimator ρn (13) obeys where σ r (ρ), σ r ( ρn ) denote the residual error of approximating ρ and ρn by a rank-r matrix (30).
However, unlike this specification, Theorem 4 does feature an additional degree of freedom.The parameter r ∈ N allows for interpolating between small values (small sampling rate, but a potentially large reconstruction error) and large values (high sampling rate, but low reconstruction error).This tradeoff is particulary benign for quantum states that are approximately lowrank.Due to experimental imperfections, such states arise naturally in many experiments that aim at generating a pure quantum state.We illustrate this by means of the following caricature of a faulty state preparation protocol.Suppose that an apparatus either produces a target state |ψ ψ| perfectly, or fails completely, in the sense that it outputs a maximally mixed state.Then, the resulting state is where p ∈ [0, 2 log(d/δ) samples suffice to ensure that, with high probability, the PLS estimator obeys ρn − ρ 1 ≤ + 2p.For sufficiently high success probabilities/low accuracy ( ≥ 2p/d) this clearly outperforms the original statement.
Proof of Theorem 4. We illustrate the proof for structured POVMs -the other settings are completely analogous.Fix ∈ [0, 1], r ∈ N and set τ = 4r .Then, the main result of Sec.VII A -Equation ( 27) -ensures that the least squares estimator Ln obeys with probability of failure bounded by de Assuming that this condition is true, Proposition 1 readily yields ρn − ρ 1 ≤ + 2 min {σ r (ρ), σ r ( ρn )}.

X. IMPROVED CONVERGENCE GUARANTEES FOR THE UNIFORM POVM
All the convergence results derived so far feature an additional log(d)-factor.This is a consequence of the matrix Bernstein inequality that proved instrumental in deriving these results.One can show that such an additional factor necessarily features in all matrix concentration inequalities that are based on exclusively first and second moments of the random matrices in question [47].
However, the following question remains: is this log(d)-factor in Theorem 1 an artifact of the proof technique, or is it an intrinsic feature of tomography via projected least squares?
In this section we rule out the second possibility: a different proof technique allows for avoiding this log(d)factor, provided that the POVM is sufficiently symmetric and well-behaved.More precisely, we re-visit the uniform POVM {d|v v|dv} v∈S d and exploit the fact that Eq. ( 14) completely characterizes all moments of the resulting outcome distribution.This opens the door for applying very strong proof techniques from large dimensional probability theory that found wide-spread applications in a variety of subjects, including compressed sensing [48] and, more recently, quantum information theory [49].We believe that this technique may be of independent interest and find it therefore worthwhile to present it in a self-contained fashion.Roughly speaking, it is based on the following steps: (o) Reformulation: the operator norm of a random hermitian matrix A ∈ H d admits a variational definition: (i) Discretization: replace the maximization over the entire complex unit sphere S d by a maximization over a finite point set N that covers S d to sufficiently high accuracy (covering net).
(ii) Concentration: fix y ∈ N and show that the scalar random variable s y = y|A|y concentrates sharply around its expectation value.
(iii) Union bound: apply a union bound over all |N | random variables s y to obtain an upper bound on the operator norm A ∞ .
Ideally the tail bound from (iii) is sharp enough to "counter-balance" the |N |-pre-factor that results from the union bound in step (iv).Should this be not the case, more sophisticated methods, like generic chaining [50], may still allow for drawing non-trivial conclusions.Fortunately, for the task at hand, this turns out to not be necessary and the rather naive strategy sketched above suffices to achieve a result that is (provably) optimal up to a constant factors: Theorem 5. Suppose that we perform n independent uniform POVM measurements on a quantum state ρ ∈ H d .Then, the associated least squares estimator Ln obeys In particular, n ≥ C d τ 2 log(1/δ) suffices to ensure Ln − ρ ∞ ≤ τ with probability at least 1 − δ.Here c 1 , c 2 , C > 0 denote constants of sufficient size.
No effort has been made to optimize the constants.The proof presented here yields c 1 = 2 log(3) and c 2 = 1 480 which could be further improved by a more careful analysis.Importantly, the second part of this statement can be combined with Proposition 1 to readily deduce the last technical result of the main text: Corollary 2 (Re-statement of Theorem 5).For any rankr state ρ, a number of n ≥ C r 2 d 2 log(1/δ) uniform POVM measurements suffice to ensure ρ (n) − ρ 1 ≤ with probability at least 1 − δ.
Not only does this statement reproduce the best known sampling rates for tomography with independent measurements [14], it also exactly matches lower bounds on the minimal sample complexity associated with any tomographic procedure that may apply in this setting [15,Table I].
The remainder of this section is dedicated to proving Theorem 5.For the sake of accessibility, we will divide this proof into three subsections that contain the steps summarized above.| denotes the cardinality of a covering net for the complex unit sphere S d with fineness θ = 1 4 .The complex unit sphere admits an isometric embedding into the real-valued unit sphere in 2d-dimensions: Map realand imaginary parts of each complex vector component onto two distinct real parameters.This map preserves Euclidean lengths and, by extension, also the geometry of the unit sphere.Volumetric upper bounds on the cardinality of covering nets for the 2d-dimensional real-valued unit sphere are widely known, see e.This concludes the proof of Theorem 5.

D. Proof of Lemma 4
Recall that, by assumption, the random matrix X assumes the value X = (d + where the last equation is due to Eq. ( 14).Next, we note that Hoelder's inequality implies tr P Sym

XI. ADDITIONAL NUMERICAL EXPERIMENTS
Maximal sets of mutually unbiased bases (MUBs) form a structured POVM (2-design) that lends itself to numerical investigation.Efficient algebraic constructions of MUBs exist in prime power dimensions d = p k [52][53][54].To further underline the implicit advantage of low-rank we fix a prime dimension d and choose a pure state uniformly from the Haar measure on the complex unit sphere in d dimensions.We compute the outcome probabilities for each of the d + 1 different MUB measurements.We then sample outcomes from each distribution a total of n d+1 times and compute the estimator ρn associated with the total frequency statistics.

)c.
Pauli basis measurements Rather than approximating (global) expectation values, it is possible to perform different combinations of local Pauli measurements.For d = 2 k , there are 3 k potential combinations in total.Each of the settings s ∈ {x, y, z} k corresponds to a basis measurement |b (s) o b (s) o |, where o ∈ {±1} k labels the 2 k potential outcomes.The union M of all 3 k bases obeys

FIG. 1 :
FIG. 1: PLS (blue) vs. ML (red) for 4-qubit Pauli basis measurements: boxplots of trace distance error for ML vs. PLS for 100 datasets generated with random states of rank 1,5,10 and 16, and 200 repetitions per setting.Inset: trace distance error as a function of sample size for a pure target state.

FIG. 2 :
FIG. 2: PLS (blue) vs. CS (red) for m 5-qubit Pauli observables and a pure target state: Trace distance error for CS (m = 256) and PLS (m = 1024) as a function of (total) sample size.

1 − 1 tr
FIG. 3: log trace distance error vs. log sample size for different prime dimensions d.Inset: ordinary plot of the same data.
Figure 3 shows the relation between reconstruction error (in trace distance) and the number of samples per basis on a log − log-scale for different prime dimensions between d = 100 and d = 200.This figure suggests that the rate of convergence only depends linearly on the ambient dimension -the additional log(d)-factor in the main result for structured POVMs is barely visible.

TABLE I :
Summary of the estimation technique.
Mathematically, this is a consequence of the fact that 2-design POVMs "almost" form a tight frame on H d .The close connection to well-behaved, tomographically complete, rank-one POVMs has spurred considerable interest in the identification of 2-designs.Over the past decades, the following concrete examples have been identified: (i) Equiangular lines (SIC POVMs): a family of m unit vectors |v 1 , . . ., |v m ∈ S d is equiangular, if v i , v j 2 1] denotes the probability of failure.This state has clearly full rank and Corollary 1 requires at least n ≥ 43 g(d)d 2 2 log(d/δ) samples to estimate it up to trace-norm accuracy with high probability.In contrast, Theorem 4 ensures that already n ≥ 43 g(d)