Near-optimal quantum tomography: estimators and bounds

We give bounds on the average fidelity achievable by any quantum state estimator, which is arguably the most prominently used figure of merit in quantum state tomography. Moreover, these bounds can be computed online---that is, while the experiment is running. We show numerically that these bounds are quite tight for relevant distributions of density matrices. We also show that the Bayesian mean estimator is ideal in the sense of performing close to the bound without requiring optimization. Our results hold for all finite dimensional quantum systems.

Inferring a quantum mechanical description of a physical system is equivalent to assigning it a quantum state-a process referred to as tomography. Tomography is now a routine task for designing, testing and tuning qubits in the quest of building quantum information processing devices [1]. In determining how "good" one is performing this task, a figure of merit must be reported. By far the most commonly used figure of merit for quantum states is fidelity [2,3]. Nowadays, fidelity is used to compare quantum states and processes in a wide variety of tasks, from quantum chaos to quantum control to the continuous monitoring of quantum systems [4][5][6][7][8][9][10]. It is surprising then, that the technique which optimizes performance with respect to fidelity is nort known.
In Theorem 3, we provide an absolute benchmark for average fidelity performance of any tomographic estimation strategy. This is important because, in the field of quantum tomography, a common theme is to compare estimators. Up to date, many options are available: linear inversion [1], maximum likelihood [11], Bayesian mean [12], hedged maximum likelihood [13], and compressed sensing [14]-to name a few. Often estimators are compared by simulating measurements on ensembles of states drawn according to some measure and averaging the fidelity. This can only provide conclusions about the relative performance of estimators. Thus, our bounds can be used to benchmark the fidelity performance of other candidate estimators.
We complement our theoretical findings with numerical experiments. These demonstrate the relative tightness of our bounds and, in particular, reveal that the Bayesian mean estimator is an excellent choice-owing to its near-optimal performance and ease of implementation. Importantly, both the mean of the distribution and our bounds can be computed online-that is, the estimator and its performance can be computed while data is being taken. In the context of Bayesian quantum information theory [12], our findings lend credence to the standard approach of using the mean of the posterior distribution as an estimator is a near-optimal one.
We note that this problem has been solved for the case of a single qubit (d = 2). Bagan et al [6] have given the optimal estimator (and measurement!) for any isotropic prior measure. Unfortunately, by making heavy use of the Bloch representation of a qubit, the methods do not generalize. Whereas, our bound holds for all distributions of states in any dimension and coincides with the results of [6] for the case of a single qubit.
The fidelity between two states ρ, σ ∈ S is defined to be [2,3] Define the average fidelity with respect to some measure dρ as E ρ [F(ρ, σ)] [31]. We want the average of this to be as large as possible. Thus, the problem can be succinctly stated as follows: subject to Tr(σ) = 1, In the context of tomography, we think of ρ as the "true state" and σ as the estimated state. An estimator is a function from the space of data to quantum states σ : data → σ(data) ∈ S, where data are the results of a sequence of quantum measurements. Since both the true state and data are unknown, we take the expected value with respect to the joint distribution of (ρ, data) to obtain the average fidelity: We want this to be as large as possible. The estimator which maximizes this quantity is equivalent to the estimator maximizing the following posterior average fidelity for every data set: An estimator which maximizes this is called a Bayes estimator [32]. Bayes estimators are useful both to understand Bayesian optimality and to provide upper bounds for the worst case performance. Now here is the subtle and important point. The measurements performed, the data themselves and the distribution from which they were generated are not important once the posterior distribution has been calculated. If we know the solution for every measure dρ, then we know the solution for the posterior measure dρ|data. For brevity, then, we will drop this conditional information from now on and the problem reduces again to (3).
Ensembles of pure states.-We first present the analytically soluble case of measures supported only on pure states. Such a case is common in theoretical studies which average the performance of their protocols over the popular choice of the unique Haar invariant measure on pure states. The solution is organized into the following theorem: Theorem 1. Let the dimension d be arbitrary and assume that the integration measure dρ is supported only on pure states. Then, the state which solves the optimization problem The proof is a simple exercise in linear programming. When ρ is a pure state, the fidelity simplifies to F(ρ, σ) = Tr(ρσ). Linearity allows us to bring the expectation inside the trace so that the problem becomes maximize Tr(E ρ [ρ]σ) subject to Tr(σ) = 1, The solution can be found in many textbooks covering linear programming-e.g. [17]. This solution also coincides with the one noted for a distribution supported on two states in [18].
General measures on mixed states.-For measures with support on mixed states, the situation is markedly different. Our main technical contribution are new upper bounds for this case. We obtain them by optimizing not the fidelity, but the so-called super-fidelity, which provides the following upper bound on the fidelity [19]: For brevity, we defineρ := E ρ [ρ], p ρ := E ρ 1 − Tr(ρ 2 ) and the function g(σ) := Tr (ρσ) + p ρ 1 − Tr (σ 2 ), (8) such that max σ∈S E ρ [F(ρ, σ)] ≤ max σ∈S g(σ) holds. The following Lemma allows us to further simplify the maximization of g(σ). Lemma 1. Fix a density operator ρ with eigenvalue decomposition ρ = ∑ d i=1 r i |b i b i |, where the eigenvalues are arranged in non-increasing order (i.e. r i ≥ r i+1 ) and let σ ∈ S be arbitrary. Then there exists a corresponding density operatorσ with equal purity-i.e. Tr σ 2 = Tr σ 2 -that commutes with ρ and obeys This density operator is explicitly given bŷ where s 1 , . . . , s n denote the eigenvalues of σ arranged in nonincreasing order.
Proof. The statement follows from a corollary of the Birkhoff-von Neumann Theorem for doubly stochastic matrices -see e.g. [16,Theorem 8.7.6]. For arbitrary hermitian d × d matrices this corollary assures where r i and s i denote the eigenvalues of ρ and σ, respectively, arranged in non-increading order. The expression on the right hand side corresponds to = Tr (ρσ) withσ defined in (10). Clearly, thisσ is also a quantum state obeying Tr σ 2 = Tr σ 2 , because the spectra ofσ and σ coincide.
Lemma 1 allows us to further constrain the maximization of g(σ) to density operatorsσ which commute witĥ ρ. Such a refinement is possible, because the lemma assures that for any state σ there is a corresponding oneσ, which commutes withρ, has equal purity and attains a larger value of Tr (ρ ·). Inserting the explicit form (10) ofσ and optimizing directly over its eigenvalues establishes our second main result: Theorem 2. For any finite dimension d and any distribution dρ over states, letr 1 , . . . ,r d denote the eigenvalues of the meanρ= E ρ [ρ]. The fidelity achieved by an estimator σ ∈ S is upper-bounded by the solution of which is a commutative convex optimization problem.
Note that, if the measure dρ is supported exclusively on pure states, p ρ vanishes and (12) reduces to Theorem 1 and becomes tight.
To further understand the nature of such a bound for mixed states and obtain analytical results, we further relax (12) by replacing the non-negativity constraints (s i ≥ 0) by the weaker demand that the optimization vector (s 1 , . . . , A straightforward application of the method of Lagrange multipliers yields an optimal vector ofσ's eigenvalues which corresponds to the matrix Here, 1 denotes the identity matrix. Evaluating the corresponding optimal value yields the following analytical upper bound: Theorem 3. For any finite dimension d and any distribution dρ over states, the fidelity achieved by any estimator σ ∈ S is bounded from above by We present a more detailed proof of this statement in the Appendix. Note that since we relaxed the maximization constraints, σ in general fails to be positive-semidefinite and is thus not a valid density operator, though we do not use it as such. In particular, the bound is not tight when dρ is supported only on pure states-as might be evident from the possibility of non-positive states arising from the (ρ − 1 d 1) term in (13). On the other hand, the distribution is known and thus in the case of a distribution supported only on pure states, one should consult the exact solution in Theorem 1.
Conversely, if σ happens to be a state, it also solves the optimization (12) and the analytical bound (14) exactly reproduces the-a priori tighter-one presented in Theorem 2. In all of our numerical experiments, some of which are presented below, this was indeed the case.
It is also worthwhile to point out that super-fidelitythe bound in (7)-and the actual fidelity coincide for one qubit, i.e. for d = 2 [19]. Also replacing positive semidefiniteness by bounded purity yields the same feasible set for that particular case. Consequently the bound (14) reproduces one of the main results in [6]: In the single-qubit case (i.e. d = 2) the bound (14) exactly reproduces the maximum average fidelity in [6, Equation (2.9)] and σ is the optimal estimator.
A proof of this equivalence is provided in the Appendix. There we also further discuss the geometric nature of the relaxation we employed to arrive at Theorem 3.
The best is the enemy of the good.-Note that fidelity achieved by any estimator is a lower bound on the one achieved by the optimal estimator. A particularly convenient and generally well motivated [18] estimator is the mean of the distributionρ = E ρ [ρ]. We want to show that for distributions of states relevant to tomography, the mean is very near-optimal. In the context of tomography the mean is furthermore arguably the most convenient estimator, since every other quantity of interest requires its calculation anyway.
Finding an analytical expression for the posterior distribution is a very challenging problem, let alone performing the multidimensional integrals required for the calculation of the expectations above. Thus, we turn to numerics. In particular, we use the Sequential Monte Carlo algorithm, which has been successfully applied to quantum statistical problems in the context of dynamical parameter estimation [20][21][22] and quantum state estimation [23][24][25].
Recall the sharp distinction between measures supported on pure states and those with full support. We use the fact that we know the exact optimal estimator in the former case to lend support to the claim that the mean estimator is a good candidate for a computationally simple, yet still near-optimal, alternative to solving the optimization problem in general. In Figure 1, we present the results of numerical simulations on two qubits. Plotted is the average fidelity achieved by the optimal estimator (see Theorem 1) and the mean estimator E ρ [ρ]. The average is taken with respect to a distribution that begins as the Haar invariant measure on pure states and is updated through simulated measurement data, where the measurement is the "uniform POVM" consisting of all pure states, distributed uniformly according to the Haar measure. For independent measurements-i.e. local, non-adaptive ones-this measurement is optimal [26, Theorem 3.1]. We see that the mean estimator's fidelity tracks the optimal fidelity quite well.
In Figure 2, we plot the average fidelity of the mean estimator against our bound (14) for measures supported also on mixed quantum states. Again, we simulate measurement data to get an accurate sense of how well the average fidelity of the mean estimator performs with respect to our bound for distributions relevant to tomography. In this case, the prior distribution is the Hilbert-Schmidt measure [33] and many other natural distributions appear as we update our prior through Bayes' rule. We see again that the mean estimator is a "good" estimator in that it comes close to the bound on the optimal fidelity and is the easiest non-trivial average quantity to evaluate. We have presented some further numerical evidence which favors the Bayesian mean estimator for other mixed state measures in the Appendix.
Conclusion.-In this work we have derived upper bounds on the average fidelity of any estimator with no restrictions on the dimension, or the distribution being averaged over. Furthermore, we have shown a sharp distinction in the optimization problems of maximizing average fidelity between measures supported only on pure states and those with full support. In the former case, we have provided the exact optimal estimator, while in both cases we argued based on numerical evidence that the mean estimator is a good proxy for the optimal solution.
For the curious, we note that the analytical bound (14) (which is based on super-fidelity [19]) is strictly tighter than a corresponding one obtained using the well known, and often used, Fuchs-van de Graaf inequalities [28]. We show this in the Appendix.
These results have obvious applications to practical Bayesian quantum tomography [12], since the bound can be computed online-that is, it is only a property of the current distribution under consideration. But we also expect our bound to be of interest in other theoretical work on tomography, where a benchmark is needed to make statements about absolute average performance of some candidate protocol. Here, we provide detailed proofs of the theoretical results presented in the main text and provide some further insights in the mathematical structure of the tasks at hand. It is organized as follows: In Section A we provide a more detailed derivation of Theorem 3 by means of Lagrangian multipliers. For the sake of being self-contained and clarifying notational differences, we show how to deduce [6, Equation (2.9)] from the upper bound (14)-Corollary 1 in Section B. In Section C, we employ the Fuchs-van de Graaf inequalities to obtain an alternative upper bound on the total average fidelity. We subsequently show that such a bound is either triviali.e. it amounts to one-or strictly majorizes the bound presented in Theorem 3. Section D further discusses the geometric properties of the convex relaxation we have employed to arrive at Theorem 3-namely replacing positive-semidefiniteness (s i ≥ 0) by bounded Euclidean norm (∑ d i=1 s 2 i ). Finally, in Section E, we outline the methodology of the numerical experiments and provide some supplementary results for other distributions.

A. Detailed proof of Theorem 3
Letr 1 , . . . ,r d denote the eigenvectors ofρ = E ρ [ρ]. From now on we shall represent them as a vectorr = (r 1 , . . . ,r d ) T ∈ R d . Likewise we shall encompass the scalar optimization variables s i in the vector s ∈ R d . Furthermore, let 0 = (0, . . . , 0) T and 1 = (1, . . . , 1) T denote the "all-zeros" and "all-ones" vectors on R d , respectively. For x, y ∈ R d , we will also make use of the standard inner product x, y = ∑ d i=1 x i y i and the vectorial inequality x ≥ y shall indicate component-wise inequality, i.e. x i ≥ y i for all 1 ≤ i ≤ d.
For arriving at Theorem 3, we need to solve the optimization problem maximize f (s) = r, s + p ρ 1 − s, s , subject to g (s) = 1, s = 0.
analytically. We can furthermore assume that p ρ > 0 holds, since otherwise the problem at hand reduces to Theorem 1.
Note that (15) is a convex optimization problem, as it requires maximizing a concave function over a convex set. As such, it has a unique maximum. One way of finding this maximum is to apply standard techniques such as the Karush-Kuhn-Tucker (KKT) multiplier method [17] which are designed to take into account the inequality constraint (16).
However, here we opt for a less direct, but considerably more convenient and less cumbersome approach: We ignore the inequality constraint in (15) for now and employ the standard technique of Lagrangian multipliers (for equality constraints) in order to find the unique critical point s of the optimization. In a second step, we are going to verify that this vector strictly obeys the additional inequality constraint, we have ignored so far, i.e. s , s < 1. This in turn implies that said inequality constraint is not active at the critical point which in retrospect confirms that we were in fact right to ignore the constraint in the first place. Finally, the fact that we face a convex optimization problem assures that this unique critical point indeed yields the sought for global maximum of (15).
In order to find the critical point s in question we define the Lagrangian function where we have-as already announced-ignored the inequality constraint (16). As a consequence, λ ∈ R denotes the single Lagrangian multiplier associated with the remaining normalization constraint. The necessary condition for an optimal solution of (15) then readŝ Taking the inner product of this vector-identity with the "all-ones" vector 1 results in where we have used 1,r = ∑ n i=1r i = Tr (ρ) = 1 and the normalization constraint, which likewise assures 1, s = 1. This equation allows us to replace 1 − s, s by p ρ 1+dλ and reinserting this into (17) results in the equivalent vector equation This can be readily inverted to yield In order to determine the value of λ, we revisit (19) which demands where we have once more used 1,r = 1 as well as r,r = ∑ n i=1r 2 i = Tr ρ 2 . This results in the quadratic equation for λ whose two possible solutions correspond to Note that the argument of the square-root is nonnegative, because the purity Tr ρ 2 of any quantum state is lower-bounded by 1/d. Also, the second solution λ − is vacuous, since it leads to an immediate contradiction. Indeed, it follows by inspection that λ − < 1/d holds. Together with (19) this implies the contradictory relation because p ρ is positive by assumption.
Consequently we are left with one meaningful value λ + for the Lagrangian multiplier and inserting it into into (21) yields the unique critical solution Recall that throughout this proof we are exploiting a one-to-one correspondence between vectors s = (s 1 , . . . , s n ) T ∈ R d and hermitian d × d-matrices σ = ∑ n i=1 s i |b i b i | that commute withρ. Consequently, the critical vector s corresponds to the critical matrix presented in (13).
Plugging the critical point s into the objective function f (s) furthermore yields the corresponding critical function value = r,r + λ + 1,r where we have once more replaced 1 − s + , s + by p ρ (1+dλ + ) and combined that with the fact that (1 + holds.
With such a unique critical point s at hand, we are now ready to show that it strictly obeys the inequality constraint s , s we have ignored so far. By employing the same equalities we have used in the previous paragraph, we can readily establish such a claim: The strict inequality on the right follows from the fact that p ρ > 0 holds by assumption. This indeed establishes, that s is also a critical point of the optimization problem (15). Since this optimization corresponds to maximizing a concave function over a convex set, the unique critical point s must correspond to the unique maximum of (15). The corresponding objective function value was calculated in (33) and establishes Theorem 3.

B. Proof of Corollary 1
We start this section by pointing out that in the particular case of dimension d = 2, the two relaxations we have employed in the previous subsection are no relaxations at all. Indeed, for dimension two, fidelity and super-fidelity coincide, as does the standard simplex ∆ 1 and its outer approximation E ∆ d−1 . Consequently, in this particular low-dimensional case, we solve the actual problem of interest.
For deducing the claimed statement from this fact, we consider Equation (2.9) in [6]: Here χ simply means the the data generated via the measurement. The vector V χ is defined as follows: where r is related to the usual Bloch vector r = (x, y, z) via We point out that this F is not the same average fidelity we have considered but the following quantity (which corresponds to (4)): Note, however that, by employing Bayes' rule, this is equal to and thus maximizing the posterior average fidelity is equivalent to maximizing the total average fidelity. Our bound applies directly to the former but trivially extends to the latter. Thus, to prove Corollary 1, we need to extract the posterior average fidelity from the expressions above. First, using Bayes' rule, we calculate Using the fact that r 2 2 = 2Tr(ρ 2 ) − 1 and we find Plugging this back into (35), we have Thus, implied by the results of [6], the maximum posterior average fidelity (dropping the χ for parallelism) is This however is just our result (14) for dimension d = 2.

C. An alternative bound obtained via the Fuchs-van de Graaf inequalities
As already mentioned in the main text's conclusion, a more straightforward approach for obtaining an upper bound on the maximal average fidelity can be obtained by evoking the Fuchs-van de Graaf inequality [28]. These relate the fidelity (2) of two arbitrary quantum states ρ, σ ∈ S to their total variational distance: Combining the right hand side of this inequality chain with the standard norm equivalence ρ − σ 2 ≤ ρ − σ 1 yields the bound which holds for any two quantum states ρ, σ ∈ S. Moreover it allows for establishing the following alternative bound relatively straightforwardly: Theorem 4. For any finite dimension d and any distribution dρ, the fidelity achieved by any estimator σ ∈ S obeys Such a bound is similar to the one presented in Theorem 3, but much easier to establish. However, as the next statement shows, it is strictly weaker than the analytical bound presented in the main text.

Corollary 2.
Let the dimension d and the distribution dρ over states be arbitrary. Then, the bound (48) obtained via the Fuchs van-de Graaf inequality is either trivial-i.e. equal to one-or it strictly majorizes the one presented in Theorem 3.
Proof of Theorem 4. Since the relation (58) is valid for any two states ρ, σ it remains true, if we take the expectation over ρ on both sides: Also, we can optimize over σ on both sides to obtain The minimum on the right-hand side can in fact be calculated analytically. To this end we define the function Note that h(σ) is convex, because it corresponds to a weighted average of convex norm-functions σ − ρ 2 2 and its matrix-valued derivative corresponds to and reinserting this global minimum into (50) yields the desired statement.
One way to establish that E ∆ d−1 is furthermore itself an Euclidean ball, is using"generalized cylindrical coordinates" for the Euclidean unit ball B 1 (0): Such coordinates use the fact that B d (1, 0) is equivalent to the union of a family of (d − 1)-dimensional unit balls. More concretely: let z ∈ R d be an arbitrary unit vector and let ζ ∈ R denote a parameter. For each value of this parameter, we define the hyperplaneH z,ζ = s ∈ R d : z, s = ζ which in particular contains the vector ζz by construction. Furthermore, letB d−1 (z, ζ) ⊂ H z,ζ be the (d − 1)-dimensional Euclidean ball with radius 1 − ζ 2 and center ζz that is contained in the hy-perplaneH z,ζ . Clearly each element in such a union of sets is contained in the d-ball, and letting ζ range from −1 to 1 covers the entire d-ball. In order to see this, decompose any s ∈ B d 1 (0) as s = s, z z + z ⊥ such that z ⊥ , z = 0 and set ζ = s, z . Pythagoras' theorem then assures ζ ⊥ 2 ≤ 1 − ζ 2 and consequently s ∈B d−1 (ζ, ζ).
The structure of the particular problem at hand suggests to fix z = 1 and center 1 d 1 that (like the standard simplex) is contained in the hyperplane H 1,1 . A quick calculation reveals that all vertices of the standard simplex ∆ d−1which are just the standard basis vectors e 1 , . . . , e dhave Euclidean distance d−1 d to the ball's center. Consequently they are contained in the boundary of the ball E ∆ d−1 and we have found sufficiently many contact points for applying Theorem 5. Since volume is translationally invariant we can furthermore shift the coordinate's origin into the point 1 d 1 (which is the center of the ball E ∆ d−1 ). This has the advantage that the affine space H 1,1 containing both ∆ d−1 and E ∆ d−1 turns into H 1,0 which is a linear subspace. Note that with respect to the (translated) standard basis, the orthogonal projection onto this subspace is given by With respect to this new coordinate system, the d contact points (vertices of the simplex) amount toẽ i = e i − 1 d 1. Choosing unit weights λ i = 1 for all m = d contact points u i =ẽ i and calculating reveals that the first condition for Theorem 5 is fulfilled. A similar calculation reveals This, however equals just the projector P onto the subspace H 1,0 which contains the entire (d − 1)dimensional problem of interest. Restricted to its range, a projector corresponds to the identity which establishes the second condition for Theorem 5. Since this statement is invariant under re-scaling, we can also apply it here, where the radius of the (d − 1)-dimensional surrounding Euclidean ball is not one but d−1 d .

E. Numerical recipes
As noted in the main text, we use the Sequential Monte Carlo (SMC) algorithm to perform the Bayesian updating and averaging. A complete and detailed discussion of the algorithm appears in Ref. [21] and thus