Towards optimal experimental tests on the reality of the quantum state

The Barrett-Cavalcanti-Lal-Maroney (BCLM) argument stands as the most effective means of demonstrating the reality of the quantum state. Its advantages include being derived from very few assumptions, and a robustness to experimental error. Finding the best way to implement the argument experimentally is an open problem, however, and involves cleverly choosing sets of states and measurements. I show that techniques from convex optimisation theory can be leveraged to numerically search for these sets, which then form a recipe for experiments that allow for the strongest statements about the ontology of the wavefunction to be made. The optimisation approach presented is versatile, efficient and can take account of the finite errors present in any real experiment. I find significantly improved low-cardinality sets which are guaranteed partially optimal for a BCLM test in low Hilbert space dimension. I further show that mixed states can be more optimal than pure states.


Introduction
The ontological status of the quantum state has long been a central question in foundational physics. If it is ontic, and therefore a true part of physical reality, then the many counterintuitive features of quantum theory (QT) remain opaque. The idea that it is epistemic -that is, reducible in essence to a state of knowledge -is an attractive proposition that promises to dissolve some of these issues [1]. For instance, the instantaneous and discontinuous 'collapse' of the wavefunction is arguably much more naturally thought of as a Bayesian update of knowledge than as a process governed by dynamical physical laws. There are a plethora of other canonically 'quantum' phenomena that have an appealing explanation when adopting an epistemic interpretation -see for example Ref. [2]. In recent years, there has been a flurry of progress towards understanding the feasibility of the so called 'ψ-epistemic' programme [3], culminating in experimental tests [4]. Here I improve on the design of such experiments, so that tighter restrictions on possible ψ-epistemic theories may be determined in the laboratory.
In the ontological-models framework, where these notions are made precise, the preparation of a quantum state |φ is associated with the (generally random) selection of an ontic state λ ∈ Λ, with the appropriately normalised distribution of probability for the various states being written µ φ (λ). Λ is the space of ontic states, and I denote the subset Λ φ as the support of µ φ -members of which are said to be 'compatible' with the preparation. A projective measurement operator |ψ ψ|, on the other hand, is associated with a conditional probability ξ(ψ|λ) known as a response function. In the full framework, there are also stochastic maps on Λ which represent, e.g. unitary transformations, but it will not be necessary to consider these here.
In order to constitute a model of quantum theory, the following condition must be met The ontic state λ, therefore, stands for a variable sufficient to screen off the preparation µ from the measurement ξ. As long as (1) is satisfied, tracing over the ontic states leaves the 'correct' conditional probabilitiesà la the Born rule. Many such ontological models are possible, and they may be classified according to certain properties of the µ and ξ. Although couched in the older terminology of 'hidden-variables', one of the most important such classifications is Bell's definition of locality [5], from which he derived an experimentally testable inequality able to separate certain predictions of quantum theory from all possible local ontological-models [6]. Bell was therefore perhaps the first to show that a no-go theorem can provide positive insight into what an ontological model can be. Importantly, Bell's inequality was made robust to experimental error [7], and has now been subject to very strict tests [8,9,10]. Since QT could well be false, theoretical proofs which rely on quantum predictions do not establish (for example) the non-locality of nature itself. Experimental tests are therefore highly important: since they can indeed establish such features in a theory independent manner. More recent work divides the ontological models into those where the wavefunction is ontic → Λ ψ ∩ Λ φ = ∅ ∀ψ = φ, from those which are not-ontic, or epistemic ∃ψ = φ : Λ ψ ∩ Λ φ = ∅. Pusey, Barrett and Rudolph (PBR) proved a theorem that theoretically ruled out the subset of epistemic models where the preparation of two independent systems can be assumed to be represented by the product µ 1 × µ 2 [11]. There are two reasons why the epistemic view cannot be immediately dispensed with, however. First, even in the idealized case, one may always instead drop the assumption of preparation independence (PI). Indeed, constructive examples of ψ-epistemic theories exist [12,13,14] which must therefore dodge PBR's no-go theorem in this way. Second, the finite precision of any real experiment, including a recent one performed in an ion trap [15], mean that only a strict subclass of epistemic models subscribing to PI are ever experimentally falsified -the 'wriggle room' offered by laboratory imperfections makes it possible to retain both PI and an epistemic view of nature. Subsequent theoretical studies have argued for the reality of the quantum state by imposing further assumptions on the set of ontological models [16,17,18] -for a comprehensive review, see Ref. [3].
Maroney introduced a classification of ontological models which generalises the ontic/epistemic dichotomy [19], so that one can begin to constrain the extent to which a model may be epistemic. Crucially, Maroney's theorem does not rely on PI, and thus its implications cannot be escaped by discarding it. Maroney's idea, later made noise tolerant by Barrett et al (BCLM) [20], springs from a very particular motivation for ψ-epistemicism: namely, the impossibility of discriminating non-orthogonal quantum states. In quantum theory, this feature depends on the quantity (2) a measure of the overlap of the quantum states in Hilbert space. It is related to the maximum probability with which the two states can be distinguished in a single shot experiment [21]. An analogous quantity applies to the underlying ontological model: it measures the extent to which two preparations 'overlap' in the ontic statespace. A partially-ψ-epistemic model is defined by the relation: The proportionality constant k(ψ, φ) will be the main subject of our study. If k(ψ, φ) = 0, there is no overlap of distributions in the ontic statespace, irrespective of the 'closeness' of the quantum states they represent: the model is ψ-ontic. On the other hand, since it is required that ω C (µ ψ (λ), µ φ (λ)) ≤ ω Q (|ψ , |φ ) [15], a model with k(ψ, φ) = 1, is said to be 'maximally ψ-epistemic'. The remainder of this paper is structured as follows: in Section 2, I provide an operational interpretation for intermediate values of k(ψ, φ); in Sections 3 and 4 I recapitulate the BCLM argument and describe the search for optimal experimental implementations as a nonlinear optimization problem. I present an algorithm for this purpose in Section 4.1, along with several numerical results in Section 5. There I show how the algorithm can tailor the design of an experiment by accepting the typical error rate as input (in Section 5.1). Conclusions are drawn at the end of the paper, and the appendices contain details of a) an algorithmic subroutine, b) a mixed-state result that performs more optimally than any known set of pure states of the same size, and c) algorithm runtime statistics.

Degrees of epistemicness
The task of discriminating |φ from |ψ in a single shot has a minimum probability of making an error (i.e. guessing the wrong state) given by P error Q = ω Q 2 ; relating to the case where one makes use of the best measurement available under the constraints set by quantum theory [22] ‡. A hypothetical omniscient measurement is described by ‡ When referring to minimum error distinction probability, we assume throughout that there is no prior information available to the state discriminator. a continuous set of Dirac delta functions ξ(λ |λ) = δ(λ − λ): it strongly violates the aforementioned constraints and reveals complete and perfect information about λ. Such a measurement, which is the 'best' that is logically possible, has an error probability k(ψ, φ) therefore represents the improved state-discrimination error-probability of an omniscient observer, compared with the observer whom is constrained by quantum mechanics. In situations where the former would be superior, this necessitates a certain property of the response functions in the ontological model termed 'deficiency' by Harrigan and Rudolph [23]. Deficiency means a measurement of |ψ ψ| will respond to ontic states λ which are not compatible with µ ψ . If k(ψ, φ) = 0, an ontological model of quantum theory must be maximally deficient -that is to say, the inequality will be saturated. A fuller discussion of deficiency is given in [24].

Bounds on k(ψ, φ)
BCLM showed how k(ψ, φ) can be bounded without relying on PI [20]. In Hilbert space dimension d, consider a reference state |c along with a set of n states S = {|ψ i } n i=1 . Consider further a set of projective operators M = {|ijk ijk|; i = 1 . . . n, j = 1 . . . i − 1, k = 1, 2, 3}, such that associated to each pair of states {|ψ i , |ψ j } in S, there is a triple of orthogonal projectors {|ij1 ij1|, |ij2 ij2|, |ij3 ij3|} in M that define a three-outcome measurement. We will find it useful to define which is a measure of the antidistinguishability of the triple of states |ψ i , |ψ j , |c . This concept is at the heart of many ψ-ontology theorems: when A ij = 0 the triple is said to be perfectly antidistinguishable, and then the measurement {|ijk ijk|} k=1,2,3 will conclusively exclude (in a single shot) the possibility that one of the triple of states was prepared [25]. Caves, Fuchs and Schack have derived the necessary and sufficient conditions for perfect antidistinguishability (which they call PP-incompatibility) of a triple of states, which depend only on the three inner products between pairs in the triple [26].
Next, define k 0 := min j k(c, ψ j ). Then, if QT predictions are correct, the relation follows from (4) and from the Bonferonni inequality, as BCLM show [20]. The remainder of this paper is concerned with finding small S and M such that if QT is approximately correct, they would lead to the lowest upper bound on k 0 . An upper bound k 0 ≤ x implies that there exists at least one state in S which has a degree of epistemicness with respect to c no greater than x. One may optionally make additional assumptions to extrapolate this to a stronger claim: for example, Lipschitz continuity [3] would require that k(ψ, φ) ≤ x ∀ψ, φ. In the course of our search, we will limit ourselves to finite values for d the Hilbert space dimension and n the number of states, considered here as experimental resources to be spent frugally.

Existing families of states
Before presenting my results below, it is prudent to survey currently-available solutions the problem which will serve as benchmarks for our algorithmic approach. BCLM supplied for every d ≥ 4 and also power prime, a set of n = d 2 states, satisfying A ij = 0 ∀i, j which leads to the bound k 0 < 2/d [20]. When d ≥ 4 is not power prime, their results lead to the bound k 0 < 4/(d − 1). Relaxing the need for exact antidistinguishability enabled them to find k 0 ≤ 0.95 for d = 3, n = 9. Next, Leifer showed an exponential decay in d (for d ≥ 3) by using a set of n = 2 d−1 Hadamard states [27]: this achieves the bound , which displays a decay in n for any d ≥ 4, as well as a number of constructive solutions [28]. Where these previous results provide bounds on k 0 for low n and d, the approach I present below was able to match or better the bound. The best bound achieved experimentally at the time of writing is k 0 69% [4]. Achieving k 0 ≤ 50% would seem like the next major milestone, where the classical overlap is doing less than half of the necessary work in explaining the indistinguishability of non-orthogonal quantum states.

Optimization
We wish to solve Here all optimisation variables are considered as unnormalised vectors in C d , with the necessary normalisation and orthogonality conditions shown explicitly above as constraints. An analytic, global solution to this problem seems intractable: but how difficult is the numerical optimization problem at hand? A naive answer is found by counting the number of real parameters necessary to describe a solution to the problem. One can do better than simply taking the real and imaginary components of each vector, since some of the constraints imply these parameters are not independent. For example, the hyper-spherical parameterisation of pure states requires only d − 1 polar angles and d − 1 phase angles [29] (making 2d − 2 in all), with normalisation constraints automatically satisfied. A projective measurement is defined by a d × d unitary matrix: in close analogy to the argument above concerning states, a unitary matrix may be parameterised by d 2 − 1 angles [29], and describes an orthonormal measurement basis by construction. Thus, even with a smart parameterisation one must solve a (2d − 1)n + (d 2 − 1)(n 2 − n)/2 dimensional optimization problem. Various heuristics can inform our search. For example, to make each |ψ i as close as possible to |c , whilst at the same time making each state far from all of the others so that they can be (approximately) anti-distinguished. The objective function is nonlinear, nonconvex and is pocked with local optima: gradient descent will therefore most likely get 'stuck' in a feasible region with sub-par performance. Brute force methods would be intractable: note that, on a coarse grid dividing each angle into only g discrete values, one must evaluate the objective function g 36 times in the problem instance of d = n = 3. Global approaches include simulated annealing, particle swarm and associated techniques. But a more powerful approach can be implemented by capitalising on a specific structure of the problem. The following generalisations of the quantum overlap and the Born rule

Exploiting convexity
share an important property: they conserve their meanings as twice the minimum error discrimination probability [21], and prepare-measure probability (respectively).||• || * is the nuclear norm (or sum of singular values). Derivative notions such as antidistinguishability A ij therefore also inherit their meaning correspondingly. With the generalisations in place, we can therefore write the optimization problem as λ λ (a) (b) Figure 1. In the ontological models framework, preparations may be thought of as 'black boxes' that simply produce a λ according to the distribution µ ψ (λ). (a) A mixed state ρ = i p i |ψ i ψ i | can be thought of as the net preparation when several pure preparations are wired together with a probabilistic switch. The resulting probability density over ontic states is the convex combination of the component distributions: In much the same way, each component preparation labelled by a pure state ψ i can be thought of itself as a compound preparation, where a probabilistic switch selects from a number of deterministic preparations of λ, so that the net preparation is described by µ ψ (λ). For the purposes of illustration, here Λ is represented as a discrete space; although it is often (and in eq (1)) thought of as a continuous space.
Here M 0 denotes that M is a positive semidefinite matrix. ijk are small error parameters to be discussed shortly -the reader may temporarily assume these to be zero. Recall that a function f (x) is convex iff f (λx 1 +(1−λ)x 2 ) ≤ λf (x 1 )+(1−λ)f (x 2 ), and that a set B is convex iff x 1 , x 2 ∈ B → λx 1 + (1 − λ)x 2 ∈ B for λ ∈ [0, 1] . Call a problem convex if it involves minimising a convex function over a convex set. Such problems exhibit many convenient features: if a local optimum exists, it is also a global optimum. Efficient algorithms exist to solve convex problems. Although (11) is not a convex problem, notice that the objective function is linear (hence convex) in the set of measurements when the states are fixed. Furthemore, when the measurements are fixed, the objective is convex-concave fractional in the states: then the global optimum can be found by solving a series of parametric (convex) subproblems [30]. Note further that D, P are convex sets. Thus, we proceed in a manner inspired by 'biconvex' problems [31] (which have a very similar structure). The approach put forward here is to begin with a feasible point in D, and then proceed to alternately optimise over P and then D again, and so on (keeping the other set fixed) until the optimal values of the two subproblems converge. This is known as alternate convex search (ACS) [31], and whilst it does not guarantee global optimality, it does tend to provide good results which are guaranteed partially optimal: that is to say, no change in D or in P alone could provide a better solution. In order to solve the fractional subproblem of the form min a/b (when fixing the measurements and searching for states), one may use Dinkelbach's technique [32], which involves solving a series of parameterised problems min a − θ k b: see Appendix A.

Numerical results
There exist many software packages for solving the convex problems that arise during my algorithm. Because the algorithm will not always settle on the same solution if seeded with different starting points, I ran the algorithm several times, using the cvx package [33] for each subproblem: the best numerical results for ijk = 0, d = 3, . . . , 8 and n = 3, . . . , 20 are shown in Figure 2a. The algorithm was seeded with random pure states. Corresponding matlab files with the states and measurements are available in the supplementary material, as is the code needed to find partially optimal solutions for any n, d. The results show that an experiment showing k 0 ≤ 50% is possible with current technology such as linear optics [4] or ion traps [15]. The performance of the algorithm is discussed in Appendix C.
Some comments about the choice of reference state are in order. Because our objective function is invariant under a simultaneous unitary transformation of all states and measurements, the reference state may be chosen arbitrarily up to the choice of eigenvalues, which determine (for example) the purity trace(ρ 2 c ). Because of the linearity of the subsearch over P, the algorithm will return projective (i.e. extremal) POVMs: our generalisation to positive matrices can thus always be thought of as a purely mathematical trick, since the measurements will always correspond to a projective measurement (and it is very important that the measurement has only 3 outcomes [34]). Importantly, no such property guarantees that the search over D will return pure states. The results so far correspond to a pure reference state, which caused the solution states ρ i to also all be pure. However, one can in fact include the reference state into the search space D → {ρ c , {ρ i } i }, thereby leaving the purity as a free parameter to be optimised over. Whilst in principle this could only lead to better results, in practice it can mean the algorithm is more likely to get stuck in local optima. However, in the appendix I present mixed states and measurements which lead to a n = d = 3 bound of k 0 ≤ 1.2018/1.5003 ≈ 0.8011, a marked improvement over known bounds with only pure states.

Noise
Inspecting Figure 2a, it seems that increasing n is an easy way to improve an experiment -but as long as the errors ijk = 0 (which captures the realistic situation where the QT predictions are not precisely reflected in the experiment) they will accumulate and spoil the trend. Let¯ = 2 i>j [ ij1 + ij2 + ij3 ]/(3n 2 − 3n) be the average error. Note that the objective function in (11) is monotonically increasing in¯ . Note also that my algorithm can adjust the tradeoff between the numerator and denominator to achieve states which are in general less optimal for the noise-free case, but more robust to error than the noise-free optima. For numerical results relating to¯ = 0.15%, see Figure 2b.
Let a bound on k 0 be given by the ratio A/B, where A is the (noiseless) numerator of (11), and B the denominator. Then, the maximum tolerable error is¯ MAX = 2(B − A)/(3n 2 − 3n). Surprisingly, the mixed state bound I found by including the reference state in the search is extremely robust to noise, tolerating¯ MAX ≈ 0.033, whilst the previous best n = d = 3 (pure state) solution [28] could tolerate¯ MAX = 0.0006. This constitutes, a ∼ 50-fold improvement in robustness for the same experimental resources, contradicting the widely held feeling that extremely high precision experiments are necessary to show the reality of the quantum state.

Lower bounds
A loose lower bound on the global optimum of (11) (i.e. the best possible BCLM experiment) can be obtained through convex relaxation. Replace the bilinear terms in the numerator with zero: then the problem is convex and actually solvable analytically by putting ρ i = ρ c ∀i. The BCLM argument therefore cannot hope to find a bound on k any lower than (1 + 1 2 n(n − 1)3¯ )/n. Tighter convex relaxations would be very useful if found.

Conclusions and directions
The technique for finding BCLM experiments presented here is very flexible. Besides the results showcased above -the improved zero-error bounds and improved finiteerror bounds for low n and d -there are additional applications of the method. An experimentalist, armed with an estimate of the typical precision available in her laboratory setup, can use my algorithm with this quantity as input -and therefore extract the optimum n and appropriate sets D, P.
It is at first quite surprising that non-extremal states can be a better choice in arguing for the reality of the quantum state. It is counter intuitive because i) for many quantum information processing applications, mixed states will perform strictly worse than pure states, and ii) since mixing states together introduces 'artificial' ignorance into the problem one would then expect the states to become more epistemic. But since we are interested in the ratio of quantum to classical overlaps, we are not interested in the absolute measure of epistemicness, but rather the upper limit on how close the epistemicness of the ontological model can come to explaining the overlap at the quantum, or operational level. Intuitively, increasing the mixedness of states can make them less distinguishable: increasing their overlap ω Q and therefore decreasing our objective function. Fundamentally, a mixed state ρ in quantum mechanics will still have to correspond to some distribution µ ρ in an ontological model (see Figure 1). Such preparations also exhibit the property that (except in special cases) one cannot distinguish them with a single shot measurement. The epistemic interpretation is therefore just as compelling for mixed states as for pure states. One difference is that even within quantum mechanics there is an 'ignorance' interpretation of a mixed state as a (non-unique) convex combination of pure states. But since our goal is to show that there is an ever widening explanatory gap that the epistemic interpretation fails to breach, this is by no means a get out clause to the argument. Needless to say that the often-cited belief that 'there are no pure states in the laboratory' should cause ρ-epistemicism to be taken just as seriously, nay, more seriously than ψ-epistemicism.
Each of the mathematical elements of an ontological model (ie. the preparations, transformations and measurements) should in-principle carry labels that allow for contextuality [35,36]. This is because multiple, physically distinct preparations (transformations, or measurements) may be identified in QT, but one is generally not warranted in identifying their representation in the ontological model. Doing so amounts to making an additional (and spurious [35,36]) assumption of non-contextuality. It is therefore paramount in any experimental test relating to ontological models (such as those proposed by PBR and BCLM) that the very same physical procedure be used at any point where a definite preparation or measurement is repeatedly called for. The temptation to use distinct procedures which would be equivalent in QT is perhaps greater for objects with multiple convex decompositions, such as mixed states: but it should be resisted all the same, because it would introduce a contextuality loophole.
Further, it is simple to apply additional constraints which will not spoil the properties of (11). For example; the expression ||ρ c − ρ j || * can be upper bounded, or trace(ρ c ρ j ) can be bounded from above (and/or below if choosing ρ c to be fixed, and not part of D). These constraints might help provide further theoretical insight into the nature of ontological models for QT. Similar constraints might help experimentalists search for those certain states and measurements that are easy to prepare with high fidelity. It is also possible to seed the algorithm a more structured and informed initial feasible point from which to search from -rather than the random feasible points I chose. My algorithm will return a solution at least as good as its input, and can therefore be used to 'polish' any solution found by other means.
It is possible in future that deterministic global optimization techniques [37] can be applied to this problem, and provide a certificate of global optimality (rather than just partial optimality) for D and P. Such a certificate would be a akin to a "Tsirelson's bound" [38] -and would provide some much-desired certainty in the search for optimal BCLM experiments.
(iii) If a(x k+1 ) − θ k b(x k+1 ) = 0, stop and x k is optimal. Otherwise k = k + 1 and go to previous step.
The subproblem of my alternating search approach that involves fixing the measurements and searching for states is of the form (A.1): it may therefore be solved with Dinkelbach's method.

Appendix B. Mixed states solution
With pure states in the n = d = 3 case, my algorithm never returned a bound better than k o ≤ 0.9964 . . . after many trials. One therefore has the suspicion that this is in fact the global optimum of the problem when the reference state is set to a pure state. Inserting the reference state into the search space removes unnecessary constraints, for example that it must have a certain purity. The algorithm found:  When the algorithm is seeded randomly, it takes a variable amount of time to finish, and settles on distinct (generally non-globally optimal) solutions. The inset of (b) shows the same data with the black line tracing the mean value, which is linear for d ≥ 4.

Appendix C. Performance of the algorithm
To gain an idea of the runtime of the algorithm, and the topology of the optimisation landscape, observe Figure C1. For the purposes of making this figure, the algorithm was run in d = 3, a total of 1000 times for each value of n = 3, 4, 5, 6, 7 on a 3.6 GHz Intel Core i7-4790 running openSUSE linux, Matlab R2016a and cvx. The results suggest that the mean runtime scales linearly with n, although generally the algorithm will find one of many distinct local optima depending on the initial random seed.