This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy. Close this notification
Brought to you by:
Paper The following article is Open access

Monte Carlo sampling from the quantum state space. II

, , , and

Published 13 April 2015 © 2015 IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
, , Citation Yi-Lin Seah et al 2015 New J. Phys. 17 043018 DOI 10.1088/1367-2630/17/4/043018

1367-2630/17/4/043018

Abstract

High-quality random samples of quantum states are needed for a variety of tasks in quantum information and quantum computation. Searching the high-dimensional quantum state space for a global maximum of an objective function with many local maxima or evaluating an integral over a region in the quantum state space are but two exemplary applications of many. These tasks can only be performed reliably and efficiently with Monte Carlo methods, which involve good samplings of the parameter space in accordance with the relevant target distribution. We show how the Markov-chain Monte Carlo method known as Hamiltonian Monte Carlo, or hybrid Monte Carlo, can be adapted to this context. It is applicable when an efficient parameterization of the state space is available. The resulting random walk is entirely inside the physical parameter space, and the Hamiltonian dynamics enable us to take big steps, thereby avoiding strong correlations between successive sample points while enjoying a high acceptance rate. We use examples of single and double qubit measurements for illustration.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Our companion paper [1] states the motivation for this work and introduces the terminology and notational conventions we are using; when referring to an equation or figure in that paper, the respective number is preceded by 'I-'. While the sampling methods presented there—rejection sampling, importance sampling, and Markov-chain sampling—are easy to implement, they require a costly (in CPU time) check whether the candidate probabilities obey all constraints imposed by the positivity of the statistical operator. By contrast, in the Hamiltonian Monte Carlo (HMC) method [2, 3] that we discuss here, the constraints are always obeyed by construction.

Like the Markov-chain Monte Carlo (MCMC) method discussed in I, HMC involves a 'walk' around the parameter space. While one often needs a walk comprising many small steps for MCMC to attain a good sampling yield, HMC can be done with large steps. As a consequence, the problem of a slow exploration of the probability space, in conjunction with strong correlations between successive sample points, which is a typical feature of other random-walk strategies, is not present in HMC.

Further, HMC enables us to sample in accordance with any (reasonable) prior density and any posterior density. Some other sampling methods are very efficient for particular priors (see, e.g., [4, 5]), but lack this much-needed flexibility. The downside of HMC, however, is the requirement for a suitable parameterization for its implementation.

2. Hamiltonian Monte Carlo

2.1. The idea

The HMC method makes use of pseudo-Hamiltonian dynamics in a mock phase space and can be applied to most problems with continuous state spaces by introducing fictitious momentum variables. One way of conveying the basic idea of HMC is the following.

We begin with a trial density $f(\theta )$, supplement the position variables $\theta =({{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{S}})$ with momentum variables $\vartheta =({{\vartheta }_{1}},{{\vartheta }_{2}},...,{{\vartheta }_{S}})$, and dress up the position density $f(\theta )$ with a Gaussian momentum density to compose the initial phase-space density

Equation (1)

at time t = 0. We obtain the final phase-space density at time t = T by propagating F0 to FT with the aid of Liouville's equation

Equation (2)

where the differential operator

Equation (3)

involves the velocity components $\frac{\partial }{\partial {{\vartheta }_{s}}}H={{\vartheta }_{s}}$ and the force components ${{u}_{s}}(\theta )=-\frac{\partial }{\partial {{\theta }_{s}}}H=w{{(\theta )}^{-1}}\frac{\partial }{\partial {{\theta }_{s}}}w(\theta )$ associated with the Hamiltonian

Equation (4)

At the final time T, we thus have

Equation (5)

The updated position density $\tilde{f}(\theta )$ now results from integrating over the momenta, so that

Equation (6)

is the net map for $f(\theta )$. Exceptional situations aside (usually they arise from an unfortunate choice for the duration T), $f(\theta )$ is a fixed point of this map only if ${{F}_{T}}={{F}_{0}}$ is a function of the Hamiltonian, which is the case if

Equation (7)

where the dotted equal sign stands for 'equal up to a constant numerical factor'. Taking for granted without proof that the map (6) is a contraction, repeated applications of the map will thus turn $f(\theta )$ into $w(\theta )$, the desired prior or posterior density.

2.2. The implementation

While this conveys the idea of HMC, its actual implementation is, however, not in terms of a trial sample that is iteratively updated by the mapping (6), but by a Markovian random walk. At each step of the walk, the state follows a trajectory $(\theta (t),\vartheta (t))$ from the current position $\theta =\theta (t=0)$ to the proposal ${{\theta }^{*}}=\theta (t=T)$, where the components of the initial momentum $\vartheta (t=0)$ are chosen at random from the Gaussian distribution $\propto {{{\rm e}}^{-\frac{1}{2}{{\sum }_{s}}\vartheta _{s}^{2}}}$, as required by the initial phase-space density in (1). As long as the map (6) is ergodic, the time averaged distribution of θ will converge towards the stationary distribution, with density $w(\theta )$ as demonstrated in section 2.1.

Here, then, is the HMC algorithm [3]:

HMC1 Start at $j=1$ with an arbitrary initial point ${{\theta }^{(1)}}$.

HMC2 Generate ${{\vartheta }^{(j)}}$ from a multivariate normal distribution with unit variance.

HMC3 Solve the Hamilton equations of motion

Equation (8)

for the initial conditions $(\theta ,\vartheta ){{|}_{t=0}}=\left( {{\theta }^{(j)}},{{\vartheta }^{(j)}} \right)$ and obtain $({{\theta }^{*}},{{\vartheta }^{*}})=(\theta ,-\vartheta ){{|}_{t=T}}$.

HMC4 Calculate the acceptance ratio

Equation (9)

HMC5 Draw a random number b uniformly from the range $0\lt b\lt 1$. Set ${{\theta }^{(j+1)}}={{\theta }^{*}}$ if $a\gt b$; otherwise, set ${{\theta }^{(j+1)}}={{\theta }^{(j)}}$.

HMC6 Update $j\to j+1$. Escape the loop when $j=M$, the target number of sample points; otherwise, return to HMC2.

A few remarks are in order. First, the value of the Hamiltonian is constant along the exact trajectory $(\theta (t),\vartheta (t))$ of HMC3, which would give $a=1$ in HMC4. In practice, however, we rely on an approximate trajectory; accepting Neal's advice [3], we calculate it with the leapfrog method described in appendix, and the difference between the initial and final values of $H(\theta ,\vartheta )$ in (9) is nonzero.

Second, HMC is an implementation of the Metropolis–Hastings Monte Carlo (MHMC) algorithm [6] that, regardless of the step size used, achieves a high acceptance rate with much weaker correlations between successive points. In MHMC, the proposal $\theta *$ is accepted with probability (see (I-D.1)),

Equation (10)

where $w(\theta )$ is the target probability density, and $J({{\theta }^{*}}|\theta )$ is the probability of proposing point ${{\theta }^{*}}$ given the current point θ. The comparison with (9) establishes that

Equation (11)

in HMC where ϑ is the randomly chosen initial momentum and ${{\vartheta }^{*}}$ is the negative of the resulting final momentum; the otherwise irrelevant minus sign in ${{\vartheta }^{*}}=-\vartheta (t=T)$ in HMC3 ensures the symmetry of the process, thereby exploiting the time-reversal invariance of the equations of motion (8). In effect, then, HMC achieves an acceptance rate close to 1 by a proposal scheme where $\frac{J(\theta |{{\theta }^{*}})}{J({{\theta }^{*}}|\theta )}$ is close to $\frac{w(\theta )}{w({{\theta }^{*}})}$.

Third, rather than the 'kinetic energy' $\frac{1}{2}{{\sum }_{s}}\vartheta _{s}^{2}$ of (4), we could use a general quadratic form $\frac{1}{2}{{\sum }_{s,s^{\prime} }}{{\vartheta }_{s}}{{\mu }_{s,s^{\prime} }}{{\vartheta }_{{{s}^{{\prime} }}}}$ where the coefficients ${{\mu }_{s,s^{\prime} }}$ are the entries of a symmetric positive-definite S × S matrix; HMC2 and the velocities in HMC3 are then changed accordingly. Whether or not this freedom in choosing matrix μ offers an advantage, depends on the structure of the 'potential energy' $-{\rm log} w(\theta )$. One would usually carry over any symmetries in the potential energy to the kinetic energy; for instance, if $w(\theta )$ is invariant under the interchange ${{\theta }_{1}}\leftrightarrow {{\theta }_{2}}$, the kinetic energy should treat ${{\theta }_{1}}$ and ${{\theta }_{2}}$ on equal footing. The kinetic energy of (4) is used for all the examples in section 4.

3. State parameterization

In [1], all sampling is done in the probability space. Unless the circumstances are so simple that we can state explicitly all constraints obeyed by the probabilities and do not need to execute a physicality check of the kind discussed in appendix A of [1], it is not feasible to implement the HMC random walk for variables θ that are (a non-redundant subset of the) probabilities. Rather, we parameterize the statistical operator ρ, and the θ dependence of the probabilities then follows from the Born rule.

For a d-level quantum system, the statistical operator ρ is represented by a hermitian unit-trace $d\times d$ matrix that has $S={{d}^{2}}-1$ real parameters. The matrix of the arbitrary operator A in $\rho ={{A}^{\dagger }}A/{\rm tr}\{{{A}^{\dagger }}A\}$, however, has 2d2 real parameters, of which ${{d}^{2}}+1$ are superfluous. We get rid of them by restricting the A matrix to upper-triangular (or lower-triangular, as in [7]) form with real diagonal elements; this reduces the count of real parameters to d2. One more parameter is removed by enforcing that

Equation (12)

holds, which requires that the moduli of the elements of A are points on a $\left( \frac{1}{2}d(d+1)-1 \right)$-sphere. We parameterize this sphere with spherical coordinates—angle parameters ${{\theta }_{1}}$, ${{\theta }_{2}}$, ..., $\theta _{\frac{1}{2}(d+2)(d-1)}^{\;}$—with the Cartesian coordinates ${{C}_{1}},{{C}_{2}},...,{{C}_{\frac{1}{2}(d+2)(d-1)}},{{S}_{\frac{1}{2}(d+2)(d-1)}}$ recursively defined by

Equation (13)

We fill up the upper-triangle of matrix A with the Cartesian coordinates, which are $\frac{1}{2}(d+2)(d-1)+1=\frac{1}{2}d(d+1)$ in number, and supplement the off-diagonal entries with phase factors

Equation (14)

The cases $d=2$, $d=3$, and $d=4$ illustrate the matter:

Equation (15)

Other ways of assigning the Cartesian coordinates to the upper-triangle entries and the phase factors to the off-diagonal entries are, of course, possible and give equally valid parameterizations. The assignment chosen is

Equation (16)

with

Equation (17)

Upon expressing the probabilities $p=({{p}_{1}},...,{{p}_{K}})$ in terms of the parameters $\theta =({{\theta }_{1}},...,{{\theta }_{S}})$ with $S={{d}^{2}}-1$, the step-function constraints in ${{w}_{{\rm cstr}}}(p)$ of (I-6) are taken care of. Then, integrating out the delta-function constraints gets rid of redundant probability variables. Finally, we need the Jacobian between the remaining pks and the ${{\theta }_{s}}$s to convert the prior or posterior density in p into the corresponding expression in θ

Equation (18)

The HMC algorithm can now be executed for ${{u}_{s}}(\theta )=\frac{\partial }{\partial {{\theta }_{s}}}{\rm log} w(\theta )$.

As stated, the parameterization of (12) and (16) with (13), (14), and (17) is applicable to POMs that are informationally complete, for which the whole state space is the reconstruction space. If the POM is not informationally complete, it may still be possible to use A of (16) with fixed values for some of the ${{\theta }_{s}}$s. An example is the trine measurement for a qubit, for which the $2\times 2$ matrix in (15) with ${{\theta }_{1}}=\frac{\pi }{4}$ does the job; see section 4.2. If no such restricted version of (16) serves the purpose, it may be possible to introduce additional parameters during the HMC sampling, thus produce a sample in a space of higher dimension, then marginalize the auxiliary parameters, and so arrive at a proper sample; see section 4.3 for an example.

4. Examples

The parameterization of first the statistical operator ρ and then the probabilities p in terms of the angles θ, while systematic, tends to be involved and does not lend itself to simplification when explicit expressions are needed. Therefore, we shall only present detailed examples for the $d=2$ case of a qubit (sections 4.1 and 4.2), while providing an example with incomplete tomographic data for the $d=4$ case of a qubit pair (section 4.3).

4.1. Qubit: informationally complete POMs

For the $d=2$ case of a qubit

Equation (19)

with the $2\times 2$ matrix of (15) referring to the basis in which the Pauli operators ${{\sigma }_{x}}$, ${{\sigma }_{y}}$, and ${{\sigma }_{z}}$ have their standard form, we have

Equation (20)

for the expectation values of the Pauli operators. As it should, the sum of their squares

Equation (21)

takes on all values between 0 and 1. For later use, we note that

Equation (22)

which exhibits the Jacobian factor that relates the $x,y,z$ parameterization of ρ in (19) to the θ parameterization of (12) with (16).

We consider two POMs, the four-outcome tetrahedron POM of minimal qubit tomography [8] and the six-outcome Pauli POM that measures in three mutually unbiased bases. For the tetrahedron POM, we have the probabilities

Equation (23)

The corresponding constraint factor ${{w}_{{\rm cstr}}}(p)$ of (I-6) contains ${{w}_{{\rm basic}}}(p)$ of (I-5) for $K=4$ and

Equation (24)

By integrating out p4 with the help of the delta function in ${{w}_{{\rm basic}}}(p)$, we reduce the volume element in the probability space of (I-7) to

Equation (25)

with ${{p}_{4}}=1-{{p}_{1}}-{{p}_{2}}-{{p}_{3}}$; the pk values selected by the first step function are non-negative, so that we can omit the $\eta ({{p}_{k}})$ factors.

Since the Jacobian between ${{p}_{1}},{{p}_{2}},{{p}_{3}}$ and $x,y,z$ does not depend on the coordinates, we have

Equation (26)

and

Equation (27)

follows, where the doubly dotted equal sign says 'essentially equal in the sense of ignoring constant factors and reducing the number of variables such that the remaining ones are independent'. The final step uses (21) and (22) to arrive at

Equation (28)

for the tetrahedron POM.

For the Pauli POM, we have the six probabilities

Equation (29)

for which

Equation (30)

and

Equation (31)

are the factors in ${{w}_{{\rm cstr}}}(p)$. The analog of (25) is

Equation (32)

and (26)–(28) hold here as well. That is, whether we use the the tetrahedron POM or the Pauli POM, for both the volume element for physical probabilities is that of (28).

Therefore, if we are sampling in accordance with the primitive prior ${{w}_{0}}(p)=1$, there is no difference between these two POMs. We just have

Equation (33)

in (4), (8), and (A.2) of the HMC algorithm. The sample of 50 000 points thus produced is reported in figure 1. This sample is a collection of $(x,y,z)$ values inside the Bloch sphere; it is converted into a p sample by (23) or (29), respectively.

Figure 1.

Figure 1. Sample of 50 000 points for the primitive prior of the single qubit tetrahedron or Pauli POM. Plot (a) shows the distribution generated by the HMC algorithm, while plot (b) is the theoretical distribution. The histograms display the counts of sample points projected onto the unit circle in the xy-plane, represented by the green circle. The projected prior element is proportional to ${\rm d}x\;{\rm d}z\;\sqrt{1-{{x}^{2}}-{{y}^{2}}}$ inside the unit circle and vanishes outside. Statistical noise aside, the generated sample is consistent with the theoretical one.

Standard image High-resolution image

Figure 2 shows trajectories of 50 sample points, with consecutive points connected by blue lines, generated using the HMC algorithm, as well as the xMHMC algorithm from I. For the HMC sample, we see that even this short trajectory samples the unit circle rather efficiently, with the points far apart from their predecessors and successors. Such a trajectory is rather different from that of the xMHMC algorithm, where consecutive points are often close together. Indeed, HMC overcomes the problem of the strong autocorrelation that figure I-4 shows. The high acceptance rate of about 95% in the run that produced figures 1 and 2(a) is another advantage of HMC over xMHMC, which had an acceptance rate of about 60% in the run that produced figure 2(b). In higher dimensional cases, the optimal acceptance rates should be about 65% and 25% for the HMC and xMHMC algorithms respectively. And to repeat, since all generated points are physical by construction, there is no need for the CPU-time consuming check of physicality that is necessary in the xMHMC algorithm.

Figure 2.

Figure 2. Sample trajectories for the HMC (a) and xMHMC (b) algorithms. From the run that produced the sample for the histogram in figure 1, we show in plot (a) 50 consecutive sample points connected by straight lines to guide the eye. For comparison, plot (b) depicts another 50 consecutive sample points obtained using the xMHMC algorithm. In both plots, the black circles indicate the boundary of the physical region.

Standard image High-resolution image

4.2. Qubit: informationally incomplete POMs

We consider two POMs that are not informationally complete, as they provide no information about z: the three-outcome trine measurement with the probabilities of (I-4) and the constraint factors

Equation (34)

and the four-outcome crosshair POM with the probabilities

Equation (35)

and the constraint factors

Equation (36)

There are many different reconstruction spaces that we can use; two choices that suggest themselves are the equatorial plane of the Bloch ball

Equation (37)

and the upper half of the Bloch sphere

Equation (38)

with ${{x}^{2}}+{{y}^{2}}\leqslant 1$ for both. The ${{\theta }_{1}}=\frac{\pi }{4}$ version of (20) realizes (37)

Equation (39)

and ${{\theta }_{2}}=0$ together with $-\frac{1}{4}\pi \leqslant {{\theta }_{1}}\leqslant \frac{1}{4}\pi $ fits to (38)

Equation (40)

For the primitive prior density, these give

Equation (41)

for both the trine POM and the crosshair POM.

Figure 3 shows the histogram of a sample of 50 000 points in accordance with a posterior density. This example is for the trine POM with the primitive prior and the parameterization of (39), and the data are $\{{{n}_{1}},{{n}_{2}},{{n}_{3}}\}=\{8,5,11\}$. Specifically, we have

Equation (42)

with

Equation (43)

and

Equation (44)

in (4), (8), and (A.2) of the HMC algorithm. This sample consists of (x, y) pairs in the unit circle and conversion into a sample of ps is done with (I-4).

Figure 3.

Figure 3. Random sample of 50 000 points for a posterior distribution generated by the HMC algorithm. The data are $\{{{n}_{1}},{{n}_{2}},{{n}_{3}}\}=\{8,5,11\}$ for the trine POM with the primitive prior.

Standard image High-resolution image

We note that the algorithm that samples in accordance with a posterior specified by the primitive prior and data $D=\{{{n}_{1}},{{n}_{2}},{{n}_{3}}\}$ can also sample according to the primitive prior itself, by simply putting $D=\{0,0,0\}$. Further, for $D=\{-\frac{1}{2},-\frac{1}{2},-\frac{1}{2}\}$ it samples in accordance with the Jeffreys prior, and running the algorithm for $D=\{{{n}_{1}}-\frac{1}{2},{{n}_{2}}-\frac{1}{2},{{n}_{3}}-\frac{1}{2}\}$ gives a posterior sample for the Jeffreys prior. Likewise, conjugate priors specified by mock data $\{{{\nu }_{1}},{{\nu }_{2}},{{\nu }_{3}}\}$ can be handled by the same algorithm. Analogous remarks apply to POMs with more outcomes.

4.3. Qubit pair: double-crosshair POM of BB84

In the BB84 scenario of quantum key distribution, the two parties use two four-outcome crosshair measurements and so have 16 outcomes for the composed POM. In a self-explaining notation that relies on two copies of (35), we denote the joint probabilities by pjk with $j,k=1,2,3,4$, eight of which are independent. One reconstruction space can be specified by 4 × 4 matrices of the form

Equation (45)

with ${{p}_{{\rm even}}}={{p}_{22}}-{{p}_{24}}-{{p}_{42}}+{{p}_{44}}$. This unit-trace real symmetric matrix has nine real parameters, eight determined by the probabilities of the double-crosshair POM. While they do not fix the value of the ninth parameter $q={\rm tr}\left\{ \rho \sigma _{z}^{(1)}\otimes \sigma _{z}^{(2)} \right\}$, they determine the range of permissible q values

Equation (46)

The p dependent bounds ${{q}_{{\rm min} /{\rm max} }}(p)$ can be found by requiring $\rho \geqslant 0$, as will be discussed shortly. By choosing a specific q value—perhaps the largest permissible value ${{q}_{{\rm max} }}(p)$, or the value closest to 0—we get a definite reconstruction space. This is, however, awkward if we want to sample by the HMC algorithm.

In order to find ${{q}_{{\rm min} /{\rm max} }}(p)$, we demand that all the eigenvalues of ρ must be non-negative. The determinant of ρ is a forth-order polynomial in q

Equation (47)

Being the product of the eigenvalues, the roots of the determinant are the q values for which one of the eigenvalues of $\rho (q)$ equals to zero. Hence, the range of physical q is bounded by roots of (47). The positive coefficient of the q4 term implies that the determinant is positive at regions of large $|q|$. Given that the roots are the q values where the determinant changes signs, the only other region where the determinant can be positive is between the second and third roots. From (45), it is clear that for large $|q|$, ρ will have two positive and two negative eigenvalues. Hence, the only permissible region is the region bounded by the second and third roots of (47). Finding these roots therefore gives us ${{q}_{{\rm min} }}$ and ${{q}_{{\rm max} }}$.

Since we get the matrix ρ in (45) from the 4 × 4 matrix in (15) by setting ${{E}_{10}}=\cdots ={{E}_{15}}=1$, which leaves us with the nine angle parameters ${{\theta }_{1}},{{\theta }_{2}},...,{{\theta }_{9}}$, we supplement the eight-dimensional probability element $({\rm d}p)w(p)$ with a factor for q

Equation (48)

and sample from the nine-dimensional target density $w(p,q)$. Upon ignoring the q values of the sample points $\left( {{p}^{(j)}},{{q}^{(j)}} \right)$, we get the wanted sample of probabilities ${{p}^{(j)}}$. Ideally, we would like to sample from $w(p,q)$ by means of HMC sampling. However, due to ${{q}_{{\rm min} /{\rm max} }}(p)$ taking very complicated forms, it would not be very inconvenient to compute the derivatives of the potential, which is required by HMC. What can be done more simply is to perform HMC to sample from the distribution ${{w}^{*}}(p,q)=w(p)$, followed by importance sampling (as described in [1]) that assigns each point the weight ${{\left( {{q}_{{\rm max} }}(p)-{{q}_{{\rm min} }}(p) \right)}^{-1}},$ effectively leaving us with the desired distibution $w(p,q)=w(p)$.

We proceed to analyze the distribution of the Clauser–Horne–Shimony–Holt (CHSH) quantity S obtained using our sampling method. Here, S is defined as [9, 10]

Equation (49)

where A1 and A2 are observables on the first qubit, while B1 and B2 are observables on the second qubit, each having eigenvalues of ±1. For a state ρ, $E(A,B)$ is given by

Equation (50)

The CHSH quantity ranges from $-2\sqrt{2}$ to $2\sqrt{2}$, with values $|S|\gt 2$ being evidence of quantum entanglement between the qubits. For this example, we restrict measurements Ai and Bj to the form

Equation (51)

In general, S depends on ${{\phi }_{i}}$ and ${{\psi }_{j}}$. For any state ρ, S can be maximized by optimizing ${{\phi }_{i}}$ and ${{\psi }_{j}}$, which gives

Equation (52)

We begin by fixing ${{\phi }_{1}}=0$, ${{\phi }_{2}}=\pi /2$, ${{\psi }_{1}}=5\pi /4$, and ${{\psi }_{2}}=3\pi /4$. With this setting, S can be expressed in terms of our detector probabilities

Equation (53)

In figure 4(a), the resulting distribution of the CHSH quantity is shown for three different samples. The first, in blue, is drawn from the primitive prior. The next, in green, is drawn from the posterior obtained from data that corresponds to the triplet state $\frac{1}{\sqrt{2}}(|10\rangle +|01\rangle )$ with noise. Lastly, the red distribution is drawn from the posterior density obtained from data corresponding to the singlet state $\frac{1}{\sqrt{2}}(|10\rangle -|01\rangle )$. Each sample contains 50 000 sample points, and the data sets used for the posterior densities are made up of 64 measured qubit pairs each. It can be seen that although the triplet state has as much entanglement as the singlet state, the CHSH quantities for the triplet sample are much closer to 0 for the unoptimized measurements. In figure 4(b), we compare the quantity ${{S}^{2}}/4$ obtained by using the fixed measurement (in green) against that obtained from the maximized CHSH quantity (in magenta) for the sample drawn from the posterior density of the triplet state with noise. This quantity ${{S}^{2}}/4$ takes values between 0 and 2, with values larger than 1 being evidence for quantum entanglement. For the fixed measurement of (53), the values that exceed ${{S}^{2}}/4=1$ are extremely rare. By contrast, there is a large fraction of values with ${{S}^{2}}/4\gt 1$ for the optimized S value of (52).

Figure 4.

Figure 4. Distribution of the CHSH quantity S for various samples. In (a), the distributions of S for samples from three different distributions are shown for comparison. In (b), we plot ${{S}^{2}}/4$ for the triplet posterior sample, using the fixed measurements (green) and the optimized ones (magenta).

Standard image High-resolution image

In closing, we note another use of (45). It can provide a physicality check for candidate probabilities pjk that is quite different from, and much simpler than, the algorithm discussed in appendix A in [1]. Given pjks that obey all the basic constraints, they are physical if there is a q value for which $\rho \geqslant 0$; otherwise, they are not physical.

5. Conclusion

We have adapted the HMC sampling algorithm to the problem of sampling quantum state spaces, where the constraints that result from the positivity of the statistical operator must be obeyed throughout. To ensure this, we use a systematic parameterization of the statistical operator without any superfluous parameters. This can always be done for informationally complete measurements, and is also possible when the measurement is not informationally complete as long as a suitable parameterization of the relevant subspace of the state space is at hand. In all other cases, one must resort to other sampling algorithms, such as the ones we discuss in [1].

Acknowledgments

We thank Michael Evans and Yong Siah Teo for stimulating discussions. This work is funded by the Singapore Ministry of Education (partly through the Academic Research Fund Tier 3 MOE2012-T3-1-009) and the National Research Foundation of Singapore.

Appendix. The leapfrog method

The leapfrog method is a split-operator method much like those based on the Trotter–Suzuki formula (see, e.g., [11]). For a small time increment τ, we advance $(\theta (t),\vartheta (t))$ to $(\theta (t+\tau ),\vartheta (t+\tau ))$ in three steps: by letting only the kinetic energy $\frac{1}{2}{{\sum }_{s}}\vartheta _{s}^{2}$ govern the evolution for duration $\frac{1}{2}\tau $ in the first and third steps, whereas only the potential energy −${\rm log} w(\theta )$ is relevant in the intermediate second step of duration τ. In total, this amounts to

Equation (A.1)

where the right-hand sides are correct to order ${{\tau }^{2}}$ and have an error $\propto {{\tau }^{3}}$.

We break up the total duration T into L time intervals $\tau =T/L$ so that L leapfrog jumps accomplish HMC3 with a discretization error $\propto L{{\tau }^{3}}={{T}^{3}}/{{L}^{2}}$. The two adjacent kinetic-energy $\frac{1}{2}\tau $-periods of subsequent jumps are conveniently combined into a single period of a full τ. Accordingly, we find the final $({{\theta }^{*}},{{\vartheta }^{*}})$ pair from the initial $(\theta ,\vartheta )$ pair by this procedure:

LF1 Set $({{\theta }_{1}},{{\vartheta }_{1}})=(\theta +\frac{1}{2}\tau \vartheta ,\vartheta )$.

LF2 For $j=1,2,3,...,2L-1$, set

Equation (A.2)

LF3 Set $({{\theta }^{*}},{{\vartheta }^{*}})=({{\theta }_{2L}}+\frac{1}{2}\tau {{\vartheta }_{2L}},-{{\vartheta }_{2L}})$.

We note an important property of the leapfrog method: the individual steps in LF1–3 are shearing transformations that preserve phase-space volumes, so that the approximate solution of (8) is volume-preserving just like the exact one (Liouville's theorem).

According to Neal [3], there is a trade-off between accuracy in the propagation and CPU time consumed. A good choice of L is such that the acceptance rate (10) is about 65% for high-dimensional problems. Further, if one observes slow convergence, the likely cause is nonergodicity, which can be cured by randomly choosing τ and L from fairly small intervals [12].

Please wait… references are loading.
10.1088/1367-2630/17/4/043018