Abstract
High-quality random samples of quantum states are needed for a variety of tasks in quantum information and quantum computation. Searching the high-dimensional quantum state space for a global maximum of an objective function with many local maxima or evaluating an integral over a region in the quantum state space are but two exemplary applications of many. These tasks can only be performed reliably and efficiently with Monte Carlo methods, which involve good samplings of the parameter space in accordance with the relevant target distribution. We show how the Markov-chain Monte Carlo method known as Hamiltonian Monte Carlo, or hybrid Monte Carlo, can be adapted to this context. It is applicable when an efficient parameterization of the state space is available. The resulting random walk is entirely inside the physical parameter space, and the Hamiltonian dynamics enable us to take big steps, thereby avoiding strong correlations between successive sample points while enjoying a high acceptance rate. We use examples of single and double qubit measurements for illustration.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Our companion paper [1] states the motivation for this work and introduces the terminology and notational conventions we are using; when referring to an equation or figure in that paper, the respective number is preceded by 'I-'. While the sampling methods presented there—rejection sampling, importance sampling, and Markov-chain sampling—are easy to implement, they require a costly (in CPU time) check whether the candidate probabilities obey all constraints imposed by the positivity of the statistical operator. By contrast, in the Hamiltonian Monte Carlo (HMC) method [2, 3] that we discuss here, the constraints are always obeyed by construction.
Like the Markov-chain Monte Carlo (MCMC) method discussed in I, HMC involves a 'walk' around the parameter space. While one often needs a walk comprising many small steps for MCMC to attain a good sampling yield, HMC can be done with large steps. As a consequence, the problem of a slow exploration of the probability space, in conjunction with strong correlations between successive sample points, which is a typical feature of other random-walk strategies, is not present in HMC.
Further, HMC enables us to sample in accordance with any (reasonable) prior density and any posterior density. Some other sampling methods are very efficient for particular priors (see, e.g., [4, 5]), but lack this much-needed flexibility. The downside of HMC, however, is the requirement for a suitable parameterization for its implementation.
2. Hamiltonian Monte Carlo
2.1. The idea
The HMC method makes use of pseudo-Hamiltonian dynamics in a mock phase space and can be applied to most problems with continuous state spaces by introducing fictitious momentum variables. One way of conveying the basic idea of HMC is the following.
We begin with a trial density , supplement the position variables with momentum variables , and dress up the position density with a Gaussian momentum density to compose the initial phase-space density
at time t = 0. We obtain the final phase-space density at time t = T by propagating F0 to FT with the aid of Liouville's equation
where the differential operator
involves the velocity components and the force components associated with the Hamiltonian
At the final time T, we thus have
The updated position density now results from integrating over the momenta, so that
is the net map for . Exceptional situations aside (usually they arise from an unfortunate choice for the duration T), is a fixed point of this map only if is a function of the Hamiltonian, which is the case if
where the dotted equal sign stands for 'equal up to a constant numerical factor'. Taking for granted without proof that the map (6) is a contraction, repeated applications of the map will thus turn into , the desired prior or posterior density.
2.2. The implementation
While this conveys the idea of HMC, its actual implementation is, however, not in terms of a trial sample that is iteratively updated by the mapping (6), but by a Markovian random walk. At each step of the walk, the state follows a trajectory from the current position to the proposal , where the components of the initial momentum are chosen at random from the Gaussian distribution , as required by the initial phase-space density in (1). As long as the map (6) is ergodic, the time averaged distribution of θ will converge towards the stationary distribution, with density as demonstrated in section 2.1.
Here, then, is the HMC algorithm [3]:
HMC1 Start at with an arbitrary initial point .
HMC2 Generate from a multivariate normal distribution with unit variance.
HMC3 Solve the Hamilton equations of motion
for the initial conditions and obtain .
HMC4 Calculate the acceptance ratio
HMC5 Draw a random number b uniformly from the range . Set if ; otherwise, set .
HMC6 Update . Escape the loop when , the target number of sample points; otherwise, return to HMC2.
A few remarks are in order. First, the value of the Hamiltonian is constant along the exact trajectory of HMC3, which would give in HMC4. In practice, however, we rely on an approximate trajectory; accepting Neal's advice [3], we calculate it with the leapfrog method described in appendix, and the difference between the initial and final values of in (9) is nonzero.
Second, HMC is an implementation of the Metropolis–Hastings Monte Carlo (MHMC) algorithm [6] that, regardless of the step size used, achieves a high acceptance rate with much weaker correlations between successive points. In MHMC, the proposal is accepted with probability (see (I-D.1)),
where is the target probability density, and is the probability of proposing point given the current point θ. The comparison with (9) establishes that
in HMC where ϑ is the randomly chosen initial momentum and is the negative of the resulting final momentum; the otherwise irrelevant minus sign in in HMC3 ensures the symmetry of the process, thereby exploiting the time-reversal invariance of the equations of motion (8). In effect, then, HMC achieves an acceptance rate close to 1 by a proposal scheme where is close to .
Third, rather than the 'kinetic energy' of (4), we could use a general quadratic form where the coefficients are the entries of a symmetric positive-definite S × S matrix; HMC2 and the velocities in HMC3 are then changed accordingly. Whether or not this freedom in choosing matrix μ offers an advantage, depends on the structure of the 'potential energy' . One would usually carry over any symmetries in the potential energy to the kinetic energy; for instance, if is invariant under the interchange , the kinetic energy should treat and on equal footing. The kinetic energy of (4) is used for all the examples in section 4.
3. State parameterization
In [1], all sampling is done in the probability space. Unless the circumstances are so simple that we can state explicitly all constraints obeyed by the probabilities and do not need to execute a physicality check of the kind discussed in appendix A of [1], it is not feasible to implement the HMC random walk for variables θ that are (a non-redundant subset of the) probabilities. Rather, we parameterize the statistical operator ρ, and the θ dependence of the probabilities then follows from the Born rule.
For a d-level quantum system, the statistical operator ρ is represented by a hermitian unit-trace matrix that has real parameters. The matrix of the arbitrary operator A in , however, has 2d2 real parameters, of which are superfluous. We get rid of them by restricting the A matrix to upper-triangular (or lower-triangular, as in [7]) form with real diagonal elements; this reduces the count of real parameters to d2. One more parameter is removed by enforcing that
holds, which requires that the moduli of the elements of A are points on a -sphere. We parameterize this sphere with spherical coordinates—angle parameters , , ..., —with the Cartesian coordinates recursively defined by
We fill up the upper-triangle of matrix A with the Cartesian coordinates, which are in number, and supplement the off-diagonal entries with phase factors
The cases , , and illustrate the matter:
Other ways of assigning the Cartesian coordinates to the upper-triangle entries and the phase factors to the off-diagonal entries are, of course, possible and give equally valid parameterizations. The assignment chosen is
with
Upon expressing the probabilities in terms of the parameters with , the step-function constraints in of (I-6) are taken care of. Then, integrating out the delta-function constraints gets rid of redundant probability variables. Finally, we need the Jacobian between the remaining pks and the s to convert the prior or posterior density in p into the corresponding expression in θ
The HMC algorithm can now be executed for .
As stated, the parameterization of (12) and (16) with (13), (14), and (17) is applicable to POMs that are informationally complete, for which the whole state space is the reconstruction space. If the POM is not informationally complete, it may still be possible to use A of (16) with fixed values for some of the s. An example is the trine measurement for a qubit, for which the matrix in (15) with does the job; see section 4.2. If no such restricted version of (16) serves the purpose, it may be possible to introduce additional parameters during the HMC sampling, thus produce a sample in a space of higher dimension, then marginalize the auxiliary parameters, and so arrive at a proper sample; see section 4.3 for an example.
4. Examples
The parameterization of first the statistical operator ρ and then the probabilities p in terms of the angles θ, while systematic, tends to be involved and does not lend itself to simplification when explicit expressions are needed. Therefore, we shall only present detailed examples for the case of a qubit (sections 4.1 and 4.2), while providing an example with incomplete tomographic data for the case of a qubit pair (section 4.3).
4.1. Qubit: informationally complete POMs
For the case of a qubit
with the matrix of (15) referring to the basis in which the Pauli operators , , and have their standard form, we have
for the expectation values of the Pauli operators. As it should, the sum of their squares
takes on all values between 0 and 1. For later use, we note that
which exhibits the Jacobian factor that relates the parameterization of ρ in (19) to the θ parameterization of (12) with (16).
We consider two POMs, the four-outcome tetrahedron POM of minimal qubit tomography [8] and the six-outcome Pauli POM that measures in three mutually unbiased bases. For the tetrahedron POM, we have the probabilities
The corresponding constraint factor of (I-6) contains of (I-5) for and
By integrating out p4 with the help of the delta function in , we reduce the volume element in the probability space of (I-7) to
with ; the pk values selected by the first step function are non-negative, so that we can omit the factors.
Since the Jacobian between and does not depend on the coordinates, we have
and
follows, where the doubly dotted equal sign says 'essentially equal in the sense of ignoring constant factors and reducing the number of variables such that the remaining ones are independent'. The final step uses (21) and (22) to arrive at
for the tetrahedron POM.
For the Pauli POM, we have the six probabilities
for which
and
are the factors in . The analog of (25) is
and (26)–(28) hold here as well. That is, whether we use the the tetrahedron POM or the Pauli POM, for both the volume element for physical probabilities is that of (28).
Therefore, if we are sampling in accordance with the primitive prior , there is no difference between these two POMs. We just have
in (4), (8), and (A.2) of the HMC algorithm. The sample of 50 000 points thus produced is reported in figure 1. This sample is a collection of values inside the Bloch sphere; it is converted into a p sample by (23) or (29), respectively.
Figure 2 shows trajectories of 50 sample points, with consecutive points connected by blue lines, generated using the HMC algorithm, as well as the xMHMC algorithm from I. For the HMC sample, we see that even this short trajectory samples the unit circle rather efficiently, with the points far apart from their predecessors and successors. Such a trajectory is rather different from that of the xMHMC algorithm, where consecutive points are often close together. Indeed, HMC overcomes the problem of the strong autocorrelation that figure I-4 shows. The high acceptance rate of about 95% in the run that produced figures 1 and 2(a) is another advantage of HMC over xMHMC, which had an acceptance rate of about 60% in the run that produced figure 2(b). In higher dimensional cases, the optimal acceptance rates should be about 65% and 25% for the HMC and xMHMC algorithms respectively. And to repeat, since all generated points are physical by construction, there is no need for the CPU-time consuming check of physicality that is necessary in the xMHMC algorithm.
Download figure:
Standard image High-resolution image4.2. Qubit: informationally incomplete POMs
We consider two POMs that are not informationally complete, as they provide no information about z: the three-outcome trine measurement with the probabilities of (I-4) and the constraint factors
and the four-outcome crosshair POM with the probabilities
and the constraint factors
There are many different reconstruction spaces that we can use; two choices that suggest themselves are the equatorial plane of the Bloch ball
and the upper half of the Bloch sphere
with for both. The version of (20) realizes (37)
and together with fits to (38)
For the primitive prior density, these give
for both the trine POM and the crosshair POM.
Figure 3 shows the histogram of a sample of 50 000 points in accordance with a posterior density. This example is for the trine POM with the primitive prior and the parameterization of (39), and the data are . Specifically, we have
with
and
in (4), (8), and (A.2) of the HMC algorithm. This sample consists of (x, y) pairs in the unit circle and conversion into a sample of ps is done with (I-4).
Download figure:
Standard image High-resolution imageWe note that the algorithm that samples in accordance with a posterior specified by the primitive prior and data can also sample according to the primitive prior itself, by simply putting . Further, for it samples in accordance with the Jeffreys prior, and running the algorithm for gives a posterior sample for the Jeffreys prior. Likewise, conjugate priors specified by mock data can be handled by the same algorithm. Analogous remarks apply to POMs with more outcomes.
4.3. Qubit pair: double-crosshair POM of BB84
In the BB84 scenario of quantum key distribution, the two parties use two four-outcome crosshair measurements and so have 16 outcomes for the composed POM. In a self-explaining notation that relies on two copies of (35), we denote the joint probabilities by pjk with , eight of which are independent. One reconstruction space can be specified by 4 × 4 matrices of the form
with . This unit-trace real symmetric matrix has nine real parameters, eight determined by the probabilities of the double-crosshair POM. While they do not fix the value of the ninth parameter , they determine the range of permissible q values
The p dependent bounds can be found by requiring , as will be discussed shortly. By choosing a specific q value—perhaps the largest permissible value , or the value closest to 0—we get a definite reconstruction space. This is, however, awkward if we want to sample by the HMC algorithm.
In order to find , we demand that all the eigenvalues of ρ must be non-negative. The determinant of ρ is a forth-order polynomial in q
Being the product of the eigenvalues, the roots of the determinant are the q values for which one of the eigenvalues of equals to zero. Hence, the range of physical q is bounded by roots of (47). The positive coefficient of the q4 term implies that the determinant is positive at regions of large . Given that the roots are the q values where the determinant changes signs, the only other region where the determinant can be positive is between the second and third roots. From (45), it is clear that for large , ρ will have two positive and two negative eigenvalues. Hence, the only permissible region is the region bounded by the second and third roots of (47). Finding these roots therefore gives us and .
Since we get the matrix ρ in (45) from the 4 × 4 matrix in (15) by setting , which leaves us with the nine angle parameters , we supplement the eight-dimensional probability element with a factor for q
and sample from the nine-dimensional target density . Upon ignoring the q values of the sample points , we get the wanted sample of probabilities . Ideally, we would like to sample from by means of HMC sampling. However, due to taking very complicated forms, it would not be very inconvenient to compute the derivatives of the potential, which is required by HMC. What can be done more simply is to perform HMC to sample from the distribution , followed by importance sampling (as described in [1]) that assigns each point the weight effectively leaving us with the desired distibution .
We proceed to analyze the distribution of the Clauser–Horne–Shimony–Holt (CHSH) quantity S obtained using our sampling method. Here, S is defined as [9, 10]
where A1 and A2 are observables on the first qubit, while B1 and B2 are observables on the second qubit, each having eigenvalues of ±1. For a state ρ, is given by
The CHSH quantity ranges from to , with values being evidence of quantum entanglement between the qubits. For this example, we restrict measurements Ai and Bj to the form
In general, S depends on and . For any state ρ, S can be maximized by optimizing and , which gives
We begin by fixing , , , and . With this setting, S can be expressed in terms of our detector probabilities
In figure 4(a), the resulting distribution of the CHSH quantity is shown for three different samples. The first, in blue, is drawn from the primitive prior. The next, in green, is drawn from the posterior obtained from data that corresponds to the triplet state with noise. Lastly, the red distribution is drawn from the posterior density obtained from data corresponding to the singlet state . Each sample contains 50 000 sample points, and the data sets used for the posterior densities are made up of 64 measured qubit pairs each. It can be seen that although the triplet state has as much entanglement as the singlet state, the CHSH quantities for the triplet sample are much closer to 0 for the unoptimized measurements. In figure 4(b), we compare the quantity obtained by using the fixed measurement (in green) against that obtained from the maximized CHSH quantity (in magenta) for the sample drawn from the posterior density of the triplet state with noise. This quantity takes values between 0 and 2, with values larger than 1 being evidence for quantum entanglement. For the fixed measurement of (53), the values that exceed are extremely rare. By contrast, there is a large fraction of values with for the optimized S value of (52).
Download figure:
Standard image High-resolution imageIn closing, we note another use of (45). It can provide a physicality check for candidate probabilities pjk that is quite different from, and much simpler than, the algorithm discussed in appendix A in [1]. Given pjks that obey all the basic constraints, they are physical if there is a q value for which ; otherwise, they are not physical.
5. Conclusion
We have adapted the HMC sampling algorithm to the problem of sampling quantum state spaces, where the constraints that result from the positivity of the statistical operator must be obeyed throughout. To ensure this, we use a systematic parameterization of the statistical operator without any superfluous parameters. This can always be done for informationally complete measurements, and is also possible when the measurement is not informationally complete as long as a suitable parameterization of the relevant subspace of the state space is at hand. In all other cases, one must resort to other sampling algorithms, such as the ones we discuss in [1].
Acknowledgments
We thank Michael Evans and Yong Siah Teo for stimulating discussions. This work is funded by the Singapore Ministry of Education (partly through the Academic Research Fund Tier 3 MOE2012-T3-1-009) and the National Research Foundation of Singapore.
Appendix. The leapfrog method
The leapfrog method is a split-operator method much like those based on the Trotter–Suzuki formula (see, e.g., [11]). For a small time increment τ, we advance to in three steps: by letting only the kinetic energy govern the evolution for duration in the first and third steps, whereas only the potential energy − is relevant in the intermediate second step of duration τ. In total, this amounts to
where the right-hand sides are correct to order and have an error .
We break up the total duration T into L time intervals so that L leapfrog jumps accomplish HMC3 with a discretization error . The two adjacent kinetic-energy -periods of subsequent jumps are conveniently combined into a single period of a full τ. Accordingly, we find the final pair from the initial pair by this procedure:
LF1 Set .
LF2 For , set
LF3 Set .
We note an important property of the leapfrog method: the individual steps in LF1–3 are shearing transformations that preserve phase-space volumes, so that the approximate solution of (8) is volume-preserving just like the exact one (Liouville's theorem).
According to Neal [3], there is a trade-off between accuracy in the propagation and CPU time consumed. A good choice of L is such that the acceptance rate (10) is about 65% for high-dimensional problems. Further, if one observes slow convergence, the likely cause is nonergodicity, which can be cured by randomly choosing τ and L from fairly small intervals [12].