Heisenberg-style bounds for arbitrary estimates of shift parameters including prior information

A rigorous lower bound is obtained for the average resolution of any estimate of a shift parameter, such as an optical phase shift or a spatial translation. The bound has the asymptotic form k_I/<2|G|>where G is the generator of the shift (with an arbitrary discrete or continuous spectrum), and hence establishes a universally applicable bound of the same form as the usual Heisenberg limit. The scaling constant k_I depends on prior information about the shift parameter. For example, in phase sensing regimes, where the phase shift is confined to some small interval of length L, the relative resolution \delta\hat{\Phi}/L has the strict lower bound (2\pi e^3)^{-1/2}/<2m| G_1 |+1>, where m is the number of probes, each with generator G_1, and entangling joint measurements are permitted. Generalisations using other resource measures and including noise are briefly discussed. The results rely on the derivation of general entropic uncertainty relations for continuous observables, which are of interest in their own right.


Introduction
In many measurement scenarios, an environmental variable acts to translate or shift a property such as the optical phase or position of a probe state. Accurate estimation of the shift parameter allows a correspondingly accurate measurement of the environmental variable. For example, interferometric measurements of quantities such as temperature, strain and gravitational wave amplitudes rely on estimation of an optical phase shift. An important aim of quantum metrology is to determine the fundamental bounds on the resolution of such estimates [1, 2].
One useful tool in this respect is the quantum Cramer-Rao inequality [3,4], which can be used to obtain bounds on the resolution of a shift parameter in terms of the variance of the operator that generates the shift [2, 3,4,5,6]. For example, the root mean square error of any unbiased estimateX of a shift parameter X, for any fixed value X = x 0 , satisfies [5,6] where G is the shift generator, ∆G is the uncertainty of G for the probe state, and m is the number of (independently measured) copies of the probe state. One can also use the Cramer-Rao inequality to derive resolution bounds in term of other quantities, such as the maximum and minimum eigenvalues of G when they exist [7,8], or the mixedness of the probe state [9]. While such bounds are universally valid, scientists often wish to obtain bounds in terms of other resources, such as the average energy or mean photon number of the probe state. But, for example, if G = N is the photon number of a single mode field, then ∆N can be arbitrarily large or small relative to the average photon number N . A simple example which proves this point is a probe state with number distribution p 0 = 1 − w, p n = w for some n = 0, and vanishing otherwise. Then N /∆N = [(1 − w)/w] 1/2 ranges over (0, ∞) as w ranges over (0, 1). † Thus, the lower bound (1) does not limit phase resolution in terms of average photon number. This illustrates that the quantum Cramer-Rao inequality cannot always be used to derive bounds in terms of the resources of interest required to achieve a given resolution, in particular when the eigenvalue range is unbounded.
An alternate tool for bounding resolution is the Heisenberg limit [10]. While less developed than the quantum Cramer-Rao inequality, it may be heuristically characterised as an asymptotic lower bound on measurement resolution that scales inversely with the number of resources available and that is achievable, up to a constant factor [11,12,13,14,15,16,17]. These references use various ways to quantify meausurement resolution and number of resources, but at this point it suffices to quote the earliest result (obtained numerically) [11]: the root mean square error of a canonical phase estimate on a single mode field, for any applied phase shift φ 0 , can asymptotically scale no better than δΦ φ0 1.38/ N . This scaling with the average photon number N , rather than with ∆N as in (1), has the advantage of providing a necessary condition on the energy resources required for a given phase resolution.
Recently, progress has been made in generalising the Heisenberg limit to obtain non-asymptotic resolution bounds for arbitrary estimates of shift parameters. These may be called Heisenberg-style bounds. First, for a shift generator G with a discrete spectrum and finite lowest eigenvalue g min , Giovannetti et al have bounded the root mean square error of any estimate, when averaged over two fixed values of the shift parameter, by 0.076/ G − g min -providing that the fixed values and errors satisfy a particular constraint [18]. Second, when G is further restricted to integer eigenvalues, Hall et al have obtained the constraint-free bound of 0.559/ G − g min + 1 , for the root mean square deviation of any estimate, when uniformly averaged over all values of the shift parameter [19].
Despite this progress, none of the above results characterise the overall performance of a given estimate in the important case that prior information is available about the value of the shift parameter. For example, in a phase sensing regime [20], such as gravitational wave detection, the value of an applied phase shift is a priori known to lie within some small interval about zero. Hence, only the performance of the estimate over the interval is of interest -it is irrelevant how well or how badly the estimate may perform outside this interval. Indeed, in a stimulating paper, Rivas and Luis have recently proposed a phase estimation scheme that improves on the scalings of the above Heisenberg bounds for small phase shifts [21].
It is therefore of great interest to determine fundamental bounds when prior information is available to be exploited. For example, the quantum van Trees inequality generalises the quantum Cramer-Rao inequality, to bound the mean square error averaged over the prior probability density q(x) of the shift parameter [6,22], leading to ∆G in (1) being correspondingly replaced by [Var G + F q ] 1/2 , where F q = q(x) (d ln q(x)/dx) 2 is the Fisher information of q(x). Note however that, for estimates confined to some bounded interval with q(x) uniform over this interval, F q vanishes and no improvement is obtained over the lower bound in (1).
Tsang has very recently obtained the first results exploiting prior information in the context of Heisenberg-style bounds [23]. For example, he has shown that, for discrete shift generators and a prior density q(x) uniform over any interval of length L, the root mean square error of any estimate is bounded by a function which asymptotically approaches 0.154/ G − g min , under a constraint that L is sufficiently large (see also Ref. [24]).
The central result of the present paper is a general Heisenberg-style lower bound that takes arbitrary prior information into account. It is constraint-free, does not require the generator to be discrete, nor to have a finite lowest eigenvalue. In particular, we show that the average mean square deviation, (δX) 2 = (X − X) 2 , of any estimateX of some shift parameter X, over any prior distribution of X, satisfies Here k I depends on the prior information available about the shift parameter, G is the shift generator, and g is an arbitrary eigenvalue of G. For continuous spectra, c = 0 and ∆ = 1, while for discrete spectra c = 1 and ∆ is the minimal spectral gap of G. Further, if the spectrum of G has a smallest value, g min , then the factor of 2 in the denominator can be removed for the choice g = g min . The scaling constant k I in (2) depends on the prior distribution of the shift parameter, and can in principle be arbitrarily small for a sufficiently narrow prior distribution. In particular, for any generator with a discrete spectrum, if the shift parameter X is known a priori to be randomly distributed over an interval of length L, it will be shown that k I ≥ L(2πe 3 ) −1/2 . Thus there is a fundamental 'relative resolution' bound for how well the shift can be resolved, relative to the size of the interval to which it is confined. Moreover, with the additional assumption that the state is a number m of identical copies of some probe state, each with discrete generator G, it will be shown that inequality (2) implies that with a similar bound conjectured when G has a continuous spectrum. Note that this inequality allows for arbitrary measurements over the m probes, even entangling joint measurements, unlike bounds obtained from the Cramer-Rao inequality, which assumes a fixed measurement. The scaling of (3) with m −1 contrasts to the m −1/2 scaling in (1). While neither bound is necessarily achievable for a given probe state and a given m, the scaling as m −1/2 in (1) is as expected from elementary statistics. Hence we do not expect that our bounds would be tight for a fixed probe state and large m.
Similarly to (2), if the spectrum of G has a smallest value g min then the factor of 2 in the denominator of (3) can be removed for g = g min . For example, for phase shifts generated by the photon number operator N of a single mode field, the relative resolution of any estimate over an interval of length L is bounded below by (2πe 3 ) −1/2 / 1 + mN , generalising the result of Hall et al [19], which was limited to the case of a completely random phase shift, with L = 2π.
The results of the paper are obtained via the derivation of suitable entropic uncertainty relations for the shift generator G and the errorX − X (section 2 and appendices). Examples are given for estimates of optical phase, time, and spatial displacements, including generalisations to alternative resource measures such as the support of the energy distribution of the probe state (sections 3 and 4). A further generalisation of inequalities (2) and (3) is given which quantifies the effects of noise (section 5), followed by a brief discussion (section 6).

General estimation schemes
Consider a general shift parameter estimation scheme, in which a probe state ρ 0 undergoes a shift generated by some operator G, to the state ρ x = e −iGx ρ 0 e iGx . A measurement of observable M on the probe then outputs some estimated valuex of the actual shift value x, and is described by some positive-operator-valued measure (POVM) M = {Mx} [2, 4]. A standard notation will be used in which random variables and operators appear in upper case, and (eigen)values of these quantities appear in lower case. Thus, the shift parameter will be denoted by X, and its estimate byX.
If the prior probability density of the shift parameter is denoted by q(x), then the probability density of the error or deviation, Y =X − X, of the estimated value from the true value, is given by where the POVM M ≡ {M y } is defined by A 'good' estimate of the shift parameter will be one for which the error is small on average, i.e., for which p Y (y) is highly peaked about y = 0. This may be quantified by the mean square deviation (δX) 2 := (X − X) 2 = Y 2 = dy y 2 p Y (y).
Note that for periodic shift parameters, such as phase, the integration may be taken over an interval centred on y = 0 [19]. The quantity δX x0 in (1) corresponds to the case q(x) = δ(x − x 0 ). Equations (4)-(6) generalise the case of phase estimates considered in Hall et al [19], where the prior density was restricted to be uniform, i.e., q(x) = 1/2π. Under this restriction the POVM M is covariant, with M x+y = e −iGy M x e iGy , allowing a connection to be made between δX and G via an entropic uncertainty relation for canonically conjugate number and phase operators [19]. However, that method fails whenever q(x) is non-uniform, as M is no longer covariant. Note also that q(x) is necessarily non-uniform for non-periodic shift parameters.
It turns out that the key to generalising the approach of Hall et al is the extension of existing entropic uncertainty relations for arbitrary discrete observables [25,26,27], to the case of continuous POVMs, such as M in equation (5) above. The necessary extensions are derived in appendix A. It will now be shown how these lead to Heisenberg-style lower bounds for δX, as per inequalities (2) and (3). Examples and generalisations are given in sections 3-5.

Exploiting entropic uncertainty relations
Suppose that one has an entropic uncertainty relation for the observables G and M (the observable corresponding to the POVM M) of the form H(G) + H(M ) ≥ ln K I for some constant K I . Here H(A) denotes the Shannon entropy of the measurement distribution of observable A, for a probe in state ρ 0 . Several such uncertainty relations are given in appendix A. In general, K I depends on the prior information encoded in the prior density q(x), and its form is discussed in section 2.3 below.
From equation (4) the statistics of M and Y are identical, and thus the above uncertainty relation can be rewritten as Furthermore, consider the variational quantity J = H(Y ) + α 1 + β Y 2 , where α and β are Lagrange multipliers fixing the normalisation of p Y (y) and the value of Y 2 respectively. The variational equation δJ/δp Y = 0 leads directly (Chapter 12 of [28]) to the upper bound H(Y ) ≤ 1 2 ln 2πe Y 2 , saturated by the Gaussian distribution p Y (y) = (2π Y 2 ) −1/2 exp[−y 2 /2 Y 2 ]. Combining this bound with equations (6) and (7) yields the lower bound for the root mean square deviation of the estimate. Inequalities (7) and (8) provide information-theoretic bounds on the performance of the estimate, in terms of the entropy of the shift generator for the probe state. This is already useful in contexts where entropy itself can be considered as a resource. Inequality (8) is also useful for determining alternative bounds on resolution, under various constraints on G, as will be discussed in sections 3 and 4.
Finally, as shown in appendix B, if g is an arbitrary eigenvalue of G, then the entropy of G is bounded above by where the factor of 2 can be dropped if G has a minimum eigenvalue g min and g = g min . For discrete generators, ∆ = min g ′ =g ′′ |g ′ − g ′′ | is the minimum spectral gap of G, and c = 1. For continuous generators, ∆ = 1 and c = 0. Substitution of (9) into (8) immediately yields inequality (2), with Before discussing specific examples and generalisations of the generic Heisenberg bound (2), the dependence of the constant K I on the prior information encoded in q(x) will be examined, yielding a derivation of the relative resolution bound (3) for discrete generators, and an analogous conjectured bound for continuous generators.

Dependence on prior information
2.3.1. Discrete generators: For a shift generator G with a discrete spectrum, a suitable scaling constant K I follows from inequality (A.2) of appendix A as where Γ g denotes the projector on to the unit eigenspace of eigenvalue g, |ψ g := Γ g |ψ / ψ|Γ g |ψ 1/2 , and |g denotes any normalised eigenstate of G. The last equality holds since (i) |ψ g is always equal to some |g by construction, and (ii) the factor ψ|Γ g |ψ is maximised when |ψ = |ψ g . Hence, using equation (5) for M y , one has where p g (x) denotes the measurement distribution when the probe state is replaced by eigenstate |g of G. Thus, the value of K I is determined by the maximum possible value of the convolution of the prior probability density q(x) with the measurement distributions {p g }. Moreover, noting that where q max denotes the maximum value of the prior probability density q(x), one has This constraint on K I leads to the universal relative resolution bound (3) for discrete generators. In particular, an estimate based on (a possibly entangling joint measurement on) m copies of a probe state corresponds to replacing ρ 0 by ⊗ m ̺ 0 and G by and ∆ T = ∆. Hence, choosing q(x) to be uniform over an interval of length L and vanishing elsewhere, q max = 1/L and inequality (3) follows from relations (2), (10) and (12). The lower bound in (12) can be approached, in principle, if the measurement distribution p g (x) is sufficiently peaked around some value x g , for some eigenstate |g of G. In particular, this allows y to be chosen in (11) such that p g (x + y) is peaked around the maximum value of q(x). Thus, the more the estimate is concentrated around some value, given a system prepared in some eigenstate |g of G, the closer the constant K I will be to 1/q max , allowing the possibility of of approaching the lower bound in equation (3). This possibility is further discussed in section 3.1.
Note finally that the bound in (12) is only useful when the prior probability density q(x) is not infinitely peaked. Although q max will be finite for any physical prior distribution, it is of interest to find stronger bounds not subject to this limitation. For example, note that any probe state corresponding to an eigenstate |g of G is invariant under shifts generated by G, implying no corresponding estimate can improve on prior knowledge about the shift parameter. It is therefore natural to define an estimate to be 'ignorance respecting' if the measurement distribution for any eigenstate of G, p g (x), is not any better concentrated than the prior probability density q(x), in the standard sense that p g majorises q [29]. This implies in particular that dx F (p g (x)) ≤ dx F (q(x)) for any continuous convex function F [29]. Choosing F (z) = z 2 , and writing C r := dx r(x) 2 for probability density r, it follows via equation (11) and the Schwarz inequality that for ignorance-respecting estimates, i.e., This is stronger than the lower bound (12), and can be nontrivial even when q max = ∞.

Continuous generators:
Similar results hold for a shift generator G with a continuous spectrum. In particular, from equation (A.6) of appendix A, identifying G with Y , a suitable scaling constant K I follows as Here |g ranges over all (typically degenerate) unit eigenkets appearing in any spectral decomposition of Γ g (i.e., |g ≡ |g, d for some d in an orthogonal expansion where d is an arbitrary degeneracy index). It has not been possible at this time to prove a general relative resolution bound for continuous generators, analogous to (3). In particular, for continuous generators the ket |g is not normalisable, so that g|M x+y |g in (14) does not correspond to some measurement probability density p g (x). However, it is conjectured that for such generators, if the prior distribution is uniform over an interval of length L.
Here, as always, the factor of 2 in the denominator can be dropped for a bounded spectrum for the choice g = g min . The support for this conjecture arises from a correspondence between the cases of no prior information and covariant estimates, as will now be detailed. First, if the POVM {Mx} is covariant, then from equation (5) it follows that M y = dx q(x) M y = M y , for any prior distribution q(x). Hence, covariant estimates cannot make use of any prior information. Conversely, any estimate that does makes use of prior information must be noncovariant.
Second, for continuous generators, H(G) + H(M ) ≥ ln πe for any covariant POVM M from equation (A.11), where the bound is saturated when M is canonically conjugate to G. Hence one may take K I = πe for covariant estimates. Moreover, for m probe states, any covariant estimate of X satisfies the same entropic uncertainty relation with G replaced by G T = G 1 + . . . G m . Hence, using (2), (10), and |G 1 + . . . + G m − mg| ≤ |G 1 − g| + . . . + |G m − g|, one has the rigorous bound for covariant estimates, where the expectation value is with respect to ̺ 0 . As usual, the denominator can be replaced by m G − g min if G is bounded below. This bound includes covariant estimates based on entangling joint measurements; a class of covariant estimates based on independent measurements is also briefly discussed in appendix A. Finally, since inequality (16) corresponds to no use of prior information, it can be interpreted in a limiting sense as a resolution bound relative to a prior distribution q(x) which is uniform over the whole real line. This corresponds to L → ∞ in (15), where more generally the conjecture claims the same relative bound holds for prior distributions uniform over any finite interval of length L.

Phase shift estimation
For the case where the spectrum of the generator G is a subset of the integers one has e iGφ = e iG(φ+2π) , and hence the corresponding shift parameter Φ may be treated as a phase parameter, taking values on the unit circle. It follows from equations (2), (10) and (12) that where q max ≥ 1/2π is the maximum value of the prior probability density q(φ) for Φ. Moreover, if it is known a priori that the phase shift is confined to an interval of length L, then q(x) = L −1 over the interval and vanishes elswhere, and hence from equation (3) for m identical systems the relative resolution is bounded by As previously, the term 2|G − g| in the denominators of (17) and (18) may be replaced by G − g min when the spectrum is bounded below. Hence for a single mode field, with number operator N ≥ 0, one has δΦ ≥ (2π/e 3 ) 1/2 / N + 1 in the case of no prior information, with q(φ) = 1/2π = q max . However, strong numerical evidence has been given that the numerator (2π/e 3 ) 1/2 ≈ 0.559 for this case can be replaced by a best possible value of ≈ 1.376, which is asymptotically achievable for large N on a suitable probe state via the canonical phase estimate [19,30]. Hence it is conjectured that the bounds (17) and (18) are not tight, and that a similar replacement can be made in the case of arbitrary prior information.
To exploit any prior information, corresponding to a scaling constant K I < 2π in (17), is nontrivial. Note that a covariant estimate is not suitable, as for any such estimate one has g|M φ |g = (2π) −1 for all eigenstates |g of G [4,12], yielding K I = 2π from equation (11). Indeed, equation (11) implies that a necessary condition for exploiting prior information is that the estimate must return a nonuniform distribution over [0, 2π] when some number eigenstate is input as a probe state. This is counterintuitive, since such eigenstates are invariant under phase shifts and hence cannot generate any useful phase information. However, it must be kept in mind that it is the actual probe state, ρ 0 , rather than a notional probe state, that is relevant for actually estimating the phase.
For example, in the recently proposed phase estimation scheme of Rivas and Luis, applicable to small phase shifts of a single mode field generated by the photon number operator N , the estimate is proportional to the result of a homodyne measurement on the probe state [21]. Hence, if a number eigenstate |n was to be input as a probe state, such a homodyne measurement would generate the (clearly nonuniform) statistics proportional to the quadrature distribution | x|n | 2 . Thus, the Rivas and Luis scheme satisfies the above necessary condition for exploiting prior information. However, as noted by Rivas and Luis following their equation (29), while their estimate has an arbitrarily low root mean square error δΦ 0 for a fixed phase shift value of zero, it can only further achieve an error δΦ φ ≈ δΦ 0 ≪ L, for each φ in an interval of length L about φ = 0 (corresponding to ∆φ ≪ δφ in the notation of [21]), if δΦ 0 ≫ 1/N T , where N T ≡ m tr[N ̺ 0 ]. Thus, unfortunately, noting that the averaged root mean square error δΦ over the interval is ≈ δΦ 0 in this case, this scheme does not approach the lower bound in equation (18) above, nor even the numerically optimal bound δΦ ≥ 1.376/ N + 1 when prior information is not exploited [19].
The above derivation establishes ultimate bounds on phase resolution. In particular, for any phase estimation scheme it is impossible to have a scaling better than 1/q max , or an asymptotic scaling better than 1/ m|G| . However, it remains a challenge for further work to determine how closely the above phase estimation bounds can be be approached (up to some numerical factor), via a suitable measurement and probe state.
Finally, it is worth noting that while G is a transparent measure of resources when G is a photon number operator or similar, more generally the quantity |G − g| may not be. However, it is straightforward to generalise the method used in section 2 to obtain bounds for resolution in terms of other quantities. For example, in the context of phase measurements, consider the shift generator G = 1 2 (N A − N B ), where N A and N B are the number operators for respective single-mode fields input to a Mach-Zehnder interferometer [31]. If the total number of input photons is bounded by some fixed maximum value, i.e., N A + N B ≤ N max , then G can only take 2N max + 1 distinct values: 0, ±1/2, ±1, . . . , ±N max /2. Hence H(G) ≤ ln(2N max + 1), and equations (8) and (12) yield the bounds for the resolution of the corresponding shift parameter Θ ∈ [0, 4π], where m and L have the meanings introduced earlier.

Time estimation for discrete Hamiltonians
Time estimates correspond to the case G = E/ , where E is a Hamiltonian operator with lowest eigenvalue ǫ 0 . Thus G generates the time shift operator e −iEt/ . For the case of a discrete spectrum, the Heisenberg bound (2) (recalling the factor of 2 can be removed for g = ǫ 0 ), in combination with relations (10) and (12), yields Here, ∆ E denotes the smallest energy gap between distinct eigenvalues of E (thus, the bound is only useful for ∆ E > 0), and q max is the largest value of the prior density q(t). Further, if the time shift is a priori uniformly distributed over an interval of length τ (which can be arbitrarily large if the system is not periodic), equation (3) yields the scaling bound where m is the number of identically prepared copies. When the energy differences ǫ j − ǫ k are incommensurate, the system will be almost periodic and q(t) must be defined over the whole real line. However, despite the nonexistence of a uniform prior distribution in this case, one can still define covariant time estimates and show, for example, that one cannot typically extract more than 1 bit of information from such an estimate [32]. It is therefore expected that a noncovariant estimate is required to exploit any prior information.
An alternative resource of interest for bounding time resolution, particularly if the spectral structure is complex, is the number of energy eigenstates accessible to the probe state. For example, if the probe state is a D-level system, then the entropy of its energy distribution must satisfy H(E) ≤ ln D, implying via the entropic bound (8) that δT ≥ (2πe) −1/2 K I /D. Hence, via (12), one has the relative resolution bound for a time shift uniformly distributed over [0, τ ]. Note, however, that it is reasonable to expect that the actual time resolution of a discrete system will have a strong dependence on the detailed structure of the energy spectrum. Hence, the above bounds may be well below what is actually achievable.
Finally, note that for a prior distribution uniform over an interval of length τ , Tsang has very recently given the lower bound [23] δT for the relative resolution, under the constraint that τ E − ǫ 0 ≥ 0.690 . For τ ∆ E ≤ 0.154(2πe 3 ) 1/2 ≈ 1.73 this is asymptotically stronger, as a function of E − ǫ −1 , than the relative resolution bound (21) (with m = 1), and is weaker otherwise. Hence it provides an improved relative resolution bound for the case of a sufficiently small energy gap, or a sufficiently small interval τ still satisfying the constraint.

Time estimation for continuous Hamiltonians
For Hamiltonians having a continuous spectrum, equation (2) (recalling the factor of 2 can be dropped for g = ǫ 0 ), together with equation (10), yields where K I is given in (14). Further, for any covariant time estimate one has from (16). Finally, if conjecture (15) is correct, then (again dropping the factor of 2) the relative resolution bound holds for a prior probability density q(t) uniform over an interval of length τ . It would be of great interest to determine how closely the bound (25) can be approached, via a canonical time measurement on a suitable probe state. Note that the variance of the canonical time distribution does not exist if the energy distribution of the probe state has a nonzero groundstate component [4]. Hence, such probe states would require a different measure of time resolution -e.g., the ensemble length of the error distribution, exp[H(T − T )], which may be bounded from below via inequalities (7) and (9).

Spatial displacement estimation
As a final example, consider the case of estimation of the displacement of a quantum system in some direction, corresponding to the generator G = P/ for the momentum in that direction. The general resolution bound follows from (2) and (10), while for any covariant estimate one has δX cov ≥ 2 π/(2e) |P − p| (27) from (16), for the case of a single copy, m = 1. The conjecture (15) implies a similar bound for the case of a prior distribution uniform over any finite interval, suggesting that measurement of the position observable Q conjugate to P , with POVM elements M q = |q q|, is always optimal in this case. The above bounds are valid for all values of the reference momentum p. However, a variational calculation shows they are strongest when p is chosen to be the median value p 0 of the momentum distribution p(k), i.e, when p0 −∞ dk p(k) = 1/2. Note that the mean and median values are identical for the case of symmetric distributions.

Including the effects of noise
The presence of noise is expected to decrease the accuracy of any estimate, and hence to increase the lower bounds in the previous sections. Consider, for example, the very simple case in which independent noise is added to the measurement outcomeX. Denoting the noise variable by Z, the entropy power inequality [28] and equation (7) imply e 2H(X+Z−X) ≥ e 2H(X−X) + e 2H(Z) ≥ K 2 I e −2H(G) + e 2H(Z) . Hence, as per the derivation of equation (8), it follows that whereX noisy :=X + Z and δX bound denotes any of the lower bounds of the previous sections. Noise thus increases the minimum possible resolution. A more physical approach is to consider processes that add noise directly to the probe state, and to use stronger entropic uncertainty relations which depend on the probe state. For example, for a rank-1 discrete generator G, inequality (A.8) of Appendix A effectively replaces K I by K I e S[ρ0] , where S[ρ 0 ] is the von Neumann entropy of the probe state. Hence, replacing ρ 0 by the noisy probe state Π(ρ 0 ) for some completely positive map Π, the generic lower bound (2) generalises to for such generators, where · Π denotes an average with respect to Π(ρ 0 ). Similarly, the relative resolution bound (3) generalises to (one is limited to the case m = 1, since G T = G 1 + . . . + G m is not rank-1 for m > 1). For a rank-1 continuous generator G, the covariant bound (16) generalises to via uncertainty relation (A.10) of Appendix A. As always, the factor of 2 in the above denominators may be removed for the choice g = g min . For example, let N be the photon number operator of a single mode field subject to Gaussian noise, where the noise is described by the completely positive map [33] Λ(ρ) = d 2 α (πn λ ) −1 e −|α| 2 /n λ D(α)ρD(α) † , and D(α) = exp(αa † − α * a) denotes the Glauber displacement operator. The parameter n λ characterises the average number of photons added to the field, i.e., N Λ = N + n λ , while the entropy of the field is bounded, both for pure and mixed states, by [34] S[Λ(ρ)] ≥ ln(1 + n λ ) + n λ ln(1 + 1/n λ ) ≥ ln(1 + n λ ).
Combining this with the relative resolution bound (30) then yields a 'noisy' bound δΦ L ≥ (2πe 3 ) −1/2 1 + N /(n λ + 1) (32) for the relative resolution of any phase estimate, for a prior distribution uniform over some interval of width L. It is seen that the resolution becomes poor for sufficiently large noise.

Discussion
The results of the paper establish a rigorous, nonasymptotic and constraint-free lower bound for parameter estimation which is in the form of the Heisenberg limit and which takes prior information into account. The fundamental bound (2) implies that asymptotic scaling better than 1/ |G| is impossible, while bound (3) for discrete generators further demonstrates that, for shifts randomly distributed over some interval, asymptotic scaling better than 1/m is impossible for the relative resolution, where m is the number of probe states (and entangling joint measurements are permitted). Bound (16) for continuous generators implies a similar 1/m limit is unavoidable for the case of covariant estimates. It has also been shown how the effects of noise may be quantified in section 5, including resolution bound (32) for phase estimates on a single mode field subjected to Gaussian noise. Examples have been given for estimates of phase shifts, time shifts and spatial displacement in sections 3 and 4. These sections also give examples of how the basic method of section 2 may be applied to obtain resolution bounds in terms of alternative resources, such as the total available photon number in equation (19) and the energy support of the probe state in equation (22). For the case of discrete generators with a finite minimum eigenvalue, the corresponding relative resolution bound (21) may be stronger or weaker than the recent constrained bound (23) due to Tsang (section 3.2).
The fundamental tool used to obtain the above resolution bounds is equation (7) for the entropy of the error in the estimate, H(X − X). As noted briefly in section 2.2, the exponential of this entropy may be useful as an alternative possible measure of resolution [35]. Further, this measure has tighter corresponding bounds, as it avoids the use of the relation between δX and H(X−X) (which is only saturated for Gaussian distributions), required for obtaining equation (8).
Another interesting measure of resolution to consider is the mutual information between the shift and its estimate, H(X : X) [28]. While mutual information is not dealt with directly in this paper, the relative resolution bounds (3), (18), (19), (21), (22), (30) and (32) do allow an approximate upper bound to be derived for H(X : X), for discrete generators, whenever the prior distribution is uniform over some sufficiently large interval. In particular, the number of distributions of width δX that can be distinguished without error, over an interval of width L, is approximately L/δX. The corresponding mutual information, i.e., the corresponding number of bits that can be encoded by the shift parameter X and distinguished by the estimateX [28], is therefore H(X : X) ≈ log 2 [L/δX], i.e., the logarithm of the reciprocal of the relative resolution. Thus the above mentioned relative resolution bounds place an approximate upper bound on the mutual information. For example, for estimates of phase shifts uniform over an interval of length L, generated by the photon number of a single mode field subjected to Gaussian noise, one has the approximate upper bound for mutual information from equation (32). If the conjectured bound (15) is correct, one may similarly obtain estimates of mutual information for continuous generators. As noted above, the resolution can scale no better than inversely with the number of probe states, m, even when entangling joint measurements are permitted. As noted in the introduction, this contrasts with the m −1/2 scaling of the Cramer-Rao related bound (1). Note that for m = 1, the bounds of this paper can be numerically weaker or stronger than (1) (section 4.2).
It has been seen in sections 2-4 that covariant estimates do not exploit any prior information that may be available. Hence it is only possible to approach the generic resolution bounds (2) and (3) via noncovariant estimates. Further, as noted in section 3.1, a necessary condition for making use of prior information is that the measurement scheme must return a nonuniform distribution when some eigenstate of the generator is input as a probe state. While the recently proposed scheme of Luis and Rivas meets this condition, it does not approach the corresponding bound (18) (section 3.1). It therefore remains an important challenge for future work to determine how closely the various lower bounds of this paper can be approached.
It is also hoped that future work will settle the conjectures made regarding the relative resolution bound (15) for continuous generators in section 2.3, the improvement in scaling factors for phase estimates in section 3.1, and the strong entropic uncertainty bound (A.9) in Appendix A.
Finally, it is noted that the extensions of various entropic uncertainty relations to continuous observables, obtained in Appendix A, will find application beyond the realm of quantum metrology.
Acknowledgment: This work was supported by the ARC Centre of Excellence CE110001027.

Appendix A. Entropic uncertainty relations involving continuous POVMs
Appendix A.1. One continuous observable On a finite-dimensional Hilbert space, the entropies of two observables A and B, corresponding to finitely-valued POVMs {A j } and {B k }, satisfy the entropic uncertainty relation [25,36] where X ∞ denotes the largest singular value of X, i.e., the square root of the largest eigenvalue of X † X.
To extend this relation to the case where one of the observables is continuously valued, first consider some observable C taking continuous values in some compact set, with corresponding POVM {C θ }, and partition the range of θ into a finite number of nonoverlapping bins {P k } of equal size ǫ. Define the discrete observable C P associated with the partition via the POVM {C P k } with C P k := P k dθ C θ . Then, for any probability density p(θ) of C, there is a corresponding well-defined discrete probability distribution p k := P k dθ p(θ) (equal to the probablity of θ ∈ P k ), and an associated piecewise-continuous probability densityp(θ) given by replacing p(θ) by its average value over the bin P k for θ ∈ P k , i.e., Note that the probability of θ ∈ P k is identical for both p(θ) andp(θ), implying the latter converges in distribution to the former in the limit ǫ → 0. Note also that the entropy of C P , H(C P ) = − k p k ln p k , can be rewritten using k p k = 1 as The entropic uncertainty relation (A.1) for observables A and C P gives where |ψ j := A 1/2 j |ψ / ψ|A j |ψ 1/2 , andp ψj (θ) is defined analogously top(θ) above, with respect to the probability density p ψj (θ) = ψ j |C θ |ψ j (and thus converges in distribution to p ψj (θ) in the limit ǫ → 0). Using the above expression for H(C P ) then yields, taking the limit ǫ → 0, whenever H(C) exists, thus generalising (A.1).
Uncertainty relations (A.2) and (A.3) may be further extended to the case of infinite-dimensional Hilbert spaces, whenever the left hand side exists, by considering the limit of a series of projections of the observables onto finite Hilbert spaces. They similarly extend to the case of a countably infinite POVM {A j }, whenever H(A) exists, by considering the limit of the sequence of finite POVMs {A 1 , A 2 , . . . , A d , 1− d j=1 A j } as d → ∞. Finally, they also extend to the case of a non-compact range of C, whenever the left hand side exists, by representing the range as the limit of a series of compact sets S, and replacing {C θ } by a corresponding series of POVMs {C θ , θ ∈ S}∪{1−C S }, where C S := S dθ C θ . Thus, (A.2) is valid for any discrete-valued observable A and continuously-valued observable C, whenever H(A) + H(C) exists, and similarly for (A.3) if C 1/2 θ is well defined. For example, for an angular momentum component J z and its corresponding conjugate angle Φ, inequality in (A.2) reduces to the known result first given by Bialynicki-Birula and Mycielski [37], which is saturated by eigenstates of J z .

Appendix A.2. Two continuous observables
Uncertainty relation (A.3) provides the basis for extending to the case of two continuous-valued observables X and Y corresponding to POVMs {X x } and {Y y } respectively. The procedure is similar to the foregoing. In particular, partitioning the range of x into bins P j of equal size ǫ, one has the corresponding discrete POVM A ≡ {A j } with A j := Pj dx X j . The entropy of the corresponding observable A, given a continuous probability density p(x) of X, is then that of the discrete distribution p j := Pj dx p(x) = ǫp(x), where the second expression defines the piecewise continuous probability densityp(x). Substitution into (A.3), with C ≡ Y and assuming Y 1/2 y is well-defined, gives where |ψ y := Y 1/2 y |ψ / ψ|Y y |ψ 1/2 andp ψy (x) is defined analogously top(x) above, with respect to the probability density p ψy (x) = ψ y |X x |ψ y . Taking the limit ǫ → 0 then gives whenever the entropies and the relevant square roots are well defined. Indeed, this uncertainty relation can also be applied in some instances when the square roots are not well defined, via taking appropriate limits. For example, for conjugate position and momentum observables Q and P , with eigenkets |q and |p respectively, and any ǫ > 0, define the 'averaged' momentum observableP with POVM {P p } viaP p := (2ǫ) −1 ǫ −ǫ dk |p + k p + k|. Then,P 1/2 p = (2ǫ) 1/2P p from the spectral theorem, yielding where the last line follows from the Schwarz inequality. The first integral evaluates to 2ǫ/(2π ), while the second is never greater than unity for any normalised state ψ. Hence, substituting X = Q and Y =P into the first inequality of uncertainty relation (A.4), and taking the limit ǫ → 0, yields Note that the lower bound is not optimal, although it is close to the optimal bound ln πe , saturated by Gaussian pure states [36,37]. However, the same lower bound is optimal for the related tight uncertainty relation (A.10) below. More generally, if the POM {Y y } is a continuous projection-valued measure corresponding to some Hermitian operator, then although Y 1/2 y is not well defined, an entropic uncertainty relation may be obtained via a similar limiting approach. In particular, in such a case Y y Y y ′ = δ(y − y ′ )Y y , implying thatỸ y := (2ǫ) −1 ǫ −ǫ dz Y y+z satisfiesỸ 1/2 = (2ǫ) 1/2Ỹ y . Applying the first inequality in (A.4) to X andỸ yields x,y,ψ ln ψ y |X x |ψ y ψ|Ỹ y |ψ , with |ψ y :=Ỹ 1/2 y |ψ / ψ|Ỹ y |ψ 1/2 = (2ǫ) 1/2Ỹ y |ψ / ψ|Ỹ y |ψ 1/2 . Noting immediately follows, where |y ranges over all unit eigenkets of Y y . This generalisation of (A.5), holding for any projection-valued measure {Y y }, is of particular relevance to generators with continuous spectra (section 2.3.2).

Appendix A.3. One rank-1 observable
When the observable A in relation (A.1) is rank 1, i.e., when A j = |a j a j | for some set of (not necessarily normalised) kets {|a j }, then one has the stronger uncertainty relation [26] H(A) + H(B) ≥ − ln max whenever the left hand side exists, for any discrete-valued rank-1 observable A and any continuously-valued observable C. Unfortunately, one cannot analogously generalise (A.4) via the methods of appendix A.2, as these methods rely on use of an observable A which is not rank 1. However, it is conjectured here that such a generalisation exists, with for any two continuously valued observables X and Y , providing X is rank 1, both entropies exist, and Y 1/2 y is well defined. The above conjecture can be proved for the special case of conjugate position and momentum observables, using an approach of Pegg et al in which Q and P are represented by approximating them as discrete rank-1 observables on a D-dimensional Hilbert space and taking the limit D → ∞ [38]. In particular, substituting the discrete observables into (A.7) and taking this limit yields whenever the left hand side exists. The same method generalises to the case of conjugate n-vectors Q and P, with the right hand side of the above relation being multiplied by n. This result proves the conjecture in equation (47) of [35], which was made on the basis of a semiclassical argument. Note that, in contrast to inequality (A.5), the bound in inequality (A.10) is tight, being saturated in the limit of equilibrium states in the high temperature limit [35].
conjugate to Q in units such that = 1. From the known entropic uncertainty relation for Q and P [36,37], it immediately follows that H(F ) + H(G) = H(Q) + H(P ) ≥ ln πe.
Further, the probability distribution of any covariant observable M for some state ρ is equal to the probability distribution of the conjugate observable F for some corresponding state ρ ′ , where ρ and ρ ′ have the same probability distribution for the observable G [32]. Hence the above bound also holds with F replaced by M , yielding (A.11) as desired. It is of interest to consider a class of covariant estimates m based on independent measurements of the covariant observable M , made on m respective copies of the probe state. Letx = f (x 1 ,x 2 , . . . ,x m ) denote the corresponding estimate of the shift parameter, wherex j denotes the individual estimate given by the j-th measurement. It will be assumed that the estimate is shift-invariant, i.e., that the function f satisfies the identity f (x 1 + y, x 2 + y, . . . , x m + y) = f (x 1 , x 2 , . . . , x m ) + y.
(A.12) This is satisfied, for example, by any weighted mean f = w 1 x 1 + . . . w m x m with w 1 + . . . + w m = 1. Differentiating with respect to y and taking y = 0 gives the equivalent condition j (∂f /∂x j ) = 1.
To show that such an estimate is covariant with respect to the corresponding shift generator G T = G 1 + . . . + G m , let x denote the vector (x 1 , . . . , x m ), and define where α and β are Lagrange multipliers. It is convenient to work with the displaced distribution q n := p n+n0 , for which the variational equation δJ/δq n = 0 has the solution q n = Av |n| , for suitable positive constants A and v determined by the contraints. One easily finds that A = (1 − v)/(1 + v) and n := |n − n 0 | = n |n|q n = 2v/(1 − v 2 ). Further, H max (G) = − n q n [ln A + |n| ln v] = − ln A − n ln v. Inverting