Average number is an insufficient metric for interferometry

We argue that analysing schemes for metrology solely in terms of the average particle number can obscure the number of particles effectively used in informative events. For a number of states we demonstrate that, in both frequentist and Bayesian frameworks, the average number of a state can essentially be decoupled from the aspects of the total number distribution associated with any metrological advantage.

Mach-Zehnder Interferometry scheme with two one-mode states incident directly on the local phases. The initial beam splitter is sometimes omitted as in Figs. 2 and 3, particularly for two-mode input states. Separable inputs are normally considered when the initial beam splitter is present.

A. Precision in quantum mechanics
For a given probe state ρ and measurement Π the precision attainable by any unbiased estimate of a parameter is bounded by the CRB. Measurement-independent lower bounds on the CRB then exist as quantum Cramér-Rao bounds (QCRBs), which depend only on the quantum state. For single-parameter estimation the symmetric logarithmic derivative (SLD) QCRB is the relevant QCRB, as there exists a measurement which can satisfy the QCRB inequality. This forms a hierarchy of inequalities bounding the variance of the estimator [34,35] (∆ϕ) 2 ≥ 1 νI(ρ(ϕ); Π) ≥ 1 νJ (ρ(ϕ)) , where ν is the number of repetitions, I(ρ(ϕ); Π) the (state-and measurement-dependent) classical Fisher information (CFI), and J (ρ(ϕ)) the (state-but not measurement-dependent) SLD quantum Fisher information (QFI) (henceforth QFI).
For mixed states the required expressions can be far more involved [4,35,37] and the mixed state QFIs necessary for this work are derived in App. A 1. Among fixed-number states the maximum QFI comes from the N00N state [1,2], which is a balanced superposition of all photons in one mode or the other directly incident on the phase shifts as Fig. 2. Using Eq. (2),n 1 |ψ N00N = N |N 0 / √ 2 andn 2 |ψ N00N = N |0N / √ 2 give giving a Heisenberg-scaling precision (∆ϕ) 2 ≥ 1/νN 2 .

C. Phase references
The conventional measurement in optical interferometry is photon counting which is a phase-insensitive technique, although it can be used with linear optics to resolve relative phases it is insensitive to a local phase. Where a state of indefinite photon number is used with phase-insensitive measurements the pure state (single-parameter) QCRB is not necessarily sufficient to properly model the problem [4,38].
A single-mode state of indefinite number can-does, if pure-possess coherence between components of different total number which accumulate a local phase. This phase is not directly accessible through phase-insensitive measurements-the probabilities {| ψ|n | 2 } depend only on the magnitude of the amplitude in the Fock basiswithout introducing (an) additional state(s) with a well-defined relative phase. This is done both explicitly in the two-mode Mach-Zehnder or implicitly in the measurement as e.g. a local oscillator in homodyne and heterodyne detection [39].
While these phase-sensitive measurements are reasonable to consider in general, their availability fundamentally changes the formulation of the problem: it becomes capable to resolve a local phase shift with single-mode probe states [38,40]. As such a Mach-Zehnder scheme with phase-sensitive measurements may contain unnecessary overhead and perform worse than a phase-sensitive measurement on a single-mode probe state under an equivalent resource.
When we wish to explicitly consider phase-insensitive measurements (such as photon counting in Fig. 1) the input probe state should be modified to erase coherences with respect to different degenerate eigenstate subspaces of the total number operators as [38] whence it follows e iζ(n1+n2) ρ ψ e −iζ(n1+n2) = ρ ψ . Unless |ψ is a state of fixed total number then ρ ψ is necessarily a mixed state and J (ρ ψ ) cannot be evaluated by Eq. (2), however this does not preclude J (ρ ψ ) = J (|ψ ) which is in fact true for all the states we consider in the following section.

III. OBFUSCATION THROUGH AVERAGE NUMBER
We first consider the vacuum-Fock superposition states states which are incident directly on a relative phase as in Fig. 3. These can be considered a Fock state analog of the Rivas-Luis state [18] to which we would expect the main ideas of this section to generalise. These |ψ 0N states are a simple example to illustrate why we should be concerned with average number representing the entire description, "danger", and power of a variable number state. We do not propose them as viable probe states: the generation of such a state |ψ 0N can be expected to be demanding and difficult to scale [41], before worrying about the relative magnitudes of the amplitudes or the simultaneous generation of two copies with fixed and known relative phase. Nor as worthwhile probe states: as well as numerous information-theoretic concerns made previously [20,22], we will demonstrate these in fact offer no advantage over an equivalent N00N state.
A. Arbitrary precision from fixed average number states photons. The QFI for ϕ = φ 1 − φ 2 can be evaluated from Eq. (2) to Eq. (9) is quadratic in N while the average photon number is linear in N and so we attain Heisenberg scaling for a fixed η.
However, rewriting Eq. (9) in terms of N and N -by solving Eq.
which can be made arbitrarily large for any fixed N by making N arbitrarily large and adjusting η accordingly.
Observations of such large QFIs have been made before [18,19]. One problem that should be anticipated from similar states is in finite trials due to issues with the necessary repetitions [20,21,42,43] or necessary prior knowledge [44][45][46]. While these limitations amply motivate alternative probe states, here we will show that the apparent surpassing of Heisenberg scaling in N can be explained also within local estimation itself.
With a N00N state the same precision can be reached with √ 2ηN/(1 + η 2 ) photons (or at least approximated with the nearest integers). For η ≥ 1/ √ 2 this requires fewer photons on average, however when η 1 (which is the case when Eq. (10) is made to substantially exceed Heisenberg scaling) the N00N state requires far more photons than the average number in the vacuum-Fock superpositions for the same precision. This suggests the vacuum-Fock superposition (with η 1) should be preferred over a N00N state with equivalent QFI due to the lower average number. We will argue that, even if both states can be easily generated, such a predilection is ill-founded and may be deeply misguided.

B. Excessive damage through detection events
We can divide the potential measurement outcomes into three groups: a. neither detector clicks, b. both detectors click a total of N times, c. both detectors click a total of 2N times. The latter two events subdivide into a combinatoric wealth of different distributions between the two detectors. Already we can see that when we register photons as having passed through the sample, the number thereof can be entirely detached from N , given in Eq. (8), which vanishes in the η 1 limit even though N remains comparatively large. In each case we can identify distinct amounts of information which will be returned: a. no photons passed through the relative phase, no clicks, no information; b. N photons passed through the relative phase shift in superposition and their resulting distribution between the two detectors is parameter-dependent; c. N photons passed through each arm (2N total) and so no relative phase was accumulated.
In the first case one may be disappointed as we learn precisely nothing, though any sample is unscathed. In the second, N photons pass through the sample and the QFI and their counts display parameter-dependence. In the last though, 2N photons passed through the sample, yet the counting statistics are entirely uninformative.
Where photon numbers in the single-shot are of utmost concern the existence of such an event could destroy the sample or necessitate the experiment be run with lower average numbers at the expense of precision. Yet this is not apparent if we rely on the average number to be a sufficient approximation for the number of photons in the singleshot. Nor were we expecting to only gain information when at least N -rather than N -photons pass through the sample.

0N
. This gives us the state The QFI of ρ ψ ⊗2

0N
, evaluated in App. A 1 a using the SLD, is matching Eq. (9), whence we see that the optimal measurement for the probe state |ψ ⊗2 0N is phase-insensitive. Moreover, as only the |ψ N00N component in both |ψ ⊗2 0N and ρ ψ ⊗2

0N
acquires a ϕ dependence (e iϕ(n1−n2)/2 |M M = |M M ) we can identify that a sufficient measurement to attain the QCRB is one which both attains the QCRB for |ψ N00N and distinguishes between components of different total photon number-the photon counting illustrated in Fig. 3 is one such measurement.
This probabilistic mixture of the definite total-number components of |ψ ⊗2 0N is no longer mode-separable. Now, the only coherence is found in the (mode-entangled) N00N state term, which was implicitly present in the expansion of the tensor product in Eq. (7). The appearance of this N00N state component is not coincidental but is responsiblealongside it's vanishing weighting for η 1-for the arbitrarily large QFI of the |ψ ⊗2 0N state.

0N
in the previous section as the effective state seen by phase-insensitive measurements it is itself a valid probe state-one which is just as capable as |ψ ⊗2 0N . Indeed whether created independently or by a suitable collapse on |ψ ⊗2 0N it is informative to understanding the underpinnings of the better-than-Heisenberg behaviour displayed by both |ψ ⊗2 0N and ρ ψ ⊗2

0N
. Convexity of the QFI upperbounds the QFI of any mixed state as [47][48][49] J ( where the two terms are the CFI of the probabilities attached to the pure state decomposition and the average of the QFIs of the associated pure states. For ρ ψ ⊗2

0N
using the decomposition given in Eq. (11), the convexity inequality is satisfied with J (ψ N00N ) = N 2 and (|ψ ⊗2 0N ) as probe state is metrologically equivalent to probabilistically using the equivalent N00N state with the same N and equivalent probability This is not to say that ρ ψ ⊗2

0N
by probabilistically generating |00 , |ψ N00N , or |N N or measuring total number of ψ ⊗2 0N we can technically improve on the strategy by not preparing or discarding the |N N component. Without material change to the interferometry scheme using instead has a negligible effect on N with 2η 2 N/(1 + η 2 ) 2 dominating 2η 4 N/(1 + η 2 ) 2 when η 1, but still has the same QFI 2η 2 N/(1 + η 2 ) 2 . More importantly though it disposes of the potentially damaging but metrologically useless 2N events, making a significant difference to the worst-case in spite of a vanishing change in N .

0N
, a coherent form of ρ ψ0NN can also be defined as which shares average number and QFI with ρ ψ0NN . However, as with |ψ ⊗2 0N , this can be thought of as nothing more than a coherent method to probabilistically apply a N00N state.

E. Implications of satisfying the convexity bound
At this point we can be convinced that the metrological value of ρ ψ ⊗2 0N or ρ ψ0NN is the (effective) probabilistic application of a N00N state. Considering either the expansion of the tensor product in Eq. (7) or the (equal QFI) effective state for phase-insensitive measurements in Eq. (11), the same appears to be true of |ψ ⊗2 0N . Here again we find ourselves dissatisfied with average number: we have associated the precision of our probe state directly with a single component of fixed number N which is detached from N (with arbitrary η).
This is not to say that the probabilistic application of the N00N state is a valuable or even viable strategy compared to the deterministic application. When vacuum is applied nothing is done and the measurement results are of no value-non-zero contributions to both average number and QFI come solely from the N00N state component. It is simply the interplay between the QFI displaying a quadratic scaling in N while average number is only linear in N which-as noted previously in coherent [18] and incoherent [4,26] mixtures-gives rise to the arbitrarily large QFI discussed in Sec. III A.
Indeed, if probabilistically preparing vacuum were a mechanism to genuinely increase precision while reducing sample exposure we could note that we can apply the vacuum as probe state and discard the (deterministic) measurement results without reducing the quality of the estimate. However, if we are not using the measurement results and consider the vacuum state to have no effect on the sample [50] no practical distinction can be drawn between performing or simulating these vacuum applications. This would open the the farcical situation that any Heisenberg-scaling scheme can be "improved" with respect to the average number by simulating a classical mixture of the probe state and a vacuum probe. Such improvement is patently false as the same useful applications of a Heisenberg-scaling probe state have simply been diluted with additiona zero-cost, zero-information probe states. Instead, to avoid this loophole of guaranteed yet free arbitrary precision from any better than shot-noise probe(s), we should conclude that the cost of the classical mixture cannot be well-represented by the average number N .

F. A more appropriate resource
We have considered a number of different states: , ρ ψ0NN , and |ψ 0NN , possessing various total number distributions, average number, and QFI. Yet in each instance the QFI is that of the N -particle N00N state scaled by the probability of collapsing each state into that N -particle N00N state. The states also share an identical SLD operator as App. A 1 a shows.
It is our view that these-and the associated mixed states generated by projecting in total number-all derive their metrological power solely from their possession of a N00N state component. This makes the other-where η 1, largely vacuum-components only relevant to the average number (for a given amplitude of the N00N state). If one were to condition on events where a photon is detected (a necessary but not sufficient condition for such an event to be informative as to the phase shift) then, as Tab. I shows, the states with a vacuum-component appear no-better, if not worse. Rather than η being a parameter which can be tuned to increase the QFI, it should be considered a parameter to reduce N by obfuscating the N00N component solely responsible for the metrological power of all these states. As such N , not N , is a more representative resource for these states in terms of both the photons passing through the sample and their metrological performance.

IV. PROBABILISTIC SENSING
The states of the previous example are particularly convenient to analyse, however, the fundamental conceit of doing nothing-by coherent or incoherent means-to artificially deflate average number can be applied more generally. In both cases the average number of the resultant state becomes detached from the original state responsible for all phase-sensitivity giving a false impression of improved sensitivity per photon.

A. Probabilistically sensing
A two-mode state |φ M with average number M which is either used (with probability p) or replaced with vacuum (with probability 1 − p)-without adapting the interferometer or measurement-as input to the interferometer is described by the state which-by definition of |φ M φ M |-contains on average pM photons. The QFI is (see App. A 1 b) which is upperbounded as J (ρ) ≤ pJ (|φ ), with the equality holding for any |φ either orthogonal to the vacuum state ( 0|φ = 0), or symmetric in the two modes ( n 1 −n 2 = 0). Adapting the probability with which |φ M is used to fix p to be N /M -for any N ≤ M -means the QFI is at most Here it may also make sense to consider M as the more appropriate resource constraint for the states defined in Eq. (16). With respect to M , J (ρ H M ) does not display any better-than-Heisenberg scaling and p acts again to reduce the average number without positively contributing to the metrological performance. However, this relies on |φ M itself being well-represented by its average number; this fails (as it should) if |φ M = |ψ ⊗2 0N .

B. Coherent probabilistic applications
Now we turn to a coherent form of the probabilistic scheme where Limiting to 0|φ M = 0 such that |ψ is normalised, we haven j |ψ = √ pn j |φ M and so the QFI is This coherent probabilistic scheme can technically improve on the probabilistic scheme outlined in Sec. IV A by the relatively simple comparison of Eqs. (17) and (20), noting Eq. (20) is greater pJ (|φ M ) due to the additional factor p reducing the n 1 −n 2 2 φ(N ) term. Again, with a Heisenberg scaling |φ M , |ψ can achieve a better-than-Heisenberg scaling with respect to N , but not M .

C. Average number
In both the incoherent and coherent settings discussed, we can perform the same conditioning on events where a non-zero number of photons are detected (which is necessary but not sufficient for the event to be informative). Considering 0|φ M = 0, photons are detected with probability p and precision can only increase when this happens. Conditioning on such detection events-retaining p = N /M -we find for ρ and for |ψ where the latter term can be upper-bounded by M 2 (as 1 − p ≤ 1 and n 1 −n 2 2 ≤ n 1 +n 2 2 ) which means for

V. BAYESIAN ANALYSIS
In Sec. III we focused exclusively on the CRB and QFI as a tool to argue that certain variable-number states derive their metrological power from a common source, and that this fact can be obscured when the average number is employed as the resource. We now consider a Bayesian single-shot analysis to show that these observations transcend the local nature of the CRB.

A. Metrological power in the presence of prior information
Instead of a hierarchy of inequalities (e.g., Eq. (1)) which must be saturated sequentially in order to describe the attainable precision, the Bayesian framework enables a flexible formulation of optimisation problems whose solution, if available, readily provides the best possible measurement and phase estimate [51], together with the associated uncertainty, for an appropriate criterion of performance [52].
Under the mean square error criterion [17,52], the optimal single-shot measurement is given by the eigenvectors of an operator S, solution to the equation S + S = 2 , where = b a dϕ z(ϕ)ρ(ϕ), = b a dϕ z(ϕ)ρ(ϕ)ϕ, and z(ϕ) is a prior probability with support bounded by a and b [43,53]. In this work, ρ(ϕ) = exp(−iϕK) ρ(0) exp(iϕK), with K = (n 1 −n 2 )/2 and ρ(0) the initial state fed to the interferometer. Furthermore, the j-th eigenvalue of S gives the optimal estimate for the phase shift ϕ when the j-th projector is measured, and the optimal uncertainty is [43,53], for a single shot [54], where (∆ϕ) 2 denotes the prior-averaged mean square error [4,33,55]. The optimal phase estimates associated with Eq. (23) are generally biased, for, in Bayesian estimation, limiting the problem to unbiased estimators is not needed, nor necessarily beneficial. This is in contrast to earlier sections where (∆ϕ) 2 refers to the variance of an unbiased estimator, with unbiasedness being required there as biased estimators can display low variance despite, due to large bias, high error. Next, it is noted that the term Tr( S 2 ) is suggestive of the mixed-state definition for the QFI (c.f. App. A 1). However, since the prior z(ϕ) enters Eq. (23) non-trivially, a more careful examination is needed if we are to find a sensible quantifier of metrological power in Bayesian interferometry.
If the prior is sufficiently narrow, Eq. (23) may reasonably be approximated as (∆ϕ) 2 [43,56,57], where J (ρ) is the QFI employed so far and 2 is the initial uncertainty. This motivates rewriting Eq. (23) in a similar form, but now without the aforementioned approximation, as where we have defined the quantity and Tr( S) = b a dϕ z(ϕ)ϕ, with ρ and denoting the initial and the prior-averaged parameter-encoded states. Note that Eq. (25) recovers the QFI locally, i.e., P(ρ; z) → J (ρ) in the limit of a narrow prior. We now argue that P(ρ; z) is a valid and more general quantifier of metrological power.
By construction, 0 ≤ P(ρ; z) ≤ 1/σ 2 0 . If P(ρ; z) = 0, then (∆ϕ) 2 opt = σ 2 0 , and so no information is gained by the application of the scheme. On the contrary, if P(ρ; z) = 1/σ 2 0 , then (∆ϕ) 2 opt = 0, which would imply that the relative phase is perfectly resolved. Enhancing the precision is thus equivalent to reducing the prior error by making P(ρ; z) larger. Unlike the QFI, P(ρ; z) can, regardless of ρ, only be unbounded when σ 0 = 0, where even an infinite amount of information would not improve the estimate since there is no uncertainty to be reduced.
The quantifier P(ρ; z) thus possesses the properties that a good measure of metrological power should have, while generalising the QFI. Specifically, in the limit where P(ρ; z) → J (ρ), Eq. (25) leads to the same hierarchy of probes predicted by local estimation; otherwise, the Bayesian quantifier will generally reveal more accurate information about metrological enhancements.
Note that, in general, one should use a criterion of performance that respects the periodicity of phase shifts [4,58], and Eq. (23) does not. Nevertheless, if the prior is effectively concentrated within an interval of length W = (b−a) < ∼ 2, then the square error giving rise to Eq. (23) approximates a sine square error, which is a valid cost function for periodic parameters [4,43,58]. Admittedly, this restricts the quantifier P(ρ; z) to an intermediate regime between local estimation (W 1) and a fully global estimation framework (W ≤ 2π). Yet, such a regime has been shown to be sufficiently non-local (i.e., to allow for sufficiently wide priors) to reveal non-trivial effects beyond QFI analyses in interferometry [43], and so it suffices for our purpose.

B. Average number in Bayesian interferometry
We now demonstrate that the decoupling of average number and the origin of the metrological power is not limited to local estimation. To ensure equivalent prior knowledge in each case, we omit a direct analysis of ψ ⊗2 0N and ψ 0NN , and instead use their effective phase-insensitive measurement forms ρ ψ ⊗2 0N and ρ ψ0NN , which are also of independent interest viz. Sec. III. App. A 2 provides the details of these calculations.

0N
, which, assuming the same flat prior, leads to Recalling that, for this scheme, N = 2 η 2 N/(1 + η 2 ), Eq. (28) can be written as where the approximation assumes that, for a fixed (and finite) N , N N . In other words, ρ ψ ⊗2

0N
cannot provide a precision scaling better than P(ρ ψ ⊗2

0N
) ∼ N N , where N , and not N , gauges the precision. Moreover, the Heisenberg scaling is not violated with respect to N .
As with the QFI, here too we are led to conclude that N may be a more representative resource constraint. This is further reinforced by the existence of a key relationship between N and the prior width W -namely, W < ∼ 1/N -which is needed for P(ρ ψ ⊗2

0N
) ∼ N N to hold with a coefficient of order unity. Given that the average number N plays no role in such relationship, using N as a proxy without examining the role of N could lead to underestimating the prior information needed to exploit the metrological power of a state. For instance, we could be led to rely on W ∼ 1/ N which, should it be an insufficient amount of prior information, risks giving rise to exposure to a high-power probe state that does not extract information beyond the initial knowledge; such a zero-gain in spite of a prior W ∼ 1/ N has been noted previously [46].
Still, there is no doubt that ρ ψ ⊗2

0N
has some metrological value-Luis [42] also showed in the context of Bayesian estimation that the single-mode |ψ ⊗2 0N has some value-however, as in the QFI discussion, its source may be linked, tentatively, to the N00N-state component. First, note that the constraint W < ∼ 1/N is a consequence of the form of the Bayesian precision for N00N states, regardless of the variable-number state to which the N00N-state component may belong to.
Secondly, if we generalise the notion of resource introduced in Sec. III F from J (ρ)/p to P(ρ)/p, where p was the probability of detecting photons at the output ports, we find where we have used that, for ρ ψ ⊗2

0N
can never surpass that of a probabilistic application of its N00N-state component. If we instead examine the mixed-N00N state ρ ψ0NN in Eq. (14), which removes the possibility of non-informative and potentially damaging events (i.e., those associated with |N N ), then we have the exact identity where now p = 2η 2 /(1 + η 2 ) 2 . Finally, in App. A 2 we show that-as with the SLD-the quantum estimator S is that for a N00N state even when ρ ψ ⊗2 0N or ρ ψ0NN are employed. Given the fundamental role played by S in Bayesian estimation-it contains the optimal phase estimates and their measurement scheme-this is perhaps the most convincing argument.
We conclude that the unsuitability of the average photon number to capture the true source of metrological advantage, as well as the real damage done to the sample, is not a byproduct of the local nature of the QFI, but a more general feature of quantum metrology. More practically, Eqs. (27), (30) and (31) extend the validity of the findings in Tab. I, based on the QFI, to a non-local regime with moderate prior knowledge (i.e., W < ∼ 2).

A. Understanding Heisenberg scaling with only an average number constraint
We have seen that the close-to-vacuum states (|0 + η |φ )/ 1 + η 2 appear to owe their metrological-advantage to being effective probabilistic application of Heisenberg-scaling states. We would expect these observations to carry over to the likes of the Rivas-Luis states; Sec. IV B does in principle cover them, but only through a state of the form |φ = |0, ξ + |ξ, 0 + α |ξ, ξ which we do not believe have garnered any direct analysis. This already raises major questions as to whether the apparent Heisenberg-violating scalings are either: actually Heisenberg-violating, or quantum in origin.
An explicitly better-than-Heisenberg scaling requires N to be explicitly parameterised in terms of η and N , otherwise we simply have a highly favourable pre-factor to a QFI linear in N as Eq. (10), or an unfavourable prefactor to a QFI quadratic in N as Eq. (9). Moreover, if we take N as our resource for these states, the parameterised form (e.g. Eq. (10)) is bound by a Heisenberg-scaling in N -both in isolation and conditioning on photon-involving events (Tab. I).
Arbitrarily large precision gains are, in any case, explicitly forbidden when working with finite prior information (i.e., σ 0 is finite), since the Bayesian quantifier P(ρ; z) is upper-bounded by 1/σ 2 0 . Even this finite improvement can be hard to achieve with better-than-Heisenberg strategies requiring either a very large number of repetitions [20,59] or a very large amount of prior knowledge [21,46]. Moreover, there is an extensive body of evidence supporting that a Heisenberg-scaling in the total number used across all repetitions cannot be surpassed [20,22,26,60,61].
As to the quantum origin, coherence between different total number subspaces does not contribute to the precision and performance is no different to spending a fraction of trials using N00N states and a fraction doing nothing. One can thus argue that the Heisenberg-scaling derives from the quintessentially quantum N00N state but (in the cases discussed herewanyhe apparent better-than-Heisenberg precision on top of this comes from the wholly classical ability to sometimes do nothing whether exactly or-due to coherence-only effectively.

B. Noise
Optical loss is known to be detrimental to precision in interferometry [62]. Of particular relevance to this work is the inevitable reduction to a shot-noise scaling in sufficiently high loss [63][64][65]. This gives rise to a bound on the QFI of a variable number state ρ with average number N of [66] for fractional loss rate γ. In low noise, Eq. (32) has a substantial pre-factor on N which does not preclude Heisenbergscaling (γ/(1 − γ) can exceed N ), but for sufficiently high loss any scheme reduces to a shot-noise scaling. In such a regime N may suffice, though photodamage may remain a concern if the loss involves absorption or does not precede the sample.
In discussing the potential photodamage we limited our model to the number of photons passing through the probe. In practice, a more intricate model explicitly factoring in some absorption (damage) dynamic may be desirable both in quantifying a state's metrological performance and the potential damage it may cause by its application. Whether a given level of exposure, or absorption or damage is considered can give rise to different precisions [10,67].
One could consider using N 2 rather than or in addition to N as a constraint; this gives a much larger cost for |ψ ⊗2 0N that better matches the average photon number when conditioned on events where a non-zero number of photons is detected. For fixed number and coherent states N 2 = N , while N 2 ≈ N for squeezed vacuum states [2,71]. This allows the existing shot-noise and Heisenberg scalings to carry over for those states under a N 2 constraint, while marking out the vacuum-Fock or Rivas-Luis states as much more expensive than average number alone suggests.
Working to an additional constraint of N 2 may make more sense to understand the potential exposure to a sample, particularly in the Heisenberg-violating regimes we explored earlier, however it is not sufficient to alleviate worst-case concerns. Consider a superposition of primarily squeezed vacuum with a vanishing component of a much higher energy Fock state, in some sense an inverse situation to |ψ ⊗2 0N ; we would expect to find an apparently reasonable sensitivity: quadratic in N and N 2 due to the squeezed vacuum contribution dominating, but still possessing a non-zero probability of a potentially damaging event. While this situation would be better for bounding N 2 and N 2 , we can still have a rare, damaging, yet uninformative event-whether in terms of the number of photons passing through the sample or explicitly modelling absorption. Simply constraining the maximum number-essentially truncating the Fock space by total photon number-goes a long way to alleviate this, but such a truncation may be stricter than need be, forbidding for example the conventional Gaussian states.
Instead, there is still some cause to desire a more intricately weighted measure of information per cost-such as an extension of the quantities J (|ψ )/(1 − | 0|ψ | 2 ) and P(|ψ ; z)/(1 − | 0|ψ | 2 ) employed in Sec. III F and Sec. V B, respectively-capable of ensuring good sensing value per-photon in every aspect of the probe state. More general measures of fluctuations in the total number distribution may be sufficient for this purpose. Even simple heuristics may be of use: the total number distribution of Gaussian or fixed-number states are singly-peaked (around N ) while the vacuum-Fock and Rivas-Luis states are multiply-peaked (around 0, x, and 2x, with x being the N of the Fock state and the average number of the squeezed vacuum state).

VII. CONCLUSIONS
We have argued that average number can fail to capture properties of the full total number distribution of probe states relevant to metrological power, photodamage, and necessary prior information. While average number has been known not to properly constrain the theoretical precision of a probe state [24,26], we argue this should be attributed to an over-reliance on average number to capture a potentially complex, multi-faceted number distribution rather than the existence of certain anomalous states.
Finally, we emphasise that while the examples considered here are informative as to the role average number should or should not be allowed to play in evaluating and comparing interferometry schemes, this does not negate the practical inaccessibility of supposed better-than-Heisenberg scaling states [20-23, 33, 72], nor can it circumvent the value of states and measurements which are practical to prepare [73] and noise-resilient [74]. ACKNOWLEDGMENTS We are grateful to Francesco Albarelli for valuable discussions. We thank Erik Gauger and Alfredo Luis for comments. DB acknowledges support from UK EPSRC Grant EP/R030413/1. JR acknowledges support from UK EPSRC Grant No. EP/T002875/1. For a mixed state ρ the SLD QFI is not given directly by Eq. (2) but can be found as [34,35] where L ϕ is the the SLD operator defined by a. Mixed state relatives of the N00N and vacuum-Fock states In Sec. III we have parameter-dependent mixed states of the form whose derivative is and so the SLD is which is also the SLD operator for any pure state where |ψ N00N is the only phase-sensitive component as L ϕ |ψ N00N (ϕ) = ∂ ϕ |ψ N00N (ϕ) . The QFI for Eq. (A3) is then which generalises the specific mixed state QFIs given in Sec. III.

Bayesian quantifier
Sec. V A introduces a measure of metrological power, P(ρ; z) = [Tr( S 2 ) − Tr( S) 2 ]/σ 4 0 , capable of accommodating a finite and moderate amount of prior information. Given the flat prior in the main text, i.e., z(ϕ) = 1/W , when ϕ ∈ [−W/2, W/2], such quantifier becomes where we have used that σ 2 0 = W 2 /12 and The optimal quantum estimator S is given by S + S = 2 , where, in this case, As the SLD in Eq. (A2), S is determined by a Lyapunov equation. However, the similarity ends here: unlike in the frequentist approach, in the Bayesian framework we integrate the phase-dependent state with respect to the prior, making the final solution parameter-independent, and the derivative ∂ ϕ ρ no longer plays a role (although it does reemerge in the limit of a narrow prior, as one would expect from a local approach [43,56] where s N W = sin(N W/2) and c N W = cos(N W/2). The solution to S + S = 2 is then Introducing Eqs. (A12) and (A14) in Eq. (A9), we finally arrive at where we have defined the function κ(x) = 9 (x cos x − sin x) 2 /x 6 . The results discussed in Sec. V B can be then recovered from Eq. (A15) as follows: β = 1 for Eq. (26) (N00N state), and β = 2η 2 /(1 + η 2 ) 2 for both Eq. (28) (state ρ ψ ⊗2

0N
) and P(ρ ψ0NN ) in Eq. (31). We further note that the optimal quantum estimator S in Eq. (A14) is independent of α and β, and so identical for all the states represented by Eq. (A3), which include the N00N state. This leads to the third argument given in Sec. V B to support the idea that the metrological advantage of states such as ρ ψ ⊗2 0N and ρ ψ0NN , both included in Eq. (A3), originates in their N00N-state components.