Quantum metrology in the presence of limited data

Quantum metrology protocols are typically designed around the assumption that we have an abundance of measurement data, but recent practical applications are increasingly driving interest in cases with very limited data. In this regime the best approach involves an interesting interplay between the amount of data and the prior information. Here we propose a new way of optimising these schemes based on the practically-motivated assumption that we have a sequence of identical and independent measurements. For a given probe state we take our measurement to be the best one for a single shot and we use this sequentially to study the performance of different practical states in a Mach-Zehnder interferometer when we have moderate prior knowledge of the underlying parameter. We find that we recover the quantum Cram\'er-Rao bound asymptotically, but for low data counts we find a completely different structure. Despite the fact that intra-mode correlations are known to be the key to increasing the asymptotic precision, we find evidence that these could be detrimental in the low data regime and that entanglement between the paths of the interferometer may play a more important role. Finally, we analyse how close realistic measurements can get to the bound and find that measuring quadratures can improve upon counting photons, though both strategies converge asymptotically. These results may prove to be important in the development of quantum enhanced metrology applications where practical considerations mean that we are limited to a small number of trials.


I. INTRODUCTION
Empirical data constitute our primary source of knowledge to construct theories that explain the world around us, and to develop the necessary technologies that help us to accomplish that task. For that reason, how we extract and process that data is a crucial step, and this can be formally captured in quantum systems by the formalism of quantum metrology, a set of techniques that rely on quantum mechanics to extract information about unknown physical quantities from the outcomes of experiments [1][2][3][4][5].
In practice the quality of this information is restricted by factors such as the number of probes, measurements or repetitions, or by the energy that the experimental arrangement can employ. The latter constraint is particularly relevant for cases where we are interested in studying fragile systems such as atoms, molecules, spin ensembles or biological samples [6][7][8][9][10][11][12]. On the other hand, the number of times that we can interact with the system under study by performing several measurements is always finite and potentially small. This is a possibility that could arise, for instance, in tracking scenarios where we can only have access to a few observations before the object of interest is out of reach, as might be the case for quantum radar [13][14][15] or lidar [15][16][17].
This situation can be mathematically represented as an optimisation problem where the minimisation of some measure of uncertainty or error for a certain fixed amount of resources informs us about how we should design our experiment so that its performance is optimal. If the un-certainty is based on the square error or can be safely approximated by it, then we can maximise the Fisher information and use the Cramér-Rao bound as the figure of merit [3,4]. This approach is appealing in principle for several reasons. Firstly, bounds for estimating a single parameter derived following this path can always be approached asymptotically provided that we repeat the experiment enough times and that we have certain prior knowledge about the unknown parameter [4,5,[18][19][20], and this simplifies the optimisation of the error considerably. Furthermore, the Fisher information has a certain fundamental character. In particular, it can be seen as a distinguishability metric [21] that arises in the expansion of the Bures distance between two infinitesimally close states [3,18]. Moreover, its reciprocal gives us the asymptotic limit for the Bayesian mean square error as a function of the number of repetitions under some fairly general assumptions [5,22], and this is also the case for other approaches that are more conservative than the Cramér-Rao bound too [23,24].
Nevertheless, the fact that this technique normally requires many repetitions to be useful is an important drawback to study realistic physical systems such as those previously mentioned. This problem has already been acknowledged in the literature (e.g., in [4,5,18,25]), and several solutions have been proposed. A conceptually simple and straightforward approach consists in using a general measure of uncertainty and estimating how many measurements are needed such that the results predicted by the asymptotic theory are valid, which can always be done numerically [5,26]. In addition, we can rely on numerical techniques such as Monte Carlo simulations [27] or machine learning [28] to perform the optimisation directly, or can simply examine the behaviour of the system when the number of resources is finite once we have established the asymptotic results [23]. This was precisely the idea behind the methodology proposed in [5], where we analysed the non-asymptotic performance of metrology protocols that had been optimised as if the asymptotic theory were valid, and we explored the structure of the non-asymptotic regime with concrete examples.
A different possibility is to derive more general lower bounds that are valid in both the asymptotic and the non-asymptotic regimes, such as [29][30][31]. Interestingly, this path provides tools that share the computational simplicity of the Cramér-Rao bound to some extent, but they also present important limitations. For example, the quantum Ziv-Zakai bound [29] can recover the asymptotic scaling, but it is not tight in general. The situation improves with the quantum Weiss-Weinstein bound [30], since it is asymptotically tight. However, it is not guaranteed that we can saturate this bound in the regime with a finite number of measurements. A similar problem arises with the quantum optimal-bias bound [31], since by construction it is lower than the Cramér-Rao bound and, as we will see, the latter is sometimes lower than the optimal error when it is applied out of its regime of applicability.
This state of affairs motivates the following question: how can we go beyond our current methods and improve our predictions for the optimal performance of our experiments when these operate in the regime of limited data? Here we propose a new method combining analytical and numerical techniques that contributes towards the solution of this problem, and we demonstrate its potential using a Mach-Zehnder interferometer that operates in the regime of limited data and moderate prior knowledge.
The key idea is to find the measurement scheme predicted by the optimal single-shot mean square error that was originally introduced in [32] and use that measurement in a sequence of repeated experiments. We will show that the bounds that arise from this technique are tight and can be approached in principle both for a single shot (by construction) and in the asymptotic regime of many measurements, since the results predicted by the Fisher information are recovered in the latter case. And while this does not guarantee that our solution will be optimal for a few observations (an adaptive scheme may be better than repeating the same measurement in that case), we will see that having an error that is a function of the number of repetitions where the first point is already tight, and that also tends towards the asymptotically optimal solution as the number of shots grows, is enough to draw conclusions to important questions such as the role of photon number correlations or the performance of experimentally feasible measurements in the regime of limited data. For instance, we have found an example where the correlations between the paths of the Mach-Zehnder interferometer appear to be particularly useful in this regime, and we have demonstrated that while measuring quadratures and counting photons after the action of a beam splitter are asymptotically equivalent in an ideal scenario, the former measurement scheme is better for a low number of repeated experiments.
It is interesting to note that a related approach was recently discussed in [33], where the authors presented a modification of the quantum Van Trees inequality and used it to construct an adaptive strategy based on an optimal parameter-independent single-shot measurement scheme. Therefore, our work and [33] are complementary, since we will mainly focus on repeated measurements to connect the optimal single-shot and asymptotic regimes and to explore the regime with a finite number of experiments. Moreover, our results can be seen as a non-trivial generalisation with respect to those that are obtained when the Fisher information is used instead. The paper is organized as follows. Our method based on single-shot measurements is developed in section II, where we also review the Mach-Zehnder interferometer and the probes that we will use in our calculations. Section III presents and discusses the bounds that arise from the application of our methodology to optical interferometry, and section IV studies the role of intra-mode and inter-mode correlations in the regime of limited data. The effect of changing the prior information is analysed in section V, where as expected we recover the predictions of the Fisher information when the prior is very narrow, and we study how to approach our bounds using practical schemes such as photon counting, measurements of quadratures or parity measurements in section VI. Finally, a summary of our conclusions and the potential of our proposal for the field of quantum metrology is presented in section VII.

A. Methodology
Suppose we have a quantum probe with statistical properties described by the density matrix ρ 0 . The probe then interacts with a second object characterised by the parameter θ, and this unknown quantity is encoded in the probe state through the unitary transformation ρ(θ) = U (θ)ρ 0 U (θ) † . In order to extract the information about θ we perform a measurement described by the POVM elements {E(n)}, where the conditional probability for the outcome n is given by the Born rule p(n|θ) = Tr [E(n)ρ(θ)]. In addition, any extra knowledge that we might have about θ that is not directly related to the measurement scheme can be encoded in the prior probability p(θ).
The joint probability p(θ, n) = p(θ)p(n|θ) contains all the available information about both the experimental outcomes and the unknown parameter. However, to have a more concrete idea about what the value of θ is we can construct an estimator g(n), which produces an estimate for the parameter as a function of the experimental outcome n, and the uncertainty of this procedure can be characterised by the average of some error function [g(n), θ], that is, Figure 1. Representation of the extraction of information from a quantum sensor. This process consists of three stages: preparation of the probe state ρ0, parameter encoding U (θ) and measurement scheme E(ni). The statistics of the outcome ni is given by the Born rule, and the protocol is repeated µ times. Taking also into account any prior information that we may have we can construct an estimator g(n1, . . . , nµ) as a function of the experimental outcomes, and assess its performance using some measure of uncertainty¯ .
As we argued in [5], equation (1) represents the uncertainty on average about the knowledge that we can acquire in principle given the experimental configuration under analysis, and as such it is the suitable quantity to find the optimal strategies for making inferences [34]. Note that, at this stage, this is a single-shot quantity. Now we focus our attention on the regime of moderate prior knowledge, that is, we are not completely ignorant about the value of the parameter but what we know is not enough to apply the local version of estimation theory. This is motivated by the fact that the prior probability may play a crucial role when the empirical data is limited and, as a consequence, it is the natural regime to study situations where the number of measurements is small. In our case we are going to consider that we know a priori that the parameter is localised somewhere within a domain of width W 0 , and that this domain is centred around the valueθ. This state of knowledge can be represented by the uniform density and p(θ) = 0 otherwise [35]. The intermediate regime has been previously explored in the context of optical interferometry [5,36,37]. In particular, the method presented in [37] solves the optimisation problem completely using the single-shot error which respects the periodic character of the difference of optical phase shifts that we will estimate [4,38,39], and it constitutes a particular instance of equation (1).
In principle, we could use the results of [37] and base our analysis in equation (3). However, its extension to the case where many repetitions are considered is still numerically challenging. Instead, in appendix A we argue that for W 0 2 it is meaningful to approximate equation (3) as¯ and we also evaluate the error in the truncation of the Taylor expansion that leads to equation (4) to show that the main conclusions of this work are not affected by this approximation. In addition, in section V we will see that the local regime is not properly recovered until the prior width is W 0 = 0.1 or smaller. Hence, this allows us to exploit the simplicity of the mean square error in phase estimation safely within the regime of moderate prior information for 0.1 < W 0 < 2.
Assuming that the probe state ρ 0 and the unitary operator U (θ) are also known, the next step is to optimise the single-shot mean square error in equation (4) over all the possible measurement schemes and all the possible estimators. First we note that, according to the proof in [40], restricting the possible POVMs to the class of projective measurements does not lead to a loss of optimality in this case. Therefore, we can combine the measurement and the estimator into the observable where now E(n) = |n n|, with n|n = δ nn , and can rewrite equation (4) as with ρ = dθp(θ)ρ(θ) andρ = dθp(θ)ρ(θ)θ. By minimising equation (6) with respect to S we finally arrive to [32,40] where Sρ + ρS = 2ρ. The main advantage of this result is that the singleshot optimal strategy can be explicitly constructed from S = ds sE(s) = ds s |s s| , since this bound is saturated when the projectors {|s } associated with the estimates {s} are used as the measurement scheme. In fact, the eigenvalues {s} are precisely the estimates given by the mean of the posterior distribution p(θ|s) ∝ p(θ)p(s|θ) [32], which is the classical solution for the optimal estimator [4,5,41], and for that reason we will refer to the observable S as the optimal quantum estimator. Moreover, further intuition can be gained by noticing that Tr(ρS) = dθp(θ)θ and Tr(ρS) = Tr(ρS 2 ), so that we can rewrite equation (7) as¯ where we have defined the prior uncertainty as and In words, the uncertainty of our estimation is lower bounded by the difference between the prior variance and the variance of the optimal quantum estimator. Equation (7) was originally discovered and explored in the context of communication theory [32,39,42], and it has been recently used for frequency estimation [40]. Moreover, a formally similar result emerges in the construction of the quantum Allan variance [43]. Nevertheless, to the best of our knowledge this result has not been fully exploited to study phase estimation in the regime of limited data and an intermediate prior that we are considering here.
Once the single-shot strategy in equation (8) has been found (we propose a semi-analytical calculation scheme to do this in appendix B), we proceed to repeat the same optimal experiment µ times, so that the uncertainty associated with the overall experience is now given bȳ where s = (s 1 , . . . , s µ ) is the outcome vector and Moreover, the optimal classical estimator that takes into account the information extracted from all the repetitions is [4,5,41] g(s) = dθp(θ|s)θ, with p(θ|s) ∝ p(θ)p(s|θ). Consequently, the final error is¯ mse = ds p(s) (s), where p(s) = dθp(θ)p(s|θ) and (s) is the variance of the posterior The error¯ mse , which can be numerically calculated as a function of µ following the three-step scheme discussed in [5], is the quantity that we will use to study the low-µ regime. In other words, our methodology uses numerical simulations and is based on a rigorous foundation provided by an analytical and potentially reachable quantum bound. Strategies where the same scheme is repeated several times are relevant for any experimental arrangement where we cannot or do not wish to correlate different runs. In that case, it is natural to choose the same optimal single-shot strategy for each individual trial, which also simplifies the complex numerical calculations that are needed to compute¯ mse as a function of µ. Admittedly, there are other interesting practical possibilities that emerge when adaptive measurements are allowed [28,33], and while they could be a better choice in some scenarios, adaptive techniques are beyond the scope of this work. Furthermore, from a theoretical perspective we could consider general collective measurements on µ copies of the same probe. This case is briefly explored for NOON states and a maximum of µ = 10 probes in section VI, although our main focus is on identical and independent measurements. A discussion about the differences between collective, adaptive and independent measurements is available in [33].

B. Physical configuration
The methodology previously introduced is relevant for and can be applied to any unitary estimation problem based on a general mixed probe state where the empirical data is limited and there is a moderate amount of prior knowledge. To illustrate its behaviour, here we will focus on one particular physical configuration.
Let us consider an interferometer formed by two electromagnetic modes with the same frequency that are modelled by the creation and annihilation operators a † i and a i , respectively, for i = 1, 2. In addition, for simplicity we assume an ideal situation with pure states.
The benchmark to evaluate the enhancement derived from quantum resources such as entanglement or squeezing will be the coherent state with J x = (a † 1 a 2 +a 2 a † 1 )/2 and D 1 (α) = exp(αa † 1 −α * a 1 ), while the NOON state (|N, 0 + |0, N )/ √ 2 will be taken as an example of a definite photon number state that reaches the Heisenberg limit [45] when enough prior knowledge is available [46,47]. Since many aspects of these two states have been extensively studied in previous works (e.g., in [46][47][48][49]), here we will only highlight those features related to the regime of limited data, and in general we will use them mainly as a reference.
The principal analysis will be dedicated to three experimentally feasible states whose quantum Fisher information is large with respect to the two previous benchmarks [12]: the twin squeezed vacuum state |r, r = S 1 (r)S 2 (r) |0, 0 , where S i (r) = exp{[r * a 2 i − r(a † i ) 2 ]/2}; the squeezed entangled state N ses (|r, 0 + |0, r ), where N ses = [2 + 2/cosh(|r|)] −1/2 ; and the twin squeezed cat state N tscs [S(r) (|α + |−α )] ⊗2 , with N tscs = (2 +  Table I. Properties of the probe states considered in the main text. The state parameters have been chosen such that the mean number of photons isn = 2. Furthermore, Q and J represent the amount of intra-mode and inter-mode correlations in the interferometer, respectively [44]. Finally, µτ (ρ) indicates the state-dependent number of repetitions that are required for the quantum Cramér-Rao bound to be a good approximation to the bounds based on the optimal single-shot strategy in figure 2, according to the methodology discussed in [5] with relative error ετ = 0.05, prior meanθ = 0 and prior width W0 = π/2. Note that ρ includes the information of the initial probe, the encoding of the signal and the prior knowledge.
2exp(−2|α| 2 ) −1/2 and |α = D(α) |0 . We recall that the classical Fisher information for a single observation and a given measurement is defined as [4,50] and the optimisation of this quantity over all possible POVMs implies that F F q = Tr[ρ(θ)L(θ) 2 ] [39], where F q is the quantum Fisher information and the symmetric logarithmic derivative L(θ) is obtained by solving L(θ)ρ(θ) + ρ(θ)L(θ) = 2∂ρ(θ)/∂θ. Moreover, F and F q do not depend on θ when the transformation is a unitary that takes the form U (θ) = exp(iHθ), where H is a Hermitian operator. All the protocols that we will study satisfy this condition. In order to have a fair comparison, the parameters that define the previous states have been chosen such that, on average, all the strategies utilise the same amount of resources (see the third column in table I). In particular, n = ψ 0 | (a † 1 a 1 + a † 2 a 2 ) |ψ 0 = 2 for all |ψ 0 . This energy constraint fixes the parameters of all the states except those of the twin squeezed cat state; the parameters of the latter case will be chosen such that the quantum Fisher information is maximum in all the sections of this work except in section IV, where we also consider an intermediate scenario. Note that the fact thatn = 2 for all our protocols implies that we are working in the low photon number regime [12].
Finally, the unknown parameter θ represents the difference of phase shifts on the two modes and is encoded using the unitary transformation All the schemes assume that the prior knowledge about this parameter is represented by the probability density in equation (2) with prior width W 0 = π/2 < 2 and prior meanθ = 0.

III. QUANTUM BOUNDS IN THE PRESENCE OF LIMITED DATA
The application of the method described in section II to interferometric configurations leads to the results shown in figure 2.i, where the mean square error in equation (12) is plotted as a function of the number of repetitions for the optical probes previously introduced: (a) coherent state, (b) NOON state, (c) twin squeezed vacuum state, (d) squeezed entangled state and (e) twin squeezed cat state. Let us proceed to analyse the consequences of these graphs.
To start with, figure 2.i presents two different regimes. On the one hand, the performance of all the states becomes linear with the number of repetitions in the logarithmic scale when µ 10 2 . This is precisely the behaviour that we would expect in the asymptotic regime µ 1, since in that case the mean square error can be approximated by the Cramér-Rao bound as¯ mse ≈ 1/(µF ) [4,5,51], and as such log(¯ mse ) ≈ −log(µ) − log(F ). In this regime we can observe that the graphs of different states do not intersect each other. This property allows us to identify the twin squeezed cat state as the best asymptotic choice, followed by the squeezed entangled state, the twin squeezed vacuum state, the NOON state and, finally, the coherent state, whose performance is the worst. We notice that this is consistent with the findings in [12].
On the other hand, the graphs deviate from this logarithmic linear approximation when 1 µ 10 2 and, as a consequence, a non-trivial structure emerges in this part of the plot. This is the non-asymptotic regime of limited data. Since the graphs no longer follow straight lines, they intersect each other, and this implies that the ordering of the states in terms of their performance depends on the number of repetitions. For instance, the twin squeezed vacuum state produces the lowest uncertainty when 1 µ < 5, while the squeezed entangled state is the best option when 5 < µ < 40. In addition, the twin squeezed cat state is recovered as the best probe when µ > 40, although it practically has the same performance as the coherent state when µ = 1, 2, 3. Interestingly, the coherent state is also associated with the largest uncertainty for a low number of trials.
The fact that the strategy leading to the lowest uncertainty can depend on the number of repetitions in a crucial way was already demonstrated in [5]. How- Figure 2. i) Mean square error as a function of the number of repetitions using the optimal single-shot strategy in equation (8) for (a) the coherent state, (b) the NOON state, (c) the twin squeezed vacuum state, (d) the squeezed entangled state, and (e) the twin squeezed cat state, with mean number of photonsn = 2, prior meanθ = 0 and prior width W0 = π/2, while (f) represents the variance of the prior probability; (ii) mean square error based on the optimal single-shot strategy (solid line) and quantum Cramér-Rao bound (dashed line) for the same coherent state, (iii) NOON state, (iv) twin squeezed vacuum state, (v) squeezed entangled state and (vi) twin squeezed cat state considered in (i). These graphs constitute the main results of section III, and their consequences are analysed in the main text. ever, the results in [5] were based on a specific measurement scheme (counting photons after the action of a 50:50 beam splitter), while now the bounds are constructed by repeating a single-shot strategy that has been optimised over all possible POVMs. Thus, the results in figure 2.i generalise those in [5] and put the state-dependence behaviour of the non-asymptotic regime on a more solid basis.
For these results to be useful, we need to understand the optimality and saturability of the bounds. The uncertainty for µ = 1 is already optimal by construction and can always be reached in principle for any given state using the single-shot POVM in equation (8). This means that other tools such as the quantum Ziv-Zakai bound [29] and the quantum Weiss-Weinstein bound [30] will necessarily produce less tight single-shot results when-ever their value is different from the solution found here.
Furthermore, figures 2.ii -2.vi show how our results for each state approach the quantum Cramér-Rao bound asymptotically, that is,¯ mse ≈ 1/(µF q ) when µ 1. Taking into account that the bounds for a large number of trials that can be constructed using the quantum Cramér-Rao bound are fundamental, we conclude that our bounds are also optimal in this limit. As a result, if we work in the regime of intermediate prior knowledge and ρ(θ) and p(θ) are given, then the scheme developed in section II is optimal both for a single shot and a large number of trials. Moreover, it is also optimal for any number of trials if we exclude the possibility of having adaptive measurements and focus on identical and independent experiments.
To quantify the number of repetitions that are needed to reach this asymptotic regime where our methods are not longer required we can follow [5], construct the relative error and select µ τ after imposing that ε τ ≈ 0.05 for each state. According to the results of this calculation, which are summarised in the last column of table I, the uncertainty for the twin squeezed vacuum state agrees with the prediction of the quantum Cramér-Rao bound when the number of trials is as low as µ τ = 5. Therefore, in this case the asymptotic theory mostly gives the right answer. However, the squeezed entangled state and the twin squeezed cat state require µ τ = 45 and µ τ = 66, respectively, and the quantum Cramér-Rao bound overestimates the performance of these probes in the regime of limited data because the graphs of our bounds are higher (figures 2.v and 2.vi). We note that it is in scenarios of this type where we could not extract useful information from the quantum optimal-bias bound derived in [31], since for a flat prior this quantity is always lower than the quantum Cramér-Rao bound by construction. Finally, the NOON state needs µ τ = 116 and the coherent state requires µ τ = 282, but the Cramér-Rao bound prediction underestimates the precision of these protocols when µ is low. It is interesting to observe that the chosen probes exemplify the three basic behaviours that we could expect to find in the non-asymptotic regime, that is, that the Cramér-Rao bound is lower, higher or approximately equal to the Bayesian mean square error.
Although the numerical character of the previous re-sults does not reveal the structure of the optimal measurement scheme associated to each state, it is possible to gain some intuition by studying the optimal quantum estimator S in equation (8). For the NOON state we have calculated this operator analytically in appendix C, finding that its projectors are simply and that its eigenvalues are s 1 = −1/π and s 2 = 1/π, which represent the Bayesian estimates for θ. In section VI we will construct physical measurements that realise these projectors exactly. In addition, it is important to note that, while this spectrum of estimates is discrete and the difference of phase shifts θ is a continuous variable, in [52] it was shown that this behaviour is not contradictory due to the existence of an ultimate quantum limit to the uncertainty in phase estimation. On the other hand, a fully analytical calculation of S for the indefinite photon number states is more challenging, and the numerical projectors that arise from the calculation scheme proposed in appendix B, which have been used to find the results in figure 2, are difficult to visualise. Nevertheless, we can still provide a partial characterisation of the single-shot strategies through the spectrum of the optimal quantum estimator for different states. A numerical approximation of these spectra has been represented in figure 3 for the coherent state, the twin squeezed vacuum state, the squeezed entangled state and the twin squeezed cat states, which shows their Bayesian estimates distributed within the parameter do- We finish this analysis by noting that both the projectors {|s } and the estimates {s} depend on the specific shape of the prior probability p(θ). Interestingly, in our case we have verified numerically that while the results change with W 0 , they do not depend onθ. Nonetheless, in section VI we will see that this is no longer true for measurement schemes different from the optimal singleshot strategy.

IV. THE ROLE OF INTRA-MODE AND INTER-MODE CORRELATIONS FOR A LOW NUMBER OF REPETITIONS
In the context of optical interferometry there are two types of correlations that are relevant for quantum metrology: the intra-mode correlations quantified by the Mandel Q-parameter, which for the path-symmetric states that we are considering here can be written as [44] where we are using the notation = ψ 0 | |ψ 0 , and the inter-mode correlations quantified by [44] where we have also incorporated the fact that the states are path-symmetric. These quantities play a crucial role in the regime wherē mse ≈ 1/(µF q ) because the quantum Fisher information for path-symmetric pure states can be rewritten as [44,54]. Therefore, we can control the asymptotic performance by changing Q and J . Recalling that −1 Q < ∞ and −1 J 1, optimising the performance amounts to increasing the intra-mode correlations as much as possible, since path entanglement can only improve the precision by a factor of 2 at most. To verify that the asymptotic part of figure 2.i is consistent with this way of proceeding we have calculated the amount of intra-mode and inter-mode correlations and the quantum Fisher information for each state [55], and the results can be found in the fourth, fifth and sixth columns of table I, respectively. As expected, the twin squeezed cat state, which was found to be the asymptotically optimal choice, has the largest values for F q and Q among the states that we are studying.
On the other hand, we have also demonstrated that this state is not better than a coherent state when µ ∼ 1, in spite of the fact that for the coherent state we have Q = 0 and J = 0, and that the other three probes perform better in the low trial number regime. This already supports the idea that the clear role that photon number correlations play asymptotically is not preserved when µ is low, something that was suggested in [5] using a specific POVM. While it is not currently possible to find a rigorous relationship between uncertainty and correlations that is also valid in the regime of limited data because an analytical expression for¯ mse (µ) is not available, we can still exploit the methodology introduced in section II to further explore this idea.
First we note that the twin squeezed cat state can be seen as a family of states defined in terms of the parameters r and α. Since this state is separable with respect to the arms of the interferometer, J = 0, and as such we are free to choose different combinations of r and α to control the Mandel Q-parameter while keepingn = 2 and W 0 = π/2 unchanged. The particular instance of the twin squeezed cat family with Q = 11.75 and F q = 25.49 considered until now is the optimal choice after maximising F q numerically. A second example with Q = 10.00 and F q = 22.00 has been included in table I to represent the intermediate case. In addition, the twin squeezed vacuum state is recovered within the twin squeezed cat family when we choose α = 0 [12], and for this state we have that Q = 3 and F q = 8.
Next we examine the mean square errors associated with the optimal case, the intermediate case and the twin squeezed vacuum from the previous family. Their graphs are represented in figure 4 and labelled respectively as (e), (g) and (c). If we compare the optimal and intermediate states first, we see that a larger amount of intra-mode correlations is associated with a larger number of repetitions needed to reach the asymptotic regime, since the former state requires µ τ = 66 and the latter µ τ = 42 (see table  I). Furthermore, by comparing the form of the graphs (e) and (g) in figure 4 for these two states we can observe that the transition from the non-asymptotic regime to the asymptotic regime is associated with a larger uncertainty for the optimal twin squeezed cat state for which Q is also larger. Finally, the graph (c) shows that the twin squeezed vacuum state, which has the smallest Q, performs worse than the two previous cases asymptotically, while its error is the lowest when 1 µ 10. In other words, for this family of states there seems to be a trade-off between the performances in the asymptotic and non-asymptotic regimes that is associated with changes in Q, which in practice would imply that increasing the amount of intra-mode correlations blindly can lead to high-uncertainty schemes in the regime of limited data. Moreover, we note that this conclusion is consistent with the related analysis in [29] for the Rivas-Luis state [56] based on the quantum Ziv-Zakai bound, which demonstrated that if a certain parameter is modified such that the Fisher information increases arbitrarily, then the error cannot deviate substantially from the prior variance unless the number of trials is very large.
Since increasing Q seems to be detrimental to the performance of our probes when the number of repetitions is low, the next natural step is to investigate whether path Probe state µ ·¯ mse(µ, W0) µ = 1, W0 = π/2 µ = 1, W0 = π/3 µ = 1, W0 = π/4 µ = 1, W0 = 0.  Table II. Optimal single-shot mean square error with different prior widths for the states considered in the main text and asymptotic performance for µ 1. The state parameters are those indicated in table I. We notice that the asymptotic ordering of probe states and the ordering for W0 = 0.1 and a single shot are identical, which implies that the local regime is achieved for such a prior width. entanglement could be useful in this regime. Including in our analysis the squeezed entangled state with Q = 9 and J = −0.1, which is labelled as (d) in figure 4, we can see that this state converges asymptotically to the performance associated with the intermediate case of the twin squeezed cat family (g), that is, both probes have the same Fisher information. However, the graph of the squeezed entangled state presents a smaller curvature and a lower uncertainty when µ < 30. The key aspect that distinguishes these two probes is that the squeezed entangled state has a lower amount of intra-mode correlations and a certain amount of beneficial path entanglement, which suggests that inter-mode correlations have helped to improve the precision in the non-asymptotic regime while keeping a large Fisher information. Hence, we conclude that path entanglement could be considerably more relevant in schemes that need to be optimised for a low number of trials than it is in the asymptotic regime.
Despite these surprising results, we must acknowledge that our analysis is centred on a particular set of states, and that other schemes based on different states could show different properties [57]. Therefore, the existence of a more general relationship between the number of trials and the usefulness of photon number correlations in interferometry for a given prior is an open question.

V. THE EFFECT OF THE PRIOR INFORMATION
In a wide set of inference problems that includes the metrology scenarios presented here, the importance of the prior information depends on the number of shots. In particular, we know that the prior becomes less important as we increase the number of repetitions [5,58]. This implies that, as we argued in section II, the prior probability is going to play an important role for making inferences if only a few experimental shots are possible. In that scenario it is crucial then to establish how different states of prior knowledge may affect the overall performance of a given metrology scheme.
Taking the same form of the prior probability given in equation (2), the parameters that we can alter are the prior width W 0 and the prior meanθ. In section III we already mentioned that the bounds constructed in figure  2.i do not depend onθ, leaving W 0 as the only free parameter. In principle we should consider the possibility of having both W 0 > π/2, which includes the intermediate and global regimes, and W 0 < π/2, which encompasses the intermediate and local regimes. However, for large values of W 0 it is not possible to approximate the periodic error function in equation (3) to the mean square error in equation (4). For that reason, we will only focus on the transition from the intermediate regime of prior knowledge to the local regime.
To do this, let us start by calculating the optimal single-shot mean square error in equation (7) for all the states with the prior widths W 0 = π/2, π/3, π/4 and 0.1. The numerical results are shown in table II. While the best probe in the single-shot regime for W 0 = π/2 is the twin squeezed vacuum state, the squeezed entangled state becomes the preferable choice when W 0 = π/3 and W 0 = π/4, and we need to start with a prior with width W 0 = 0.1 in order to recover the twin squeezed cat state as the optimal state. Moreover, the ordering of probes in terms of their performance when W 0 = 0.1 is exactly the same as the ordering found in the asymptotic regime, which is also included in the last column of table II. Consequently, we can say that for our schemes the local regime due to a high amount of prior information is achieved when W 0 0.1.
An equivalent path to arrive to the same result relies on the approximation for the single-shot mean square error employed in [40,59]. This relation was found in [40] assuming a Gaussian prior with a narrow width but, in fact, in appendix D we show that it also holds for the flat prior introduced in equation (2) if we assume that W 0 1. That the Fisher information appears as the key quantity to determine which scheme has the best performance for a given prior explains why the numerical results in table II for W 0 = 0.1 predict the same order of probes as the approximation 1/(µF q ) in the asymptotic regime of many repetitions. In both cases, the larger F q , the better the performance. It is interesting to observe the similarity between the local regime of prior information for a single shot and the local regime due to a large number of experiments. On the one hand, the best states for W 0 = π/2 and W 0 = 0.1 have intra-mode correlations only, while for W 0 = π/3 and W 0 = π/4 the best state presents path entanglement too. On the other hand, figure 5.i shows that for 1 µ < 5 and µ > 40 there is no inter-mode entanglement in the optimal probes, but it appears in the best state for 5 < µ < 40. One way of understanding this similar behaviour is to observe that updating our posterior density via Bayes' theorem after each new trial reduces the uncertainty in a way that is formally similar to making the prior narrower in a sequential way. Nevertheless, both processes are conceptually different.
Finally, figures 5.i -5.iv demonstrate the transition from the intermediate regime of prior knowledge and a low number of trials to a local regime with both high prior information and a large number of repetitions. This process modifies the connection between the number of repetitions and the properties of different probes considerably, as can be seen by the change in the points where the graphs for different states cross each other as the prior width is reduced. As a consequence, establishing a pattern that helps us to understand what probes we need to use for different values of µ in the regime of limited data becomes more complicated than in the two previous sections. Fortunately, this is not a problem in real experiments because we typically know what our specific prior information is and we can always proceed on a case-by-case basis, but it constitutes an important obstacle to deriving more general results.

VI. PHYSICAL MEASUREMENTS
Until now we have investigated the physical consequences of the bounds constructed following the procedure of section II. Nevertheless, in a real-world situation we also need to be able to generate concrete sequences of operations that can be implemented in the laboratory, study whether they saturate the theoretical bounds and, if they do not, determine how close to the fundamental minimum the associated uncertainty is. Since here we are using a fixed set of probe states, we need only consider sequences for implementing the measurement scheme.

A. Practical states
States that can be generated using operations such as squeezing or displacement from the vacuum are generally easier to prepare in the laboratory than the abstract (and possibly entangled) probe states that arise in theoretical optimisations [4,12,60] and, as a consequence, there is a intrinsically practical interest in exploring how close to the fundamental bounds this type of state can get. We already know that we can approach the quantum Cramér-Rao bound asymptotically for path-symmetric pure (but otherwise general) states when each individual measurement consists of counting photons after the action of a 50:50 beam splitter [5,61]. For instance, using that POVM it was shown in [5] that if W 0 = π/2 and we impose that the relative error in equation (18) is ε τ = 0.05, then this is true for the twin squeezed vacuum state for µ τ 874, although surpassing the 0.05 threshold with the squeezed entangled state requires more than µ = 10 3 repetitions because its convergence is slower.
By using the bounds with W 0 = π/2 andθ = 0 in section III we can now answer this question in the regime of limited data too, both for the previous states and for the coherent and the twin squeezed cat states. As a preliminary step we have reproduced these bounds as shaded areas in figures 6.i -6.iv for the coherent state, the twin squeezed vacuum state, the squeezed entangled state and the twin squeezed cat state, respectively. In addition, the dashed lines in those figures represent the mean square error associated with the measurement of the energy at each port of the interferometer (i.e., counting photons) after the action of a 50:50 beam splitter. We draw attention to the fact that we have also introduced a known phase shift in the second port of the interferometer before this beam splitter is applied, the complete sequence of operations for each state being presented in table III. The reason behind this choice is that we have found that the uncertainty of this POVM depends onθ, and the extra phase shift allows us to achieve the optimal single shot precision when the prior is centred aroundθ = 0, which is our case [62]. This dependence withθ can be seen as a Bayesian analogue of those cases where the standard error propagation formula for a given observable depends on the unknown parameter θ, which is not a problem in practice provided that the experiment is arranged close to an optimal operating point [4].
To start our discussion of the low trial number regime with this POVM, we first observe that, according to figure 6.i, measuring energy with coherent states produces an uncertainty that is already very close to the associated bound for a low value of µ. More concretely, the bound and the measurement error only differ in their second and third significant figures, as can be directly verified from the numerical values in table IV that we provide in appendix B for the first ten shots of every scheme based on indefinite photon number strategies. Moreover, this can be further improved if instead we undo the preparation of the probe state before counting photons, that is, by reversing the 50:50 beam splitter and the displacement from the vacuum operations that generated the coherent state in the first place. The extra known difference of phases showed in table III is also needed for the case withθ = 0 that we are considering. Nonetheless, taking into account the fact that both schemes produce an uncertainty whose first significant figure is that of the optimum (see table IV), we conclude that, for most practical purposes, they are equally useful and optimal given any number of repetitions.
The situation is very different when we consider the other three states in figures 6.ii -6.iv, where the uncertainty of the energy measurement is now notably higher than each bound in the regime of limited data, the distance between the graphs of the measurement and those of the bounds being larger for a few repetitions than for a single shot. This measurement is particularly detrimental for the strategy based on the squeezed entangled state, since its error is very close to the prior variance (horizontal line in 6.iii) when µ ∼ 1 and this indicates that almost no information is being gained there. Additionally, we can observe that the twin squeezed cat state in figure 6.ii presents a slow convergence to the asymptotic Cramér-Rao bound when we use this POVM, compared with the twin squeezed vacuum probe state in figure 6.ii or the coherent state in in figure 6.i. Note that this is the same problem found in [5] for the squeezed entangled state, which is also reproduced in our calculations here.
These results show that counting photons is not the best strategy to be followed when µ is low and the probes have been prepared in states with a large Fisher information such as the ones considered here, and this motivates the search for other practical alternatives. More concretely, instead of projecting onto the energy basis, we can consider the measurement of a different physical quantity. The dash-dot lines in figures 6.ii -6.iv show the results where we have projected onto the eigenvectors of the observable X 1 ⊗ X 2 , Figure 6. i) Mean square error based on the optimal single-shot strategy (shaded area), error associated with the measurement of energy (dashed line) and prior variance (horizontal solid line) for the coherent state, (ii) the twin squeezed vacuum state, (iii) the squeezed entangled state, and (iv) the twin squeezed cat state, with mean number of photonsn = 2, prior meanθ = 0 and prior width W0 = π/2. Furthermore, the dash-dot graphs in (ii), (iii) and (iv) represents the uncertainty for the measurement of quadratures. The sequences of operations that implement the POVMs that produce these results can be found in table III. being a quadrature rotated by π/8 for the i-th mode [63], after having introduced the phase shift exp(i π 4 a † 1 a 1 ) and having applied a 50:50 beam splitter (see table III) [64]. The error of this scheme also depends onθ.
By comparing the energy and quadrature measurements in figures 6.ii -6.iv we see that the graphs based on the latter POVM are substantially closer to the bounds than those for the former measurement when the experiment is operating in the regime of limited data. In other words, we have found a physical measurement that improves over the results based on measuring the energy for the practical states under consideration and a low number of trials. Interestingly, the dash-dot lines still converge to the fundamental asymptotic bound, and this implies that in the asymptotic regime both schemes are, nevertheless, equivalent in practice and optimal.
Although these results extend, generalise and clarify the findings of [5], figures 6.ii -6.iv also show that it could still be possible to find other physical schemes with a better precision when µ is low, with a faster rate of convergence to the asymptotic minimum or even saturating the bound for any µ. These are some of the key questions that should be addressed for further progress in the design of experimentally feasible protocols that operate both in and out of the regime of limited data.

B. Optimality of NOON states
The fact that NOON states are conceptually simple makes them an excellent tool to understand metrology protocols, which is why we have chosen to study them separately. They emerge as the optimal probe when we maximise the Fisher information over the definite photon  Table III. Sequences of quantum operations needed to implement the practical measurements discussed in section VI, whose uncertainty is represented in figures 6 and 7. Note that the observable column indicates the physical quantity that is being measured, and that the different combinations of phase shifts that appear in the third column have been chosen such that the schemes are optimal when the prior is centred aroundθ = 0 andn = 2.
number states [37,49], and while they are unsuitable for a global estimation due to the multi-peak structure associated with the posterior probability functions that they generate [5,48,49], and they require that the scaling of the prior variance is already ∼ 1/n 2 in order to achieve the same scaling that the Cramér-Rao bound predicts [46,47], the results in [5] showed that they can still be useful to a certain extent in the intermediate regime of prior knowledge and limited data when the number of photons is low and the POVM is based on measuring the energy at each port. In addition, this moderate usefulness also holds for the repetition of the single-shot optimal strategy according to our previous results in figure  2.i, since the NOON state performs better than the twin squeezed cat state for 1 µ 10. By studying the performance of this probe for different physical measurements with respect to the non-asymptotic bound we will see that NOON states are also optimal in another sense.
First we consider the two measurement schemes that we described in the previous section, that is, counting photons and measuring rotated quadratures after the introduction of some phase shifts that are indicated in table III, and after the action of a 50:50 beam splitter. The mean square errors generated by them for the NOON state, which are represented in figures 7.i and 7.ii, respectively, display a perfect agreement with the bounds for any number of repetitions. This can be further verified by observing that the uncertainties for the first ten shots provided in table V of appendix B are virtually identical.
Similarly, a parity measurement based on the projectors of the observable iii. This is consistent with the fact that the information about the phase is actually contained in the parity of the number of photons [48,65,66]. Interestingly, we have verified that counting photons and checking the parity at each port produces the same non-asymptotic results for the indefinite photon number states too. That different physical schemes are able to saturate the same quantum bound can be explained by recalling that the optimal quantum estimator S is only defined on the support of ρ (see [53] and appendix B). In particular, for NOON states ρ can be represented by a non-singular (2 × 2) matrix in the number basis (see appendix C), which is only a part of the full Hilbert space including all the sectors with any number of photons. As a consequence, any measurement scheme that coincide with the projectors |s 1 and |s 2 given in equation (19) in the part of the Hilbert space that corresponds to the support of ρ is going to be optimal, independently of the particular form of the POVM elements. Furthermore, the same intuition can be used to understand why it is more difficult to saturate the bounds for indefinite photon number states in the non-asymptotic regime. For these states there is a non-zero probability of detecting any number of photons at each port of the interferometer, which implies that the optimal quantum estimator S can be constrained in all the sectors of the Hilbert space, and these constraints need to be fully satisfied to saturate the single-shot bound. However, as we accumulate more data we start to approach the quantum Cramér-Rao bound, which is based on the equation L(θ)ρ(θ) + ρ(θ)L(θ) = 2∂ρ(θ)/∂θ, and this equation only has a unique solution on the support of ρ(θ) [67], which in our case is simply a pure state. That is, finding physical measurements that saturate the asymptotic bounds is generally less demanding and, in fact, the errors of the physical measurements in figures 6.i -6.iv converge to the fundamental bound.
This state of affairs gives raise to an interesting situation. The Bayesian bounds in figure 2.i show that, in principle, the NOON state is not the best option among the probes that we are examining for any number of rep- etitions. In spite of this fact, if we compare the uncertainty associated with counting photons after undoing the preparation of a coherent state, the measurement of quadratures for the states based on the squeezing operator, and any of the physical measurement previously discussed for the NOON state, then it can be shown that, in this case, the NOON state is the best probe when 1 µ 3. In particular, this conclusion can be extracted by inspection from tables IV and V in appendix B. This analysis highlights the importance of studying the possibility of saturating the theoretical bounds using realistic implementations in a particularly transparent way.
On the other hand, the mathematical simplicity of NOON states allows us to go one step further and study collective measurements [49,59]. Until now this work has been based on preparing some probe, implementing its optimal strategy in equation (8) and repeating this procedure µ times. However, a more general possibility is to prepare µ identical states and perform a single measurement on all of them at once. If we upgrade the optimal single-shot bound in equation (7) to cover the collective case we find that where now S µ is given by S µ ρ µ + ρ µ S µ = 2ρ µ with A calculation scheme for equation (24) is proposed in appendix E, and its application to the NOON state for 1 µ 10 results in the graph of figure 7.iv, which coincides with the bound generated by repeating the optimal strategy for a single probe. Numerically, this agreement occurs at least for the first significant figure, as it can be verified in table V available in appendix B. We conclude thus that collective measurements do not provide a better performance than the practical measurements previously studied when we are working in the low-µ regime, each probe is prepared in a NOON state withn = 2 and the prior width is W 0 = π/2.
In summary, we have shown that there are measurement schemes that can saturate the bound for the NOON state for all µ simultaneously. Consequently, NOON states do not only have a special status in the local regime, but also in the regime of limited data and moderate prior knowledge [68]. This can be explained by noticing that the optimal projectors for a single shot in equation (19) are the same that the projectors predicted by the symmetric logarithmic derivative that define the quantum Fisher information [48]. While this probe state is fragile and difficult to prepare in more realistic scenarios [60], these results are still interesting from a fundamental perspective, and they have helped us to understand the problems associated with saturating the bounds of more practical states that we need to overcome in the future.

VII. CONCLUSIONS
We have developed a method to study the performance of metrology protocols that operate in the regime of limited data and moderate prior knowledge. More concretely, we have proposed to use the strategy that is optimal after minimising the single-shot mean square error over all the possible POVMs in a sequence of µ repeated experiments. Given a state, a Hamiltonian and a prior probability, we have seen that the bounds that arise from this technique are optimal for the first shot by construction, and that they also start to converge to the quantum Cramér-Rao bound when µ ∼ 10 2 . In addition, we have argued that they can be saturated using measurements that are equivalent to the projectors of the optimal quantum estimator S for each repetition, and that this strategy is optimal for those experiments based on identical and independent trials where adaptive techniques or more general measurements are excluded.
The usefulness of this method in the context of quantum metrology has been demonstrated through the analysis of a Mach-Zehnder interferometer, and we have focused our study on three indefinite photon number states that have been proposed in the literature due to their large Fisher information: the twin squeezed vacuum state, the squeezed entangled state and the twin squeezed cat state. We have found that the twin squeezed vacuum state is the best option when 1 µ < 5, W 0 = π/2, and for µ = 1, W 0 = π/3; that the squeezed entangled state is the preferred choice if 5 < µ < 40, W 0 = π/2 and when µ = 1, W 0 = π/3 or W 0 = π/4; and that the twin squeezed cat state recovers its status of best probe due to its largest Fisher information when µ > 40, W 0 = π/2 and µ = 1, W 0 = 0.1. To the best of our knowledge, a fully Bayesian analysis in the terms explored in this work had not been done before for these probes.
Using the twin squeezed cat state as a family of probes whose parameters can be modified for given mean number of photons and prior width, we have provided evidence that suggests that increasing the amount of intramode correlations, that is, the correlations within each arm of the interferometer, could be detrimental when the number of repetitions is low, which contrasts with the fact that the same type of correlations are actually beneficial in the asymptotic regime. Moreover, we have shown that using a state with less intra-mode correlations and a certain amount of path entanglement such as the squeezed entangled state appears to help to enhance the precision in the non-asymptotic regime without damaging the asymptotic performance in a dramatic way. Therefore, we conjecture that there might exist a more general relationship between the number of trials and the amount of intra-mode and inter-mode correlations that could indicate how to reduce the uncertainty of the protocols in the regime of limited data.
It has been shown that, for a low number of trials, the usual strategy of counting photons after the action of a beam splitter is optimal for most practical purposes when the probe is prepared in a coherent state, although it does not saturate the non-asymptotic bounds for the other indefinite photon number states. However, we have found that in the latter case the situation can be improved if instead we measure quadratures rotated by π/8, since this scheme is closer to our bounds for low values of µ. This result is particularly relevant because states prepared with operations such as squeezing or displacement from the vacuum and quadratures measurements are easier to implement in real-world situations. In addition, our calculations indicate that counting photons, measuring quadratures and implementing parity measurements are optimal strategies for any number of repetitions if the probe is in a NOON state, and that collective measurements on the first ten copies of this probe do not provide an advantage over the schemes based on identical and independent experiments.
It is important to note that in this work we have not considered what happens in the presence of noise because our aim was to identify the novel effects that emerge directly from having a low number of trials without the interference of other features, which justifies our focus on ideal schemes. However, a comprehensive study of the effect of noise when the available data is limited is also crucial to model realistic scenarios. Although we leave this analysis for future research, in appendix F we provide an initial test to demonstrate that our method can be also applied to a scheme where photon losses are present, finding that the qualitative behaviour of our results does not seem to change substantially for a reasonable amount of loss.
We believe that these results constitute an important advance towards the creation of a practical and useful methodology that will help us to design optimal metrology experiments taking the finite number of trials into account, and that they could play a crucial role in the design of realistic inference schemes once this method is combined with other features such as the presence of noise, larger numbers of photons, adaptive techniques, state engineering algorithms or multi-parameter systems.
states are limited due to the ambiguity that they introduce in the estimation and, more importantly, because of the difficulties to use them in real experiments [60], we draw attention to the fact that the regime where the prior knowledge may play a substantial role is relevant and useful whenever we need to make inferences from a practical scenario where only a low number of experiments can be performed.

The approximation 4 sin
− θ] 2 relies on the quantity |g(n) − θ|/2 being small. Moreover, if W 0 is the width of the phase domain, then |g(n) − θ|/2 W 0 /2 within one period. The minimimum requirement that is natural to impose is that the variable for which the Taylor expansion is calculated (i.e., |g(n) − θ|/2) is slightly smaller than 1 at most, which is always the case if the width of our experiment satisfies that W 0 2. In principle this would still be a crude approximation if we were interested in the sine function itself. However, the sine error is then integrated over all the possible values for θ and n = (n 1 , . . . , n µ ). This implies that |g(n) − θ|/2 ∼ 2 when W 0 ∼ 2 only for a few combinations of values, and the weight of those cases will decrease as the joint probability p(θ, n) accumulates more data. We conclude then that W 0 2 is a reasonable estimation for the range of validity of the mean square error in a problem with a periodic parameter. Note that this condition has the same order of magnitude than the estimation found in [72], where the authors argued that the width of their Gaussian prior had to be π/2 or less, and it is a better estimation than the one obtained in [5].
According to the previous discussion, only the calculation of the first few shots could be potentially misleading if we use the mean square error. To show that this is not the case for the schemes analysed in the main text, let us estimate explicitly the error of the Taylor expansion. First, using Taylor's theorem we have that sin 2 (x) = x 2 − x 4 cos(2ε)/3, where ε ∈ [0, x] [73]. The first term is the approximation that we want to use, while the second term represents the error of this approximation. Using the fact that the cosine is bounded between −1 and 1, the Taylor error can be estimated with and knowing that the optimal phase estimator is the average of the posterior probability p(θ|n) ∝ p(θ)p(n|θ), we can rewrite equation (A1) as where and we have used the notation = dθp(θ|n) . This is precisely the three-step decomposition introduced in [5] to obtain the mean square error and, as such, we can compute ∆¯ numerically in the same way.
This calculation is shown in figure 8, where the graph in the middle of the shaded areas is¯ mse for 1 µ 10 and W 0 = π/2 and the boundaries are given by ±∆¯ . We can see that the Taylor error bounds for the twin squeezed cat state, the squeezed entangled state and the twin squeezed cat state, which constitute the basis of our main results, do not overlap for any value of µ. Therefore, all the comparisons made between these probes are valid. That the twin squeezed cat state and the coherent state overlap for µ = 1, 2, 3 is not surprising, since mse (µ = 1), . . . ,¯ mse (µ = 10)  Table IV. Mean square error for the indefinite photon number states using the optimal single-shot POVM and the physical measurement schemes described in the main text, with 1 µ 10,n = 2,θ = 0 and W0 = π/2. their respective mean square errors also do (see figure  2.i), and the same observation hold for the NOON state and the squeezed entangled state when µ = 2. On the other hand, the shaded area of the NOON state overlaps slightly with the top shaded area of the twin squeezed vacuum state when µ = 1. It is important to appreciate that the shaded areas are bounds for the Taylor error, and it is not guaranteed that the uncertainty for this two states actually coincides. However, even if they did, it would simply constitute another instance where the role of inter-mode and intra-mode correlations is altered in the regime of limited data, since a state with path entanglement that is beaten by a state with a large amount of intra-mode correlations in the asymptotic regime would reach the same uncertainty than the latter for a single shot.
Finally, we also notice that the approximation will become even better as W 0 decreases, which is the case for the other prior widths that we have explored. Hence, we can conclude that the results that arise from the use of the mean square error as an approximation for the mean sine error in the regime of moderate prior knowledge are valid.
Appendix B: Calculation scheme for the optimal single-shot strategy and its bound In this appendix we present the calculation scheme that has been used to obtain the eigenvalues and eigenvectors of the optimal quantum estimator S that generate the results of the main text.
We recall that the Hermitian operator S satisfies the equation Sρ + ρS = 2ρ. Hence, the first step is to find ρ andρ. By expanding the transformed state |ψ(θ) in the number basis as |ψ(θ) = nm e −i(n−m)θ/2 c nm |nm , where c nm are the components of the initial probe state, we can construct the density matrix with c nm c * kl = (ρ 0 ) nmkl . Then, given the flat prior in equation (2) Table V. Mean square error for the NOON state using the optimal single-shot POVM, the physical measurement schemes described in the main text and collective measurements, with 1 µ 10,n = 2,θ = 0 and W0 = π/2. We note that the calculation for collective measurements has been performed with a different numerical algorithm.
and x nmkl = n − m + l − k. These integrals can be computed directly, finding that and where we have defined Note that all the elements K nmkl and L nmkl are well defined except when x nmkl vanishes, in which case we have an indetermination. In those cases we need to take the limits lim x nmlk →0 K nmkl = 1, lim Since K nmkl , L nmkl and c nm c * kl can be seen as (nm × kl) matrices, we can rewrite equations (B2) and (B3) as ρ = ρ 0 • K and ρ = ρ 0 • L, respectively, where we are using the entrywise product of matrices defined as X • Y = ij X ij Y ij |i j| [74]. In other words, now we have two expressions where the integration has been performed analytically.
On the other hand, if we expand ρ in the basis of its eigenvectors, that is, ρ = i p i |φ i φ i |, and we insert it into Sρ + ρS = 2ρ, we arrive to Note that (B10) is only defined on the support of ρ, since the Sylvester equation Sρ + ρS = 2ρ only has a unique solution in the subspace where the spectra of ρ and −ρ are disjoint [71]. Interestingly, this solution is formally analogous to the expression to calculate the symmetric logarithmic derivative in the asymptotic theory [3,4]. Unfortunately, completing this calculation analytically for the indefinite photon number states is challenging because they belong to a Hilbert space whose dimension is infinite. However, we can take advantage of the analytical expressions ρ = ρ 0 • K, ρ = ρ 0 • L and those in equations (B6) -(B10) in order to simplify the numerical scheme. In particular, we have implemented the following method: 1. The components c nm of the initial state |ψ 0 are numerically approximated employing a finite Hilbert space of dimension d c per mode. For the coherent state this dimension is d c = 21, and the number probability for this cut-off is p c ∼ 10 −19 ; for the twin squeezed vacuum state we have that d c = 51 and p c ∼ 10 −17 ; d c = 61 and p c ∼ 10 −5 for the squeezed entangled state; and d c = 51 and p c ∼ 10 −10 for the twin squeezed cat state.
2. The matrices K and L are numerically generated using the formulas in equations (B6) -(B8). This allows us to calculate ρ = ρ 0 • K and ρ = ρ 0 • L in the number basis.
3. The basis of ρ andρ is changed as ρ D = V † ρV andρ D = V †ρ V , where the columns of V are given by the eigenvectors |φ i of ρ, (ρ D ) ij = p i δ ij and (ρ D ) ij = φ i |ρ |φ j . We note that only the eigenvectors |φ i whose eigenvalues p i satisfy that p i 10 −12 are employed.
4. Now we can calculate the elements (S D ) ij = φ i | S |φ j = 2(ρ D ) ij /(p i + p j ) directly. 5. We return to the original basis using S = V S D V † .
6. Finally, we calculate the spectral decomposition of S as indicated in equation (8), which gives us the estimates {s} and the projectors {|s }.
Once the optimal single-shot POVM has been found, we can proceed with the calculation of the mean square error as a function of µ in equation (12) using the threestep method in [5]. Tables IV and V provide the numerical values of our schemes for 1 µ 10, while the complete results for 1 µ 10 3 have been presented as graphs in the main text. The numerical precision of these values can be estimated using the identity dθp(θ)θ 2 = dθ p(θ ) dn p(n|θ ) dθp(θ|n)θ 2 , (B11) where the right hand side is calculated numerically and it is compared to the analytical solution for the left hand side. In particular, we have found that our results are valid up to the third significant figure.
Finally, we arrive tō mse ∆θ 2 p 1 − ∆θ 2 p F q (θ) (D5) after introducing the approximated expression for Tr (ρS) into equation (7), which is the result involved in the discussion of Section V and that was available in the literature for a Gaussian prior [40,49,59]. . Mean square error based on the optimal single-shot strategy (solid line) and quantum Cramér-Rao bound (dashed line) for a two-photon state whose Fisher information is optimal (see [75]) that is fed to a Mach-Zehnder interferometer with photon losses in its first arm, with η = 0.9,φ = π/4 and W0 = π/2.
find the Bayesian bound based on repeating the optimal single-shot strategy of this state. However, a potentially better result could be found by optimising the single-shot bound instead. We leave this possibility for future work.
Finally, we calculate the mean square error in equation (12) using this optimal single-shot measurement. The result has been represented in figure 9, which also includes the quantum Cramér-Rao bound that can be obtained using the expression for the Fisher information provided in [75]. As we can see, the Bayesian error approaches the asymptotic result also in this case, and while a perfect convergence cannot be observed within the number of trials that we are considering because the mean square error crosses the bound when µ ≈ 4·10 2 , we have verified that after µ = 10 3 repetitions the relative error defined in equation (18) is just ε = 0.02. Therefore, we can conclude that, according to our methodology, a reasonably amount of photon losses does not seem to alter substantially the behaviour that we have found in the main text using ideal schemes. Nevertheless, a deeper investigation including other sources of noise, other probe states and realistic measurements is required in order to construct a more complete picture of the effect that noise has in those systems that operate in the regime of limited data.