Beating noise with abstention in state estimation

Bernat Gendra; Elio Ronco-Bonvehi; John Calsamiglia; Ramon Muñoz-Tapia; Emilio Bagan

doi:10.1088/1367-2630/14/10/105015

1. Introduction

Knowing the state of a system is a key task in quantum information processing. An unknown quantum state can only be unveiled by means of measurements. These, however, provide only partial knowledge about the system and, furthermore, this information gain comes always at the expense of destroying the state. Only when a reasonably large number N of identically prepared copies of the system are available, is an accurate estimation of the state possible. For a given N, the aim then is to find the measurement protocol that yields the best estimate of the input state.

The standard estimation optimization problem is suited for a situation where, say, an experimentalist is confronted with an unknown state of a system of which she is asked to provide an estimate, based, of course, on the results of a measurement of her choice. A quantitative assessment of her performance is usually given by the expected value of the fidelity (or some other distinguishability measure) between the unknown input and her guess (see below). Hence, it is implicitly assumed that the experimentalist is obliged to provide such a guess regardless of the measurement outcome she obtains. In this context, many results have been obtained in a large variety of scenarios over the last few years [1–12].

Here, we will study a variation of this setting suited for a situation where the experimentalist is allowed to decide whether to provide a guess or abstain from doing so. Of course, this decision cannot be based on the actual state of the system (which is unknown by definition) but rather on the result of a measurement. This relaxation of the original setting is very useful because it enables the experimentalist to post-select her measurement outcomes in order to provide a more accurate guess. That is, the possibility of abstaining enables her to discard instances where the measurement outcome turns out to be not informative enough. We will find that abstention can provide an important advantage, especially in noisy scenarios. The problem of 'state estimation with abstention'² is especially relevant in situations where the experimentalist can afford to re-run the experiment, i.e. she can easily prepare a new instance of the problem, or where she prioritizes having high-quality estimates.

Post-selection is a widely used tool in quantum information, particularly in experimental scenarios, where one has special demands or constraints. A form of abstention has already been explored in state discrimination [13], another important quantum statistical inference primitive. Discrimination aims at identifying in which one, out of a set of known quantum states, a system has been prepared. Two fundamental approaches are usually considered: the so-called 'minimum-error', where the experimentalist always has to provide a conclusive answer, at the expense, of course, of being wrong with a certain probability [14], and 'unambiguous discrimination', where no errors are permitted, but instead an inconclusive answer (abstention) may be given with some probability [15]. By varying the allowed rate of inconclusive answers [16–19], we may go from one approach to the other [20–22]. The possibility of abstaining has been studied in [23] for phase estimation, and in [24] for direction estimation with arbitrary pure input signals. In both cases the results show a significant improvement over the standard (without abstention) approach.

In this work we consider the optimal estimation of a completely unknown pure qubit state when N copies of it are available for measurement and when a certain rate (probability and rate will be used interchangeably throughout the paper) of abstention Q is permitted. Our approach can be viewed as a probabilistic purification that succeeds with probability Q (similar probabilistic maps for broadcasting and cloning have been employed, for instance, in [25–27]). We will show that in an ideal noise-free scenario, abstention does not improve the estimation accuracy. However, it does in a realistic noisy scenario, as we claimed above. Here we will consider a simplified model where noisy measurements will be replaced by local depolarizing channels followed by ideal measurements.

The paper is organized as follows. In section 2, we consider estimation without abstention. More precisely, we obtain the protocol that gives the best estimate of the state of a qubit based on non-ideal measurements on N independent and identically prepared systems. In section 3, estimation with abstention is introduced, and the optimal protocol for a fixed value of the abstention rate Q is obtained. We study the asymptotic regime of large N and derive the corresponding maximum fidelity and probability of abstention. As an example, we also consider a scenario where abstention gives a drastic improvement. This is the case when noise increases with N in such a way that the fidelity of the estimation approaches a finite value less than 1 as N becomes large. We close the paper with some brief conclusions and present an outlook for future work.

2. No abstention

Let us consider N copies of a completely unknown pure qubit state $\left |{\vec {n}}\right \rangle$ (throughout the paper $\vec n$ will denote a unit Bloch vector) that we wish to estimate by performing a realistic, and therefore noisy, quantum measurement. We model it as an ideal measurement preceded by the single-qubit depolarizing channel acting on every copy:

$\begin{equation} \mathcal{E}(\rho)=(1-\eta) \rho +\frac{\eta}{3}(\sigma_{x}\rho\sigma_{x}+\sigma_{y}\rho\sigma_{y}+\sigma_{z}\rho\sigma_{z}), \end{equation} \tag{ 1 }$

where with probability 1 − η no error occurs, while with probability η the state is affected by either a bit-flip, a phase-flip or both. This error probability η is assumed to be known by the experimentalist; therefore, for the purpose of analyzing the effects of noise in the estimation process, we will transfer its effect to the input states and optimize the estimation protocol over ideal measurements. Hence, we will consider input states of the form

$\begin{equation} \rho(\vec{n})=r \left|{\vec{n}}\rangle\!\langle{\vec{n}}\right| +(1-r)\frac{\mathbbm{1}}{2}=\frac{\mathbbm{1}+r\, \vec{n}\cdot \vec{\sigma}}{2}, \end{equation} \tag{ 2 }$

with r = 1 − (4/3)η. In other words, we will assume that the input states either do not change with probability r or they become completely randomized with probability 1 − r = (4/3)η. The original problem is thus equivalent to the estimation of a pure state $|\vec n\rangle$ (or of a uniformly distributed Boch vector $\vec {n}$ ) based on the outcomes of an appropriate ideal measurement on N copies of the mixed state $\rho (\vec n)$ in equation (2), i.e. on the state $\rho (\vec {n})^{\otimes N}=\tau (\vec {n})$ .

For each measurement outcome χ an estimate $|\vec n_\chi \rangle$ is provided according to some guessing rule $\chi \to |\vec n_\chi \rangle$ . We choose to quantify the quality of the estimate by means of the squared overlap

$\begin{equation} f(\vec{n},\vec{n}_\chi)= |\left\langle{\vec{n}}\right| \vec{n}_\chi \rangle|^2, \end{equation} \tag{ 3 }$

also known as the fidelity. The overall quality of the estimation protocol is then given by the average fidelity

$\begin{equation} F=\sum_\chi \int {\rm d}n f(\vec{n},\vec{n}_\chi)\, p(\chi|\vec{n}), \end{equation} \tag{ 4 }$

where dn = sinθ dθ dϕ/(4π) is the uniform probability distribution on the two-sphere and $p(\chi |\vec {n})$ is the conditional probability of obtaining the outcome χ if the input state is $\tau (\vec {n})$ . This probability is given by the Born rule $p(\chi |\vec {n})=\mathrm {tr}[\Pi _\chi \tau (\vec {n})]$ , where Π_χ ⩾ 0 are the elements of a positive operator valued measure (POVM). They satisfy the completeness relation $\sum _\chi \Pi _\chi =\mathbbm {1}$ , where $\mathbbm {1}$ denotes the identity operator in the space spanned by the input states $\{\tau (\vec {n})\}$ . The index χ may be discrete, continuous or both.

A protocol (i.e. a measurement {Π_χ} and a guessing rule $\chi \to \left |{\vec {n}_\chi }\right \rangle$ ) is said to be optimal if it maximizes F. For pure states, r = 1, the maximum fidelity is well known [3]:

$\begin{equation} F=\frac{N+1}{N+2}=1-{1\over N}+ {\mathcal O}(N^{-2}). \end{equation} \tag{ 5 }$

It is also known that the (continuous) covariant POVM [3]

$\begin{equation} \Pi(\vec{s})=(2J+1) U(\vec{s})\left|{J\; J}\rangle\!\langle{J\; J}\right| U^{\dag}(\vec{s}) \end{equation} \tag{ 6 }$

(with the obvious guessing rule $\Pi (\vec s)\to |\vec s\rangle$ ) is optimal. In (6), we use the standard notation, where {|jm〉}^j_m=−j is the eigenbasis of the total angular momentum operators J² and J_z. We denote by $U(\vec {s})=[u(\vec s)]^{\otimes N}$ , $u(\vec s)\in \mbox {SU(2)}$ (the unitary representation of) the rotation that maps the unit (Bloch) vector $\hat z$ into $\vec s$ (thus $u(\vec s)|{1\over 2}{1\over 2}\rangle =|\vec s\rangle$ ), and we have also introduced the definition J ≡ N/2. Note that the POVM $\{\Pi (\vec s)\}$ acts on the symmetric subspace of largest total angular momentum J, of dimension 2J + 1 = N + 1. In terms of J, (5) can also be written as

$\begin{equation} F=\frac{1}{2}\left( 1+\frac{J}{J+1}\right)\equiv \frac{1}{2}\left( 1+\Delta_J\right). \end{equation} \tag{ 7 }$

Mixed states span a much larger Hilbert space and the computation becomes more involved. Due to the permutational invariance of the input state, it greatly simplifies in the total angular momentum basis, where $\tau (\vec {n})$ is block-diagonal [9]. We have

$\begin{equation} \tau(\vec{n})=\sum_{j=j_{{\rm min}}}^{J} p_{j} \tau_{j} (\vec{n})\otimes\frac{\mathbbm{1}^{(j)}_{\rm rep}}{n_j} , \end{equation} \tag{ 8 }$

where $\tau _{j}(\vec n)$ is the normalized mixed state

$\begin{equation} \tau_{j}(\vec{n})=\frac{1}{Z_j} \sum_{m=-j}^{j} R^m\; U(\vec n) \left|{jm}\right\rangle\left\langle{jm}\right| U^\dagger(\vec n), \end{equation} \tag{ 9 }$

with the definitions

$\begin{equation} Z_j=\sum_{m=-j}^{j}R^m=\frac{R^{j+1}-R^{-j}}{R-1},\quad R=\frac{1+r}{1-r}>1. \end{equation} \tag{ 10 }$

The projector $\mathbbm {1}^{(j)}_{\rm rep}$ stands for the various occurrences of the irreducible representation of total angular momentum j. It has dimension n_j given by

$\begin{eqnarray} n_j &=& \left( \begin{array}{c} 2J \\ J-j \end{array} \right) - \left( \begin{array}{c} 2J \\ J-j-1 \end{array} \right)\nonumber \\ &=& \left( \begin{array}{c} 2J \\ J-j \end{array} \right) \frac{2j +1}{J+j+1}. \end{eqnarray} \tag{ 11 }$

In the sum (9), j runs from j_min = 0 (j_min = 1/2) for N even (odd) to the maximum total angular momentum J, in contrast to the pure state case where only the maximum value J appears. The numbers p_j > 0 are the probabilities that the state $\tau (\vec {n})$ has quantum number j, i.e. $p_{j} =\mathrm {tr} [\mathbbm {1}_{j} \tau (\vec {n})]$ , where $\mathbbm {1}_{j}=\sum _{m=-j}^j|jm\rangle \langle jm|\otimes \mathbbm {1}^{(j)}_{\rm rep}$ is the projector onto the corresponding eigenspace, The probabilities read

$\begin{eqnarray} p_j&=&\left( \frac{1-r^2}{4}\right)^{J}n_j Z_j \nonumber\\ &=& \left( \frac{1-r^2}{4}\right)^{J} \left( \begin{array}{c} 2J \\ J-j \end{array} \right) \frac{2j +1}{J+j+1} \frac{R^{j+1}-R^{-j}}{R-1}. \end{eqnarray} \tag{ 12 }$

One can easily check that $\sum _j p_j=1$ , as it should be.

Because of the block diagonal form of the input states, an obvious optimal measurement consists of a direct sum of covariant POVMs,

$\begin{equation} \Pi(\vec{s})=\bigoplus_{j=j_{{\rm min}} }^{J}\Pi_{j}(\vec{s})\otimes\mathbbm{1}^{(j)}_{\rm rep}, \end{equation} \tag{ 13 }$

where each of them is a straightforward generalization of equation(6):

$\begin{equation} \Pi_{j}(\vec{s})=(2j+1)\, U(\vec{s})\left|{j\; j}\rangle\!\langle{j\; j}\right| U^{\dag}(\vec{s}). \end{equation} \tag{ 14 }$

One can easily check that the completeness condition $\int {\rm d}s \,\Pi (\vec {s})=\mathbbm {1}$ holds. The total fidelity then is

$\begin{equation} F=\frac{1}{2}\left( 1+ \sum_{j=j_{\rm min}}^J p_j \Delta_j \right) , \end{equation} \tag{ 15 }$

where [1]

$\begin{equation} \Delta_j=\frac{\langle J_z \rangle_j}{j+1}=\frac{{\rm tr} [J_z \,\tau_j(\hat{z})]}{j+1}. \end{equation} \tag{ 16 }$

A straightforward calculation gives

$\begin{equation} \langle J_z \rangle_j=\frac{1}{Z_j} \sum_{m=-j}^{j} m R^m=j-\frac{1}{R-1}+\frac{2j+1}{R^{2j+1}-1}. \end{equation} \tag{ 17 }$

Note that for pure states, one has R → ∞, and in turn 〈J_z〉_J → J, in agreement with equation (7).

As will be shown in the next section, for asymptotically large N the probability p_j peaks at a value of j ≃ rJ, which gives the dominant and subdominant contributions to the sum in (15). Up to order 1/N, and discarding exponentially vanishing contributions (e.g. ∼R^−rJ), the asymptotic fidelity turns out to be

$\begin{equation} F=1- \frac{1}{Nr}\frac{r+1}{2 r}+\cdots . \end{equation} \tag{ 18 }$

This result is interesting on its own and, to the best of our knowledge, has not been presented before. Note that for pure states (r = 1), equation (18) agrees with the asymptotic expression of the fidelity in equation (5).

3. Abstention

In this section we focus on estimation protocols where the experimentalist is allowed to abstain, i.e. refrain from giving an answer, if the outcome of the measurement she has carried out cannot provide a good enough estimate of the unknown state. Obviously, F cannot decrease by excluding these abstentions from the average. In noisy scenarios, such as that considered in this paper, F actually increases, as will be shown below. Our aim is to quantify this gain and find the optimal protocol. In our approach, the probability of abstention, Q, is kept fixed, rather than unrestricted, since usually in practical situations one cannot afford to discard an unlimited amount of resources/state preparations.

3.1. The general framework

To enable the possibility of abstaining, the POVM representing the measurement must include the abstention operator, which we denote by Π₀, in addition to the operators {Π_χ}, each of which are associated with a specific estimate $|\vec n_\chi \rangle$ . Thus, the completeness relation reads

$\begin{equation} \sum_\chi \Pi_\chi +\Pi_0=\mathbbm{1}. \end{equation} \tag{ 19 }$

The probability of abstention (abstention rate) and that of producing an estimate (acceptance rate) are then given, respectively, by

$\begin{equation} Q=\int {\rm d}n\,{\rm tr}\left[\Pi_0\tau(\vec n)\right]\quad \mbox{and}\quad \skew3\bar Q=1-Q, \end{equation} \tag{ 20 }$

and the mean fidelity defined in (4) now becomes

$\begin{equation} F(Q)=\frac{1}{\skew3\bar Q}\sum_\chi\int {\rm d}n\, f(\vec n,\vec n_\chi){\rm tr}\left[\Pi_\chi \tau(\vec n)\right] , \end{equation} \tag{ 21 }$

where note that the sum does not include the Π₀ operator and $\skew3\bar Q$ takes into account the abstentions excluded from the average.

We next note that for any unitary transformation U of the type defined after equation (6), the operators {UΠ_χU^†,UΠ₀U^†} give the same value of Q and F(Q) as the original set {Π_χ,Π₀}, provided we change the guessing rule as $\vec n_\chi \to {\mathcal R}_U\vec n_\chi$ , where ${\mathcal R}_U$ is the SO(3) rotation whose unitary representation is U. Therefore, one can easily prove that Π₀ (the set {Π_χ}) can always be chosen to be SU(2) invariant (covariant) by simply averaging over U. In other words, with no loss of generality, the POVM elements that provide a guess $|\vec s\rangle$ can be chosen as

$\begin{equation} \tilde\Pi(\vec s)=U(\vec{s})\,\Pi\, U^\dagger(\vec{s}) , \end{equation} \tag{ 22 }$

where Π ⩾ 0 is the so-called seed of the POVM (in particular, note that $\tilde \Pi (\hat z)=\Pi$ ). The abstention operator then reads

$\begin{equation} \Pi_0=\mathbbm{1}- \int {\rm d}s\, \tilde{\Pi}(\vec{s}) , \end{equation} \tag{ 23 }$

which is manifestly rotationally invariant (as claimed above). It is thus proportional to the identity on each invariant subspace

$\begin{equation} \Pi_0= \bigoplus_{j=j_{\rm min}}^J a_j \mathbbm{1}_j, \end{equation} \tag{ 24 }$

where a_j are coefficients that satisfy the condition 0 ⩽ a_j ⩽ 1 and $\mathbbm {1}_j$ is the projector onto the corresponding eigenspace j previously defined. We can also choose $\tilde {\Pi }(\vec s)$ to have the block-diagonal form of the input state $\tau (\vec n)$ , namely,

$\begin{equation} \tilde{\Pi}(\vec{s})=\bigoplus_{j=j_{\rm min}}^J \tilde{\Pi}_{j}(\vec{s})\otimes\mathbbm{1}^{(j)}_{\rm rep}. \end{equation} \tag{ 25 }$

For a given {a_j}, the optimality of $\Pi _{j}(\vec {s})$ , defined in equation (14), clearly ensures that

$\begin{equation} \tilde{\Pi}_{j}(\vec{s})=(1-a_j) \Pi_{j}(\vec{s}) \end{equation} \tag{ 26 }$

are also optimal for estimation with abstention. We have from (20) that the abstention probability is simply

$\begin{equation} Q=\sum_{j=j_{\rm min}}^J p_j a_j , \end{equation} \tag{ 27 }$

where p_j is given in equation (12). The coefficients a_j can be understood as the probabilities of abstention conditional on the input state having total angular momentum j, i.e. a_j = p(abstention|j). Similarly, for a given j, the probability of producing an estimate, or accepting, is $\bar {a}_j=1-a_j=p ~(\mbox {acceptance}| j)$ .

From equation (21) we obtain

$\begin{equation} F(Q)=\frac{1}{2} \left( 1 + \sum_{j=j_{\rm min}}^J p_j\tilde{\Delta}_j\right), \end{equation} \tag{ 28 }$

where

$\begin{equation} \tilde{\Delta}_j=\frac{1-a_j}{1-Q}\Delta_j=\frac{\bar{a}_j}{\bar{Q}}\Delta_j, \end{equation} \tag{ 29 }$

and the quantity Δ_j is given in equations (16) and (17). Thus, we are only left with the free parameters a_j, which have to be optimized in order to maximize F(Q), subject to the constraints 0 ⩽ a_j ⩽ 1 and (27). Somehow expected, one can show that Δ_j is a monotonically increasing function of j, i.e. Δ_j−1 < Δ_j; therefore the largest contribution to the fidelity is given by Δ_J. This corresponds to $\bar a_{J}=1$ and $\bar a_j=0$ , j < J. Hence, for an unrestricted probability of abstention, the optimal protocol discards any contribution with j < J. This protocol, however, would provide an estimate with a probability that decreases exponentially with N, for r < 1, as p_J ≃ (1/r)[(1 + r)/2]^N+1. Note that in a noiseless scenario, r = 1, there is only the contribution j = J, which is already the optimal one and therefore abstention is of no use in such a case.

Clearly, for finite Q there can be contributions from other total angular momentum eigenspaces (j < J) compatible with equation (27). Recalling the monotonicity of Δ_j, and by convexity, it is obvious from equations (28) and (29) that there must exist an angular momentum threshold j* such that $\bar a_j=0$ ( $\bar a_j=1$ ), if j < j* (j > j*). The value j* is determined through equation (27) to be

$\begin{equation} j^*=\max\left\{j \ \mbox{such that} \ Q-\sum \limits_{j'=j_{{\rm min}}}^{j-1} p_{j'} \geqslant 0 \right\}. \end{equation} \tag{ 30 }$

Thus, we have

$\begin{equation} a_j=\left\{\begin{array}{ll} 1, & j<j^*, \\ \displaystyle p_j^{-1} \left(Q-\sum\limits_{j'=j_{{\rm min}}}^{j^*-1}p_{j'}\right), & j=j^*, \\ 0 , & j>j^* . \end{array} \right. \end{equation} \tag{ 31 }$

In a more physical language, the optimal strategy consists actually of two successive measurements. The experimentalist first makes a weak measurement to find the total angular momentum j of the input state $\tau (\vec n)$ and decides to abstain (provide a guess) if j < j* (j > j*). If j = j*, she simply decides randomly, by tossing a Bernoulli coin with probability a_j* of coming up heads, and if heads (tails) show up, abstain (provide a guess). In order to provide the actual guess, if she decides to do so, she makes the optimal POVM measurement $\{\Pi (\vec s)\}$ (or just $\{\Pi _{j}(\vec s)\}$ ) in equation (13) on the state $\tau _{j}(\vec n)$ that resulted from the first measurement.

3.2. A small number of copies

In figure 1 we plot the fidelity gain due to abstention [F(Q) − F(0)]/F(0) = ΔF/F(0) versus Q for N = 6, 8, and purities of r = 0.3 and 0.7. The structure of equation (31) is apparent from these plots: at Q = 0 (a_j = 0 for all j) there is, naturally, no gain; kinks sequentially appear at the precise values of Q where a new coefficient a_j in (31) becomes positive (and j* increases by one); the curves are convex between successive kinks, where the one a_j that has become positive, a_j*, keeps increasing. This pattern repeats until the abstention rate Q reaches a critical value Q_crit at which j* = J,

$\begin{equation} Q_{{\rm crit}}=1-p_{J}=1-{1\over r}\left(\frac{1+r}{2}\right)^{2J+1}\kern-1em+{1\over r}\left(\frac{1-r}{2}\right)^{2J+1} \end{equation} \tag{ 32 }$

(see equation (12)). Increasing Q further will not provide any additional gain, as the flat plateaus of figure 1 illustrate. This is so, since one can view the optimal abstention protocol as a filtering process where the low angular momentum components of the input state are filtered out. Hence, keeping the maximum value of j = J is the optimal filtering beyond which no further improvement is possible. Figure 1(a) shows that in noisy scenarios, e.g. r = 0.3, abstention can increase the fidelity quite notably, up to 15%. For higher purities the gain is more moderate, as shown in figure 1(b). The enhancement in this case is about 4–5% but with an abstention rate slightly above 50%. Further results are shown in figure 2, where we plot the fidelity gain as a function of the number of copies N for abstention rates larger than Q_crit, and for various values of the purity r. All the curves have a maximum at a value of N that varies with the purity. The lower the purity, the higher the value of N at which the maximum occurs (e.g. for r = 0.3 the maximum gain occurs at N = 12; for r = 0.1 the maximum is off scale at the right of the figure).

**Figure 2.** Fidelity gain as a function of N for various values of r, indicated in the legend, and for Q ⩾ Q_crit.
Download figure:
Standard image

As we have seen, the possibility of abstaining enables us to reach values of the fidelity that otherwise we could only attain with lower levels of noise. To quantify this effective reduction of noise, let us define an effective purity r_eff by the implicit equation F(r_eff,N,0) = F(r,N,Q). That is, for an estimation setting, given by r, N and Q, r_eff is the purity of the input states that would provide the same fidelity if the standard strategy without abstention (Q = 0) were used instead. Since r is related to the probability of error η in our model of noisy measurements in (1), an increase of the effective purity corresponds to an effective reduction of the amount of noise in the measurement through the relation η_eff = (3/4)(1 − r_eff). Figure 3 shows a plot of the effective purity r_eff as a function of Q for various values of r and N. As can be seen, r_eff increases faster at low values of N, but it saturates earlier (lower Q_crit), reaching a lower value. For low N and for a wide range of purities, 0.1 ≲ r ≲ 0.9, we observe a constant effective increase of the purity, r_eff ≈ r + 0.2, for reasonable values of the abstention rate Q. As N increases one has to go to higher values of the abstention rate, Q ∼ Q_crit, to have a significant gain. Hence, a moderate abstention rate is most effective in noisy scenarios when a small, but fair, number of copies are available.

**Figure 3.** Effective purity r_eff as a function of the abstention rate Q for N = 5 (dotted), 10 (dashed) and 30 (solid), and for purities of r = 0.01, 0.1, 0.3, 0.5, 0.7 and 0.9, which can be read off from the values of r_eff at Q = 0.
Download figure:
Standard image

Finally, let us point out that the protocol we have presented requires a projection on the total angular momentum eigenspaces. This is a non-local measurement that nonetheless can be implemented efficiently [31]. In a more extreme scenario where there are no restrictions on the abstention rate, one can attain the maximum fidelity with an even simpler strategy: make a local Stern–Gerlach measurement on every qubit (say, of the z-component of the spin) and abstain unless all outcomes agree. This strategy renders an abstention probability of Q = 1 − [(1 + r)/2]^N, which might be comparable to Q_crit in equation (32).

3.3. The asymptotic regime

We next compute the analytical expressions of the fidelity in the large N limit. Here it is useful to define the variable x as

$\begin{equation} x={j\over J},\quad 0\leqslant x\leqslant 1, \end{equation} \tag{ 33 }$

which becomes continuous in the limit N → ∞ (J → ∞). In this case, we can replace p_j by the continuous probability distribution in [0,1] defined by

$\begin{equation} p(x)=J p_{j=x J}, \end{equation} \tag{ 34 }$

so that $\int _0^1 \mathrm {d}x\,p(x)=1$ as N goes to infinity. Equation (28) can then be approximated by its continuous version, which reads

$\begin{equation} F={1\over2}\left[1+\int_0^1 {\rm d}x\, p(x)\tilde\Delta(x)\right], \end{equation} \tag{ 35 }$

where

$\begin{equation} \tilde\Delta(x)=\tilde\Delta_{j=x J}, \end{equation} \tag{ 36 }$

where recall that $\tilde \Delta _j$ is given in equation (29). From equation (31) we see that asymptotically $\bar a_j$ becomes the step function θ(x − x*), where x* = j*/J, and we have used the standard definition

$\begin{equation} \theta(x)=\left\{ \begin{array}{@{}ll@{}} 1, & x\geqslant 0,\\ 0, & x<0 . \end{array} \right. \end{equation} \tag{ 37 }$

With this, equation (29) becomes

$\begin{equation} \tilde\Delta(x)={\theta(x-x^*)\over \skew3\bar Q}\Delta(x), \end{equation} \tag{ 38 }$

and, in turn,

$\begin{equation} F={1\over2}\left[1+{1\over\skew3\bar Q}\int_{x^*}^1{\rm d}x\,p(x)\Delta(x)\right]. \end{equation} \tag{ 39 }$

It also follows from (27) that

$\begin{equation} \skew3\bar Q=\int_0^1 {\rm d}x\, p(x)\theta(x-x^*)=\int_{x^*}^1 {\rm d}x\, p(x) . \end{equation} \tag{ 40 }$

At this point, we need to find a good approximation to p(x) that would enable us to obtain the explicit form for the asymptotic fidelity. From equation (12), and using the Stirling formula, we obtain

$\begin{equation} p(x)\simeq \sqrt{\frac{N}{2 \pi}}\frac{1}{\sqrt{1-x^2}} {x(1+r)\over r(1+x)}\;{\rm e}^{- N H(\frac{1+x}{2}\parallel\frac{1+r}{2})}, \end{equation} \tag{ 41 }$

where H(s∥t) is the (binary) relative entropy

$\begin{equation} H(s\parallel t)=s \log\frac{s}{t}+ (1-s)\log\frac{1-s}{1-t} , \end{equation} \tag{ 42 }$

and the approximation is valid for both x and r in the open unit interval (0,1). The appearance of a relative entropy³ in equation (41) can be understood as follows. Our N-copy input state (diagonal in the canonical J_n basis) can be thought of as a classical coin tossing distribution of N identical coins with a bias of (1 + r)/2. From the theory of types [30], it is well known that the probability to obtain k heads is given by the Kulback–Leibler distance (or relative entropy) between the empirical distribution {f = k/N,1 − f} and the distribution {(1 + r)/2,(1 − r)/2}. That is, $p(k)\sim \exp \{-N H[ f \parallel (1+r)/2]\}$ to first order in the exponent. The number of heads k is in one-to-one correspondence with the magnetic quantum number, m = k − J, and the conditioned probability p(j|m) is strongly peaked at m = j, as one can easily check. It follows that the probability that the input state has total angular momentum j, given by $p(j)=\sum _{m} p(j|m)p(m)$ , will be asymptotically determined by the probability distribution p(m), which has a convenient expression in terms of the typical and the empirical distribution of up/down outcomes.

From equation (41) it follows that p(x) is peaked at the value x = r, i.e. at j = rJ, as shown in figure 4 and stated without a proof in section 2. Actually, around the peak, x ∼ r, the exponent becomes quadratic and p(x) approaches the Gaussian distribution

$\begin{equation} p(x)\simeq\sqrt{N\over2\pi(1-r^2)} \,{\rm e}^{ -N \frac{(x-r)^2}{2(1-r^2)}}, \end{equation} \tag{ 43 }$

as also follows from the central limit theorem, whereas it falls off exponentially elsewhere.

**Figure 4.** Plots of Δ(x) (blue line with solid circles) and p(x) (red line with empty circles) for N = 100 and r = 0.5. The circles represent the quantities Δ_j and Jp_j as a function of x = j/J. The shaded area indicates the acceptance region for an abstention rate Q ∼ 93%.
Download figure:
Standard image

It is now apparent that, asymptotically, abstention has negligible impact if components with j below rJ are filtered out (x* < r), since the main contribution to the fidelity, which comes from the peak around x ≃ r, is not excluded from the integral in equation (39) (only the left exponentially decaying tail is). For the same reason (see equation (40)), $\skew3\bar Q\simeq 1$ (the abstention rate Q is exponentially small), and equation (39) yields

$\begin{equation} F = 1- \frac{1}{2N}\frac{r+1}{r^{2}}+\cdots \quad \mbox{ for }\ \ x^*< r, \end{equation} \tag{ 44 }$

which is the same expression as the asymptotic fidelity of the protocol without abstention, equation (18).

It is then clear that, in order to have a discernible improvement in the fidelity, the abstention threshold x* must lie on the right of the peak of the probability distribution. The fidelity in (39) then can be written as

$\begin{equation} F \simeq \frac{1}{2} \left[1+ {p(x^{*})\over\skew3\bar Q}\Delta(x^{*})\right] \simeq \frac{1}{2}\left[1+\Delta(x^{*})\ \right],\quad x^*> r, \end{equation} \tag{ 45 }$

where we have used that for x ⩾ x* > r and for large enough N, p(x) falls off exponentially and the integral can be approximated by the value of the integrand at its lower limit. By the very same argument equation (40) gives

$\begin{equation} \skew3\bar Q\simeq p(x^*), \end{equation} \tag{ 46 }$

which has also been used in (45). Using now (41) we obtain that in the asymptotic limit of many copies, the rate at which our protocol provides a guess is

$\begin{equation} \bar{Q}\sim \exp\left[- N H\left(\frac{1+x^*}{2}\left\| \frac{1+r}{2}\right)\right.\right] . \end{equation} \tag{ 47 }$

Recalling equations (16) and (17) we obtain the optimal fidelity

$\begin{equation} F=1- \frac{1}{2Nx^{*}}\frac{r+1}{r}+\cdots ,\quad \mbox{for $r\leqslant x^{*}\leqslant 1$}, \end{equation} \tag{ 48 }$

for a value of Q given by (47). For x* = r the results (18) and (44) are recovered, whereas for x* → 1 (Q ⩾ Q_crit) the maximum average fidelity is attained:

$\begin{equation} F_{{\rm max}}=1- \frac{1}{2N }\frac{r+1}{r}+\cdots . \end{equation} \tag{ 49 }$

The advantage provided by our estimation with the abstention protocol can be quantified by the effective number of copies that the standard protocol without abstention would require to achieve the same fidelity: $N_{\mathrm {eff}}=(x^*\kern -.2em/r) N$ , where x*∈[r,1) is determined by the abstention rate Q through (47). For high noise levels (low purity, r ≪ 1) our protocol provides an important saving of resources/copies, as N_eff/N = 1/r ≫ 1, whereas for nearly ideal detectors the saving in this asymptotic regime is more modest.

Alternatively, the advantage discussed above can also be quantified by the effective measurement-noise reduction, or equivalently, the effective purity r_eff (see section 3.2). Using (48) one can easily find a simple expression for the effective purity in the asymptotic limit and for large abstention rate: $r_{\mathrm {eff}} = (r+\sqrt {4r+5 r^{2}})/[2 (1+r)]$ . In the limit of very low noise levels the errors probability η (recall equation (1)) is effectively reduced by a factor of three, i.e. η_eff = η/3, while in the opposite limit of very noisy measurements, one finds that $r_{\mathrm {eff}}=\sqrt {r}$ .

3.4. Other regimes

In the previous section we have seen how a gain in fidelity can be obtained provided the 'acceptance' rate $\skew3\bar Q$ falls off exponentially as N becomes very large. Here we give an example where this gain takes place even at finite $\skew3\bar Q$ .

At a fixed noise level (purity r), the fidelity is an increasing function of N. However, one could imagine an experimental setup where the noise (purity) also increases (decreases) with N. If this is so, the asymptotic fidelity could be strictly less than one, or in other words, perfect estimation could be unattainable even with unbounded resources. This is the case in our example, were we assume that $r={a}/\sqrt N$ , a being a positive constant. Note that the threshold x* must also scale as $1/\sqrt {N}$ in order to have a reasonably low abstention rate. Therefore, it is convenient to use a new variable $\xi =\sqrt {N} x=\sqrt N\, j/J=2j/\sqrt N$ instead. Then, the probability distribution in this new variable is

$\begin{equation} p(\xi)={\sqrt N\over2} p_{j=\xi\sqrt N/2},\quad {\rm with}\ r={a\over\sqrt{N}}. \end{equation} \tag{ 50 }$

Recalling equation (12) and using the Stirling formula this equation gives

$\begin{equation} p(\xi)={{\rm e}^{-\left({\xi-{a}\over\sqrt2}\right)^2}-{\rm e}^{-\left({\xi+{a}\over\sqrt2}\right)^2}\over\sqrt{2\pi}{a}}\xi \end{equation} \noindent \tag{ 51 }$

to leading order in inverse powers of N. The subleading terms are of order N^−1/2 and will be neglected here. For a given threshold value $\xi ^* =2 j^*/\sqrt {N}$ the abstention rate is

$\begin{equation} Q=\int_0^{\xi^*}p(\xi)\,{\rm d}\xi={1\over2}\left({\rm erf}\,{\xi^*_+}+{\rm erf}\,{\xi_{-}^*}\right)-{{\rm e}^{-{\xi^*_-}^2}-{\rm e}^{-{\xi^*_+}^2}\over\sqrt{2\pi}{a}} , \end{equation} \noindent \tag{ 52 }$

where ${\xi ^*_{\pm }}=(\xi ^*\pm {a})/\sqrt 2$ and erf x is the error function.

From equations (16) and (17) we have in this same regime and at leading order

$\begin{equation} \Delta(\xi)=\Delta_{j=\xi\sqrt N/2}=1-{2\over1-{\rm e}^{2{a}\xi}}-{1\over{a}\xi} . \end{equation} \noindent \tag{ 53 }$

With the above, the fidelity (28) (or rather, the counterpart of (35)) is

$\begin{equation} F={1\over2}\left[1+\int_0^\infty {\rm d}\xi\, p(\xi)\tilde\Delta(\xi)\right]= {1\over2}\left[1+{1\over\skew3\bar Q}\int_{\xi^*}^\infty {\rm d}\xi\, p(\xi)\Delta(\xi)\right], \end{equation} \noindent \tag{ 54 }$

where the last integral can be computed to be

$\begin{eqnarray} \Delta^*&\equiv&\int_{\xi^*}^\infty\Delta(\xi)p(\xi)\,{\rm d}\xi\\ &=&{1-{a}^2\over2{a}^2}\left({\rm erf}\,{\xi^*_-}-{\rm erf}\,{\xi^*_+}\right) +{{\rm e}^{-{\xi^*_-}^2}+{\rm e}^{-{\xi^*_+}^2}\over\sqrt{2\pi}{a}} .\nonumber \end{eqnarray} \noindent \tag{ 55 }$

We can finally write the fidelity as

$\begin{equation} F={1\over2}\left(1+\frac{\Delta^*}{1- Q}\right)+{\mathcal O}(N^{-1/2}). \end{equation} \noindent \tag{ 56 }$

As shown in (52) and (55), both Q and Δ* are functions of the filtering threshold ξ*, which is just a properly scaled version of the original threshold j*. Finding the maximum fidelity for a given rate of abstention Q requires inverting equation (52) to obtain ξ*(Q), but this cannot be done analytically and one has to resort to numerical methods.

In figure 5 we plot F as a function of Q for a = 1. The increase of the fidelity in the asymptotic regime of large N is clearly seen: e.g., an abstention rate of 50% yields a rise of about 10%, and it goes up to about 30% for higher (but still reasonable) values of Q. The figure also shows the agreement between the approximate form of the fidelity given by equations (52)–(56) and the numerical evaluation of its exact expression in (28).

**Figure 5.** Plot of the fidelity as a function of Q for $r=a/\sqrt N$ , with the choice a = 1.0, N = 10⁶ (red circles). The solid line (in blue) is the leading term in equation (56) plotted as a function of Q (a parametric plot of the pairs (Q,F), as given by equations (52) and (56)).
Download figure:
Standard image

It should be noted that in the regime described here a rise of the input size N fails to replicate the fidelity improvement that results from increasing the rate of abstention (no N_eff can be defined in this regime); thus abstention appears to be the only means by which one can improve estimation.

4. Conclusions

In this work we have addressed optimal estimation of pure qubit states when abstention from providing an outcome is allowed. We have considered a reasonably realistic multiple copy scenario, where a sample of N identically prepared systems go through a non-ideal (noisy) process of measurement. We have shown that in the limit of zero noise, abstention does not help to improve estimation (it does not hamper it either). However, abstention turns out to counterbalance the adverse effect of errors in a noisy process of measurement. We have shown that, in general, abstention is most useful for inputs of a few copies and for error rates of the order of a few per cent. For example, for N = 6 and a value of the error probability of η = 0.5 (per qubit), one can easily attain fidelity gains of the order of 15% with an abstention rate of Q = 4/5. As N increases, one needs to allow for higher abstention rates to obtain a significant improvement. We have given analytical asymptotic expressions of the fidelity valid in the limit of a large number of copies. In this limit, abstention can have the effect of increasing the number of copies by a constant fraction: N_eff/N = x*/r (x* > r), with an acceptance rate $\skew3\bar Q$ given by the relative entropy: $-(1/N)\log \bar {Q}= H[(1+x^{*})/2\parallel (1+r)/2]$ . For low levels of noise, this amounts to reducing the error probability η by a factor of up to three.

We have also considered a scenario where the noise (per qubit) increases with the number of copies in such a way that perfect estimation is unattainable (lim_N→∞F < 1). In this case one can obtain a significant enhancement of the asymptotic fidelity (a few per cent) even for finite abstention probabilities Q < 1. Moreover, in such a scenario abstention appears to be the only way to improve estimation.

In broader parameter estimation contexts, where, e.g., phase or direction information is encoded in more general many-particle states, abstention may have a much more dramatic effect. These issues will be analyzed in a separate publication [24].

Acknowledgments

We acknowledge financial support from the European Regional Development Fund. This research was supported by the Spanish MICINN through contract no. FIS2008-01236 and the Generalitat de Catalunya CIRIT through contract no. 2009SGR-0985. We thank G Chiribella for useful discussions.

Beating noise with abstention in state estimation

Article metrics

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. No abstention