Device-independent randomness generation from several Bell estimators

Olmo Nieto-Silleras; Cédric Bamps; Jonathan Silman; Stefano Pironio

doi:10.1088/1367-2630/aaaa06

1. Introduction

In recent years, researchers have uncovered a fundamental relationship between the non-locality of quantum theory and its random character. This relationship is usually formulated as follows. Consider two (or generally k) separated quantum devices accepting, respectively, classical inputs x₁ and x₂ and outputting classical outputs a₁ and a₂. Let $p=\{p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})\}$ denote the set of joint probabilities describing how the devices respond to given inputs, from the point of view of a user who can only interact with the devices through the input–output interface, but who has no knowledge of the inner workings of the devices. Suppose that given p, the expectation value of a certain Bell expression f, such as the Clauser–Horne–Shimony–Holt (CHSH) expression [1], equals f[p]. Then, it is in principle possible to compute a lower bound on the randomness generated by the devices, as quantified by the min-entropy—the negative logarithm of the maximal probability of correctly guessing the values of future outputs. This bound on the min-entropy holds for any observer, including those having an arbitrarily precise description of the inner workings of the devices, and depends only on information derived from the resulting input–output behavior through the quantity f[p]. In principle, this bound can be computed numerically for any given Bell expression f. For certain Bell expressions, such as the CHSH expression, it can also be determined analytically.

This relation between the non-locality of quantum theory and its randomness is at the basis of various protocols for device-independent (DI) randomness generation (RNG) [2, 3] and quantum key distribution (QKD) [4, 5]. The theoretical analysis of such protocols presents us with an extra challenge in that the probabilistic behavior p of the devices is not known in advance and may vary from one measurement run to the next. This implies that bounds on the randomness as a function of f [p] have to be adapted to rely instead on the value of the Bell expression f estimated from experimental data. Some DIRNG and DIQKD protocols, and their security analyses, are reliant on specific Bell inequalities (usually the CHSH inequality) [6–9] or certain families of Bell inequalities [10–12], while others may be adapted to arbitrary Bell inequalities [3, 13–17]. However, to our knowledge all DIRNG and DIQKD protocols in the literature require that a single Bell inequality be chosen in advance and its experimental violation estimated (one exception is [18], where two fixed Bell expressions are used). The length and secrecy of the final key will then depend on the observed violation of the chosen inequality.

Nevertheless, it has been pointed out in [19, 20, 22] that the fundamental relation between the randomness and non-locality of quantum theory does not necessarily need to be expressed in terms of a specific Bell inequality. It is in principle possible, at least numerically, to bound the probability of guessing correctly the outputs of a pair of quantum devices directly from the knowledge of the joint input–output probabilities p. Indeed, the amount of violation f [p] of a given Bell inequality captures the non-local behavior of the devices only partially, and better bounds on the min-entropy can be obtained if all the information about the devices' behavior is taken into account.

This observation raises the following question: can one devise a DI RNG or QKD protocol that does not rely on the estimation of any a priori chosen Bell inequality, but which instead takes directly into account all the data generated by the devices?

There are various reasons for introducing protocols of this type. First, as already mentioned, the entire set of data generated by the devices can provide more information than the violation of a specific Bell inequality, and may therefore potentially allow for more efficient protocols. Second, the choice of a Bell inequality may have a deep influence on the amount of randomness that can be certified: as shown in [21] there are devices for which the amount of randomness, as computed from the CHSH inequality, is arbitrarily small, but is maximal if computed using another Bell inequality. Third, even if a set of quantum devices have been specifically designed to maximize the randomness according to a specific Bell inequality, the optimal extraction of randomness from noisy versions of such devices, say because of degradation of the devices with time, will typically rely on other Bell inequalities [19, 20, 22]. Finally, suppose that one is given a set of quantum devices without any specification of which Bell inequality they are expected to violate. Can one nevertheless directly use them in a protocol and obtain a non-zero random string or shared key, without testing their behavior beforehand?

We show here that it is indeed possible to devise DIRNG protocols which exploit more information than the estimated violation of a single Bell inequality, particularly, DIRNG protocols which exploit the full set of frequencies obtained (i.e., the entire set of estimates of the behavior p). Specifically, we introduce a DIRNG protocol whose security holds against an adversary limited to classical side information, or equivalently, with no long-term quantum memory. (Note that such a level of security may well be sufficient for all practical purposes [14, 15].) Technically, our protocol is obtained by generalizing the security analysis introduced in [14, 15] and combining it with the semidefinite programming techniques introduced in [19, 20] for lower-bounding the randomness based on the full set of probabilities p (which cannot be directly applied to experimental data.

We start in section 2 by briefly presenting the theoretical framework of our work, its main assumptions, and the notation used throughout the paper. In sections 3–5 we present our main mathematical results. In section 3 we present the main theorem of the paper and explain in detail how to put a DI bound on the randomness produced when measuring a Bell device n times in succession, given that we have a way to bound the single-round randomness as a function of the Bell expectation, and given that we can estimate the Bell expectation with some confidence. These two sub-procedures are respectively presented in sections 4 and 5 for the general case of an arbitrary number of Bell expressions. Combining these two sub-procedures with the general approach of section 3 immediately yields a DIRNG protocol, whose various steps are summarized in section 6. In section 7 we discuss in detail the main features of our protocol, and illustrate these with a numerical example. We end with some concluding remarks and open questions in section 8.

2. Behaviors and Bell expressions

In the following we will refer to a Bell set up, that is to say, k separated 'black' boxes (quantum devices whose inner workings are unknown), as a Bell device. Each box i can receive an input x_i upon which it produces an output a_i, with x_i and a_i taking values in some finite sets ${{ \mathcal X }}_{i}$ and ${{ \mathcal A }}_{i}$ , respectively, where without loss of generality we assume that the set of outputs ${{ \mathcal A }}_{i}$ does not depend on the input x_i. We write $x=({{\rm{x}}}_{1},\,\ldots ,\,{{\rm{x}}}_{k})$ and $a=({{\rm{a}}}_{1},\,\ldots ,\,{{\rm{a}}}_{k})$ for the k-tuple of inputs and outputs, and write ${ \mathcal X }={{ \mathcal X }}_{1}\,\times \cdots \times \,{{ \mathcal X }}_{k}$ and ${ \mathcal A }={{ \mathcal A }}_{1}\,\times \cdots \times \,{{ \mathcal A }}_{k}$ for the set of all possible k-tuples of inputs and outputs. Note that we use a roman (upright) type for the inputs and outputs of a single box and an italic type for the joint inputs and outputs of all k boxes.

The behavior of a single-round use of this Bell device can be characterized by the $| { \mathcal A }| \times | { \mathcal X }|$ joint probabilities $p(a| x)$ , which we can arrange into a vector $p\in {{\mathbb{R}}}^{| { \mathcal A }| \times | { \mathcal X }| }$ . We denote by ${ \mathcal Q }\subset {{\mathbb{R}}}^{| { \mathcal A }| \times | { \mathcal X }| }$ the set of behaviors p which admit a quantum representation, i.e., the set of behaviors such that there exist a k-partite quantum state and local measurements yielding the outcomes a with probability $p(a| x)$ when performing the measurements x. It is well-known that the set ${ \mathcal Q }$ can be approximated from its exterior (from outside the set) by a series of semidefinite programs (SDP) using the NPA hierarchy [23].

We define a Bell expression as a vector $f\in {{\mathbb{R}}}^{| { \mathcal A }| \times | { \mathcal X }| }$ with components f(a, x). The Bell expression f defines a linear form on the set of behaviors p through

$\begin{eqnarray}&&f[p]=\displaystyle \sum _{a,x}f(a,x)p(a| x).\end{eqnarray} \tag{ 1 }$

We refer to f [p] as the expectation of f with respect to the behavior p.

We consider here a framework in which the information we have about a Bell device is not necessarily given by the full behavior p, but possibly only by the expectation of one or more Bell expressions. In the following, we thus assume that t Bell expressions f_α ( $\alpha =1,\,\ldots ,\,t$ ) have been selected. (The certifiable randomness will depend on this initial choice of Bell expressions; we discuss this issue later.) We denote by ${\bf{f}}=({f}_{1},\,\ldots ,\,{f}_{t})$ these t Bell expressions and by ${\bf{f}}[p]=({f}_{1}[p],\,\ldots ,\,{f}_{t}[p])$ their expectations with respect to the behavior p. As an example, in a bipartite scenario, we might only know the value of the CHSH expression, in which case t = 1 and there is a single f defined by $f(a,x)={(-1)}^{{{\rm{a}}}_{1}+{{\rm{a}}}_{2}+{{\rm{x}}}_{1}{{\rm{x}}}_{2}}$ . But the framework is also applicable when ${\bf{f}}[p]$ corresponds to the full set p of probabilities. One simply needs to consider $| { \mathcal A }| \times | { \mathcal X }|$ expressions, one for each pairing (a, x), which are defined by ${f}_{a,x}(a^{\prime} ,x^{\prime} )={\delta }_{(a,x),(a^{\prime} ,x^{\prime} )}$ , so that ${f}_{a,x}[p]=p(a| x)$ .

Of course, in a DI protocol, we are not actually given ${\bf{f}}[p];$ we must instead estimate it by performing sequential measurements. We are thus led to consider a Bell device which is used n times in succession. We write $\vec{x}=({x}_{1},\,\ldots ,\,{x}_{n})$ and $\vec{a}=({a}_{1},\,\ldots ,\,{a}_{n})$ for the corresponding sequence of inputs and outputs and ${\vec{x}}_{j}=({x}_{1},\,\ldots ,\,{x}_{j})$ and ${\vec{a}}_{j}=({a}_{1},\,\ldots ,\,{a}_{j})$ for the sequences of inputs and outputs up to, and including, round j.

We write $P(\vec{a}| \vec{x})$ for the conditional probabilities of obtaining the sequence of outputs $\vec{a}$ given a certain sequence of inputs $\vec{x}$ . Note that we use an upper-case P to denote the n-round behavior of the boxes and lower-case p's for single-round behaviors. We assume that the Bell device is probed using inputs $\vec{x}$ distributed according to a probability distribution ${\rm{\Pi }}(\vec{x})$ . We will consider, in particular, the case where at each round the inputs are selected according to identical and independent distributions π(x), so that ${\rm{\Pi }}(\vec{x})={\prod }_{j\,=\,1}^{n}\pi ({x}_{j})$ (though this condition can actually be slightly relaxed in the results that follow). The full (non-conditional) n-round probabilities are thus given by $P(\vec{a},\vec{x})=P(\vec{a}| \vec{x}){\rm{\Pi }}(\vec{x})$ . We denote by P_AX and ${P}_{A| X}$ the distributions corresponding to the probabilities $P(\vec{a},\vec{x})$ and $P(\vec{a}| \vec{x})$ , respectively.

The only assumption we make about the Bell device is that at each round it is characterized by a joint entangled quantum state and a respective set of local measurement operators for each box. Each set of local measurement operators can depend on the past inputs and outputs of all k boxes (separated boxes can thus freely communicate between measurement rounds), but does not depend on future inputs (inputs are thus selected independently of the state of the device) or inputs of the k − 1 other boxes in the same round. Mathematically, this means that we can write $P(\vec{a}| \vec{x})={\prod }_{j=1}^{n}P({a}_{j}| {x}_{j},{\vec{a}}_{j-1},{\vec{x}}_{j-1})$ , and that the (single-round) behavior at round j given the past inputs and outputs ${\vec{x}}_{j-1}$ and ${\vec{a}}_{j-1}$ , defined as ${p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}({a}_{j}| {x}_{j})=P({a}_{j}| {x}_{j},{\vec{a}}_{j-1},{\vec{x}}_{j-1})$ , should be a valid no-signaling quantum behavior, i.e., ${p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}\in { \mathcal Q }$ .

We assume that the internal behavior of the boxes may be classically correlated with a system held by an adversary. Formally, these correlations and the adversary's knowledge can be represented through the joint probabilities $P(\vec{a},\vec{x},e)$ , where e denotes the adversary's classical side information. However, in order to keep the notation simple, we do not explicitly include e in the following. All the reasonings that follow would nevertheless hold, with only minor modifications, if the adversary's classical side information e were explicitly taken into account. This can be understood by comparing our proofs with those in [14]. Alternatively, e can be formally viewed as an initial input x₀ = e.

In the following, we sometimes adopt a terminology where the k-tuples x and a are referred to as the input and output of (a single-round use of) the Bell device (though of course each consists of the inputs and outputs, respectively, of all k boxes).

3. A general procedure for DIRNG against classical side-information

In this section, we show how to quantify the randomness produced by n sequential uses of the Bell device based on the Bell expressions ${\bf{f}}$ . We follow the approach introduced in [3, 14]. This approach relies on two essential sub-procedures: a first sub-procedure to bound the randomness of single-round behaviors and a second sub-procedure to estimate a certain quantity involving the Bell expressions ${\bf{f}}$ . Given these two ingredients, the single-round randomness bound can, through some simple algebra, be adapted to the n-round scenario and related to the actual data obtained in the Bell experiment.

We provide a macro-level description of this approach, which relies only on certain general mathematical properties that these two basic sub-procedures must satisfy, but not on any specifics as to how to implement them. We will present explicit ways to carry out these sub-procedures in the next two sections.

Intuitively, the output of the Bell device exhibits randomness for some choice of input $\vec{x}$ if there is no corresponding outcome that is certain to happen, i.e., if $P(\vec{a}| \vec{x})\lt 1$ for all $\vec{a}\in {{ \mathcal A }}^{n}$ . Equivalently, we can express this condition by saying that the surprisals $-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})$ are bounded away from zero: $-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})\gt 0$ for all $\vec{a}\in {{ \mathcal A }}^{n}$ . Our first aim will thus be to lower-bound these surprisals without making any assumptions regarding the Bell device's behavior apart from the ones stated in section 2. We will then see how to turn this bound into a more formal statement in terms of min-entropy.

To bound the n-round randomness, we assume the existence of a function H which bounds the single-round surprisal $-{\mathrm{log}}_{2}p(a| x)$ as a function of the Bell expectations ${\bf{f}}[p]$ . This is our first ingredient. We actually require this function to non-trivially bound the surprisals $-{\mathrm{log}}_{2}p(a| x)$ corresponding to a certain subset ${{ \mathcal X }}_{r}\subseteq { \mathcal X }$ of all possible inputs. This is because for certain behaviors, some inputs x lead to less predictable outputs than those resulting from other inputs, and we would therefore prefer to focus on these inputs only. (As will be elaborated on later, the amount of certifiable randomness generally depends on the choice of ${{ \mathcal X }}_{r}$ .) Formally, the function H, on which our results are based, is defined as follows.

Definition 1. Let ${\bf{f}}[{ \mathcal Q }]=\{{\bf{f}}[p]\,:p\in { \mathcal Q }\}$ be the set of Bell expectation vectors compatible with at least one quantum behavior. A function $H\,:{\bf{f}}[{ \mathcal Q }]\to [0,{\mathrm{log}}_{2}| { \mathcal A }| ]$ is a randomness-bounding (RB) function if it satisfies the following properties:

1.
${\min }_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}(-{\mathrm{log}}_{2}\,p(a| x))\geqslant H({\bf{f}}[p])$ for all $p\in { \mathcal Q }$ .
2.
$H({\bf{f}}[p])$ is a convex function of its argument:
$\begin{eqnarray}&&H(q\,{\bf{f}}[{p}_{1}]+(1-q){\bf{f}}[{p}_{2}])\leqslant {qH}({\bf{f}}[{p}_{1}])+(1-q)H({\bf{f}}[{p}_{2}])\end{eqnarray} \tag{ 2 }$
for any $0\leqslant q\leqslant 1$ and any ${p}_{1},{p}_{2}\in { \mathcal Q }$ .

We will also need to compute a lower bound on $H({\bf{f}}[p])$ for all behaviors $p\in { \mathcal Q }$ such that ${\bf{f}}[p]\in { \mathcal V }$ for some arbitrary region ${ \mathcal V }\subseteq {{\mathbb{R}}}^{t}$ . We thus extend our definition of H to sets, such that

$\begin{eqnarray}&&H({ \mathcal V })\leqslant \inf \{H({\bf{f}}[p]):\,{\bf{f}}[p]\in {\bf{f}}[{ \mathcal Q }]\,\cap { \mathcal V }\}.\end{eqnarray} \tag{ 3 }$

When ${\bf{f}}[{ \mathcal Q }]\,\cap { \mathcal V }=\varnothing$ , we define $H({ \mathcal V })=0$ . Furthermore, we define η to be a constant such that $\eta \geqslant {\eta }^{* }={\max }_{p\in { \mathcal Q }}H({\bf{f}}[p])$ . (In case ${\eta }^{* }$ is hard to compute, we may always use $\eta ={\mathrm{log}}_{2}| { \mathcal A }|$ .) We discuss the intuitive interpretation of H and its properties in section 5.

Given a RB function H, we can now easily lower-bound the n-round surprisals:

Lemma 1. Let $H$ be a RB function. Then, for any $(\vec{a},\vec{x})$ and any Bell device behavior ${P}_{A| X}$

$\begin{eqnarray}&&-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})\geqslant {nH}\left(\displaystyle \frac{1}{n}\displaystyle \sum _{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\right)-\nu (\vec{x})\eta ,\end{eqnarray} \tag{ 4 }$

where

$\begin{eqnarray}&&\nu (\vec{x})=\displaystyle \sum _{j=1}^{n}{{\mathbb{1}}}_{{ \mathcal X }\setminus {{ \mathcal X }}_{r}}({x}_{j})\end{eqnarray} \tag{ 5 }$

is the number of ${x}_{j}$ in $\vec{x}=({x}_{1},\,\ldots ,\,{x}_{n})$ which do not belong to the set ${{ \mathcal X }}_{r}$ .

Proof of lemma 1. The proof follows essentially the same steps as the proof of lemma 1 in [14]. The main differences are (a) that we express the bound equation (4) as a function of t Bell expressions, instead of a single Bell expression, and (b) that the bound considers explicitly only the randomness from the inputs in ${{ \mathcal X }}_{r}$ .

From our assumptions regarding the Bell device, it follows that for any $(\vec{a},\vec{x})$ we can write

$\begin{eqnarray}&&-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})=-{\mathrm{log}}_{2}\displaystyle \prod _{j=1}^{n}{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}({a}_{j}| {x}_{j})=\displaystyle \sum _{j=1}^{n}-{\mathrm{log}}_{2}{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}({a}_{j}| {x}_{j}).\end{eqnarray} \tag{ 6 }$

If ${x}_{j}\in {{ \mathcal X }}_{r}$ , then we can bound, according to the definition of the function H, the terms $-{\mathrm{log}}_{2}{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}({a}_{j}| {x}_{j})$ by $H({\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}])$ . If ${x}_{j}\notin {{ \mathcal X }}_{r}$ , it is certainly the case that $-{\mathrm{log}}_{2}{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}({a}_{j}| {x}_{j})\geqslant 0\geqslant H({\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}])-\eta$ . We can thus write

$\begin{eqnarray}&&-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})\geqslant \displaystyle \sum _{j=1}^{n}H({\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}])-\nu (\vec{x})\eta \end{eqnarray} \tag{ 7 }$

$\begin{eqnarray}&&\geqslant \,{nH}\left(\displaystyle \frac{1}{n}\displaystyle \sum _{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\right)-\nu (\vec{x})\eta ,\,\end{eqnarray} \tag{ 8 }$

where in the last line we have exploited the convexity of H.□

Lemma 1 tells us how to bound the surprisals $-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})$ as a function of the quantity $\tfrac{1}{n}{\sum }_{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]$ , which can be understood as an n-round average Bell expectation, where the average is taken conditioned on past inputs and outputs at each preceding round. This quantity, however, is not directly observable. This leads us to introduce the following definition of a confidence region, which is the second ingredient needed in our approach.

Definition 2. $A\ 1-\epsilon$ confidence region ${ \mathcal V }(\vec{a},\vec{x},\epsilon )$ for $\tfrac{1}{n}{\sum }_{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]$ is a subset of ${{\mathbb{R}}}^{t}$ such that, according to any distribution ${P}_{{AX}}$ ,

$\begin{eqnarray}&&\Pr \left[\displaystyle \frac{1}{n}\displaystyle \sum _{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\in { \mathcal V }(\vec{a},\vec{x},\epsilon )\right]\geqslant 1-\epsilon .\end{eqnarray} \tag{ 9 }$

We denote by $V=\{(\vec{a},\vec{x}):\tfrac{1}{n}{\sum }_{j\,=\,1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\in { \mathcal V }(\vec{a},\vec{x},\epsilon )\}$ the set of input–output sequences such that $\tfrac{1}{n}{\sum }_{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]$ belongs to the confidence region ${ \mathcal V }$ .

(Note that in general ${ \mathcal V }$ explicitly depends on $\vec{a}$ and $\vec{x}$ , although notation-wise this dependence is sometimes left implicit.) In other words, for small and large n, knowing the outcomes $(\vec{a},\vec{x})$ of n rounds of measurement, one can determine ${ \mathcal V }={ \mathcal V }(\vec{a},\vec{x},\epsilon )$ and assert with high confidence that $\tfrac{1}{n}{\sum }_{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]$ is somewhere in ${ \mathcal V }$ , even though its exact value cannot be deduced from $(\vec{a},\vec{x})$ alone. The assertion is false if and only if $(\vec{a},\vec{x})\notin V$ , which occurs with a probability smaller than by definition.

Combining equation (8) with this definition immediately implies the following:

Lemma 2. Let ${ \mathcal V }$ be a $1-\epsilon$ confidence region according to definition 2. Then for any $(\vec{a},\vec{x})\in V$

$\begin{eqnarray}&&-{\mathrm{log}}_{2}P(\vec{a}| \vec{x})\geqslant {nH}({ \mathcal V })-\nu (\vec{x})\eta .\end{eqnarray} \tag{ 10 }$

Lemma 2 tells us that the surprisal associated to the event $\vec{a}$ given $\vec{x}$ is lower-bounded by a function of $(\vec{a},\vec{x})$ , except for a subset of 'bad' events $\{(\vec{a},\vec{x})\notin V\}$ .

One way to deal with these bad events is simply to pretend that the boxes are characterized by a slightly modified behavior $\tilde{P}$ that yields a new 'abort' output $\vec{a}=\perp$ when one of the bad events is obtained (while according to P, the probability of $\vec{a}=\perp$ is zero). Effectively, $\tilde{P}$ can be thought of as post-processed version of the physical behavior P. Though this post-processed version cannot be achieved in practice by the user of the devices (since he does not know the set of bad events), it is well-defined physically (it could for instance be implemented by an adversary having a perfect knowledge of P). The relevant point is that since the probability of these bad events is extremely low for sufficiently small , the behaviors P and $\tilde{P}$ are, as shown below, close in variation distance, and analyzing the security using $\tilde{P}$ instead of P thus yields the same result up to vanishing error terms. (See [14] for a more detailed discussion.)

Lemma 3. There exists a behavior ${\tilde{P}}_{A| X}$ such that ${\tilde{P}}_{{AX}}={\tilde{P}}_{A| X}\times {{\rm{\Pi }}}_{X}$ and ${P}_{{AX}}={P}_{A| X}\times {{\rm{\Pi }}}_{X}$ are $\epsilon$ -close in variation distance, i.e.,

$\begin{eqnarray}&&d({\tilde{P}}_{{AX}},{P}_{{AX}})=\displaystyle \frac{1}{2}\displaystyle \sum _{\vec{a},\vec{x}}| \tilde{P}(\vec{a},\vec{x})-P(\vec{a},\vec{x})| \leqslant \epsilon ,\end{eqnarray} \tag{ 11 }$

and such that for any $\vec{a}\ne \perp$

$\begin{eqnarray}&&-{\mathrm{log}}_{2}\tilde{P}(\vec{a}| \vec{x})\geqslant {nH}({ \mathcal V })-\nu (\vec{x})\eta .\end{eqnarray} \tag{ 12 }$

Proof of lemma 3. The proof of this lemma is analogous to that of lemma 3 in [14]. Define ${\tilde{P}}_{A| X}$ as

$\begin{eqnarray}\tilde{P}(\vec{a}| \vec{x})=\left\{\begin{array}{ll}P(\vec{a}| \vec{x}) & \mathrm{if}\ (\vec{a},\vec{x})\in V,\\ 0 & \mathrm{if}\ (\vec{a},\vec{x})\notin V\ \mathrm{and}\ \vec{a}\ne \perp ,\\ {\displaystyle \sum }_{\vec{a}:(\vec{a},\vec{x})\notin V}P(\vec{a}| \vec{x}) & \mathrm{if}\ \vec{a}=\perp .\end{array}\right.\end{eqnarray} \tag{ 13 }$

Equation (11) follows immediately, and lemma 2 implies equation (12). □

We can now put a bound on the randomness of the Bell device as follows. Let λ denote the event that ${nH}({ \mathcal V })-\nu (\vec{x})\eta$ is greater than or equal to some a priori fixed threshold ${H}_{\mathrm{thr}}$ . Conditioned on λ occurring, we can bound the conditional min-entropy of the outputs given the inputs, ${H}_{\min }(A| X;\lambda )=-{\mathrm{log}}_{2}{\sum }_{\vec{x}}\tilde{P}(\vec{x}| \lambda ){\max }_{\vec{a}}\tilde{P}(\vec{a}| \vec{x};\lambda )$ , as follows (see [24] for a more detailed discussion of the concept of min-entropy and its relevance in our context):

$\begin{eqnarray}&&{H}_{\min }(A| X;\lambda )=-{\mathrm{log}}_{2}\displaystyle \sum _{\vec{x}}\tilde{P}(\vec{x}| \lambda )\mathop{\max }\limits_{}\tilde{P}(\vec{a}| \vec{x};\lambda )\end{eqnarray} \tag{ 14 }$

$\begin{eqnarray}&&=-{\mathrm{log}}_{2}\displaystyle \sum _{\vec{x}}\displaystyle \frac{\tilde{P}(\vec{x}| \lambda )}{\tilde{P}(\lambda | \vec{x})}\mathop{\max }\limits_{}\tilde{P}(\vec{a}| \vec{x})\,\end{eqnarray} \tag{ 15 }$

$\begin{eqnarray}&&\geqslant \,-{\mathrm{log}}_{2}\displaystyle \sum _{\vec{x}}\displaystyle \frac{\tilde{P}(\vec{x})}{\tilde{P}(\lambda )}{2}^{-{H}_{\mathrm{thr}}}\,\end{eqnarray} \tag{ 16 }$

$\begin{eqnarray}&&\geqslant \,{H}_{\mathrm{thr}}-{\mathrm{log}}_{2}\displaystyle \frac{1}{\tilde{P}(\lambda )}.\end{eqnarray} \tag{ 17 }$

In the second line we defined ${{\rm{\Lambda }}}_{\vec{x}}$ as the set of $\vec{a}$ 's such that the event λ occurs given $\vec{x}$ , and in the third line we used equation (12) and the fact that ${nH}({ \mathcal V })-\nu (\vec{x})\eta \geqslant {H}_{\mathrm{thr}}$ by the definition of λ. Comparing $\tilde{P}(\lambda )$ to some positive $\epsilon ^{\prime}$ directly implies the following result:

Theorem 1. Let $\epsilon$ and $\epsilon ^{\prime}$ be two positive parameters, let ${H}_{\mathrm{thr}}$ be some threshold, and let $\lambda$ be the event that ${nH}({ \mathcal V })-\nu (\vec{x})\eta \geqslant {H}_{\mathrm{thr}}$ , where ${ \mathcal V }$ is a $1-\epsilon$ confidence region according to definition 2. Then the behavior ${P}_{{AX}}$ is $\epsilon$ -close to a behavior ${\tilde{P}}_{{AX}}$ such that, according to ${\tilde{P}}_{{AX}}$ ,

1.
either $\Pr (\lambda )\leqslant \epsilon ^{\prime}$ ,
2.
or ${H}_{\min }(A| X;\lambda )\geqslant {H}_{\mathrm{thr}}-{\mathrm{log}}_{2}\tfrac{1}{\epsilon ^{\prime} }$ .

□

The meaning of this result is as follows. Suppose that we are able to compute a RB function according to definition 1 and, from the results $(\vec{a},\vec{x})$ of n rounds of measurements, a 1 − confidence region according to definition 2. We may thus compute the value of ${nH}({ \mathcal V })$ and check whether it is above the chosen threshold ${H}_{\mathrm{thr}}$ , i.e., whether the event λ occurred.

The given physical device that we used to generate the results $(\vec{a},\vec{x})$ is characterized by an unknown behavior P. The theorem indirectly characterizes the behavior P, by showing the existence of an -close behavior $\tilde{P}$ , where can be chosen arbitrarily small. The probability difference between the two distributions is thus at most for any event, and P and $\tilde{P}$ are almost indistinguishable. The theorem states that, assuming that the event λ occurs, the behavior $\tilde{P}$ is one of two possible kinds.

The first possibility if the event λ is observed is that the conditional min-entropy of $\tilde{P}$ is higher than ${H}_{\mathrm{thr}}-{\mathrm{log}}_{2}\tfrac{1}{\epsilon ^{\prime} }$ . This implies that $\tilde{P}$ contains extractable randomness: one can use a randomness extractor to process the raw outputs $\vec{a}$ and obtain a final string of bits, which is close to uniformly random according to $\tilde{P}$ and whose size is essentially ${H}_{\mathrm{thr}}-{\mathrm{log}}_{2}\tfrac{1}{\epsilon ^{\prime} }$ (the length and randomness of the output string will also depend on a security parameter _ext of the extractor itself) [25] . Since P is -close to $\tilde{P}$ , it follows that the output string will also be essentially uniformly random according to the actual behavior P of the device (see section III.D of [14] for details).

The second possibility is that the event λ occurred while being very unlikely: according to $\tilde{P}$ , $\Pr (\lambda )\leqslant \epsilon ^{\prime}$ , and thus, according to P, $\Pr (\lambda )\leqslant \epsilon ^{\prime} +\epsilon$ , where $\epsilon ^{\prime}$ can be chosen arbitrarily small. In this case there is no guaranteed lower bound on the conditional min-entropy. We cannot, of course, avoid such a possibility. For instance, a Bell device that simply outputs predetermined bits, which have been chosen uniformly at random by an adversary, will have zero conditional min-entropy, but may still pass any statistical test we can devise with some positive probability. Nevertheless, in this case, since λ is unlikely, the impact on the security of the protocol of (mistakenly) assuming that the conditional min-entropy bound of the theorem holds, will be negligible. We refer to section III.D of [14] for more details on how theorem 1 translates to a secure RNG protocol.

Note that more generally, one can use a sequence of thresholds ${H}_{0}\lt {H}_{1}\,\lt \cdots \lt \,{H}_{{\ell }}$ , rather than a single threshold. Theorem 1 then becomes a set of individual statements, regarding events λ_i where ${nH}({ \mathcal V })-\nu (\vec{x})\eta \in [{H}_{i},{H}_{i+1}[$ for $i=0,\,\ldots ,\,{\ell }-1$ . This means that the protocol admits intermediate thresholds of success leading to increasingly better min-entropy bounds, rather than being a single-threshold, all-or-nothing protocol.

4. Estimation

In this section we explicitly illustrate how to construct a confidence region, according to definition 2, using a straightforward estimator for the Bell expectations ${\bf{f}}[p]$ and applying the Azuma–Hoeffding inequality, as proposed in [3]. Note that it is possible to use other (tighter) concentration inequalities than the Azuma–Hoeffding inequality. In particular, we do not claim our specific choice to be optimal for a finite number of rounds n.

Let $(\vec{a},\vec{x})$ be the output–input sequence obtained in a certain realization of the n-round protocol. We define the observed frequencies as an estimation of the average behavior of the device based on the observed data,

$\begin{eqnarray}&&\hat{p}(a| x)=\displaystyle \frac{\#(a,x)}{n\pi (x)},\end{eqnarray} \tag{ 18 }$

where #(a, x) is the number of occurrences of the output–input pair (a,x) in the n rounds. As with probabilities, we refer to the full set of observed frequencies $(\hat{p}(a| x))$ as a vector $\hat{p}$ .

We define resulting estimators for the Bell expressions by substituting $\hat{p}$ for p in (1):

$\begin{eqnarray}&&{\bf{f}}[\hat{p}]=\displaystyle \sum _{a,x}{\bf{f}}(a,x)\hat{p}(a| x)=\displaystyle \frac{1}{n}\displaystyle \sum _{i=1}^{n}\displaystyle \frac{{\bf{f}}({a}_{i},{x}_{i})}{\pi ({x}_{i})}.\end{eqnarray} \tag{ 19 }$

To ease the notation, in the following we sometimes write $\hat{{\bf{f}}}$ instead of ${\bf{f}}[\hat{p}]$ . It should be kept in mind that $\hat{p}$ and $\hat{{\bf{f}}}$ are random variables, being functions of the observed event $(\vec{a},\vec{x})$ .

As shown in [3], a simple application of the Azuma–Hoeffding inequality yields the following result:

Lemma 4. For any $\alpha =1,\,\ldots ,\,t$ , let ${\epsilon }_{\alpha }^{\pm }\gt 0$ and let

$\begin{eqnarray}&&{\mu }_{\alpha }^{\pm }={\gamma }_{\alpha }\sqrt{\displaystyle \frac{2}{n}\mathrm{ln}\displaystyle \frac{1}{{\epsilon }_{\alpha }^{\pm }}},\end{eqnarray} \tag{ 20 }$

where

$\begin{eqnarray}&&{\gamma }_{\alpha }\geqslant \mathop{\max }\limits_{p\in { \mathcal Q }}\,\mathop{\max }\limits_{a,x}\left|\displaystyle \frac{{f}_{\alpha }(a,x)}{\pi (x)}-{f}_{\alpha }[p]\right|.\end{eqnarray} \tag{ 21 }$

Then

$\begin{eqnarray}&&\Pr \left[\displaystyle \frac{1}{n}\displaystyle \sum _{j=1}^{n}{f}_{\alpha }[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\leqslant {\hat{f}}_{\alpha }+{\mu }_{\alpha }^{+}\right]\geqslant 1-{\epsilon }_{\alpha }^{+}\end{eqnarray} \tag{ 22 }$

and

$\begin{eqnarray}&&\Pr \left[\displaystyle \frac{1}{n}\displaystyle \sum _{j=1}^{n}{f}_{\alpha }[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\geqslant {\hat{f}}_{\alpha }-{\mu }_{\alpha }^{-}\right]\geqslant 1-{\epsilon }_{\alpha }^{-}.\end{eqnarray} \tag{ 23 }$

Lemma 4 simply states that with high probability the n-round average $\tfrac{1}{n}{\sum }_{j=1}^{n}{f}_{\alpha }[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]$ , conditioned on the past, is no greater (no smaller) than the observed value ${\hat{f}}_{\alpha }$ plus (minus) some deviation ${\mu }_{\alpha }^{+}$ ( ${\mu }_{\alpha }^{-}$ ). This deviation tends to zero as $1/\sqrt{n}$ and directly depends on the quantity γ_α, which represents an upper bound on the maximum possible value of $| {f}_{\alpha }(a,x)/\pi (x)-{f}_{\alpha }[p]|$ , that is to say, the maximal extent to which the random variable f_α(a, x)/π(x) can differ from its expectation f_α[p]. In other words, ${\gamma }_{\alpha }$ bounds the possible statistical fluctuations which our observations can be subject to. A specific value for γ_α is given by

$\begin{eqnarray}&&{\tilde{\gamma }}_{\alpha }=\max \left\{\mathop{\max }\limits_{a,x}\displaystyle \frac{{f}_{\alpha }(a,x)}{\pi (x)}-\mathop{\min }\limits_{p\in { \mathcal Q }}{f}_{\alpha }[p],\,\mathop{\max }\limits_{p\in { \mathcal Q }}{f}_{\alpha }[p]-\mathop{\min }\limits_{a,x}\displaystyle \frac{{f}_{\alpha }(a,x)}{\pi (x)}\right\}.\end{eqnarray} \tag{ 24 }$

The terms ${\max }_{a,x}\tfrac{{f}_{\alpha }(a,x)}{\pi (x)}$ and ${\min }_{a,x}\tfrac{{f}_{\alpha }(a,x)}{\pi (x)}$ are easy to calculate, while the terms ${\max }_{p\in { \mathcal Q }}{f}_{\alpha }[p]$ and ${\min }_{p\in { \mathcal Q }}{f}_{\alpha }[p]$ can be computed through SDP using a NPA relaxation [23].

We can combine the above upper and lower bounds for all α through a union bound to get the following confidence region:

Lemma 5. Given ${\epsilon }_{\alpha }^{\pm }\geqslant 0$ for $\alpha =1,\,\ldots ,\,t$ , let ${\hat{{\bf{f}}}}^{\pm }$ be the vector $({\hat{f}}_{1}^{\pm },\,\ldots ,\,{\hat{f}}_{t}^{\pm })$ , where

$\begin{eqnarray}{\hat{f}}_{\alpha }^{\pm }=\left\{\begin{array}{ll}{\hat{f}}_{\alpha }\pm {\gamma }_{\alpha }\sqrt{\tfrac{2}{n}\mathrm{ln}\tfrac{1}{{\epsilon }_{\alpha }^{\pm }}} & \mathrm{if}\ {\epsilon }_{\alpha }^{\pm }\gt 0,\\ \pm \infty & \mathrm{if}\ {\epsilon }_{\alpha }^{\pm }=0,\end{array}\right.\end{eqnarray} \tag{ 25 }$

with ${\hat{f}}_{\alpha }$ as defined in equation (19) and ${\gamma }_{\alpha }$ as defined in equation (21).

Let the confidence region

$\begin{eqnarray}&&{ \mathcal V }=[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]=\{{\bf{f}}\in {{\mathbb{R}}}^{t}\,:\,{\hat{{\bf{f}}}}^{-}\leqslant {\bf{f}}\leqslant {\hat{{\bf{f}}}}^{+}\}.\end{eqnarray} \tag{ 26 }$

Then

$\begin{eqnarray}&&\Pr \left[\displaystyle \frac{1}{n}\displaystyle \sum _{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]\in [{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]\right]\geqslant 1-\epsilon ,\end{eqnarray} \tag{ 27 }$

where $\epsilon ={\sum }_{\alpha =1}^{t}({\epsilon }_{\alpha }^{+}+{\epsilon }_{\alpha }^{-})$ .□

In equation (26) the inequalities ${\hat{{\bf{f}}}}^{-}\leqslant {\bf{f}}\leqslant {\hat{{\bf{f}}}}^{+}$ —as all other vector inequalities in this paper—should be understood to hold component-wise, i.e., ${\hat{f}}_{\alpha }^{-}\leqslant {f}_{\alpha }\leqslant {\hat{f}}_{\alpha }^{+}$ for all α.

Note that when ${\epsilon }_{\alpha }^{+}=0$ (or ${\epsilon }_{\alpha }^{-}=0$ ), we are simply not putting any bound on $\tfrac{1}{n}{\sum }_{j=1}^{n}{\bf{f}}[{p}_{{\vec{a}}_{j-1},{\vec{x}}_{j-1}}]$ from above (or below). Indeed, it is not always useful to bound a Bell expression from both directions. Consider, for instance, the CHSH expression. It is well-known that the amount of certifiable randomness increases with the absolute value of the CHSH violation, increasing from 2 (the maximal local value) to $2\sqrt{2}$ (the maximal quantum value) and from −2 (the minimal local value) to $-2\sqrt{2}$ (the minimal quantum value). If we are estimating the randomness produced by our Bell device based only on the CHSH expression ${f}_{\mathrm{chsh}}$ , and strongly expect the CHSH expectation to be in the region $[2,2\sqrt{2}]$ , then it is certainly desirable to lower-bound it as accurately as possible. However, we have no interest in knowing that it is smaller than some value (since the randomness which can be certified is only affected by the lower bound in the region). For a given $\epsilon ={\epsilon }_{\mathrm{chsh}}^{+}+{\epsilon }_{\mathrm{chsh}}^{-}$ , we are therefore interested in setting ${\epsilon }_{\mathrm{chsh}}^{+}=0$ , so that ${\epsilon }_{\mathrm{chsh}}^{-}$ is as large as possible, and thus ${\hat{f}}_{\mathrm{chsh}}^{-}$ is as close as possible to ${\hat{f}}_{\mathrm{chsh}}$ . However, if we have no a priori reason to expect the CHSH expression to lie in one region or the other, ${\epsilon }_{\mathrm{chsh}}^{\pm }=\epsilon /2$ is the most natural choice.

5. Bounding single-round randomness

In section 3 we showed how to put a bound on the randomness produced by a Bell device which is used n times in succession, given a RB function H. We now discuss how we can explicitly compute such a function.

The function H is defined through two properties, as specified in definition 1. The first one is the condition that $-{\mathrm{log}}_{2}p(a| x)\geqslant H({\bf{f}}[p])$ for all $a\in { \mathcal A }$ , all $x\in {{ \mathcal X }}_{r}$ , and all $p\in { \mathcal Q }$ . The optimal function satisfying this first condition is simply given by

$\begin{eqnarray}\tilde{H}({\bf{f}}[p]) & = & \mathop{\min }\limits_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\mathop{\min }\limits_{p^{\prime} }\ -{\mathrm{log}}_{2}\,p^{\prime} (a| x)\\ & & \,\mathrm{subject}\ \mathrm{to}\qquad {\bf{f}}[p^{\prime} ]={\bf{f}}[p],\quad p^{\prime} \in { \mathcal Q }.\end{eqnarray} \tag{ 28 }$

Alternatively, we can pass the $-{\mathrm{log}}_{2}$ to the left of the minimizations, which then become maximizations, and we can thus write $\tilde{H}({\bf{f}}[p])=-{\mathrm{log}}_{2}\tilde{G}({\bf{f}}[p])$ , where

$\begin{eqnarray}\tilde{G}({\bf{f}}[p]) & = & \mathop{\max }\limits_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\mathop{\max }\limits_{p^{\prime} }\ p^{\prime} (a| x)\\ & & \mathrm{subject}\ \mathrm{to}\,{\bf{f}}[p^{\prime} ]={\bf{f}}[p],\quad p^{\prime} \in { \mathcal Q }.\end{eqnarray} \tag{ 29 }$

The functions $\tilde{H}$ and $\tilde{G}$ defined in this way have an intuitive interpretation. For a fixed behavior p and a fixed input $x,\tilde{H}={\min }_{a\in { \mathcal A }}(-{\mathrm{log}}_{2}p(a| x))$ is simply the min-entropy of the distribution $\{p(a| x)\}{}_{a\in { \mathcal A }}$ , while $\tilde{G}={2}^{-\tilde{H}}={\max }_{a\in { \mathcal A }}p(a| x)$ is the associated guessing probability, i.e., the optimal probability to correctly guess the output a given that we know that it is drawn from the distribution $\{p(a| x)\}{}_{a\in { \mathcal A }}$ . Both these quantities represent measures of the output randomness. However, we are generally interested in bounding the output randomness not only for a single input x, but simultaneously for a subset ${{ \mathcal X }}_{r}$ of all the inputs. In addition, we assume in the DI spirit that the full behavior p of our Bell device is generally not known, and that the device is characterized only by the Bell expectations ${\bf{f}}[p]$ . Taking the worst case of $\tilde{H}$ and $\tilde{G}$ over all inputs $x\in {{ \mathcal X }}_{r}$ and all quantum behaviors p compatible with the Bell expectations ${\bf{f}}[p]$ leads to (28) and (29).

The second requirement in definition 1 is that H should be a convex function. This property is used in lemma 1 to bound the randomness produced from n successive measurement rounds. However, the function defined by (28) is not necessarily convex. For fixed values of $a\in { \mathcal A }$ and $x\in {{ \mathcal X }}_{r}$ , let us denote ${\tilde{H}}_{a,x}({\bf{f}}[p])$ the function defined by the interior minimization, i.e., the minimum over $p^{\prime} \in { \mathcal Q }$ of $-{\mathrm{log}}_{2}p^{\prime} (a| x)$ subject to the constraint that ${\bf{f}}[p^{\prime} ]={\bf{f}}[p]$ . This is a convex minimization program and thus the functions ${\tilde{H}}_{a,x}({\bf{f}}[p])$ are all convex. However, $\tilde{H}$ is obtained by taking the point-wise minimum $\tilde{H}({\bf{f}}[p])={\min }_{a,x}{\tilde{H}}_{a,x}({\bf{f}}[p])$ of these functions, which will generally not be convex (see [20] for a specific example where this happens). Similarly, the individual functions ${\tilde{G}}_{a,x}$ defined by the interior maximization in (29) are concave, but $\tilde{G}$ will generally not be.

In order to obtain a convex function, we could simply define a function H^* as the convex envelope of the functions ${\tilde{H}}_{a,x}$ (which is also the convex hull of (28)):

$\begin{eqnarray}{H}^{* }({\bf{f}}[p])=\mathop{\min }\limits_{\{{q}_{a,x},{p}_{a,x}\}{}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{q}_{a,x}{\tilde{H}}_{a,x}({\bf{f}}[{p}_{a,x}])\,\\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{q}_{a,x}=1,\\ & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{q}_{a,x}{\bf{f}}[{p}_{a,x}]={\bf{f}}[p].\\ & & \quad {q}_{a,x}\geqslant 0,\ {p}_{a,x}\in { \mathcal Q }\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}\end{eqnarray} \tag{ 30 }$

Similarly, the concave hull of (29) is

$\begin{eqnarray}G({\bf{f}}[p])=\mathop{\max }\limits_{{\{{q}_{a,x},{p}_{a,x}\}}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{q}_{a,x}{\tilde{G}}_{a,x}({\bf{f}}[{p}_{a,x}])\\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{q}_{a,x}=1,\\ & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{q}_{a,x}{\bf{f}}[{p}_{a,x}]={\bf{f}}[p].\\ & & \quad {q}_{a,x}\geqslant 0,\ {p}_{a,x}\in { \mathcal Q }\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}\end{eqnarray} \tag{ 31 }$

Note that it is not true any more that ${H}^{* }=-{\mathrm{log}}_{2}G$ , but it is easy to see that ${H}^{* }\geqslant H=-{\mathrm{log}}_{2}G$ .

Though the function H^* defined through (30) is the tightest function satisfying the constraint of definition 1, it is not easy to deal with numerically because of the presence of the logarithms in the definitions of ${\tilde{H}}_{a,x}$ . We will thus instead use the lower-bound $H=-{\mathrm{log}}_{2}G$ , which obviously satisfies the first condition of definition 1 (since ${H}^{* }\geqslant H$ ) as well as the second one (since G is concave and nonnegative, $H=-{\mathrm{log}}_{2}G$ is convex). The interest is that the optimization problem (31) is simpler to evaluate than (30). Note first that (31) can be re-expressed as follows by absorbing the weights q_{a, x} in the unnormalized quantum behaviors ${\tilde{p}}_{a,x}={q}_{a,x}{p}_{a,x}$ :

$\begin{eqnarray}G({\bf{f}}[p]) & = & \mathop{\max }\limits_{{\{{\tilde{p}}_{a,x}\}}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}}\quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{\tilde{p}}_{a,x}(a| x)\\ & & \mathrm{subject}\ \mathrm{to}\quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\mathrm{Tr}[{\tilde{p}}_{a,x}]=1,\\ & & \,\displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\,{\bf{f}}[{\tilde{p}}_{a,x}]={\bf{f}}[p],\\ & & \,{\tilde{p}}_{a,x}\in \tilde{{ \mathcal Q }}\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}.\end{eqnarray} \tag{ 32 }$

In the above formulation, $\tilde{{ \mathcal Q }}$ denotes the set of unnormalized quantum behaviors, the conditions q_{a, x} ≥ 0 and ${p}_{a,x}\in { \mathcal Q }$ are equivalent to the single condition ${\tilde{p}}_{\lambda }\in \tilde{{ \mathcal Q }}$ , and the condition ${\sum }_{a,x}{q}_{a,x}=1$ becomes ${\sum }_{a,x}\mathrm{Tr}[{\tilde{p}}_{a,x}]=1$ where $\mathrm{Tr}[p]={\sum }_{a}p(a| x)$ is the norm of p (the expression $\mathrm{Tr}[p]$ is independent of the choice of x, and it is equal to 1 for normalized behaviors). Problem (32) cannot be solved in general since the set $\tilde{{ \mathcal Q }}$ is hard to characterize, but it can be replaced with one of its NPA relaxations, in which case it becomes a SDP (since apart from the condition ${\tilde{p}}_{a,x}\in \tilde{{ \mathcal Q }}$ all constraints and the objective function are linear). This will in general only yield an upper bound on the optimal value G (and thus a lower bound on H^*), but this is entirely sufficient for our purpose.

In the case where the set ${{ \mathcal X }}_{r}$ contains a single input x, the optimization problem (32) is essentially identical to the one introduced in [19, 20] and corresponds to maximizing an adversary's average guessing probability over all possible quantum strategies (the difference with [19, 20] is that we characterize the devices through an arbitrary number of Bell expectations ${\bf{f}}[p]$ , rather than a single Bell expression or the full set of probabilities $p(a| x)$ ). The general form (32), however, also applies to the case where ${{ \mathcal X }}_{r}$ contains more than one input and represents one possible way to characterize the randomness of a subset of inputs (other suggestions have been made in [20]; the main reason for the present choice is that it satisfies the mathematical properties that are needed in our n-round analysis). In the following, we refer to the function G given by (32) as the guessing probability of the behavior characterized by ${\bf{f}}[p]$ .

To apply our n-round analysis, we actually do not need to compute the value $H({\bf{f}}[p])=-{\mathrm{log}}_{2}G({\bf{f}}[p])$ for a fixed value ${\bf{f}}[p]$ , but instead its worst-case bound over all quantum behaviors $p\in { \mathcal Q }$ for which ${\bf{f}}[p]\in { \mathcal V }$ . If the confidence region ${ \mathcal V }$ is defined as an interval $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ , as in the preceding section, this can simply be cast as the following optimization problem:

$\begin{eqnarray}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=\mathop{\max }\limits_{{\{{\tilde{p}}_{a,x}\}}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{\tilde{p}}_{a,x}(a| x)\\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\mathrm{Tr}[{\tilde{p}}_{a,x}]=1,\\ & & \quad {\hat{{\bf{f}}}}^{-}\leqslant \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\,{\bf{f}}[{\tilde{p}}_{a,x}]\leqslant {\hat{{\bf{f}}}}^{+},\\ & & \quad {\tilde{p}}_{a,x}\in \tilde{{ \mathcal Q }}\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}.\end{eqnarray} \tag{ 33 }$

Again this problem admits a SDP relaxation through the NPA hierarchy. When $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]\cap {\bf{f}}[{ \mathcal Q }]=\varnothing$ , the optimization problem is infeasible. In accordance with definition 1, in that case we let $G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=1$ or $H([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=0$ .

We conclude this discussion by noting that in specific cases such as that of [3], where ${\bf{f}}$ is a single CHSH expression, the symmetries under relabelings of inputs and outputs imply that the formulations (28)–(31) are equivalent, since (28) is already convex. In such cases, our RB function is the tightest function that satisfies definition 1, by virtue of (28) being the tightest function that satisfies condition 1 of the definition.

In the appendix, we provide more intuition about the above problems by considering their dual formulations. We also discuss in more detail their link with [19, 20].

6. Summary of the protocol

In the two preceding sections, we have specified a way of bounding the randomness within a (1 − ) confidence region ${ \mathcal V }=[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ around the observed statistic ${\bf{f}}[\hat{p}]$ . We can thus apply theorem 1 to bound the min-entropy of the output string obtained after n uses of the device. Processing this raw string with a suitable extractor finally leads to a uniformly random and private string. The resulting protocol is summarized in figure 1.

**Figure 1.** DIRNG protocol following from theorem 1 (section 3) and from the results of sections 4 and 5.
Download figure:
Standard image High-resolution image

As we noted in section 3, one can define a similar protocol based on a sequence of thresholds ${H}_{0}\lt {H}_{1}\,\lt \cdots \lt \,{H}_{{\ell }}$ rather than a single one, introducing intermediate levels of success in the protocol. One advantage of this is that we do not need to determine what threshold we expect the device to reach and risk failing the protocol with high probability if we overestimated ${H}_{\mathrm{thr}}$ . See section III.D of [14] for details.

7. Discussion

We have introduced a family of protocols, each characterized by a choice of t Bell expressions f_α, a randomness-generating input set ${{ \mathcal X }}_{r}$ , and an input distribution π(x). This family contains as a special case the protocols introduced in [3, 14, 15], which correspond to the case where a single Bell expression f is used (t = 1) and where the RB function covers all inputs ( ${{ \mathcal X }}_{r}={ \mathcal X }$ ). The main novelty introduced in the present work is that we can take into account information from more Bell expressions $(t\geqslant 1)$ and can tailor the randomness analysis to a subset of all possible inputs ( ${{ \mathcal X }}_{r}\subseteq { \mathcal X }$ ).

In order to discuss these new aspects, in the following sections we illustrate our protocol on a concrete example. The scenario in this example has two parties (k = 2), two measurement settings per party ( ${ \mathcal X }=\{0,1\}{}^{2}$ ), and two outcome possibilities per measurement ( ${ \mathcal A }=\{0,1\}{}^{2}$ ). We consider a device behavior

$\begin{eqnarray}&&p={{vp}}_{\mathrm{ext}}+(1-v)u\end{eqnarray} \tag{ 34 }$

for v = 0.99, arising from a mixture of white noise u and the extremal quantum behavior p_ext that achieves maximal violation of the ${I}_{1}^{\beta }$ tilted-CHSH inequality as introduced in [21], with $\beta =2\cos (2\theta )/\sqrt{1+{\sin }^{2}(2\theta )}$ for θ = π/8. The tilted-CHSH expression is defined as

$\begin{eqnarray}&&{I}_{1}^{\beta }=\beta \langle {A}_{0}\rangle +\langle {A}_{0}{B}_{0}\rangle +\langle {A}_{0}{B}_{1}\rangle +\langle {A}_{1}{B}_{0}\rangle -\langle {A}_{1}{B}_{1}\rangle .\end{eqnarray} \tag{ 35 }$

The extremal behavior can be achieved by a pair of partially entangled qubits $| \phi \rangle =\cos \theta | 00\rangle +\sin \theta | 11\rangle$ measured with observables

$\begin{eqnarray}&&{A}_{0}={\sigma }_{x},\qquad \qquad \qquad {A}_{1}={\sigma }_{z},\end{eqnarray} \tag{ 36a }$

$\begin{eqnarray}&&{B}_{0}=\cos \mu \,{\sigma }_{z}+\sin \mu \,{\sigma }_{x},\qquad {B}_{1}=\cos \mu \,{\sigma }_{z}-\sin \mu \,{\sigma }_{x},\end{eqnarray} \tag{ 36b }$

with $\tan \mu =\sin 2\theta$ . (Note the difference in notation from [21]: we relabeled the inputs 1 and 2 to 0 and 1, respectively.)

The resulting correlations have the property of giving more predictable outcomes for a subset of measurement inputs. For θ = π/8, the two measurement settings that give more predictable outcomes, x = (0, 0) and x = (0, 1), have a guessing probability of about 0.775 in the ideal (v = 1) case where ${I}_{1}^{\beta }$ is maximally violated. On the other hand, the two measurement settings with less predictable outcomes, x = (1, 0) and x = (1, 1), have guessing probabilities of about 0.496.

In the analysis of randomness, we will consider two choices for the randomness generating subset ${{ \mathcal X }}_{r}$ : the full input set ${{ \mathcal X }}_{r}={ \mathcal X }$ and the more restricted choice ${{ \mathcal X }}_{r}=\{(1,0)\}$ , which is one of the two settings that give less predictable measurements in p_ext.

Furthermore, we will estimate three different Bell expressions, all defined in terms of the following correlators:

$\begin{eqnarray}&&\langle {A}_{{{\rm{x}}}_{1}}\rangle =\displaystyle \sum _{{{\rm{a}}}_{1},{{\rm{a}}}_{2},{{\rm{x}}}_{2}\in \{0,1\}}{(-1)}^{{{\rm{a}}}_{1}}{\pi }_{2}({{\rm{x}}}_{2}| {{\rm{x}}}_{1})p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})\qquad \qquad {{\rm{x}}}_{1}\in \{0,1\},\end{eqnarray} \tag{ 37a }$

$\begin{eqnarray}&&\langle {B}_{{{\rm{x}}}_{2}}\rangle =\displaystyle \sum _{{{\rm{a}}}_{1},{{\rm{a}}}_{2},{{\rm{x}}}_{1}\in \{0,1\}}{(-1)}^{{{\rm{a}}}_{2}}{\pi }_{1}({{\rm{x}}}_{1}| {{\rm{x}}}_{2})p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})\qquad \qquad {{\rm{x}}}_{2}\in \{0,1\},\end{eqnarray} \tag{ 37b }$

$\begin{eqnarray}&&\langle {A}_{{{\rm{x}}}_{1}}{B}_{{{\rm{x}}}_{2}}\rangle =\displaystyle \sum _{{{\rm{a}}}_{1},{{\rm{a}}}_{2}\in \{0,1\}}{(-1)}^{{{\rm{a}}}_{1}+{{\rm{a}}}_{2}}p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})\qquad \qquad {{\rm{x}}}_{1},{{\rm{x}}}_{2}\in \{0,1\}.\,\end{eqnarray} \tag{ 37c }$

The weights ${\pi }_{1}({{\rm{x}}}_{1}| {{\rm{x}}}_{2})$ and ${\pi }_{2}({{\rm{x}}}_{2}| {{\rm{x}}}_{1})$ represent the two conditional local input distributions defined with respect to the joint input distribution π(x₁x₂)¹ .

The expressions we will evaluate are the CHSH expression

$\begin{eqnarray}&&{I}_{\mathrm{chsh}}=\langle {A}_{0}{B}_{0}\rangle +\langle {A}_{0}{B}_{1}\rangle +\langle {A}_{1}{B}_{0}\rangle -\langle {A}_{1}{B}_{1}\rangle ,\end{eqnarray} \tag{ 38 }$

the tilted-CHSH expression ${I}_{1}^{\beta }$ (35) and the 'optimal' expressions for the chosen device behavior (34):

$\begin{eqnarray}{I}_{p} & = & 10.610-1.859\,\langle {A}_{0}\rangle -1.733\,\langle {A}_{1}\rangle +0.499\,\langle {B}_{0}\rangle -2.196\,\langle {B}_{1}\rangle \\ & & -3.109\,\langle {A}_{0}{B}_{0}\rangle -2.945\,\langle {A}_{0}{B}_{1}\rangle -2.610\,\langle {A}_{1}{B}_{0}\rangle +4.343\,\langle {A}_{1}{B}_{1}\rangle \end{eqnarray} \tag{ 39 }$

and

$\begin{eqnarray}{I}_{p}^{\mathrm{all}} & = & 3.131+0.126\,\langle {A}_{0}\rangle -0.428\,(\langle {B}_{0}\rangle +\langle {B}_{1}\rangle )\\ & & -0.673\,(\langle {A}_{0}{B}_{0}\rangle +\langle {A}_{0}{B}_{1}\rangle )-1.002\,(\langle {A}_{1}{B}_{0}\rangle -\langle {A}_{1}{B}_{1}\rangle ).\end{eqnarray} \tag{ 40 }$

These last two Bell expressions are 'optimal' Bell expressions in the following sense. As already observed in [19, 20], the dual of problem (32) (see equation (51) in the appendix), when applied to a device characterized by its full behavior (i.e., ${f}_{a,x}[p]=p(a| x)$ so that ${\bf{f}}[p]=p$ ), finds a Bell expression I_p such that the amount of randomness certified from I_p[p] with respect to the measurement setting x = (1, 0) is equal to the amount of randomness that can be certified from the entire table of probabilities $p(a| x)$ (again, with respect to the measurement x = (1, 0)). Thus, to each device behavior p is associated a single Bell expression I_p that is optimal for p from the point of view of randomness² . Likewise, ${I}_{p}^{\mathrm{all}}$ is defined with respect to all inputs $x\in { \mathcal X }$ rather than the subset {(1, 0)}.

7.1. Bounding randomness for all inputs with one Bell expression ( ${{ \mathcal X }}_{r}={ \mathcal X },t=1$ )

Before discussing the novelties introduced in this work, let us start by briefly reviewing the case t = 1 and ${{ \mathcal X }}_{r}={ \mathcal X }$ , which corresponds to the protocols introduced in [3, 14, 15]. In this case, $\nu (\vec{x})$ , the number of inputs not in ${{ \mathcal X }}_{r}$ , is always equal to zero, and according to theorem 1, the min-entropy of the output string is roughly equal to ${nH}({ \mathcal V })$ . Furthermore, the confidence region ${ \mathcal V }$ reduces to a confidence interval $[{\hat{f}}^{-},{\hat{f}}^{+}]$ around the estimated Bell violation $\hat{f}$ . Usually, the values of $\hat{f}$ that we expect to obtain in the protocol will fall in a region where $H(\hat{f})$ is either monotonically increasing or decreasing with $\hat{f}$ , i.e., the interval is within either the upward- or downward-sloped region of the convex function $H(\hat{f})$ . For instance, if f is the CHSH expression, we may assume that the devices have been designed so that with very high probability $\hat{f}\geqslant 2$ . In that region, $H(\hat{f})$ is indeed increasing with $\hat{f}$ (i.e., the randomness increases for increasing values of the CHSH expression). Let us assume for definiteness that H is increasing (the same kind of reasoning can be done if H is decreasing). Since we are looking for the minimal value of H in the region ${ \mathcal V }$ (see lemma 2), it is then sufficient, as done in [3, 14, 15], to take a one-sided interval $[{\hat{f}}^{-},\infty [$ , and the minimal value of H in the interval will then be $H({\hat{f}}^{-})$ . Considering again our CHSH example, we are interested in a guarantee that the CHSH value is above some threshold, which determines the randomness we can certify in the worst case, but it is useless to know that it is bounded from above (see also discussion at the end of section 4). Taking the definition equation (25) for ${\hat{f}}^{-}$ , we thus get that the min-entropy of the output string is bounded (roughly speaking³ , and up to the $-{\mathrm{log}}_{2}(1/\epsilon ^{\prime} )$ correction) as

$\begin{eqnarray}&&{H}_{\min }\gtrsim {nH}\left(\hat{f}-\gamma \sqrt{\displaystyle \frac{2}{n}\mathrm{ln}\displaystyle \frac{1}{\epsilon }}\right).\end{eqnarray} \tag{ 41 }$

This is precisely the result of [3, 14, 15], whose interpretation is quite intuitive: the min-entropy after n runs is equal to n times the min-entropy for a single run, evaluated on the observed Bell violation $\hat{f}$ offset by a statistical parameter $\mu =\gamma \sqrt{(2/n)\mathrm{ln}(1/\epsilon )}$ . This correction accounts for the fact that even if a device has been built such that it produces a target Bell violation, statistical fluctuations may push the observed violation above what is expected.

This statistical correction depends on the security parameter and decreases with the number of runs n. It also depends on the prefactor γ defined in equation (21). This prefactor depends on the choice of Bell expression f, and also importantly on the input distribution π(x).

As discussed in [3, 14], the input distribution can be suitably chosen to optimize the ratio R_out/R_in of the randomness that is produced to the randomness that is consumed when choosing the inputs. The idea is that if at each run one selects with very high probability a given input x = x*, then the resulting distribution π(x) can be sampled from a small number of initial uniform bits R_in, which should improve the ratio R_out/R_in. However, this will also lower R_out because observations involving the other inputs $x\ne {x}^{* }$ will be less frequent, which will reduce the statistical accuracy. Consider for instance, as in [3, 14], the case where the input x^* is chosen with probability $\pi ({x}^{* })=1-\kappa {n}^{-\delta }$ for some constants κ and δ, and the other inputs are chosen with probability $\pi (x)=\kappa ^{\prime} {n}^{-\delta }$ , where $\kappa ^{\prime} =\kappa /(| { \mathcal X }| -1)$ for normalization. Then the initial randomness R_in required to choose the inputs according to this distribution will be of size $O({n}^{1-\delta }\mathrm{ln}{n}^{\delta })$ (i.e., roughly n times the Shannon entropy of the input distribution, see theorem 2 in [14]). On the other hand, according to equation (41), the output randomness will be of size Ω(n) as long as the statistical correction, of order $\gamma /\sqrt{n}$ , remains bounded by a constant. Since, according to equation (21), $\gamma \simeq 1/({\min }_{x}\pi (x))$ we get that the statistical correction is of order $\gamma /\sqrt{n}=O({n}^{\delta -\tfrac{1}{2}})$ and thus that we should take $\delta \leqslant \tfrac{1}{2}$ . We can thus hope at best a quadratic expansion wherein $O({n}^{\tfrac{1}{2}}\mathrm{ln}{n}^{\tfrac{1}{2}})$ initial bits are consumed and Ω(n) are produced.

Note that the initial randomness for choosing the inputs only needs to be random with respect to the devices, but can be publicly announced to the adversary without compromising the privacy of the output string [14, 27]. One can thus view the above protocols as producing private randomness from public randomness. From this perspective, the 'expansion' efficiency of the protocol is less relevant since the final and initial randomness correspond to different resources that do not necessarily have to be compared on the same footing.

We generated random samples of n input–output pairs $\vec{a},\vec{x}$ from the behavior p corresponding to equation (34) with the following input distribution

$\begin{eqnarray}\pi (x)=\left\{\begin{array}{ll}1-\tfrac{3}{2}{n}^{-1/5}\quad & \mathrm{if}\ x=(1,0),\\ \tfrac{1}{2}{n}^{-1/5}\quad & \mathrm{otherwise}.\end{array}\right.\end{eqnarray} \tag{ 42 }$

Note that as n grows, the input distribution becomes strongly biased to select x = (1,0) most of the time.

We performed this sampling independently for different values of n between 100 and 3 × 10¹⁸. For each value of n, we repeated this sampling 300 times in order to show the variation of our result over several simulations.

The corresponding min-entropy rate bound (that is, (41) divided by n) for = 10⁻⁶ is represented in figure 2 as a function of the number of runs n for different Bell expressions. The curves in this plot and the ones that follow (figures 2–5) show the values for the first simulation out of the 300, and the range of values taken over all 300 simulations is drawn as a shaded area behind each curve. In some instances, usually for high values of n, the area is invisible, which indicates a negligible variation across simulation runs. All curves are obtained by solving the program (33) in its dual form (56) (see appendix) at level 2 of the NPA hierarchy. All optimizations were performed using the Matlab toolboxes Yalmip [28] and SeDuMi [29].

**Figure 2.** Bounds on min-entropy rate H_min/n (equation (43)) with ${{ \mathcal X }}_{r}={ \mathcal X }$ , for a varying number of measurement runs n and for different Bell expressions. Note that for ${{ \mathcal X }}_{r}={ \mathcal X }$ , the error term $\nu (\vec{x})\eta /n$ equals zero. The protocol is simulated for the device behavior p in equation (34) as explained in section 7.1. The dashed and dotted lines represent the min-entropy H(p) associated to the behavior p, respectively for ${{ \mathcal X }}_{r}=\{(1,0)\}$ and ${{ \mathcal X }}_{r}={ \mathcal X }$ . The dashed–dotted line represents the min-entropy $H({I}_{1}^{\beta }[p])$ for ${{ \mathcal X }}_{r}={ \mathcal X }$ given the expectation value of the tilted-CHSH expression ${I}_{1}^{\beta }$ .
Download figure:
Standard image High-resolution image

As we can see, the expression I₁^β gives the worst results. The reason for this is that the inequality is suited to the extremal behavior p_ext rather than the imperfect behavior we simulated (i.e., in equation (34), the case of perfect visibility v = 1 rather than v = 0.99). On the other hand, the expression ${I}_{p}^{\mathrm{all}}$ is tailored to our illustrative behavior (34); it thus gives asymptotically optimal results for ${{ \mathcal X }}_{r}={ \mathcal X }$ according to [19, 20]. There is, however, no reason for it to be optimal for finite, low values of n. Indeed, we observe that the CHSH expression, while not specially suited to the behavior of our device, yields a better performance for values of n lower than 10¹⁰. The CHSH expression appears actually to be a good randomness certificate for all values of n, as it only performs slightly worse than ${I}_{p}^{\mathrm{all}}$ asymptotically.

7.2. Bounding randomness for a subset of all inputs ( ${{ \mathcal X }}_{r}\subseteq { \mathcal X }$ )

Having reviewed the case t = 1 and ${{ \mathcal X }}_{r}={ \mathcal X }$ , we proceed to consider the modifications introduced in this work. We consider first the possibility ${{ \mathcal X }}_{r}\subset { \mathcal X }$ . This means that the RB function H is only required to non-trivially bound the output probability for inputs that are in the set ${{ \mathcal X }}_{r}$ . This is an important feature because for many Bell expressions the randomness that can be certified depends on the input used. For instance, maximal violation of the tilted-CHSH inequalities may imply that the randomness is maximal for one input pair but near zero for another input pair [21]. Using a function which is simultaneously randomness bounding for all inputs $x\in { \mathcal X }$ would then be highly sub-optimal in this case. This aspect is particularly important for photonic implementations of DI protocols: recent photonic Bell tests rely on partially entangled states [30–33], for which the optimal extraction of randomness requires the use of a specific input.

According to our analysis, in the case ${{ \mathcal X }}_{r}\subseteq { \mathcal X }$ , the bound (41) becomes

$\begin{eqnarray}&&{H}_{\min }\gtrsim {nH}\left(\hat{f}-\gamma \sqrt{\displaystyle \frac{2}{n}\mathrm{ln}\displaystyle \frac{1}{\epsilon }}\right)-\nu (\vec{x})\eta ,\end{eqnarray} \tag{ 43 }$

where H is now a RB function for ${{ \mathcal X }}_{r}$ , which will generally yield an improvement over a RB function that is required to be valid for all of ${ \mathcal X }$ . Our analysis, however, introduces a penalty term of the form $\nu (\vec{x})\eta$ , where $\eta \leqslant {\mathrm{log}}_{2}| { \mathcal A }|$ is bounded by a constant and $\nu (\vec{x})$ is the number of inputs not in ${{ \mathcal X }}_{r}$ that have been observed⁴ . To keep this penalty term as low as possible, we should choose inputs in ${{ \mathcal X }}_{\bar{r}}={ \mathcal X }\setminus {{ \mathcal X }}_{r}$ with a low probability. One possibility, compatible with our previous discussion about the introduction of a bias in the input distribution, is to take $\pi (x)=\kappa ^{\prime} {n}^{-\delta }$ for $x\in {{ \mathcal X }}_{\bar{r}}$ , in which case the expected value of $\nu (\vec{x})$ would be $| {{ \mathcal X }}_{\bar{r}}| \kappa ^{\prime} {n}^{1-\delta }$ . This is negligible asymptotically with respect to the main term of (43), which is Ω(n), provided that δ > 0. The input distribution (42) chosen for our numerical example satisfies this requirement.

The corresponding min-entropy rate bound (that is, (43) divided by n) for = 10⁻⁶ and $\eta ={\mathrm{log}}_{2}| { \mathcal A }| =2$ is represented in figure 3 as a function of the number of runs n, for different Bell expressions and for two choices of ${{ \mathcal X }}_{r}$ .

Figure 3 shows that in spite of the penalty term, which quickly vanishes as n grows, the bound on the entropy rate for ${{ \mathcal X }}_{r}=\{(1,0)\}$ for the expressions I₁^β and I_p (the analog of ${I}_{p}^{\mathrm{all}}$ for this restricted ${{ \mathcal X }}_{r}$ ) is significantly better than the values obtained with ${{ \mathcal X }}_{r}={ \mathcal X }$ . This clearly shows the value of using the restricted randomness-generating input set ${{ \mathcal X }}_{r}=\{(1,0)\}$ . In particular, by combining this with the use of the optimal expression I_p corresponding to behavior (34), one can asymptotically reach the theoretical value H(p) of the min-entropy (represented by the dashed line in figure 3). Furthermore, whereas using the ${I}_{1}^{\beta }$ expression with ${{ \mathcal X }}_{r}={ \mathcal X }$ yields worse values than using the CHSH expression, taking ${{ \mathcal X }}_{r}=\{(1,0)\}$ yields an asymptotic entropy rate for the ${I}_{1}^{\beta }$ expression which is higher than using the CHSH expression. As mentioned at the beginning of this subsection, this is because ${I}_{1}^{\beta }$ is not adapted to bound randomness independently of the input (i.e., ${{ \mathcal X }}_{r}={ \mathcal X }$ ) [21].

Similarly to the difference between the curves corresponding to ${I}_{p}^{\mathrm{all}}$ and ${I}_{1}^{\beta }$ for ${{ \mathcal X }}_{r}={ \mathcal X }$ , the difference between the asymptotic entropy rates reached by using the expressions I_p and ${I}_{1}^{\beta }$ for ${{ \mathcal X }}_{r}=\{(1,0)\}$ is caused by the imperfect visibility parameter v = 0.99 in the simulated behavior (34).

Making the right choice of Bell expression and input subset ${{ \mathcal X }}_{r}$ depends not only on the device, but also on the value of n. Indeed, while I_p is an optimal expression for certifying randomness in this specific device with respect to the input subset ${{ \mathcal X }}_{r}=\{(1,0)\}$ , this is only the case asymptotically. For small n, figure 3 suggests that the CHSH expression has a better resistance to statistical fluctuations than the other expressions we considered, regardless of ${{ \mathcal X }}_{r}$ .

Note that we did not attempt to optimize the choice of input distribution and it is possible that a different choice of π(x) would lead to better bounds in figure 3 for the two curves with ${{ \mathcal X }}_{r}=\{(1,0)\}$ .

7.3. Bounding randomness from several Bell expressions (t ≥ 1)

As we have seen, the right choice of a single Bell expression in the analysis of randomness is not straightforward, except for large values of n where I_p becomes optimal. In this regime, it would seem perfectly admissible to perform tests on the device before running the actual RNG protocol, in order to estimate p and use this information to find an 'optimal' Bell expression ${I}_{p^{\prime} }$ as described above, which can afterwards be used in the RNG protocol proper. However, there are disadvantages to this method.

Firstly, to find an expression ${I}_{p^{\prime} }$ that performs comparably to the optimal I_p for the device behavior p, we must know p to a sufficiently high accuracy. In a black box scenario where imperfections cannot be ruled out, this means that a significant number of measurements must be performed in order to evaluate the behavior to great precision. Since the Bell expression needs to be fixed in advance of the protocol, those evaluation rounds cannot be taken from measurement rounds of the protocol and must instead be thrown away. In addition, the behavior of the devices may vary in time, unlike our i.i.d. choice (34), due to drifts in the experimental set-up for example. In that case, one would need to periodically estimate p and rederive the corresponding optimal Bell expression on some subset of the measurement data that needs to be thrown away. Finding an expression ${I}_{p^{\prime} }$ also requires methods of inference of the behavior of the device from a finite sample: indeed, the estimated behavior (18) cannot be used directly to find a candidate ${I}_{\hat{p}}$ , as $\hat{p}$ almost always violates the no-signaling conditions. There exist different approaches to this inference (see for instance [20, 34–37]), so a nontrivial choice must be made.

Finally, even ignoring the problem of estimating the unknown behavior p, the associated data loss, or the drift of p over time, we saw in the previous section and in figure 3 that the choice of a Bell expression is not straightforward when considering different values of n. For example, the asymptotically optimal expression I_p as formulated in [19, 20] is generally not the best for low values of n. There is thus no general method to guide the choice of a Bell expression for a given n.

In order to avoid the above problems associated to the use of a single Bell expression⁵ , we now turn to the second element introduced in this work: the possibility to estimate the randomness from t > 1 Bell expressions, and in particular from the full set of observed frequencies of occurrence $\hat{p}=\{\hat{p}(a| x)\}$ as defined in equation (18).

When we have more than one Bell expression, the bound (43) generalizes to

$\begin{eqnarray}&&{H}_{\min }\gtrsim {nH}([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])-\nu (\vec{x})\eta ,\end{eqnarray} \tag{ 44 }$

where the one-dimensional interval $[{\hat{{\bf{f}}}}^{-},\infty ]$ has simply been replaced with the multidimensional region $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ . As before the limits of the region depend on the security parameter , the constants γ_α, and they become smaller with the number of runs n (see equations (25) and (26)).

Increasing the number of Bell expressions can have both beneficial and detrimental consequences. We can reach an understanding of this by considering the optimization problem (33) that defines $H([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=-{\mathrm{log}}_{2}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])$ . This problem essentially evaluates the randomness of a certain quantum behavior p such that ${\hat{{\bf{f}}}}^{-}\leqslant {\bf{f}}[p]\leqslant {\hat{{\bf{f}}}}^{+}$ . Each vector component of this constraint defines two affine constraints, ${\hat{f}}_{\alpha }^{-}\leqslant {f}_{\alpha }[p]\leqslant {\hat{f}}_{\alpha }^{+}$ , restricting the set of values p can take in the optimization. From a geometrical point of view, for each $\alpha =1,\,\ldots ,\,t$ , this defines two parallel hyperplanes in the space of behaviors, delimiting a region between them which we call a slab. The full constraint ${\hat{{\bf{f}}}}^{-}\leqslant {\bf{f}}[p]\leqslant {\hat{{\bf{f}}}}^{+}$ defines a polytope in the space of behaviors which is the intersection of the t slabs.

The optimization (33) identifies the worst-case bound on randomness for quantum behaviors inside this constraint polytope. We would therefore like to restrict this region as much as possible given a value of the confidence parameter . We thus see that adding Bell expressions is generally beneficial, as it cuts the constraint polytope into a smaller volume. However, as stated in lemma 5, the confidence parameter in the protocol is shared between all 2t parameters ${\epsilon }_{\alpha }^{\pm }$ , as ${\sum }_{\alpha =1}^{t}({\epsilon }_{\alpha }^{+}+{\epsilon }_{\alpha }^{-})=\epsilon$ . A consequence of this is that the more Bell expressions we have, the smaller ${\epsilon }_{\alpha }^{\pm }$ are on average. Since smaller values of ${\epsilon }_{\alpha }^{\pm }$ give thicker slabs (see equation (25)), if we are distributing evenly across all ${\epsilon }_{\alpha }^{\pm }$ for instance, this amounts to a dilation of the constraint polytope in optimization (33). Nevertheless, since the width of the slab depends on ${\epsilon }_{\alpha }^{\pm }$ only through a factor $\sqrt{\mathrm{ln}(1/{\epsilon }_{\alpha }^{\pm })}$ , we will typically find that this negative effect is outweighed by the positives of adding more Bell expressions, and the randomness bound is globally improved.

This improvement is illustrated in figure 4, where we reconsider the numerical example presented in section 7.1. We now use 8 Bell expressions that are equivalent (for quantum behaviors) to the specification of the 16 probabilities $p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})$ (we will explain why we use this choice of 8 Bell expressions later).

**Figure 4.** Bounds on the min-entropy rate ${H}_{\min }/n$ (equation (44)) for a single Bell expression ( ${I}_{\mathrm{chsh}}$ or I_p), and for a 'complete' set of 8 expressions. The protocol is simulated for the device behavior p in equation (34) as explained in section 7.1. The dashed and dotted lines represent the min-entropy H(p) associated to the behavior p, respectively for ${{ \mathcal X }}_{r}=\{(1,0)\}$ and ${{ \mathcal X }}_{r}={ \mathcal X }$ .
Download figure:
Standard image High-resolution image

In the case ${{ \mathcal X }}_{r}=\{(1,0)\}$ , we see that the randomness that we can extract with multiple Bell expressions is similar to the use of the optimal expression I_p alone for about n ≥ 10¹⁰ runs, but much better for smaller numbers of runs. In fact, the improvement is even better in practice, even in regions where the use of 8 Bell expressions gives the same rate as I_p, because in plotting the rate for I_p we knew the exact behavior p from which the measurement runs are sampled. In a real experiment (in particular with drifts over time), we would instead need to infer the behavior p at regular intervals from auxiliary measurements and throw away the corresponding data. In contrast, the method based on the full set of observed frequencies achieves the same or better key extraction without throwing away any data.

In the case ${{ \mathcal X }}_{r}={ \mathcal X }$ , the use of 8 expressions is comparable to that of CHSH and, as with I_p above, it outperforms the ${I}_{p}^{\mathrm{all}}$ expression (not shown in figure 4; see figure 2 or 3). For small values of n, the CHSH expression keeps a small advantage. This can be understood from the fact that the CHSH expression itself is part of the set of 8 expressions that we used in figure 4. The difference between the two curves therefore results from a trade-off between a better estimation of randomnes from more expressions and the negative effect of wider margins ${\epsilon }_{\alpha }^{\pm }$ in the confidence region.

In the remainder of this section, we discuss in more detail how to choose a good set of Bell expressions for the protocol. For this, let us start by considering our protocol when $n\to \infty$ . In this asymptotic limit, the interval $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ narrows down towards the point $\hat{{\bf{f}}}={\bf{f}}[\hat{p}(a| x)]$ , which is just the value of the t Bell expressions ${\bf{f}}$ computed on the experimentally observed frequencies $\hat{p}(a| x)$ . If the bias towards inputs in ${{ \mathcal X }}_{r}$ is appropriately chosen (as discussed previously), then the relative contribution of the penalty term vanishes as $n\to \infty$ and the bound (44) becomes in the asymptotic limit, up to sublinear terms,

$\begin{eqnarray}&&{H}_{\min }\gtrsim {nH}(\hat{{\bf{f}}}).\end{eqnarray} \tag{ 45 }$

Furthermore, in the case where the device behaves in an i.i.d. way according to a behavior p, then, asymptotically, $\hat{{\bf{f}}}\to {\bf{f}}[p]$ . If one chooses enough Bell expressions as to fully characterize the behavior of the devices (for instance, by using an estimator for each probability $p(a| x)$ ), $\hat{{\bf{f}}}$ thus becomes equivalent to the knowledge of p and the above bound converges to the maximal min-entropy bound one can obtain from p given ${{ \mathcal X }}_{r}$ , as characterized in [19, 20]. In this sense, and as seen in figure 4, our protocol is asymptotically optimal.

Note that there are different sets of Bell estimators that are asymptotically equivalent to the knowledge of the full set of probabilities $p(a| x)$ . For instance in a bipartite Bell experiment with two inputs and two outputs there are 16 probabilities $p(a| x)=p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})$ with ${{\rm{a}}}_{1},{{\rm{a}}}_{2},{{\rm{x}}}_{1},{{\rm{x}}}_{2}\in \{0,1\}$ and thus 16 associated Bell expressions ${e}_{1},\,\ldots ,\,{e}_{16}$ defined by ${e}_{\alpha }[p]=p({{\rm{a}}}_{1}{{\rm{a}}}_{2}| {{\rm{x}}}_{1}{{\rm{x}}}_{2})$ , with one value of α for each of the possible values of $({{\rm{a}}}_{1},{{\rm{a}}}_{2},{{\rm{x}}}_{1},{{\rm{x}}}_{2})$ . But since the probabilities $p(a| x)$ satisfy normalization and no-signaling, they are uniquely specified by the 8 correlators of equation (37), which constitute 8 Bell expressions ${g}_{1},\,\ldots ,\,{g}_{8}$ , where g₁ and g₂ are the first party's two marginal correlators $\langle {A}_{{{\rm{x}}}_{1}}\rangle ,{g}_{3}$ and g₄ are the second party's $\langle {B}_{{{\rm{x}}}_{2}}\rangle$ , and ${g}_{4},\ldots ,{g}_{8}$ are the four bipartite correlators $\langle {A}_{{{\rm{x}}}_{1}}{B}_{{{\rm{x}}}_{2}}\rangle$ .

Alternatively, the probabilities are also equivalent to the 8 expressions ${h}_{1},\,\ldots ,\,{h}_{8}$ with h_α = g_α for $\alpha =1,\,\ldots ,\,4$ , and h_α for $\alpha =5,\,\ldots ,\,8$ are four linearly independent permutations of the CHSH expression, generalizing (38):

$\begin{eqnarray}&&{I}_{\mathrm{chsh}}^{{{\rm{y}}}_{1},{{\rm{y}}}_{2}}=\displaystyle \sum _{{{\rm{x}}}_{1},\ {{\rm{x}}}_{2}\in \{0,1\}}{(-1)}^{({{\rm{x}}}_{1}+{{\rm{y}}}_{1})({{\rm{x}}}_{2}+{{\rm{y}}}_{2})}\langle {A}_{{{\rm{x}}}_{1}}{B}_{{{\rm{x}}}_{2}}\rangle \qquad {{\rm{y}}}_{1},{{\rm{y}}}_{2}\in \{0,1\}.\end{eqnarray} \tag{ 46 }$

As we increase the number of rounds, all these possible choices become equivalent, since the intervals $[{\hat{{\bf{e}}}}^{-},{\hat{{\bf{e}}}}^{+}],[{\hat{{\bf{g}}}}^{-},{\hat{{\bf{g}}}}^{+}],[{\hat{{\bf{h}}}}^{-},{\hat{{\bf{h}}}}^{+}]$ define constraint polytopes in the space of behaviors p that asymptotically intersect the quantum set at the same unique point. However, the choice of one set of estimators over another could make a difference for finite n.

Generally speaking, when choosing which Bell expressions to use for a fixed number t, we may prefer that as many of them as possible be linearly independent. Consider t − 1 Bell expressions and their associated slabs, which define a constraint polytope. In the absence of any meaningful information concerning the behavior of the objective function of (33) within its feasible set, the choice of a t-th Bell expression should be dictated by the resulting reduction of the constraint polytope: cutting a large volume out is more likely to reduce the maximum of (33). As n grows large and the slabs grow thinner, the best way to reduce this volume is to choose a Bell expression that is linearly independent from the t − 1 previous ones, if possible. We can easily understand this in the asymptotic limit: as we mentioned above, the optimization converges to $H(\hat{{\bf{f}}})$ , and with enough linearly independent Bell expressions, ${\bf{f}}[\hat{p}]$ uniquely defines $\hat{p}$ , hence $H(\hat{{\bf{f}}})=H(\hat{p})$ . At this point, adding more expressions only makes ${\bf{f}}[\hat{p}]$ a more redundant definition of $\hat{p}$ , which does not improve the randomness bound. On the other hand, with too few independent Bell expressions, $\hat{{\bf{f}}}={\bf{f}}[\hat{p}]$ is compatible with many values of $\hat{p}$ , and the worst value is what ends up determining $H(\hat{{\bf{f}}})$ .

In addition, we see that there is no need for Bell expressions that are purely signaling, i.e., that have a constant value for all no-signaling behaviors p. Indeed, since the feasible region of (33) is defined by the intersection of the slabs and the quantum set, constraints deriving from purely signaling expressions are trivial in this region, and therefore do not contribute to improve the randomness bound.

Combining these two conclusions also indicates that we should avoid Bell expressions that are only linearly dependent up to purely signaling terms. This implies for instance that the sets ${g}_{1},\,\ldots ,\,{g}_{8}$ or ${h}_{1},\,\ldots ,\,{h}_{8}$ should be preferred over ${e}_{1},\,\ldots ,\,{e}_{16}$ . This is indeed what we find, as illustrated in figure 5.

**Figure 5.** Bounds on the min-entropy rate H_min/n (equation (44)) with ${{ \mathcal X }}_{r}=\{(1,0)\}$ for three (asymptotically) equivalent sets of Bell expressions. The protocol is simulated for the device behavior p in equation (34) as explained in section 7.1. The dashed line represents the min-entropy H(p) associated to the behavior p for ${{ \mathcal X }}_{r}=\{(1,0)\}$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Bounds on the min-entropy rate H_min/n (equation (44)) with ${{ \mathcal X }}_{r}=\{(1,0)\}$ for three (asymptotically) equivalent sets of Bell expressions. The protocol is simulated for the device behavior p in equation (34) as explained in section 7.1. The dashed line represents the min-entropy H(p) associated to the behavior p for ${{ \mathcal X }}_{r}=\{(1,0)\}$ .
Download figure:
Standard image High-resolution image

Note that the sets ${g}_{1},\,\ldots ,\,{g}_{8}$ and ${h}_{1},\,\ldots ,\,{h}_{8}$ only differ by a linear transformation, but the second set yields better results for the same (finite) number of rounds n. With respect to optimization (33), this means that the feasible set for ${h}_{1},\,\ldots ,\,{h}_{8}$ excluded the optimum obtained for ${g}_{1},\,\ldots ,\,{g}_{8}$ . This might be related to the fact that in this scenario of two parties with two inputs and two outputs, the four versions of the CHSH inequalities constitute the facets that separate local from nonlocal behaviors, and they might therefore serve as better measures of nonlocality and randomness than the correlators $\langle {A}_{{{\rm{x}}}_{1}}{B}_{{{\rm{x}}}_{2}}\rangle$ .

This phenomenon can be visualized in a simpler instance where two Bell expressions are used, namely, ${I}_{\mathrm{chsh}}^{0,0}$ and ${I}_{\mathrm{chsh}}^{0,1}$ . In figure 6, we represent the RB function $G({I}_{\mathrm{chsh}}^{0,0}[p],{I}_{\mathrm{chsh}}^{0,1}[p])$ with respect to a single input ${{ \mathcal X }}_{r}=\{(0,0)\}$ , as defined in equation (31). The figure shows the evolution of the randomness bound, from a trivial value of 1 for values of ${I}_{\mathrm{chsh}}^{0,0}[p]$ and ${I}_{\mathrm{chsh}}^{0,1}[p]$ compatible with a local hidden variable model, to nontrivial values up to approximately 0.32 reached at extremal points. The variation of the RB function along these two axes is not trivial, but as we can see, the gradient mostly points along the directions of the CHSH axes, with the exception of the regions where the local and quantum boundaries meet. To minimize this variation within a confidence region of fixed square shape in this plane, it is best to rotate the interval so that its sides are aligned with the gradient. Aligning the confidence region with the two CHSH axes is therefore a sensible choice in this case.

**Figure 6.** RB function G(f₁, f₂) according to (31) for ${{ \mathcal X }}_{r}=\{(0,0)\}$ , with ${f}_{1}={I}_{\mathrm{chsh}}^{0,0}$ and ${f}_{2}={I}_{\mathrm{chsh}}^{0,1}$ , as defined in (46). The set of local behaviors projected onto this plane defines a square region, represented with a dotted line. The quantum set ${ \mathcal Q }$ projects into the circular region ${({I}_{\mathrm{chsh}}^{0,0}[p])}^{2}+{({I}_{\mathrm{chsh}}^{0,1}[p])}^{2}\leqslant 8$ [38]. Note that while this RB function is centrally symmetric, it is not symmetric under reflection through either CHSH axis. The function was computed by solving the optimization program (31) at level 2 of the NPA hierarchy.
Download figure:
Standard image High-resolution image

**Figure 6.** RB function G(f₁, f₂) according to (31) for ${{ \mathcal X }}_{r}=\{(0,0)\}$ , with ${f}_{1}={I}_{\mathrm{chsh}}^{0,0}$ and ${f}_{2}={I}_{\mathrm{chsh}}^{0,1}$ , as defined in (46). The set of local behaviors projected onto this plane defines a square region, represented with a dotted line. The quantum set ${ \mathcal Q }$ projects into the circular region ${({I}_{\mathrm{chsh}}^{0,0}[p])}^{2}+{({I}_{\mathrm{chsh}}^{0,1}[p])}^{2}\leqslant 8$ [38]. Note that while this RB function is centrally symmetric, it is not symmetric under reflection through either CHSH axis. The function was computed by solving the optimization program (31) at level 2 of the NPA hierarchy.
Download figure:
Standard image High-resolution image

8. Conclusion and open questions

In recent years several protocols for generating randomness in a DI way have been introduced [2–17], with varying degrees of security, rate of randomness expansion, or noise robustness. They all, however, share the feature that they rely on the estimation of a single Bell expression.

As was shown in [19, 20], in an idealized setting in which the behavior of the devices is known and fixed, more randomness can in principle be certified if one takes into account the violation of several Bell inequalities, or, even better, the full set of probabilities characterizing the devices' behavior.

We have shown here that a similar reasoning applies in the context of an actual DIRNG protocol, where randomness is directly certified from experimental data. Specifically, we have combined the analysis of [19, 20] with the protocol introduced in [3, 14, 15], which generates certified randomness against an adversary with classical side information. We have in this way obtained a family of DIRNG protocols which rely on the estimation of a choice of t ≥ 1 Bell expressions. This includes the special case where the randomness is directly certified from the knowledge of the relative frequencies of occurrence of the outputs given the inputs. Asymptotically, for a given ${{ \mathcal X }}_{r}$ , this results in an optimal generation of randomness from experimental data (as measured by the min-entropy) without having to assume beforehand that the devices violate a specific Bell inequality and without the need to infer the device behavior from preliminary measurements. Furthermore, in the non-asymptotic case, the choice of an optimal Bell expression is ambiguous even if the device behavior is perfectly characterized. Our method proposes a way of bypassing this problem by directly evaluating the randomness from the observed output frequencies.

Our protocol also provides a way of treating the case where the randomness of the outcomes of the devices is much higher for some inputs than for others. This happens in particular when generating randomness from partially entangled states [21], which are used in present photonic loophole-free Bell experiments [32, 33]. Our analysis essentially amounts to consider that all the randomness has been generated from the optimal set of inputs, but corrected by a penalty term that is proportional to the number of events corresponding to non-optimal inputs. By biasing the choices of inputs towards the optimal ones, one can make this penalty term negligible asymptotically. However, for small numbers of measurement runs, we have seen that this procedure may be less efficient than an analysis based on a Bell expression that treats all inputs on the same footing, like the CHSH expression. It is possible that our way of treating non-optimal inputs could be improved, leading to more efficient protocols for small numbers of measurement runs.

Our result could be generalized in several ways. First, how to prove security against quantum side information when several Bell expressions or the full set of data generated in the experiment are taken into account? This is not a priori easy to answer since the analysis of most DIRNG protocols secure against quantum side information rely on Bell expressions with a particular structure [6, 8, 11] or, when they allow for arbitrary Bell expressions, do not optimally take into account the observed level of violation [17]. Second, we based the statistical analysis on the Azuma–Hoeffding inequality, but alternative deviation theorems [39] could be adapted to our setting. Our attempt at improving our bounds using a tighter concentration inequality from Hoeffding [40] (called McDiarmid's inequality in [39] after [41]) produced no visible difference in the plots. Another alternative, Bentkus' inequality [39, 42], involves discrete summations with around n terms, which would grow too large in our simulations to be used as-is. Finally, we note that since DIRNG is not the only task where information from several Bell estimators can be exploited, a similar approach could be developed for other DI problems, such as DIQKD.

Acknowledgments

We acknowledge financial support from the Fondation Wiener-Anspach, the Interuniversity Attraction Poles program of the Belgian Science Policy Office under the grant IAP P7-35 photonics@be, and the Fonds de la Recherche Scientifique F.R.S.-FNRS (Belgium). ONS acknowledges financial support through a grant of the Fonds pour la Formation à la Recherche dans l'Industrie et l'Agriculture (FRIA). CB acknowledges funding from the F.R.S.-FNRS through a Research Fellowship. JS was a postdoctoral researcher of the F.R.S.-FNRS at the time this research was carried out. SP is a Research Associate of the F.R.S.-FNRS. We would like to thank the anonymous referees for the time dedicated to reviewing our manuscript and for their useful comments. All plots were produced using TikZ [43] and pgfplots [44]. The heat map of figure 6 was produced using Matplotlib [45].

Appendix

The problems introduced in section 5 are typical instances of conic programming. A conic program is an optimization problem of the form

$\begin{eqnarray}\mathop{\mathrm{maximize}}\limits_{\{{x}_{i}\}} & & \quad \displaystyle \sum _{i=1}^{k}\langle {c}_{i},{x}_{i}\rangle \\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{i=1}^{k}{A}_{i}{x}_{i}=b\\ & & \quad {x}_{i}\in {K}_{i}\qquad i=1,\,\ldots ,\,k,\end{eqnarray} \tag{ 47 }$

where $b\in {{\mathbb{R}}}^{m},{x}_{i},{c}_{i}\in {{\mathbb{R}}}^{{n}_{i}},{A}_{i}\in {{\mathbb{R}}}^{m\times {n}_{i}}$ , and K_i are closed convex cones ( $i=1,\,\ldots ,\,k$ ). Here $\langle c,x\rangle$ denotes the usual scalar product $\langle c,x\rangle ={\sum }_{j=1}^{n}{c}_{j}{x}_{j}$ between vectors in ${{\mathbb{R}}}^{n}$ and a set K is a convex cone if q₁p₁ + q₂p₂ belongs to K for any nonnegative scalars q₁, q₂ ≥ 0 and any p₁, p₂ in K.

In other words, an optimization problem is a conic program if it involves the maximization of a linear function of the optimization variables given linear constraints on such variables and the condition that they belong to a certain family of cones. Many familiar convex optimization problems, including linear and semidefinite programming, are instances of conic programming.

The problems (32) and (33), and their NPA relaxations, are also conic programs. Indeed, the set of unnormalized behaviors $\tilde{{ \mathcal Q }}$ and all of its NPA relaxations are clearly cones and all other constraints in (32) and (33) are linear in the variables ${\tilde{p}}_{a,x}$ . As we said earlier, the NPA relaxations of these problems actually correspond to a specific type of conic programs, i.e., SDP.

Any conic program in the primal form (47) admits a dual formulation given by [46]

$\begin{eqnarray}\mathop{\mathrm{minimize}}\limits_{y} & & \quad \langle b,y\rangle \\ \mathrm{subject}\ \mathrm{to} & & \quad {A}_{i}^{T}y-{c}_{i}\in {K}_{i}^{* }\qquad i=1,\,\ldots ,\,k,\end{eqnarray} \tag{ 48 }$

where $y\in {{\mathbb{R}}}^{m}$ and ${K}_{i}^{* }=\{{z}_{i}\in {{\mathbb{R}}}^{{n}_{i}}:\langle {z}_{i},{x}_{i}\rangle \geqslant 0,\forall {x}_{i}\in {K}_{i}\}$ is the dual cone of K_i. Let α denote the value of the primal program (47) and β the value of the dual program (48). Then the strong duality theorem of conic programming says that if the primal program is feasible, has finite value α, and has points ${\tilde{x}}_{i}\in \mathrm{int}({K}_{i})$ such that ${\sum }_{i=1}^{k}{A}_{i}{\tilde{x}}_{i}=b$ , then the dual program is feasible and has finite value β = α [46].

Let us determine the dual of the optimization problem (32). From now on, we take generically $\tilde{{ \mathcal Q }}$ to denote the cone of unnormalized quantum behaviors or any of its NPA relaxations, as the analysis is identical in both cases.

Let us first rewrite (32) in a form similar to (47). For this, define ${c}_{{ax}}\in {{\mathbb{R}}}^{| { \mathcal A }| \times | { \mathcal X }| }$ as the vector with components ${c}_{{ax}}(a^{\prime} ,x^{\prime} )={\delta }_{{aa}^{\prime} }{\delta }_{{xx}^{\prime} }$ . Thus $\langle {c}_{{ax}},p\rangle =p(a| x)$ . Let $b=(1,{f}_{1}[p],\,\ldots ,\,{f}_{t}[p])\in {{\mathbb{R}}}^{1+t}$ . Let u be any Bell expression such that $u[p]=\mathrm{Tr}[p]$ for all no-signaling behaviors p, for instance $u(a,x)={\delta }_{x,{x}_{0}}$ for some input ${x}_{0}\in { \mathcal X }$ . Let A_ax be matrices in ${{\mathbb{R}}}^{1+t,| { \mathcal A }| \times | {{ \mathcal X }}_{r}| }$ with components ${[{A}_{{ax}}]}_{1,a^{\prime} x^{\prime} }=u(a^{\prime} ,x^{\prime} )$ and ${[{A}_{{ax}}]}_{1+\alpha ,a^{\prime} x^{\prime} }={f}_{\alpha }(a^{\prime} ,x^{\prime} )$ for $\alpha =1,\,\ldots ,\,t$ . Then

$\begin{eqnarray}G({\bf{f}}[p])=\mathop{\max }\limits_{{\{{\tilde{p}}_{a,x}\}}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\langle {c}_{{ax}},{\tilde{p}}_{a,x}\rangle \\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{A}_{{ax}}{\tilde{p}}_{a,x}=b\\ & & \,{\tilde{p}}_{a,x}\in \tilde{{ \mathcal Q }}\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}.\end{eqnarray} \tag{ 49 }$

The dual is then readily given as

$\begin{eqnarray}G({\bf{f}}[p])\,= & \mathop{\min }\limits_{y\in {{\mathbb{R}}}^{1+t}} & \quad \langle b,y\rangle \\ & \mathrm{subject}\ \mathrm{to} & \quad {A}_{{ax}}^{T}y-{c}_{{ax}}\in {\tilde{{ \mathcal Q }}}^{* }\qquad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r},\end{eqnarray} \tag{ 50 }$

where ${\tilde{{ \mathcal Q }}}^{* }$ is the dual cone of $\tilde{{ \mathcal Q }}$ , that is, the set ${\tilde{{ \mathcal Q }}}^{* }=\{c\in {{\mathbb{R}}}^{| { \mathcal A }| \times | { \mathcal X }| }\,:\langle c,\tilde{p}\rangle \geqslant 0,\forall \tilde{p}\in \tilde{{ \mathcal Q }}\}$ . Note that this dual cone can be identified with the set of Tsirelson inequalities for normalized behaviors, that is, the set ${{ \mathcal Q }}^{* }=\{({d}_{0},d)\in {{\mathbb{R}}}^{1+| { \mathcal A }| \times | { \mathcal X }| }\,:\langle d,p\rangle \leqslant {d}_{0},\forall p\in { \mathcal Q }\}$ . Indeed, note that by normalization $\langle u,p\rangle =1$ and thus an inequality $\langle d,p\rangle \leqslant {d}_{0}$ valid for $p\in { \mathcal Q }$ can always be rewritten in the form $\langle ({d}_{0}u-d),p\rangle \geqslant 0$ , hence in the form $\langle c,p\rangle \geqslant 0$ for some suitable c. But an inequality $\langle c,p\rangle \geqslant 0$ is clearly valid for ${ \mathcal Q }$ if and only if it is valid for $\tilde{{ \mathcal Q }}$ .

Using the explicit form of b, A_ax, and c_ax and the above interpretation of $\tilde{{{ \mathcal Q }}^{* }}$ , we can rewrite the dual (50) as

$\begin{eqnarray}G({\bf{f}}[p])\,= & \mathop{\min }\limits_{y\in {{\mathbb{R}}}^{1+t}} & \quad {y}_{0}+\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }{f}_{\alpha }[p]\\ & \mathrm{subject}\ \mathrm{to} & \quad p^{\prime} (a| x)\leqslant {y}_{0}+\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }{f}_{\alpha }[p^{\prime} ]\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r},\forall p^{\prime} \in { \mathcal Q }.\end{eqnarray} \tag{ 51 }$

The interpretation of this dual problem is immediate. We are seeking an affine function ${y}_{0}+{\sum }_{a}{y}_{\alpha }{f}_{\alpha }[p]$ satisfying the condition 1 in definition 1—this is the constraint line of the optimization problem (51). Since the function is affine it is also concave and the condition 2 in definition 1 is also trivially satisfied. We are then searching for the best function of this form, i.e., the one that gives the lowest possible value (i.e., the most randomness) for the given Bell expectations ${\bf{f}}[p]$ . Note that considering an affine function is actually not a restriction because the optimal function $G({\bf{f}}[p])$ is concave, and therefore it is equivalent to the envelope of its tangents, which are affine functions of ${\bf{f}}[p]$ .

From the point of view of implementations, the condition that $p^{\prime} (a| x)\leqslant {y}_{0}+{\sum }_{\alpha }{y}_{\alpha }{f}_{\alpha }[p^{\prime} ]$ is valid for ${ \mathcal Q }$ can be cast as a search for a sum-of-squares decomposition of this inequality. When ${ \mathcal Q }$ is one of the NPA relaxations of the quantum set, this is equivalent to solving a certain SDP, which turns out to be, as expected, the dual of the original NPA problem.

Let us now determine the dual of the noisy problem (33), which we can rewrite as

$\begin{eqnarray}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=\mathop{\max }\limits_{\displaystyle \genfrac{}{}{0em}{}{{\{{\tilde{p}}_{a,x}\}}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}}{{\boldsymbol{\tau }},{\boldsymbol{\kappa }}\in {{\mathbb{R}}}^{t}}} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{\tilde{p}}_{a,x}(a| x)\\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\mathrm{Tr}[{\tilde{p}}_{a,x}]=1\\ & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\,{\bf{f}}[{\tilde{p}}_{a,x}]+{\boldsymbol{\tau }}={\hat{{\bf{f}}}}^{+}\\ & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\,{\bf{f}}[{\tilde{p}}_{a,x}]-{\boldsymbol{\kappa }}={\hat{{\bf{f}}}}^{-}\\ & & \quad {\tilde{p}}_{a,x}\in { \mathcal Q }\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}\\ & & \quad {\boldsymbol{\tau }},{\boldsymbol{\kappa }}\in {{\mathbb{R}}}_{+}^{t}.\end{eqnarray} \tag{ 52 }$

Again, let us rewrite in a form similar to (47). For this, define $b\in {{\mathbb{R}}}^{1+2t}=(1,{\hat{f}}_{1}^{+},\,\ldots ,\,{\hat{f}}_{t}^{+},{\hat{f}}_{1}^{-},\,\ldots ,\,{\hat{f}}_{t}^{-})$ . Let A_ax be matrices in ${{\mathbb{R}}}^{1+2t,| { \mathcal A }| \times | {{ \mathcal X }}_{r}| }$ with components ${[{A}_{{ax}}]}_{1,a^{\prime} x^{\prime} }=u(a^{\prime} ,x^{\prime} )$ and ${[{A}_{{ax}}]}_{1+\alpha ,a^{\prime} x^{\prime} }={f}_{\alpha }(a^{\prime} ,x^{\prime} )$ as above, and ${[{A}_{{ax}}]}_{1+t+\alpha ,a^{\prime} x^{\prime} }={f}_{\alpha }(a^{\prime} ,x^{\prime} )$ , for $\alpha =1,\,\ldots ,\,t$ . Additionally, we define two matrices A₁, A₂ in ${{\mathbb{R}}}^{1+2t,t}$ through ${[{A}_{1}]}_{1,\alpha }=0,{[{A}_{1}]}_{1+\alpha ,\alpha ^{\prime} }={\delta }_{\alpha ,\alpha ^{\prime} },{[{A}_{1}]}_{1+t+\alpha ,\alpha ^{\prime} }=0$ and ${[{A}_{2}]}_{1,\alpha }=0,{[{A}_{2}]}_{1+\alpha ,\alpha ^{\prime} }=0,{[{A}_{2}]}_{1+t+\alpha ,\alpha ^{\prime} }=-{\delta }_{\alpha ,\alpha ^{\prime} }$ for $\alpha ,\alpha ^{\prime} \in \{1,\,\ldots ,\,t\}$ . Then

$\begin{eqnarray}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=\mathop{\max }\limits_{\displaystyle \genfrac{}{}{0em}{}{{\{{\tilde{p}}_{a,x}\}}_{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}}{{\boldsymbol{\tau }},{\boldsymbol{\kappa }}\in {{\mathbb{R}}}^{t}}} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}\langle {c}_{{ax}},{\tilde{p}}_{a,x}\rangle \\ \mathrm{subject}\ \mathrm{to} & & \quad \displaystyle \sum _{a\in { \mathcal A },x\in {{ \mathcal X }}_{r}}{A}_{{ax}}{\tilde{p}}_{a,x}+{A}_{1}{\boldsymbol{\tau }}+{A}_{2}{\boldsymbol{\kappa }}=b\\ & & \quad {\tilde{p}}_{a,x}\in { \mathcal Q }\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}\\ & & \quad {\boldsymbol{\tau }},{\boldsymbol{\kappa }}\in {{\mathbb{R}}}_{+}^{t}.\end{eqnarray} \tag{ 53 }$

The dual is then readily given as

$\begin{eqnarray}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])\,= & \mathop{\min }\limits_{v\in {{\mathbb{R}}}^{1+2t}} & \quad \langle b,v\rangle \\ & \mathrm{subject}\ \mathrm{to} & \quad {A}_{{ax}}^{T}v-{c}_{{ax}}\in {\tilde{{ \mathcal Q }}}^{* }\qquad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}\\ & & \quad {A}_{1}^{T}v,{A}_{2}^{T}v\in {{\mathbb{R}}}_{+}^{t}\end{eqnarray} \tag{ 54 }$

or, writing $v=({v}_{0},{v}_{1},\,\ldots ,\,{v}_{t},{v}_{1}^{{\prime} },\,\ldots ,\,{v}_{t}^{{\prime} })$

$\begin{eqnarray}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])\,= & \mathop{\min }\limits_{v\in {{\mathbb{R}}}^{1+2t}} & \quad {v}_{0}+\displaystyle \sum _{\alpha =1}^{t}{v}_{\alpha }{\hat{f}}_{\alpha }^{+}+\displaystyle \sum _{\alpha =1}^{t}{v}_{\alpha }^{{\prime} }{\hat{f}}_{\alpha }^{-}\\ & \mathrm{subject}\ \mathrm{to} & \quad {v}_{0}+\displaystyle \sum _{\alpha =1}^{t}({v}_{\alpha }+{v}_{\alpha }^{{\prime} }){f}_{\alpha }-{c}_{{ax}}\in {\tilde{{ \mathcal Q }}}^{* }\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r}\\ & & \quad {v}_{\alpha }\geqslant 0,{v}_{\alpha }^{{\prime} }\leqslant 0\quad \forall \alpha =1,\,\ldots ,\,t.\end{eqnarray} \tag{ 55 }$

Defining $({y}_{0},{y}_{\alpha }^{+},{y}_{\alpha }^{-})=({v}_{0},{v}_{\alpha },-{v}_{\alpha }^{{\prime} })$ and ${y}_{\alpha }={v}_{\alpha }+{v}_{\alpha }^{{\prime} }={y}_{\alpha }^{+}-{y}_{\alpha }^{-}$ , we can reformulate this as

$\begin{eqnarray}G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=\mathop{\min }\limits_{({y}_{0},{y}^{+},{y}^{-})\in {{\mathbb{R}}}^{1+2t}} & & \quad {y}_{0}+\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }^{+}{\hat{f}}_{\alpha }^{+}-\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }^{-}{\hat{f}}_{\alpha }^{-}\\ \mathrm{subject}\ \mathrm{to} & & \quad p^{\prime} (a| x)\leqslant {y}_{0}+\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }{f}_{\alpha }[p^{\prime} ]\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r},\forall p^{\prime} \in { \mathcal Q }\\ & & \quad {y}_{\alpha }={y}_{\alpha }^{+}-{y}_{\alpha }^{-},\quad {y}_{\alpha }^{+},{y}_{\alpha }^{-}\geqslant 0\quad \forall \alpha =1,\,\ldots ,\,t.\end{eqnarray} \tag{ 56 }$

This dual formulation is analogous to (51) and can be understood in the same way. The difference lies in the objective function.

Although (56) is the best way to express the optimization for a numerical implementation, we may further simplify its expression by noting that, for a fixed value of y₀ and {y_α}, the objective function is minimized when ${y}_{\alpha }^{+}\,=\,{y}_{\alpha }$ if y_α ≥ 0 and ${y}_{\alpha }^{-}\,=\,-{y}_{\alpha }$ if y_α ≤ 0 (in short, ${y}_{\alpha }^{\pm }\,=\,(| {y}_{\alpha }| \pm {y}_{\alpha })/2$ ). We can then write the objective function as

$\begin{eqnarray}&&{y}_{0}+\displaystyle \sum _{\alpha :{y}_{\alpha }\geqslant 0}{y}_{\alpha }{\hat{f}}_{\alpha }^{+}+\displaystyle \sum _{\alpha :{y}_{\alpha }\lt 0}{y}_{\alpha }{\hat{f}}_{\alpha }^{-},\end{eqnarray} \tag{ 57 }$

and further as the following maximum:

$\begin{eqnarray}&&\mathop{\max }\limits_{{\bf{f}}\in [{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]}{y}_{0}+\displaystyle \sum _{\alpha }{y}_{\alpha }{f}_{\alpha }.\end{eqnarray} \tag{ 58 }$

Indeed, the maximum will clearly be attained at one of the extreme points of the region $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ , whose components are of the form ${f}_{\alpha }^{\pm }$ . If y_α ≥ 0, the maximum will be attained when f_α is equal to ${\hat{f}}_{\alpha }^{+}$ , if y_α ≤ 0, when it is equal to ${\hat{f}}_{\alpha }^{-}$ . All in all, (56) can thus be rewritten as the nested optimization

$\begin{eqnarray}\mathop{\min }\limits_{({y}_{0},y)\in {{\mathbb{R}}}^{1+t}}\,\mathop{\max }\limits_{{\bf{f}}\in [{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]} & & \quad {y}_{0}+\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }{f}_{\alpha }\\ \mathrm{subject}\ \mathrm{to} & & \quad p^{\prime} (a| x)\leqslant {y}_{0}+\displaystyle \sum _{\alpha =1}^{t}{y}_{\alpha }{f}_{\alpha }[p^{\prime} ]\quad \forall a\in { \mathcal A },\forall x\in {{ \mathcal X }}_{r},\forall p^{\prime} \in { \mathcal Q }.\end{eqnarray} \tag{ 59 }$

In (56), we are thus solving a problem completely analogous to (51) except that the objective function now yields a bound on $G({\bf{f}})$ that holds on the entire region $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ . Since we minimize the objective function, we are searching for the best possible such bound.

Now, since the extreme points of $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ are not necessarily quantum (i.e., do not necessarily belong to ${\bf{f}}({ \mathcal Q })$ ), one could expect this upper bound to be strictly higher than the optimal quantum bound in $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ , contrarily to the primal formulation (33). But this is not the case, as follows from the duality property of conic programs. As we mentioned, the strong duality theorem for conic programs [46] ensures that the value of the dual (59) matches that of the primal (33), as long as the primal has a strictly feasible point. That is, a set of subnormalized probability vectors $\{{\tilde{p}}_{a,x}\}$ should exist such that all the equality constraints of the optimization are satisfied, and which lie in the interior of their respective cones, i.e., ${\tilde{p}}_{a,x}\in \mathrm{int}(\tilde{{ \mathcal Q }})$ for all $a\in { \mathcal A },x\in {{ \mathcal X }}_{r}$ . We argue that when the primal is not infeasible, this is almost always the case.

Consider the set ${\bf{f}}[\mathrm{int}({ \mathcal Q })]$ , that is, the set of Bell value vectors that are compatible with a non-extremal quantum behavior. If the intersection of this set with the confidence region $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ is non-null, we have strict feasibility of (33). Indeed, let $p\in \mathrm{int}({ \mathcal Q })$ a behavior such that ${\bf{f}}[p]\in [{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ . Then, any decomposition of p as a sum of points in $\mathrm{int}(\tilde{{ \mathcal Q }})$ , for instance ${\tilde{p}}_{a,x}=p/(| { \mathcal A }| \times | {{ \mathcal X }}_{r}| )$ , gives a strictly feasible point for (33).

On the other hand, if the closure $\overline{{\bf{f}}[{ \mathcal Q }]}$ has no intersection with $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ , then the primal (33) is infeasible and the dual (59) diverges to $-\infty$ , as a rather straightforward application of the hyperplane separation theorem shows.

The last possibility is that ${\bf{f}}[\mathrm{int}({ \mathcal Q })]$ has a point of tangency with $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ or, in terms of separating hyperplanes, that there exist {y₀, y_α} such that ${\sum }_{\alpha }{y}_{\alpha }{f}_{\alpha }\gt {y}_{0}$ for all ${\bf{f}}\in {\bf{f}}[\mathrm{int}({ \mathcal Q })]$ , and ${\sum }_{\alpha }{y}_{\alpha }{f}_{\alpha }\leqslant {y}_{0}$ for all ${\bf{f}}\in [{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ . In this case, there might only exist non-strictly feasible points for the primal, while the dual does not diverge. Without strong duality, there is no guarantee that the two resulting values are the same. However, this case is irrelevant to us, as the chances that the confidence region $[{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}]$ around our estimated frequencies is tangent to ${\bf{f}}[\mathrm{int}({ \mathcal Q })]$ are essentially zero.

Thus, in practice, the primal is either infeasible or it is strictly feasible. Hence, when the dual (59) converges, we can safely conclude strong duality, and the optimum of the dual is indeed not worsened by allowing the dual to select ${\bf{f}}\notin {\bf{f}}[{ \mathcal Q }]$ , since the same value as computed by the primal (33) is achieved by a feasible point $\{{\tilde{p}}_{a,x}\}$ such that $G([{\hat{{\bf{f}}}}^{-},{\hat{{\bf{f}}}}^{+}])=G(f[p])$ with $p={\sum }_{a,x}{\tilde{p}}_{a,x}\in { \mathcal Q }$ .