Randomness in post-selected events

Bell inequality violations can be used to certify private randomness for use in cryptographic applications. In photonic Bell experiments, a large amount of the data that is generated comes from no-detection events and presumably contains little randomness. This raises the question as to whether randomness can be extracted only from the smaller post-selected subset corresponding to proper detection events, instead of from the entire set of data. This could in principle be feasible without opening an analogue of the detection loophole as long as the min-entropy of the post-selected data is evaluated by taking all the information into account, including no-detection events. The possibility of extracting randomness from a short string has a practical advantage, because it reduces the computational time of the extraction. Here, we investigate the above idea in a simple scenario, where the devices and the adversary behave according to i.i.d. strategies. We show that indeed almost all the randomness is present in the pair of outcomes for which at least one detection happened. We further show that in some cases applying a pre-processing on the data can capture features that an analysis based on global frequencies only misses, thus resulting in the certification of more randomness. We then briefly consider non-i.i.d strategies and provide an explicit example of such a strategy that is more powerful than any i.i.d. one even in the asymptotic limit of infinitely many measurement rounds, something that was not reported before in the context of Bell inequalities.


I. INTRODUCTION
Sources of randomness have numerous applications: in algorithms, samplings, numerical simulations, gambling, and of course cryptography [1][2][3].The last application demands sources that can be certified as being uncorrelated to any outside process or variable, i.e. private randomness.Typically, the output of a physical process (thermal noise, shot noise, ...) is considered random in this sense only if certain assumptions are made on its underlying behavior.The violation of Bell inequalities, however, certifies private randomness in a deviceindependent way [4,5].From the amount of violation, one obtains a lower bound on the min-entropy H of the output string generated by the process [5][6][7].This information is then sufficient to extract randomness: indeed, one can design seeded extractors, whose output is a string of (roughly) H bits guaranteed to be uniformly random, even according to an external adversary.
A Bell experiment, however, produces much more information than the mere violation of a single inequality.For instance, one can estimate the single-run frequencies p(a, b|x, y) of the outcomes (a, b) conditioned on the settings (x, y).When this knowledge is taken into account, higher values for the lower bounds on H can in principle be obtained [8,9].More generally, there may be other ways to process the data that can lead to improved bounds on the randomness, as the following example illustrates.
Consider a Bell experiment running for two days, each day consisting of N 1 runs.Suppose that, on the first day, the setup produces outcomes that violate the CHSH inequality maximally; on the second day, for some techni-cal glitch, the detectors don't fire, so the list of outcomes consists only of double no-detection events.Suppose that the users estimate the amount of randomness generated using solely the observed CHSH violation I, using the simple bound H ≥ 1 − log 2 1 + 2 − I 2 /4 [5].Suppose further that they planned to extract randomness every two day.Over the two day period, they observe an average CHSH violation of (2 √ 2 + 2)/2 2.41 (we take the convention that no-detection events are mapped to +1 outcomes), from which they deduce a randomness rate of ∼ 0.2 bit/run for Alice's outcomes, that is ∼ 0.4N bits in total for the two-day period.However, the users might have chosen to extract randomness at the end of each day instead.The same techniques certify now 1 bit/run for Alice on the first day and 0 on the second, for a total of N bits over the two days [10] What happened is clear: the data contain the information that two processes are involved; this information was missed by the overall analysis, but was revealed by the choice of sorting the data in two blocks.
The example is extreme, but a simple variation is very relevant: the case in which no-detection events are evenly spread during the whole duration of the experiment is a good approximation to the data produced in photonics Bell tests, in which no-detection events constitute a large fraction of the runs (see e.g.Table I in [11]).Nodetection events come from two processes: the finite efficiency of the detectors, and the fact that parametric down-conversion often produces the vacuum state.The physics of both suggests that these events contain little or no randomness: it is thus tempting to sort the outcomes of the Bell test in two groups, the detections and the no-detections.As in the previous example, this may lead to certify more randomness.Even if it does not, one may get a practical advantage by extracting randomness only from the detection events.Indeed, randomness extractors require an independent random seed: the longer the initial string, the longer the needed seed and the computational time to output the result; in fact, it is an active research direction to construct randomness extractor with short seed length [3].Thus, it is beneficial to be able to extract randomness from a short string.
Here, we investigate the amount of randomness that can be certified in Bell tests within the subset of detection events.For this first study, our aim is simply to determine whether this is actually a viable strategy.We thus perform our analysis in the simplified scenario in which the devices and the adversary behave in an i.i.d.way and in the limit of infinitely many measurement rounds.If randomness cannot be certified in this simple scenario, then it can also certainly not be certified in the non-i.i.d.finite statistics case.
The post-selection of detection events notoriously opens the detection loophole [12,13].It is important to clarify that our approach does not fall into that trap.We shall compute a lower bound on the randomness that can be extracted from a subset of events, but the bound is obtained by taking into account the whole set of events.In particular, if the behavior of the devices is compatible with local realism due to the detection loophole, our method will say that no randomness can be certified in the post-selected set of detection events.
Let us remark that a similar analysis in the context of violating local realism, namely the p-value of postselected events which does not contain no-detections, has been done recently [14].
After introducing the technique that we will use to bound randomness in Section II, we apply it to several physically-motivated examples in section III.In Section IV we analyse more precisely the effect of postselection in a simplified case.A glimpse beyond the i.i.d.restriction is given in Section V before the conclusion.

II. AVERAGE RANDOMNESS IN POST-SELECTED EVENTS
Consider a Bell experiment consisting of two separate devices in which each party inputs x ∈ X and y ∈ Y and obtains outputs a ∈ A and b ∈ B, respectively.The behavior of such devices over n successive runs can be characterized by the -generally unknown -joint probabilities p(ab|xy) to obtain the output string ab = (a 1 b 1 , . . ., a n b n ) given the input string xy = (x 1 y 1 , . . ., x n y n ).The information that an adversary has over the output string can be characterized by a tripartite quantum distribution p(abe|xyz) where e denotes the output the adversary obtains when he makes a measurement z on a system possibly entangled with Alice and Bob's devices.In general e can be a string of arbitrary size representing the total information that the adversary can get about Alice and Bob's outcomes and z can be an arbitrary measurement that depends on the information available to the adversary in the protocol before his measurement.
Here we shall make the following simplifying assumptions.First, we will assume that the device behave in an i.i.d.way and similarly that the adversary extracts his information in an i.i.d.way by performing at each run individual measurements z i .We can thus write p(abe|xyz) = n i=1 p(a i b i e i |x i y i z i ).Second, we are going to assume that Alice and Bob's marginal p(ab|xy) at each run are known and given.In this way, we do not need to take care of estimation.With these assumptions, finding the adversary's optimal attack thus amounts at optimizing some quantity over all tripartite quantum distributions p(abe|xyz) = Ψ|M a|x ⊗ M b|y ⊗ M e|z |Ψ compatible with a given bipartite marginal p(ab|xy) = e p(abe|xyz) = Ψ|M a|x ⊗ M b|y ⊗ I|Ψ .Let us now introduce the additional ingredient of postselection.For this, we consider a bipartition of the joint output alphabet O = A × B into two sets V (valid symbols) and N .If the outputs at a given round (a, b) ∈ V, we say that the round is valid, and otherwise, if (a, b) ∈ N , that it is invalid.We refer to the events obtained in valid runs only as the post-selected events.Our goal is to estimate how much randomness can be extracted from these post-selected events.
A priori, an adversary trying to guess the post-selected events might not have access to the information about which run turned out to be valid or invalid, since he should not have access to the outputs observed by the parties.For simplicity, however, we'll assume here that the adversary has access to this information.This allows him to know exactly which run he should try to guess and is thus advantageous for him.The amount of randomness that can be certified in this case thus constitutes a lower bound on the amount that can be certified when the adversary is not given this information.This assumption might however be problematic in a non-i.i.d.situation (see Section V).
We are going to assume in the following that Alice and Bob use a certain pair of inputs (x,ȳ) for randomness generation [15].Since there is a promise on the marginal p(ab|xy) and since we do not need to consider how to estimate this quantity, we are going to assume for simplicity that Alice and Bob always measure their systems using the inputs (x,ȳ).Suppose that by measuring n systems, they obtain m results in V and n − m results in N .The number m of valid results is a random variable with probability distribution p(m) = n m p m xȳ (1 − p xȳ ) n−m , where p xȳ = ab∈V p(ab|xȳ) is the single-run probability to obtain a pair of valid results when using inputs (x, ȳ).
By the i.i.d.assumption, the min-entropy of the melements post-selected string is m H xȳ , where H xȳ is the single-run min-entropy and is defined below.Applying a randomness extractor to this string, then yields m H xȳ bits of randomness (such extractors exist up to correction, see [16,17]).The average length of the final random string is then We can also interepret this last quantity as an "average" min-entropy [18].The rate of randomness extraction per use of the device can then be defined as p xȳ H xȳ .
To complete the analysis, it remains to determine H xȳ .By definition, the min-entropy is related to the guessing probability G xȳ as H xȳ = − log 2 G xȳ , where the guessing probability is the maximal probability that the adversary correctly guesses Alice and Bob's outputs by performing an optimal measurement on his quantum side information [19].Here since we condition on valid runs, this quantum side information can be represented by the cq-state ρ ABE = 1 s.t.Ψ|M a|x ⊗ M b|y ⊗ I|Ψ = P (ab|xy) .
Following [9] and introducing the bipartite subnormalized quantum correlations pa b (ab|xy) = Ψ|M a|x ⊗ M b|y ⊗M a b |z |Ψ where z denotes the adversary's optimal measurement which maximizes (1), the above optimization program can be rewritten as where Q denotes the set of unormalized bipartite quantum correlations.The meaning of this program is intuitive: Eve prepares one of |V| systems for Alice and Bob, one for each outcome pair (a b ).Each system is characterized by joint probabilities p a b (ab|xy) = pa b (ab|xy)/q a b and is prepared with probability q a b = ab pa b (ab|xy).When Eve prepares system ab, she guesses that Alice's and Bob's outputs are ab, hence the probability that she guesses correctly on average is given by the objective function in (2).Eve's preparations should of course on average reproduce the given correlations p(ab|xy), hence the first constraint of (2).The second constraint simply expresses that Eve's preparations should be compatible with quantum theory.
Notice that the constraints in the second line of (1) and in the second one of ( 2) involve all outputs a, b and not only those belonging to the post-selected set V. This reflects the fact that our analysis is not subject to the detection loophole.
To summarize, for a given set of bipartite correlations p(ab|xy) characterising the behavior of the devices, the figure of merit that we are going to consider in this paper, which we call the randomness rate, is In general, it is not possible to carry out explicitly this optimization as there is no closed form for the set of quantum correlations Q.However, we can upper-bound the optimal value of (2), and thus lower-bound the randomness rate, through semidefinite programming by relaxing the last condition Pa b (ab|xy) ∈ Q and asking that Pa b (ab|xȳ) belongs to some level of the NPA hierarchy [20][21][22] instead of the exact quantum set.All optimizations reported here were performed at local level 1 of the SDP hierarchy [23].

III. APPROXIMATING PHOTONIC EXPERIMENTS
The natural benchmark to test our tools are the correlations expected in a Bell experiment using spontaneous parametric down-conversion (SPDC).In the single-mode case, such a pulsed SPDC source produces a state of the form where a H/V (b H/V ) are polarization modes for Alice (Bob), |0 is the vacuum state, and c(g, ḡ) = 1 − tanh 2 g 1 − tanh 2 ḡ for g, ḡ being the two squeezing parameters.The parties Alice and Bob can measure this state by placing two detectors after the usual set of wave plates and a polarization beam splitter.If the detectors do not resolve the number of incident photons, four cases can then be observed: no detection, a click in the first detector, a click in the second detector, or two clicks.In the following, we label a click in the first detector as 0, a click in the second detector as 1, and the case where either no detection or double detections are observed as ∅, so that each party effectively produces one of three possible outcomes.The statistics observed in this situation as a function of the polarization measurements and the detection efficiency (or equivalently the losses between the source and the detectors) are described in [24].
Using the program (2), we are going to compute lower bounds on the extractable randomness that can be found in presence of these statistics in the following cases: • (a) All outcomes are considered (no post-selection), i.e.N = N a = {} (the empty set).
For the sake of comparison, we will sometimes also consider the case in which the measurements are performed only when at least one photon pair is produced by the source, i.e.
• (h) The source is heralded.
An example of heralded experiment is the recent one of Hensen et al. [25].Note that in this particular case the state is encoded in a non-photonic system and always yields a detection whenever measured.

A. Perfect detectors, variable squeezing
We first consider the case of an experiment with no loss, and with unit efficiency detectors.In this case it seems natural to try to generate a maximally entangled state.We thus set g = ḡ and vary the squeezing g.Varying g can also be understood as changing the time window τ during which detectors are monitored.Indeed, the average number of photon pairs produced within this window is given by ν = sinh 2 g + sinh 2 ḡ = 2 sinh 2 g.
Figure 1 shows the randomness per run obtained when setting the polarization measurement according to the standard CHSH settings.The various discarding strategies yield different amounts of certified randomness, the largest amount being obtained using strategy (b).
One may be tempted to infer that, for randomness extraction, SPDC sources should be operated with detection window at ν ∼ 0.6.However, this is the amount of randomness per run, not per time.For a given pump power, decreasing the window size τ decreases the average number of photon pairs in a proportional manner: ν ∝ τ .At the same time, the number of time windows increases as ∼ 1/τ ∝ 1/ν.If f (ν) denotes the randomness rate per time window, the randomness that can be certified in a given time interval is thus given, up to a constant factor, by f (ν)/ν.This quantity is plotted in the inset of Figure 1, where one can see that total amount of randomness certified is larger when ν is small, i.e. the time window τ is small.Therefore, in the asymptotic limit of infinitely many runs, one should set τ → 0 to get more randomness per time against an i.i.d adversary.In this case, the observed data set is dominated by double no-detection events, which reinforces the relevance of our post-selection approach.The regime of small ν is also the regime in which optical experiments closing the detection loophole have been performed [11,27,28], for a different reason: in the presence of losses and imperfect detectors, the Bell violation disappears if too many pairs are created, while is preserved in the limit of small windows [29].FIG. 1. Randomness from an SPDC source when setting the polarization measurement according to the standard CHSH settings, as a function of the average number of photon pairs produced in each detection window.No losses and unit efficiency detectors are assumed.The qualitative shape of the curves can be understood as follows: for small g, the generated state contains mostly the vacuum; for large g, the source generates several pairs, which worsens the statistics [26].Strategies (a), (b) and (c) certify various amounts of randomness.
Here and in the following figures, all the curves are normalised to the same number of runs, namely the total number of runs.Inset: Randomness certified in a given time period when the length of a time window varies (and the number of time windows varies accordingly).This curve is obtained at constant pumping g.

B. Imperfect detectors, small squeezing
For the reasons just mentioned, we focus now on g, ḡ << 1. (i.e.small ν).In this case, a large number of no-detection events is expected.In spite of this, we are going to see that strategy (b) continues to perform better than the others.Concretely, we choose to fix the average number of photon per detection window as ν = 0.01.The state produced by the source can be approximated to first order in g and ḡ by In analogy with the partially entangled state cos θ |01 − sin θ |10 , we define the entanglement parameter of the state as θ = arctan(tanh ḡ/ tanh g).
We now introduce finite detection efficiency η and study how the certification of randomness varies with this parameter.We then consider two families of correlations.In the first, the two-photon state is maximally entangled, i.e. with θ = π/4, and we fix the standard CHSH polarization measurements.The expected randomness per run as a function of η is shown in Figure 2. We note that no randomness can be extracted if η ≤ 82.8% which is known to be the boundary at which those correlations Randomness from a singlet with finite detection efficiency.Curves (b) and (h) coincide almost perfectly and approach 0 at the detection loophole limit 0.828 [13]. ) and (h) coincide and approach 0 at the Eberhard limit of 2/3 [12].Two recent experiments used this Eberhard correlations.In Ref. [28], the overall efficiencies are estimated at 78.6% for Alice and 76.2% for Bob; in Ref. [27], at 74.7% for Alice and 75.6% for Bob.Thus, strategies (a) and (b) would extract a very similar (small) amount of randomness.If efficiencies are increased in the future, strategy (b) should be preferred.
can be explained with a local model exploiting the detection loophole.The second case is that of Eberhard's famous study [12], in which the entanglement parameter θ depends on the detector efficiency η, and Alice's measurements are parametrized by two angles α 0 , α 1 which also depend on η.These parameters are chosen to optimize the violation of a lifting [30] of the CHSH inequality, in the case where exactly one pair of photons is measured, for each value of η.The resulting randomness rate is plot-ted in Figure 3. Again, no randomness can be extracted below the known detection loophole threshold η ≤ 66.6%.
In both cases we notice again that, within a numerical precision ∼ 10 −5 , strategy (b) certifies the largest amount of randomness and in fact recovers the result that one would obtained with a heralded source (h).The expected proportion of discarded events is ∼ (1−ν)+ν(1− η) 2 , which can be substantial: it is larger than 99% in our case for all η.Strategy (c), i.e. removing all events where some no-detection occurred, results in clearly lower randomness per run; and for efficiencies lower than 86% and 85%, no randomness at all is even certified.This kind of post-selection is thus too strong if one is interested in certifying an optimal amount of randomness.Strategy (a) certifies essentially the maximum amount of randomness for efficiencies η 90%, but would become suboptimal as efficiency increases.

IV. UNDERSTANDING WHY ONE CERTIFIES MORE RANDOMNESS FROM A SUBSET OF DATA
Let us stress again that in Figures 1-3, all the curves are normalised to the same number of runs, the total one.Thus, they show that if a suitable small fraction of the symbols is processed, a strictly larger amount of total randomness can be certified, as compared to the case where all the symbols are processed.In order to shed light on this behavior, we consider a simplified model in which the source emits a perfect maximally-entangled state with probability ν, and the vacuum otherwise (in other words, compared to the previous section, we neglect completely the possibility of double detections in each party's measurement setup).We also work at perfect detection efficiency η = 1.The statistics observed with such a source can be written as .
(5) Notice that, for the source efficiency ν = 1 2 , these correlations can be seen as the scrambled version of the two-day extreme situation mentioned in the introduction.
In Figure 4, we show how much randomness can be certified for these statistics when ν varies.In this case, the lower bound on the randomness computed from the raw data is this time consistantly lower than the one obtained after removing double no-detections from the data.In fact, after discarding double no-detections, the same amount of randomness that could be certified if the source was heralded is recovered (i.e. it is proportional to the source efficiency ν).
We thus recover the same behaviour as discussed in Section III B and in the two-day example of the introduction.If we don't consider it an overwhelmingly improbable fluctuation, the two-day example clearly suggests a non-i.i.d.process, for which the possibility of identifying Randomness from a singlet produced with finite probability ν, with η = 1.Curves (b) and (c) are identical, since there are no events with one detection and one no-detection in the raw data (the post-selection procedures (b) and (c) are actually the same for this correlation).Curve (h), which gives the randomness from raw string of outcomes upon the heralding of a successful preparation of the state (i.e.randomness from the correlation 5), exactly coincides with curves (b) and (c).Curve (a) lies below the other ones.
two separate processes is easy to understand.Here, on the contrary, the statistics are manifestly i.i.d.-and nevertheless, the extraction of randomness based on the single-run frequencies p(ab|xy) can be improved.We are going to show that the cause is the same: because of the structure of the correlations, one can actually identify the presence of two distinct processes, and the post-selection of detection events happens to capture this fact.That the alternation between the two processes is done in an i.i.d.way, instead of a disruptive way as in the two-day example, eventually does not matter.
All together, we can thus rewrite the optimization (2) as q a b (ab|xy) ∈ Q where Q denotes the set of normalized quantum correlations.Defining qa b (ab|xy) = ν a b /ν × q ab (ab|xȳ), we can further rewrite it as qa b (ab|xy) ∈ Q This optimization is nothing but the one associated to a heralded source characterized by the correlations q(ab|xy) and explains why curve (h) of Figure 4 coincides with curves (b) and (c).

V. GOING BEYOND I.I.D. FOR THE SOURCE
In this section, we are going to relaxing the i.i.d.assumption for the source.We won't be able to derive bounds for the extraction of randomness from the most general non-i.i.d.source.But we are going to provide two example of non-i.i.d.strategies that are strictly more powerful than i.i.d.strategies even in the asymptotic limit of infinitely many runs.To our knowledge, this is a feature not found in previous works on randomness from Bell tests [5,31,32] or on quantum key distribution [33].
In the strategies we found, the adversary exploits the knowledge of whether each outcome is kept or discarded.As mentioned in Section II, it would be definitely reasonable not to reveal anything, but such scenario may introduce other security concerns (e.g. the raw key is private conditional on some other information being kept private).
Specifically, suppose that the outcomes of run k are valid, i.e. they are kept for the raw key; the adversary would like to know their value.In a non-i.i.d.case, the fact of keeping or discarding the outcome at run k + 1, an information which we assume the adversary will learn, may leak some information about the outcome that is kept at run k.This is similar to the argument of [34] against reusing QKD devices in the device-independent level of characterization [35].Notice that this behaviour does not require the adversary to have tampered with the device in a malicious way, it may be simply a defect of fabrication that the adversary is aware of.For instance, suppose that the detector corresponding to outcome 0 has an inordinately long jitter time compared to the other detector: if a detection happens at run k + 1, it means that the outcome at run k was 1; if no detection, the outcome at run k was most probably 0.

A. First example
The simplest example we found requires both Alice's and Bob's devices to depend on the previous inputs and outputs of both sides.Note that this is not in contradiction with the basic assumption in all device-independent protocols that the two boxes are non-communicating, since this assumption must only be verified during the measurement runs.Between measurement runs, however, boxes could in principle be free to communicate.For instance, before the measurement runs, the boxes may open a door within a small time interval to let enter incoming quantum systems, those generated by and coming from the source.Malicious boxes could take advantage of this interval to exchange the inputs and outputs obtained in previous runs.In the next subsection, we will present a more convoluted example that does not require signalling between the boxes, and thus which also works if measure are taken to insure that the boxes do not exchange such kind of information between measurement runs.
Consider the i.i.d.correlations obtained when the parties measure a singlet with probability ν, and nothing with probability 1 − ν.We have encountered this situation in paragraph IV: for any ν > 0, some randomness remains in the non-discarded outcomes (see Figure 4).
In all existing protocols, the amount of randomness that is extracted is determined from a statistical test which is based on the input and output pair counts #(x, y) and #(a, b) (or simply relative outcome frequencies #(a, b)/#(x, y).However, the same statistics obtained for ν = 2/5 can be obtained with high probability when measurements are always performed on a perfect singlet, but runs with double no-detections are artificially added by using the following non-i.i.d.rule: singlet outcomes (a, b) following runs (0, 0) ) where M means that an usual measurement is performed on the perfect singlet to determine the outcome of that run.In this case, counting the number of successive discarded events fully informs about the value of both parties' outcomes.Thus, in the non-i.i.d case, and allowing signalling from one box to the other between measurement runs, no private randomness can be certified from a non heralded source characterized by ν ≤ 2/5 (unless some more complicated processing beyond looking at simple outcome counts is done).

B. Second example
The second example was found numerically.It is admittedly hard to find a narrative justification for it, besides the general intuition given above; but we describe it in detail since, to our knowledge, it is the first example in which a non-i.i.d.strategy actually outperforms the i.i.d.ones in a Bell scenario in the asymptotic limit.
Resources.In each run, Alice and Bob share two binary variables λ, µ ∈ {0, 1} and one out of five quantum correlations that we denote by P j with j ∈ {1, 2, 3} and P λ .These correlations are such that Alice's box has three outcomes {0, 1, ∅}, while Bob's box has only the two outcomes {0, 1}: in other words, information about previous outcomes will be leaked out by Alice's box detection or no-detection events.We can write these correlations as above in the form of Collins-Gisin tables [36]: 0|0) P A (0|1) P B (0|0) P (00|00) P (00|01) P B (1|0) P (10|00) P (10|01) P B (0|1) P (00|10) P (00|11) P B (1|1) P (10|10) P (10|11) because by no-signaling it holds P (a1|xy) = P A (a|x) − P (a0|xy) and P (∅b|xy) = P B (b|y)−P (0b|xy)−P (1b|xy); and of course a P (a|x) = b P (b|y) = 1.The example that we find uses: Protocol.One starts with one of the three P j 's.As long as j = 1 or j = 2, the next round will also use one of the three P j 's.When P 3 was chosen, the next box will be P λ with the value of λ available in that run.Besides, if Alice's outcome from P 3 was a = µ, in the next run Alice uses the box P λ ; if the outcome was a = 1 − µ, in the next run Alice ignores P λ and outputs ∅.After this, the process starts again by selecting one of the three P j 's.Now, when x = 0, either outcome 0 or outcome 1 cannot occur for each potential correlation except P 3 ; and when P 3 is used, its outcomes is fully leaked out in the next run by the information of whether the subsequent outcome is kept or not, since P λ (∅|x) = 0.
One can check, however, that it would not be possible to fully guess Alice's outcome if the same outcome relative frequencies as the one generated by the above process where produced by devices behaving in an i.i.d manner.For instance, let us specify q 1 = 0.4097, q 2 = 0.4992, q 3 = 0.0911 as the frequencies at which the P j 's are chosen; and p(λ = 0) = 1 − p(λ = 1) = 0.0013, p(µ = 0) = p(µ = 1) = 1/2.The expected relative frequencies in the asymptotic limit are then peaked around the following values P = q 1 P 1 + q 2 P 2 + q 3 P 3 + q 3 (p(λ = 0)(P 0 + P 0 B )/2 + p(λ = 1)(P 1 + P 1 B )/2) q 1 + q 2 + 2q 3 = 1 0.6919 0.5000 0.2800 0.0716 0.0178 0.5000 0.4681 0.3722 0.2800 0.0716 0.2621 0.5000 0.4681 0.1279 where P λ B denote the correlations obtained when Bob uses P λ and Alice outputs ∅.Applying our i.i.d.programme to these correlations, one can show that in case Alice uses x = 0 and the run is not discarded, the guessing probability on her outcome is upper-bounded by 0.9874.

VI. CONCLUSION
This work stems from the general remark that randomness extraction does not need to be performed on all of the raw data and can be done by blocks, or on a subset of data.In the context of randomness certification by Bell inequalities, we have investigated in a simple scenario whether this could provide an advantage when post-selecting detection events, which is relevant for pho-tonics Bell tests.Because we estimate the randomness present in a subset of data conditioned on the knowledge of the whole set of data, this certification does not open the detection loophole.
Naively, one could a priori think that "full detection" events, where a detection happens on both side, are the most important for randomness certification and that discarding all other events would influence only negligibly the randomness rate.However, our findings show for several physically-motivated models of the observed statistics that this is not the case.In particular, Figure 2 and Figure 3 show that the resistance to detection inefficiencies is substantially lower (up to 20% for the scenario Figure 3) when the post-selected data does not contain any occurrence of a no-detection event.
The physical intuition that the double no-detection events contain almost no randomness is, however, vindicated.In some cases, the post-selection actually help identify a better way of reading the data.From a practical perspective, our work suggests the possibility of hashing a small post-selected subset of the original data, thereby reducing the needed seed length, and ultimately the computational time.However, one should still embed this idea within a full randomness certification protocol, in particular one that can deal with finite statistics and non-i.i.d.devices.
Regarding this last point, the physical intuition that double no-detection events can safely be discarded, as vindicated by our numerical results in an i.i.d.setting, should, however, be contrasted with the example of Section V in which we prove that non-i.i.d.strategies outperform i.i.d.ones even in the asymptotic limit of infinitely many runs, something that had not been reported previously in the context of Bell inequalities.Whether these strategies are actually harmful in a more general and realistic case remains to be determined.
In particular, we remind that for simplicity we have performed our analysis assuming that the adversary gets to know which runs are kept and which ones are discarded in the post-selection.This scenario is rather artificial for randomness generation, insofar as the two boxes for the Bell experiment don't need to be in separate labs.Relaxing this assumption could increase the randomness rate and the security of the final string.Specifically, the non-i.i.d.attacks of Section V would not apply anymore in this case.

1
pxȳ ab∈V |ab ab| ⊗ ρ ab E , where ρ ab E = tr M a|x ⊗ M b|ȳ ⊗ I |Ψ Ψ| .The probability that the adversary then makes a correct guess e = (a, b) of Alice and Bob's outputs a, b by performing a measurement z on his system is, averaged over Alice and Bob's possible outputs, 1 pxȳ ab∈V tr M ab|z ρ ab E = pxȳ ab∈V Ψ|M a|x ⊗ M b|ȳ ⊗ M ab|z |Ψ .To determine the maximal value of this guessing probability, we should maximize it over all quantum realizations R = (|Ψ , {M a|x }, {M b|y }, {M e|z }) compatible with the given marginals p(ab|xy) characterizing Alice and Bob's devices.We thus have FIG. 2.Randomness from a singlet with finite detection efficiency.Curves (b) and (h) coincide almost perfectly and approach 0 at the detection loophole limit 0.828[13].

FIG. 3 .
FIG.3.Randomness from Eberhard correlations.Curves (b) and (h) coincide and approach 0 at the Eberhard limit of 2/3[12].Two recent experiments used this Eberhard correlations.In Ref.[28], the overall efficiencies are estimated at 78.6% for Alice and 76.2% for Bob; in Ref.[27], at 74.7% for Alice and 75.6% for Bob.Thus, strategies (a) and (b) would extract a very similar (small) amount of randomness.If efficiencies are increased in the future, strategy (b) should be preferred.
FIG. 4.Randomness from a singlet produced with finite probability ν, with η = 1.Curves (b) and (c) are identical, since there are no events with one detection and one no-detection in the raw data (the post-selection procedures (b) and (c) are actually the same for this correlation).Curve (h), which gives the randomness from raw string of outcomes upon the heralding of a successful preparation of the state (i.e.randomness from the correlation 5), exactly coincides with curves (b) and (c).Curve (a) lies below the other ones.