Concise and tight security analysis of the Bennett–Brassard 1984 protocol with finite key lengths

We present a tight security analysis of the Bennett–Brassard 1984 protocol taking into account the finite-size effect of key distillation and achieving unconditional security. We begin by presenting a concise analysis utilizing the normal approximation of the hypergeometric function. Next we show that a similar tight bound can also be obtained by a rigorous argument without relying on any approximation. In particular, for the convenience of experimentalists who wish to evaluate the security of their quantum key distribution systems, we also give the explicit procedures of our key distillation and show how to calculate the secret key rate and the security parameter from a given set of experimental parameters. In addition to the exact values of key rates and security parameters, we also describe how to obtain their rough estimates using the normal approximation.


Introduction
The finite size effect is an important issue in practical quantum key distribution (QKD) systems.The first detailed finite-size analysis for general coherent attacks was given by Hayashi [1] using the normal approximation.Later, Scarani and Renner [2] gave a simple analysis based on the quantum de Finetti Theorem, but their results are valid only against collective attacks.Matsumoto and Uyematsu also gave a simple analysis [3], but again, essentially valid only for collective attacks.Later, Tomamichel et al. [4] gave a tighter bound with unconditional security by using the uncertainty relations (see., e.g., [5,6]).
In this paper, we present a concise analysis for the Bennett-Brassard 1984 (BB84) protocol [7] that takes the finite key effect into account and yields better key generation rates, with and without relying on the normal approximation.Our analysis is valid for general coherent attacks and thus our results guarantee the unconditional security.For the sake of simplicity, we consider the case where the sender, Alice, has a perfect single photon source.We also assume that Alice and the receiver, Bob, calculate an upper bound on the phase error rate of a sifted key, from that of the corresponding sample bits; hence the key generation rate can vary each time Alice and Bob run of the protocol.
Throughout the paper we use the security criteria with universal composability; the same criteria as used by many researcher, particularly by Renner and his coworkers [8,9].Hence our final goal is to show that the trace distance between the actual and the ideal states can be bounded from above.However, in the mathematical analysis for obtaining upper bounds on the trace distance, we do not use Renner's approach based on the smooth minimum entropy [8].Instead, we bound the trace distance using the argument by Shor and Preskill [10], as well as its modification by Hayashi [1].In Section 3, by using these formalisms, we show that the trace distance can be bounded by using the decoding error probability P ph of the virtual phase error correction; in other words, the universally composable security can be guaranteed by bounding P ph .To the best of our knowledge, our argument here is the first rigorous treatment of the universally composable security based on the Shor-Preskill formalism, applicable to linear universal hash functions with variable final key lengths.
As we shall also discuss at the end of Section 3, in order to achieve high key generation rates and strong bounds on P ph simultaneously, it is crucial to estimate the phase error rate p sft of the sifted key with a high accuracy.Note here that the quantity p sft cannot be measured directly in the BB84 protocol.Hence in Section 4, we solve an interval estimation problem on p sft using the hypergeometric distribution P hg .Then by using the obtained result, we give explicit bounds on P ph in Section 5.In particular, in order to clarify the argument, we present two versions of analysis: We first derive a simple bound that we call the straightforward bounds (Propositions 1 and 2); and then next give a more complicated bound called the Gaussian bounds (Theorems 2 and 3), which yield a better final key rate if the raw key is sufficiently large.For the both types of bounds, we first present a simple analysis based on the normal approximation of the hypergeometric function (Proposition 1 and Theorem 2), and then next show that a similarly tight bound can also be obtained by a rigorous argument without relying on any approximation (Proposition 2 and Theorem 3).
Since this paper is not aimed only at theorists, but also at experimentalists who wish to evaluate the security of their QKD systems, we include explicit procedures of security evaluation.We begin in Section 2 by explaining explicit procedures of our key distillation.Then after theoretical arguments of the security, we demonstrate in Section 6 how to use our theorems to calculate the secret key rate and the security parameter (i.e., an upper bound on the trace distance) from a given set of experimental parameters.Besides the exact values of key rates and security parameters, we also present how to obtain their rough estimates using the normal approximation.
In order to show that our rates are indeed better than in existing literatures, e.g., Refs.[2,4], we draw in Section 7 example curves of key generation rates (Figs. 1  and 2).There are several reasons for this improvement.First, our upper bounds are close to the approximated value of the hypergeometric distribution obtained by the normal approximation, while the existing results [2,4] did not discuss the closeness to the normal approximation.Second, in our method, the adversary's information is estimated in terms of the Shannon entropy, whereas in [2,4] they use the minimum entropy, which is a lower bound on the Shannon entropy.Finally, we use an error margin that depends on the measured error rates of sample bits, while in Refs.[2,4] the margin is a constant.
We also treat the sacrifice bit length with the second order coding rate, which draws the attention from information theory community [11,12,13].The conventional asymptotic theory treats the coding length with the first order coefficient.It is impossible to treat the approximation value of the best error probability with the first order coefficient of the coding length.However, it becomes possible if we consider the coding length up to the second order coefficient.In this paper, we derive an asymptotic approximation value of the upper bound of the universally composable security criterion when the sacrifice bit length is given as the form nh(p smp )+ √ ng(p smp ) with the measured phase error rate, where a function g(p smp ) of p smp will be given with a concrete form in Section 4 (Theorem 4).The differences from our previous papers are as follows.In Refs.[1], Hayashi simply approximated the hypergeometric distribution by the normal distribution having the same variance, without showing its validity.In this paper, we present a rigorous analysis without relying on any approximation (Proposition 2 and Theorem 3), by using upper bounds on the hypergeometric distribution obtained from the Stirling's formula and inequalities proved in Ref. [14,15].As mentioned above, we also included the first rigorous treatment of the universally composable security based on the Shor-Preskill formalism, applicable to linear universal hash functions with variable final key lengths.

Description of Our QKD Protocol
We consider the following type of the BB84 protocol.This protocol differs from existing versions (e.g., [1,2,3]) only in the phase estimation and the privacy amplification steps.
Generation of a Sifted Key and Sample Bits Alice and Bob start the protocol with a quantum communication and obtain a sifted key of n bits and sample bits of l bits.Here we assume that raw key bits are chosen from the uniform distribution.The sample bits must be selected randomly, and a sifted key and the sample bits must be measured in different bases.
For example, suppose that Alice and Bob exchange N qubits, choosing the x basis with probability q, and the z basis with 1 − q.Then, on average, Nq 2 bits coincide in the x basis, and N(1 − q) 2 in the z basis.By assinging the x basis for a sifted key, and the z basis for sample bits, they have n = Nq 2 , l = N(1 − q) 2 .‡ Bit Error Correction Bob corrects bit errors in his sifted key using a linear error correcting code.For example, as in Shor-Preskill's case [10], Alice may announce a random bit string XORed with her sifted key; or alternatively, as in Koashi's case [16], she may send a syndrome of her sifted key encrypted with a previously shared secret key.In either case, Alice and Bob end up with n(1 −f h(p bit )) bits of reconciled key k rec , with the bit error rate p bit of a sifted key.Here h(x) is the binary entropy function defined as h(x) := −x log 2 x − (1 − x) log 2 (1 − x), and value f corresponds to the efficiency of the error correcting code used.For practical codes, f ≃ 1.1.It should be noted that here the sizes of bit error correcting codes are independent of the security, and thus Alice and Bob may perform bit error correction by dividing a sifted key k sif of n bits to arbitrarily smaller blocks.
In many cases, one needs to guarantee the correctness of the shared keys, that is, one has to minimize the probability ǫ cor that Alice's and Bob's secret keys do not match and the protocol does not abort.One way for minimizing ǫ cor is that Alice calculates an r-bit hash value of her reconciled key k rec using universal 2 hash functions.Then she encrypts it with the one-time pad using a previously shared secret key, and sends it to Bob. Bob also calculates his own hash value, and if it does not match Alice's, they abort the protocol §.By doing this, we have ǫ cor ≤ 2 −r .

Estimation of the number of phase errors in the channel
In order to use privacy amplification properly and guarantee the security of a secret key, Alice and Bob need to know an upper bound on the number of phase errors occurring in the channel.It should be noted here that the phase error is a completely different concept from the bit error mentioned above (for details, see Section 3).Since the phase error rate cannot be measured directly in practical QKD systems, we estimate its upper bound from the measured error rate of samples.
We denote the number of bit errors occurring in a sample bits by c, and the corresponding bit error rate by p smp (c) := c/l.We also call the union of a sifted key and the sample bits total bits, and denote the number of their bit errors by k.Hence the error rate of total bits is given by p(k) := k/(n + l), and that of a sifted key by p sft (k, c) = (k − c)/n.Note here that measuring c corresponds to randomly sampling phase errors in the total bits, because a sifted key and the samples are measured in different bases.Due to this fact, the measured value of p smp (c) is used to estimate an upper bound on p sft (k, c).In the asymptotic limit n, l → ∞, Alice and Bob may assume p sft (k, c) = p smp (c).In practical QKD systems, however, the two values differ in general due to statistical fluctuations.Thus they obtain a statistically estimated upper bound of p sft (k, c) as a function of the measured value c, which we denote by psft (c).Throughout the paper, we make it a rule to denote an estimated upper bound of a random variable v by v.The explicit functional form of psft,ε (c) is discussed later, and is given in Eq. (25).
Privacy Amplification (PA) The estimated phase error rate psft (c) can be used to obtain an upper bound the amount of information that is leaked to Eve.In order to cancel Eve's information, Alice and Bob perform a classical data processing called privacy amplification on the reconciled key k rec to generate the secret key k sec ; very roughly speaking, PA randomizes and shrinks k rec so that Eve's information is canceled by the remaining fraction that is unknown to Eve.The number of bits to be reduced in this process (sacrifice bits) is determined from psft (c) in the following manner.
We set two limits c min and c max (c min ≤ c max ) on the sample bit error c, depending on which Alice and Bob change their procedures.
• If c max < c, Alice and Bob abort the protocol.
• If c min ≤ c ≤ c max , Alice and Bob generate a secret key as the hash value of their sifted key by using a linear and surjective universal 2 hash functions.The number α(c) of sacrifice bits, i.e., the number of bits reduced in PA, is given by Here ⌈x⌉ denotes the smallest integer larger than or equal to x.Hence, as a result, they obtain a secret key Note that key length G of (2) differs from the asymptotic case (l, n → ∞) essentially only in the definition of phase error rate psft,ε (c + 2).Hence the estimation of psft,ε (c + 2) is the key point of our finite size analysis.
• If c < c min , Alice and Bob generate a secret key in the same way as above, except that they sacrifice α(c) = ⌈nh (p sft,ε (c min + 2))⌉ + D bits for PA.As a result, they obtain a secret key Alternatively, we can combine these three case as follows: Define the sacrificed bit length α(c) to be If c ≤ c max , Alice and Bob sacrifice α(c) bits for PA and obtain the final key of length If c ≥ c max , they abort the protocol.
In practice, the most efficient implementation of PA is to use the Toeplitz matrices: Alice and Bob select a bit-valued Toeplitz matrix M randomly by communicating over the public channel, multiply it with a reconciled key k rec modulo 2, and obtain the secret key k sec = Mk rec (for details, see., e.g., [8,17,18]).
In this paper, we additionally require the surjectivity for all of hash functions.To the best of our knowledge, the most efficient implementation of linear and surjective universal 2 functions is by using the modified Toeplitz matrix, introduced in [1,17]; in this case we replace M above by a concatenation M ′ = (I, T ) of the (square) identity matrix I and a Toeplitz matrix T .Note that this modification M ′ is slightly more efficient than M above.Also note that unlike M ′ , the normal Toeplitz matrix M gives a non-surjective map with a very small but nonzero probability; e.g., for M being an all-zero or all-one matrix.
It should be noted here that, unlike in bit error correction, one is not allowed to perform PA by dividing k rec and k sec into smaller blocks, because doing so will destroy the universal 2 property of the (modified) Toeplitz matrix.Also note here that the both key lengths, |k rec | = n[1 − f h(p bit )] and |k sec | = G, are of order O(n).If one applies a naive multiplication algorithm, the computational complexity (i.e., the processing time) increases as O(n 2 ) (i.e., O(n) per key), and thus becomes a severe bottle neck of the key distillation.This is in fact the most explicit impact of the finite size effect on practical QKD systems.
One way around this problem is to use an efficient multiplication algorithm for a Toeplitz matrix and a vector exploiting the fast Fourier transform (FFT) algorithm (see, e.g., [19]).The complexity of this efficient algorithm scales as O(n log n), or O(log n) per bit, which can be regarded as a constant in practice.An actual implementation shows that the throughput exceeds 1Mbps for |k rec | = 10 6 on software, as demonstrated, e.g., in Ref. [18].3. Security Criteria of the BB84 Protocol in the finite case

The security of QKD with universal composability
We employ the definition of the security of QKD with universal composability in the variable length case [20].In order to guarantee the security for our protocol, we need to evaluate the security criteria with universal composability after the privacy amplification [9].In this paper, we apply the above definition with the variable length case to the final state after the privacy amplification [21].
For this purpose, we describe all public information by x, including the choice of a hash function (which corresponds, e.g., to "f " of [9]), and the length of the final key (e.g., "m" of [20]).However, here we do not restrict ourselves with those two cases; it may contain other public information, e.g., the choice of a code for bit error correction.Hence the length m of the final key is of course a function of x.We denote the probabilistic distribution of x in the actual protocol by P pub (x).
Then we consider the Hilbert space H A ⊗ H E ⊗ H X , consisting of Alice's final key H A , Eve's system H E , and the public information H X .We define H A = (C 2 ) M with M sufficiently large; so that when m(x) < M, Alice uses the (preassigned) subspace of H A .Also, following [8], we define the composite system of E and X to be E ′ , i.e., H E ′ = H E ⊗ H X .We denote by ρ A,E|x the state of Alice and Eve after privacy amplification, conditioned on public information x.Hence, the state after privacy amplification takes the form ρ A,E ′ = x P pub (x)ρ A,E|x ⊗ |x x|.
In this notation, we consider conditional probabilities with respect to length m of the final key.The actual protocol generates the final key of m bits with probability P len (m) := x:m(x)=m P pub (x).The public information x obeys the conditional distribution P (x|m) := P pub (x) P len (m) ; hence the conditional actual state given m is a density matrix ρ A,E ′ |m := x:m(x)=m P pub (x|m)ρ A,E|x ⊗ |x x|.The corresponding ideal state given m is defined to be ρ Ideal |m := ρ mix A|m ⊗ρ E ′ |m , where ρ mix A|m is the completely mixed state in the m-qubit subsystem of H A , and ρ E ′ |m := Tr A ρ A,E ′ |m .Thus, under the condition that the final key length is m, the universal compsable security can be guaranteed by bounding the trace distance of these two states, i.e., ρ A,E ′ |m − ρ Ideal |m 1 [9].
Parameter m is a random variable in our protocol; hence following [20], we define the universally composable security by bounding the average trace distance m P len (m) ρ A,E ′ |m − ρ Ideal |m 1 .In this case, it is convenient to define ρ Ideal := m P len (m)ρ Ideal |m .Then the average trace distance can be rewritten as where ρ A|x := Tr E ρ A,E|x .Hence one may instead bound the sum of the second and the third lines.Here we used the fact that ρ A,E ′ = x P pub (x)ρ A,E|x ⊗ |x x| = m P len (m)ρ A,E ′ |m for the first equality; and ρ E ′ |m = x:m(x)=m P pub (x|m)ρ E|x ⊗ |x x| for the second equality.The quantity of (5) measures the non-uniformity of Alice's final key; i.e., it gives the averaged distance between Alice's partial state ρ A|x and the ideally mixed state ρ mix A|m(x) .Note that these two states equal when Alice and Bob choose a surjective hash function, because we assume that Alice's raw key obeys the uniform distribution.In particular, if Alice and Bob use a hash function family which consists only of surjective functions (such as the modified Toeplitz matirices [1,17] mentioned in the previous section), it suffices to bound (4) only.

Decoding error probability of the virtual phase error correction
We believe that the above definition of security based on the trace distance is the same as the one used by Renner and others [8,9].Throughout the paper we employ this definition of security.However, in the remaining part where we actually obtain upper bounds on the trace distance, we do not use Renner's approach based on the smooth minimum entropy [8].Instead, we bound the trace distance ρ A,E|x − ρ A|x ⊗ ρ E|x 1 appearing in (4) using the well-known argument by Shor and Preskill [10], as well as its modification by Hayashi [1].As we shall see shortly, in these formalisms, the trace distance is bounded from above by using the decoding error probability of the (virtual) phase error correction ¶, which can be identified with the privacy amplification in the actual protocol.The first step of the proof is to consider a virtual protocol where Alice and Bob correct bit errors as well as phase errors occurring in the quantum channel (under Eve's influence) by using the Calderbank-Shor-Steane (CSS) code.By correcting these two types of errors, Alice and Bob can guarantee that their virtual channel (obtained as a result of quantum error correction) is noiseless and decoupled from Eve; thus the key they exchange there is unconditionally secure.The second step of the proof is to note that, from Eve's view point, this virtual protocol is completely ¶ The probability that the (virtual) decoding algorithm fails to give a correct answer.
indistinguishable from the actual protocol.By using this indistinguishability, the security of the actual protocol follows automatically from that of the virtual protocol.
In these formalisms, phase error correction in the virtual protocol is transformed to a simple classical data processing in the actual protocol.That is, Alice and Bob do not need to perform phase error correction in the actual protocol; instead it suffices to perform a projection often called privacy amplification (PA).This is why we often identify PA with the virtual phase error correction in this paper + .(In Ref. [17], we have shown that the projection C 1 → C 1 /C 2 can be replaced by an ε-almost dual universal 2 hash function family.) The original argument of Shor and Preskill was later improved in Refs.[22,23], where it was shown that the virtual phase error correction and the bit error correction can be discussed separately.In fact the virtual phase error correction is essential for guaranteeing security, while the bit error correction is necessary only for equalizing Alice's and Bob's final keys.As a result of this observation, the trace distance ρ A,E|x − ρ A|x ⊗ ρ E|x 1 of (4) can be bounded as [1] where P ph|x denotes the conditional decoding error probability of the virtual phase error correction, given public information x.By taking the average of ( 6) with respect to x, and by noting that the function a → √ a is concave, we have where P ph denotes the decoding error probability of the virtual phase error correction.
As to the non-uniformity of the final key given in (5), recall that we assumed that Alice's random variable obeys the uniform distribution.Then the left over hash lemma [24,25] where α(x) is the number of sacrifice bits in the privacy amplification.
Hence by combining (3)∼( 5), (7), and ( 8) we obtain In other words, in order to guarantee the security with universal composability, it suffices to bound the quantity on the right hand side of (9).In particular, as we have noted below (5), the second term on the right hand side of ( 9) is exactly zero when all of the hash functions are surjective; in this case the above inequality is replaced by + However, the actual protocol does not necessarily have a counterpart for any operation in the virtual protocol.For example, the actual protocol has no operation corresponding to measurement of the syndrome in the virtual protocol.
Hence, in order to guarantee the universally composable security, it suffices to bound P ph .

Conditional decoding error probability given k
In this subsection we show that, in order to bound the decoding error probability P ph of the virtual phase error correction, it is sufficient to bound P ph|k for all k, where P ph|k denotes the corresponding conditional probability given k.We also show that a bound on P ph|k can be given in a concise form using the hypergeometric distribution P hg (c|k) and binary entropies.First note that, without loss of generality, Eve's eavesdropping strategy can be described by the probability distribution Q Eve (k) of k, which is the number of errors in the total bits n + l * .Then P ph can be rewritten as P ph = k Q Eve (k)P ph|k , where P ph|k denotes the conditional decoding error probability given k.
Next we consider the conditional probability P hg (c|k) of c given k; i.e., the probability that c bits of errors are found in sample bits when there are k errors in the total bits.Since sample bits are sampled without replacement, c obeys the hypergeometric distribution for a fixed value of k: with the average c and the deviation σ given by c(k In the following, σ n,l (k) 2 is simplified to σ(k) 2 .Hence values of k, c occurs with probability Q Eve (k)P hg (c|k).(Here sample bits are sampled without replacement simply because one cannot measure both the phase and the bit values of a qubit simultaneously, and thus Alice and Bob cannot reuse the sample bits as a sifted key.If one could somehow sample them with replacements, the hypergeometric distribution here would of course be replaced by the binomial distribution, which is much simpler.)Finally we consider the conditional decoding error probability P ph|k,c for fixed values of k and c.In this case, the number of phase error patterns of total bits is bounded from above by 2 nh((k−c)/n) (see, e.g., Lemma 4.2.2,Ref. [29]).Due to the construction of the procotocl, the number of the sacrificed bits α(c) is fixed.As we have shown in Ref. [17], if Alice and Bob use a linear universal 2 hash function family for PA in the actual protocol, it can be considered as the situation in the virtual protocol where they use a 2-almost universal 2 linear code family for phase error correction (i.e., a linear 2-almost universal 2 hash function family is used as the syndrome function for correcting phase errors).Then the decoding error probability P ph|k,c of the virtual phase error correction can be bounded as where [x] − := min(x, 0).It is easy to see that Inequality (13) holds when the completely random matrices (a type of universal 2 hash functions) are used for PA, as in Koashi's case [16].It is also shown to hold when the Toeplitz matrices (another universal 2 hash function family) are used for PA, by using the fact that dual matrices of the Toeplitz matrices generate universal 2 hash functions [1].More generally, in Ref. [17], we have further shown that Inequality ( 13) is valid when an arbitrary family of universal 2 functions is used for PA.Hence, to summarize, under Eve's strategy Q Eve (k), error numbers k, c are distributed by Q Eve (k)P hg (c|k).For fixed values of k, c, the virtual phase error correction fails with a probability less than S pa (k, c) given in (13).Combining these probabilities, we see that the decoding error probability P ph of the virtual phase correction can be bounded as where S av (k) is defined by As one can see from the definition of S pa (k, c) in ( 13), ( 14), a straightforward way of minimizing max k S av (k) is to define the function psft (c) so that it always gives a large value; this corresponds to the situation where, looking at c, Alice and Bob always give a pessimistic estimate psft (c) that is much larger than the actual value p sft (k, c).However, as one can see from the definition of α(c) in (1) and the final key length G given in the previous section, a large psft (c) results in a poor key generation rate.Rather, in order to achieve high key generation rates and the high-level security simultaneously, one needs to minimize max k S av (k) by considering the contributions of the two factors, P hg (k|c) and S pa (k, c).Hence we define psft (c) so that it becomes as close as possible (and larger) to the actual value p sft (k, c), in the regions of k, c where P hg (c|k) is not negligible.This is equivalent to the estimation problem of an upper bound of p sft (k, c): (i) For a given c, we give a suitable choice of the estimated value psft (c) for the phase error rate of a sifted key.Alice and Bob use this value to calculate the value of α(c) of ( 1), and obtain the final key length G.This will be done in Section 4.
(ii) With the suitable choice of psft (c), we obtain a universal upper bound on the RHS of ( 17) that is independent of k, and thus an upper bound of P ph ♯.This will be done in Section 5.
4. Upper confidence limit on the phase error rate p sft (k, c) Now let us turn to the definition of psft (c).As mentioned above, since length l of sample bits is finite in practical QKD systems, the phase error rate of a sifted key p sft (k, c) deviates from that of sample bits, p smp (c), due to statistical fluctuations.Hence, in order to guarantee the security by privacy amplification, instead of p smp (c), one needs to use the estimated upper bound psft (c) of p sft (k, c), defined with the statistical effect taken into account.
As long as p sft (k, c) is estimated larger than the actual value, i.e., psft (c) > p sft (k, c), there is no loss of security, because then, more information is erased by the privacy amplification than is actually leaked to Eve.On the other hand, however, one needs to avoid a situation where p sft (k, c) is estimated smaller as psft (c) ≤ p sft (k, c).In such a case, the privacy amplification of the previous section does not work since [g(k, c)] − = 0. Hence, at least as a necessary condition, the function psft needs to satisfy that where Pr k {c|Q} denotes the probability that c occurs satisfying a condition Q, under the hypergeometric distribution P hg (c|k).In order to maximize the key generation rate for fixed values of l, n, we wish to minimize psft (c) as small as possible.In statistics, this corresponds to an interval estimation problem.That is, finding psft (c) satisfying (18) is to obtain an upper confidence limit on p sft (k, c) from an observed value of c, with significance level ε (see, e.g., [27]).
In the following, we derive the minimum estimate psft,ε (c) = psft (c) satisfying the condition (18) under the normal approximation of P hg (c|k) by employing interval estimation of k.Although there is a standard procedure found in every textbook for this analysis (e.g., [27]), we reproduce it below for the sake of explanation.First we define the normal distribution function by and s(ε) as the deviation corresponding to ε, e.g., ♯ A similar analysis was given by Fung et al. [26].However, they seem to evaluate P hg (c|k)S pa (k, c) without the summation.This corresponds to the probability that a certain set of values k and c occur and then the virtual phase error correction by Alice and Bob fails.
such that ε = Φ(s(ε)).In what follows, we often abbreviate s(ε) to s.Then, by applying the normal approximation to P hg (c|k), we have the relation for any integer k; that is, c ≥ c(k) −s(ε)σ(k) holds at least with probability 1 −ε for any integer k.Note that this condition is equivalent to We rewrite this condition further as where p = k/(n + l), p smp (c) = c/l, and The condition ( 22) is equivalent to p ≤ pε (c), where pε (c) is a solution of (p smp − pε ) 2 = 4γ pε (1 − pε ) given by pε (c) := 1 1 + 4γ That is, k/(n + l) = p ≤ pε (c) holds at least with probability 1 − ε for any integer k.
In other words, the rate pε (c) gives the upper bound of one-sided interval estimation of p = k/(n + l).Using this estimate, we define another function psft,ε (c Then, again, the inequality psft,ε (c) ≥ p sft (k, c) = (k−c)/n holds at least with probability 1 − ε for any integer k.As a result, by choosing psft (c) as psft,ε (c), we can satisfy the condition (18).Throughout the paper, we will use these definitions of pε (c) and psft,ε (c) in calculating α(c).Now two remarks are in order.First, if there are sufficiently many samples (i.e., with l large and thus γ sufficiently small), the error number c has roughly the same distribution, irrespective of whether the samples are picked up with or without replacement.In such a case, as we mentioned under Eq.( 12), the hypergeometric distribution P hg (c|k) can be approximated by the binomial distribution.Indeed, to the first order of √ γ, the estimated value pε (c) of Eq. ( 24) can be approximated as where σ bin (c) := lp smp (c)(1 − p smp (c)) denotes the deviation of the binomial distribution with the error rate of the sample bits being p smp (c) = c/l.Furthermore, by using the inequality p smp (c) + s l n n+l−1 σ bin (c) ≤ p smp (c) + s l σ bin (c), and by noting that the larger pε (c) always gives better a security bound, we can instead use a simpler approximation given by pε The approximated upper bound of (26) can also be obtained by an argument similar to the above, with the hypergeometric distribution replaced by the binomial distribution.This means that, for l sufficiently large, one can conclude that the phase error rate p(k, c) of the total bits can be bounded from above by pε (c) of ( 26), which is simply the measured error rate p smp (c) of the samples, plus s times its standard deviation s l σ bin .The actual value deviates this bound only with a probability less than Φ(s); or in other words, this estimation fails only with a probability less than Φ(s).

Upper bounds on the decoding error probability P ph
Throughout the paper, we assume that Alice and Bob perform the protocol specified in Section 2, using the estimated upper bound psft,ε (c) of ( 24) and ( 25), obtained in the previous section.That is, we here substitute psft,ε (c) for psft (c) in (1), and as a result of that, Alice and Bob use sacrifice bits of α(c) = h (p sft,ε (max[c, c min ])) + D in the PA step.In this setting, we evaluate the decoding error probability evaluate P ph and obtain several upper bounds.

The Straightforward Upper Bounds
In Section 3.3, we showed that, in order to bound P ph , it suffices to bound S av (k) of ( 17) for all values of k.In this subsection, we first present a simple evaluation of P ph , where we divide the summation S av (k), given in (17), into two regions of c.This method is similar to those used in preceding literature [2,3], and we call it here the straightforward method.
For each value of k, we set the boundary value c bnd (k) := ⌊c(k) −sσ(k)⌋, and divide the summation of (17) as ≤ ≤ (In what follows, we often write c, σ, s instead of c(k), σ(k), s(ε).)Then, by using the properties of psft,ε (c) given in the preceding section, the two terms of (29) can be evaluated as follows: (i) The first summation of ( 29) is the probability Pr As we have shown in the preceding section, this term is less than ε (see ( 21)), if one applies the normal approximation to P hg (c|k).To put it more explicitly, apply the normal approximation of the form: with Then it follows that the first term of ( 29) is less than Φ(s(ε)) = ε, where Φ(s) is the normal distribution function given in (19).
(ii) In the second term of ( 29), the function holds by the definition of psft,ε (c), given in ( 24) and (25).† † Thus from ( 14), we have For the inequality of the second line, we used the fact that α . This means that the second summation of ( 29) can be bounded by 2 −D+1 .We remark that, unlike the first term of ( 29), this upper bound is valid without relying on the normal approximation.
Note here that the both bounds are valid for all values of k.Hence by combining these two upper bounds, we obtain the following proposition.
Proposition 1 For a given ε (and the corresponding s(ε) = Φ −1 (ε)), suppose that c min ≤ c max , and that Alice and Bob perform the QKD protocol specified in Section 2.
Then by applying the normal approximation to P hg (c|k), P ph can be bounded as If one wishes to bound P ph by a certain value, say P max , a convenient choice of parameters is ε = 2 −D+1 = 1 2 P max , or equivalently, D = 2 − log 2 P max and s = Φ −1 (ε) = Φ −1 1 2 P max .† Then Inequality (10) guarantees that the trace distance is bounded as ρ Alice and Bob use a universal 2 hash function family that consists of linear and surjective functions.
Further, if parameters l and n are sufficiently large, we can also obtain a tight bound on the first term of (29) without relying on the normal approximation of P hg (c|k).† † In fact, this is exactly the way we planned when we defined psft,ε (c): As mentioned in sentences below (46), the function pε (c) is defined so that the condition pε (c(k) − sσ(k)) = p(k) is satisfied for all k.This condition is equivalent to psft,ε (c(k) − sσ(k)) = p sft (k, c(k) − sσ(k)), due to definitions of psft,ε (c) and p sft (k, c) given in (25) and in Table 1.† Of course, the optimal choice is to let ε = aP max and 2 −D+1 = (1 − a)P max , and then find the optimal 0 < a < 1 that yields the largest key generation rate.However, we do not pursue this optimality in the rest of the paper, since varying a contributes very little to the key rate in typical situations.
Lemma 1 If 5  4 s(ε) 2 ≤ l ≤ n, 1 ≤ k, and c max ≤ 0.12l, we have min(⌊c−sσ⌋,cmax) where µ := 1/(6n) + 1/ (12).Note that this bound holds rigorously, without relying on the normal approximation of P hg (c|k).This lemma will be proved in Appendix B. 3. Now recall that the upper bound 2 −D+1 , obtained above for the second term of ( 29), does not rely on any approximation either.Hence, besides Proposition 1, we can obtain another bound on P ph that is similarly tight, and is valid rigorously without relying on any approximation: Proposition 2 Suppose that 5  4 s(ε) 2 ≤ l ≤ n, and c max ≤ 0.12l are satisfied for a given ε (i.e., with Φ(s) = ε).Also assume that Alice and Bob perform the QKD protocol specified in Section 2. Then without using the normal approximation of P hg (c|k), we have

The Upper Bounds by The Gaussian Integration
In the above analysis of the straightforward bounds, if one wishes to bound P ph by a certain value, say P max , it is necessary to let D ≥ 1 − log 2 P max .Hence, if one choose a very small P max in order to achieve a high level security, this D can decrease the final key length severely through the sacrificed bit length (1).
In this subsection, we derive improved bounds that holds with D = 1.We call them here the Gaussian bounds for the following reason.The first step of the analysis is similar to that of the previous section; i.e., we divide the summation of S av (k) as in (28) and obtain upper bounds for each term.For the first term of (28), we use the normal approximation (30) again and bound it by ε.However, for the second term of (28), we employ a quite different strategy: We approximate P hg (k|c) by using (30), and also upper bound S pa (k, c) by an exponential function of a simple linear function of c (specified below in (35)).By using this simple form, we evaluate the summation over c as a Gaussian integral.As a result of this integration, instead of 2 −D+1 appearing in the previous subsection, we obtain an upper bound δε on the second term, with δ being small for large l, n.
In order for this strategy using the Gaussian integration to work properly, parameter k must be confined to a specific region.Thus as a preparation, we consider the following three cases depending on the value of k: (i) If k is too small (i.e., 0 ≤ k ≤ nc min /l), it can be shown that S pa (k, c) is always bounded by ε, by using the properties of g(k, c).Thus S av (k) ≤ ε.
(ii) For the intermediate domain where nc min /l ≤ k ≤ (n + l)p sft,ε (c max ), the function g(k, c) (used for S pa (k, c) = 2 [g(k,c)] − +1 ) can be bounded from above by a simple function, i.e., a constant or a linear function of c.
(iii) If k is too large (i.e., (n + l)p sft,ε (c max ) ≤ k), we can also show that S av (k) is less than c−sσ c=0 P hg (c|k).The more precise argument will be given in Appendix C, and we have the following theorem.
Theorem 1 Let D = 1.If c min ≤ c max and 2 ≤ s(ε), then S av (k) is bounded from above as follows • (Case 2) If nc min /l < k ≤ (n + l)p sft,ε (c max ), for an arbitrary possible outcome c, we have where (For the proof of this theorem, see Appendix C.) We stress that the normal approximation to P hg (c|k) is not yet applied, and thus all inequalities are rigorous at this stage ‡ Then in the rest of this subsection, we will show that the right hand side of each inequality of Theorem 1 can be bounded from above by (1 + δ)ε, with δ being smaller than one for sufficiently large l, n.In other words, we obtain an upper bound on S av (k) that is valid for all k; and thus an upper bound on P ph (recall the argument of Section 3.3).can be bounded from above by (and thus P ph ) from above by ε.Let us first discuss ‡ It is true that we used the normal approximation in deriving psft,ε (c) in ( 25) and (24), and that psft,ε (c) is used in the statement of Theorem 1.However, in the proof of Theorem 1 we use no approximation; thus the theorem holds rigorously, without any approximation.
the easier cases, namely, Cases 1 and 3.As mentioned above, for these two cases S av (k) can be easily shown to be less than ε: For Case 1, it is already proved in Theorem 1.For Case 3, if one applies the normal approximation to P hg (c|k), S av (k) is bounded by ε, as can be seen by the same argument as in the previous section (see the paragraph of ( 30)).
Hence it remains to evaluate Case 2, where parameter k is restricted as nc min /l < k ≤ (n + l)p sft,ε (c max ).As mentioned above, we here show that S av (k) can be rewritten as the Gaussian integration in this case.In Inequality (37), the first term on the right hand side can be bounded by ε, with the approximation applied to P hg (c|k).For the second term, which is a summation over c, we replace P hg (c|k) with the the normal approximation.In addition to that, we replace S pa (k, c) appearing in the same summation by the right hand side of (35).Then the summation can be rewritten a Gaussian integral: Further, in order to bound I 2 (ξ ε (k)) using ε, we introduce the inequalities where Φ(x) is the normal distribution function given in (19).(Inequalities (42) will also be proved in Appendix C.) By using (42), the integral I 2 (ξ ε (k)) can be evaluated further as Note here that σ(k) is an increasing function of k, because ξ ε (k) is.Thus the final term of ( 43) is maximized at the lower boundary k = nc min /l, and we obtain finally with ξ min,ε := ξ ε (nc min /l).We now have the following theorem: Theorem 2 For a given ε, suppose that c min ≤ c max , 2 ≤ s(ε) and 1 < ξ min,ε with Here psft,ε (c) is defined in Eq. ( 25), σ in Eq. ( 12), and . Also assume that Alice and Bob perform the QKD protocol specified in Section 2. Then with the normal approximation applied to P hg (c|k), P ph can be bounded as where Note here that none of c min , psft,ε (c max ) or γ depends on k or c, which can vary for each run of the protocol; thus ξ min,ε can be calculated as a fixed value specified by the protocol.(In other words, ξ min,ε is the constant and thus calculated at the preparation stage prior to the protocol.) Further, as we have done in the previous subsection, if parameters l and n are sufficiently large, we can also obtain a similarly good bound without relying on the normal approximation of P hg (c|k) (in Eq. ( 30)).By using exact upper bounds on P hg (c|k) including Lemma 1, we obtain the following theorem: Theorem 3 Suppose that 1 ≤ l ≤ n, s 2 ≤ c min ≤ c max ≤ 0.12l, and 1 < ξ min are satisfied for a given ε.Also assume that Alice and Bob perform the QKD protocol specified in Section 2. Then without using the normal approximation of P hg (c|k), we have where The proof of this theorem is given in Appendix D.

Second Order Asymptotics
Now, we roughly estimate the relation between the sacrifice bit length and the upper bound max k S av (k) of the phase error.For this purpose, we focus on the asymptotic expansion for the sacrifice bit.In the protocol discussed in the above, the sacrifice bit length α(c) is ⌈nh (p sft,ε (c + 1))⌉ + 2 with psft,ε (c) = (n+l)pε(c)−lpsmp(c) n and pε (c) := 1 1+4γ p smp + 2γ + 2 γ {p smp (1 − p smp ) + γ} .When the ratio l/n is t, we obtain the asymptotic expansion: 4t s(ε).When we use only the first term in the above expansion, the upper bound max k S av (k) for the phase error converges to zero or one.The limit value zero or one cannot be used for the approximation for the upper bound max k S av (k) because the real value of the upper bound max k S av (k) takes a value between zero and one, which is different from zero or one.
However, when we use up to the second order √ n in the asymptotic expansion of α(c), the upper bound max k S av (k) converges to a value between zero and one.In this case, we can use the limit for the approximation for the upper bound max k S av (k).That is, by using the above asymptotic expansion, the virtual phase error can be abounded as the following way.
Theorem 4 For a given ε, p min , and p max , we choose c min and c max as p min l and p max l, and assume that l/n = t.Also suppose that Alice and Bob perform the QKD protocol specified in Section 2, except that the sacrifice bit length α(c) is less than Then, the maximum P ph,n,l of S av (k) with given n and t can be asymptotically characterized as lim n→∞ max l:l≥tn P ph,n,l ≤ ε. (51) The proof will be given in Appendix E.

How to use the above formulas to evaluate the security of one's QKD system
In this section we summarize what we have proved so far, and then explain how one can use Proposition 1 or 2, or Theorem 2 or 3 to evaluate the security of one's QKD system.

Summary of Our Results
As discussed in Section 3, the standard quantitative measure of the security of QKD is the trace distance ρ A,E ′ − ρ Ideal 1 between the actual state ρ A,E ′ and the ideal state ρ Ideal , given in (3).Inequalities ( 9) and (10) claim that this trace distance can be bounded from above by the averaged decoding error probability P ph of the virtual phase error correction.Throughout the paper, we are interested in bounding P ph by using the Shor-Preskill's formalism.Also in Section 3, we have shown that in order to bound P ph under an arbitrary attack by Eve, it suffices to bound the probability max k S av (k), with S av (k) defined in (17) (or equivalently, for all k, one needs to bound S av (k) by a certain value).Here the function S av (k) gives an upper bound on the failure probability S pa (k, c) of the virtual phase error correction, averaged with respect to the hypergeometric distribution P hg (c|k).Our analyses of Sections 4 and 5 are devoted for obtaining an upper bound on max k S av (k).
In Section 4, we determined the suitable functional form of the upper bound psft (c) on the phase error rate psft (k, c) of the sifted key, such that we can achieve high key generation rates and the high-level security simultaneously.The function psft (c) is used for calculating the sacrifice bit length α(c) of Eq. ( 1), i.e., the number of bits that needs to be erased in privacy amplification (PA).This problem can be reduced to determining an upper bound on parameter k, or equivalently, that on the phase error rate p sft (k, c) of a sifted key.For this purpose, we derived an upper bound psft,ε (c) of Eqs. ( 24) and ( 25) on p sft (k, c), as a function of the measured error rate p smp (c) = c/l of sample bits.We here used the standard method of interval estimation, and the upper bound psft,ε (c) is defined so that, for any value of k, the undesired case p sft (k, c) > psft,ε (c) occurs with a probability ≤ ε (see Eqs. ( 18) and ( 21)).
Then in Section 5, by using this psft,ε (c) and the corresponding sacrificed bit length α(c) given in (1), we obtained the upper bounds on S av (k) that holds for all k.By the argument of the paragraph of (17), this means that we have given upper bounds on P ph .For the sake of simplicity, we first gave straightforward bounds in Proposition 1 (with the approximated values of the hypergeometric distribution P hg (c|k)) and Proposition 2 (without any approximation).Next we gave the other bounds exploiting the properties of the Gaussian integration, which yield larger final key length G for sufficiently large l, n; namely, Theorem 2 (with the approximated P hg (c|k)) and Theorem 3 (without any approximation).

The Straightforward Upper Bound With The Normal Approximation (How to Use
Proposition 1) Here we present how to calculate the secret key length of one's QKD system using the straightforward upper bound on P ph obtained in Propositions 1.

• Preparation steps:
(i) Determine one's desired upper bound T max on trace distance.(ii) Calculate the corresponding upper bound on the phase error rate by P max = 1 8 (T max ) 2 .(iii) Let the confidence limit be ε = 1 2 P max .Then calculate parameter s = Φ −1 (ε), as the inverse value of the normal distribution function Φ(x) (see the definitions of Φ(x) and s(ε) given in (19), (20)).Under this setting of parameters, one can guarantee that P ph ≤ ε + 2 −D+1 ≤ P max , by applying the normal approximation to P hg (c|k) and by using Proposition 1. Then Inequality (10) guarantees that the trace distance is bounded as ρ (As specified below, we here assume that Alice and Bob use a universal 2 hash function family that consists of linear and surjective functions.) • For each run of the protocol: (vii) Perform the protocol as specified in Section 2. In particular in the PA step, for the calculation of the length α(c) of (1), use psft,ε (c) defined in Eqs. ( 24) and ( 25), as well as parameters s and D obtained in the preparation steps above.† Then use a universal 2 hash function family that consists of linear and surjective functions, to convert the reconciled key to the secret key.
As noted in Section 2, as a result of this protocol, Alice and Bob obtain the final key of length G = n rec − α(c) with α(c) given in (1), and n rec being the reconciled key length.
If an error correcting code with efficiency f is used, we have n rec = n(1 − f h(p bit )), with p bit being the bit error rate of the sifted key.Thus Alice and Bob obtain the final key of length G, given in (2).

The Straightforward Upper Bound Without Any Approximation (How to Use
Proposition 2) By using Proposition 2, an exact upper bound on P ph can be obtained, without relying on the normal approximation of P hg (c|k).In this case all the steps are the same as those given in Section 6.2.1, except for Steps (iii) and (vi): (iii') Choose parameter s such that n + l n is satified, where µ = 1/(6n) + 1/12.
(vi') (Parameter check:) Check that 5 4 s 2 ≤ l ≤ n and c max ≤ 0.12l are satisfied.If not, set T max smaller and restart from Step (i).
As a result of Step (iii'), we have ε = Φ(s(ε)) ≤ s −1 × 1 2 P max .This means that, for a fixed value of P max , one needs to choose ε = Φ(s(ε)) to be smaller than that obtained in Section 6.2.1, by a factor of s −1 .As a result, s also turns out to be larger, one ends up with a smaller final key length.Note, however, that such increment of s is negligible for sufficiently large s (e.g., for s ≥ 10), because Φ(s) scales as e − 1 2 s 2 and thus a very small increment of s compensates the factor of s −1 in front of 1  2 P max .Hence the decrement in the final key length is very small.We will demonstrate this fact in the next section by a numerical calculation in Section 7.3.

How to Use The Upper Bounds by The Gaussian Integration (How to Use Theorems 2 and 3)
As mentioned in Section 5.2, if parameters l and n are sufficiently large, we can set D = 1 and still obtain similarly tight bounds on P ph as given in Theorems 2 and 3; thereby we can improve the final key length G.For these cases too, we summarize how to calculate the secret key length of one's QKD system.

The Gaussian Bound With
The Normal Approximation (How to Use The Bound of Theorem 2) For Theorem 2, the preparation steps are modified as follows: • Preparation steps: (i) Determine one's desired upper bound on trace distance T max .(ii) Calculate the corresponding upper bound on the phase error rate by (iii) Set the confidence limit ε to be slightly smaller than P max .(For example, if l, n are sufficiently large, ε = 0.9P ph is usually sufficient.)Then calculate parameter s = Φ −1 (ε), as the inverse value of the normal distribution function Φ(x) given in ( 19).(iv) Let D = 1.(v) Determine c min and c max , such that the conditions in the first sentence of Theorem 2 are all satisfied.(vi) (Parameter Check:) Check if δ is small enough so that Inequality (46) is satisfied.If not, go back to Step (iii) and set ε smaller.
After these preparation steps, Alice and Bob run the protocol as in previous sections.That is, they run the protocol as specified in Step (vii) of Section 6.2.1.

The Gaussian Bound Without The Normal Approximation (How to Use
The Bound of Theorem 3) As we have done for the case of the straightforward bounds, we also obtained in Theorem 3 the exact version of the Gaussian bound that does not rely on the normal approximation of P hg (c|k).This theorem was derived using essentially the same idea as Theorem 2 and achieves a similarly tight bound, but it does not rely on any approximation.
For Theorem 3, the preparation steps are the same as Theorem 2 (i.e., the same as in Section 6.3.1),except for Steps (v) and (vi): (v") Determine c min and c max , such that the conditions in the first sentence of Theorem 3 are all satisfied.
If not, go back to Step (iii) and set ε smaller.
After these preparation steps, Alice and Bob run the protocol as in previous sections.That is, they run the protocol as specified in Step (vii) of Section 6.2.1.

Rough Estimate of The Key Rate and The Security Parameter
We note here that if l, n are sufficiently large, parameters γ and δ becomes sufficiently small, and the approximate evaluation of the key length G of (2) can be greatly simplified.
As one can see from Steps (i) and (ii) of Section 6.3, bounding P ph is enough for the security.If δ is sufficiently small, then according to Theorem 2 (or or Step (iii) of Section 6.3), P ph can be bounded approximately by ε, which determines the value of psft,ε (c) via Eqs.(24) and (25).Then as we discussed in the paragraph of Eq. ( 26), if γ is sufficiently small, psft,ε (c) = n+l n pε (c) − l n p smp (c) can be approximated by using pε (c) ≃ p smp (c) + s l σ bin (c).As a result, if the conditions of the first sentence of Theorem 2 are satisfied for a given set of experimental parameters, and if γ and δ are sufficiently small, one has the following rough estimates.The trace distance is approximately bounded by the square root of ε as Parameter s is chosen to be the deviation of the standard deviation, i.e., s = Φ −1 (ε).Then this s determines the final key length G as We expect that these relation will be useful for experimentalists and theorists who wish to obtain a rough estimate of the key length with the finite size effect taken into account.

Numerical results.
We demonstrate the tightness of our bound with numerical results.We consider a quantum channel in the absence of eavesdropper, and assume that it can be described as a binary symmetric channel with quantum bit error rate (QBER).

Case 1: Basis Choice with Probability
First, as a comparison to preceding literature [2,4], we plot key rates for the case where Alice and Bob choose the x and the z bases with the equal probability.We present two types of evaluations given in Section 6; one is the analysis of Section 6.2.2 using the straightforward bound of Proposition 2, the other is that of Section 6.3.2 using the Gaussian bound of Theorem 3. Note that both these bounds are derived without using the normal approximation; thus the all key generation rates obtained in this subsection are rigorous.We assume that Alice and Bob choose both the phase basis and the bit basis with probability q = 1/2, and thus n = l = N/4.We also assume that Alice and Bob consume r = 40 bits of a previously shared secret key for exchanging the hash value, in order to guarantee that ǫ cor ≤ 10 −12 (in the following, these r = 40 bits will be subtracted from the final key length G).Then we choose P max to be P max = 0.98 × 1 8 × 10 −20 , so that the trace distance ρ A,E ′ − ρ Ideal 1 is guaranteed to be less than T max = 2 √ 2P max = 0.99 × 10 −10 .By these choices of parameters, we can guarantee T max + ǫ cor ≤ 10 −10 , which is the same condition as used in Ref. [4].
Because r = 40 bits are consumed for guaranteeing that Alice's and Bob's final keys are equal, the effective final key length is G(c) − r, with G(c) defined in (2).Hence in this section, we define the final key rate to be The efficiency of bit error correction is chosen to be f = 1.1.

The Straightforward Bound
With the above choices of parameters, we perform the analysis of Section 6.2.2, and obtain the corresponding final key rate R.Here we restrict ourselves to the case where parameters l, n satisfy 125 ≤ l = n.Parameters P max and T max are already specified above.As to parameter s, we follow Step (iii') and let s = 9.9, so that n + l n According to Step (iv), we choose D = ⌈2 − log 2 P max ⌉ = 79; next according to Step (v), c min = 0.01l and c max = 0.12l.It is easy to verify that all these parameters are compatible with the parameter checks of Step (vi').
Then we assume that Alice and Bob perform the BB84 protocol (i.e., Step (vii)), in the quantum channels with QBER = 1%, 2.5%, and 5%.The corresponding key rates R(c) (with c = l × QBER) are shown in bold curves in Fig. 1, versus n + l.

The Gaussian Bound
For the same choice of parameters q, r, P max , D, and for the same ratio of c max = 0.12l with respect to l, we perform the analysis of Section 6.3.2.The remaining parameters to be fixed are s and c min ; hence we here numerically calculate the pairs of s and c min that gives the best key rate R(c).That is, we first fix l and n, and then search for the pair of s and c min that is compatible with the parameter check and gives the largest R(c).(This corresponds to repeating Steps (iii) through (vi') of Section 6.3.2, by letting ε smaller each time, until the largest key length G(c) is obtained.)The results are shown in thin curves in Fig. 1.
As one can see from Fig. 1, if QBER=5%, the Gaussian bound gives better key rate than the straightforward bound for all l, n.On the contrary, for smaller QBER (1% and 2.5%), the straightforward bound becomes better for l, n ≃ 5000.
The dots in Fig. 1 represents the key rates obtained by Tomamichel et al. [4] under the same condition.It can be clearly seen that our key rates R are better in all parameter regions.For example, Fig. 1 gives R = 0.19 for QBER = 5% and n + l = 10 4 , while Tomamichel et al. gave R = 0 in this region [4].As n+l becomes larger, R converge very fast to the asymptotic values; all three curves reach more than 80% of the asymptotic values at n + l = 2 × 10 5 .
In particular, as the key size becomes larger, R converge very fast to the asymptotic values, more than 80% of the asymptotic values at n + l = 2 × 10 5 .As we have noted in Section 2, key distillation is quite practical even in this region.That is, the sizes of bit error correcting codes are independent of security, and thus Alice and Bob may perform bit error correction by dividing a sifted key of n bits to arbitrarily smaller blocks.As to privacy amplification, one can use the efficient algorithm for the multiplication of the (modified) Toeplitz matrix and a vector.Here we assume that x and the z bases are chosen with the equal probability, i.e., q = 1 2 .The typical QBER are chosen to be 1% (red), 2.5% (blue), and 5% (black).As to the security, we set r = 40 and P max < 0.98 × 1 8 × 10 −20 , so that T max + ǫ corr ≤ 10 −10 .That is, the sum of the trace distance and ǫ cor is less than 10 −10 .We have used two types of analysis to achieve this value of P max : The bold curves represent the key rates based on the straightforward bound given in Proposition 2 and in Section 6.2.2.The thin curves are based on the Gaussian bound given in Theorem 3 and in Section 6.3.2.We stress that these curves are obtained without using the normal approximation.Dots of the same color are the rates obtained in Figure 2 of Ref. [4].

Case 2: Optimized Basis Choice with Variable Probability q
Next, as a more practical setting, we consider the case where Alice and Bob choose the x and the z bases with varying probabilities q, 1 − q (thus, l = q 2 N, n = (1 − q) 2 N).Then we maximize the secret fraction F , defined by with respect a fixed raw key length N, where G denotes the final key length.We use the analysis of Section 6.3.2 based on the Gaussian bound of Theorem 3 (without any approximation); hence again, all the final key rates obtained in this subsection are rigorous.We choose parameters P max , ǫ cor are chosen to be the same as in the previous subsection.According to Step (iii), we let s(ε) = 10.5 so that ε = 4.32 × 10 −26 < < P max .The channel error rates are chosen to be QBER = 1%, 2.5%, and 5%, respectively.Under these settings, for each fixed value of N, we performed numerical simulations to select the optimal values of q and c min that give the maximum value of F (c).That is, we first fix N, and then search for the pair of q and c min that is compatible with the parameter check of Step (vi") and gives the largest F (c).The results are shown in Figure 2.

Exact Bounds Verses Approximate Bounds
All the key rates of the previous two subsections are rigorous, in the sense that they are obtained without using any approximation.In this final subsection, we demonstrate that, for practical parameter regions, the key rates are almost the same, whether one uses the analysis based on the normal approximation (i.e., Proposition 1 and Theorem 2), or those without any approximation (i.e., Proposition 2 and Theorem 3).
In Fig. 3, the solid curve shows R(c) obtained in Section 7.1.1 with QBER=1%.On the other hand, the dashed curve in the same figure is the key rate R(c) obtained for the same values of QBER and P max , r, l, n by the procedure of Section 6.2.1; hence this curve is obtained by using Proposition 1, and thus relies on the normal approximation of P hg .Similarly in Fig. 4, the solid curve shows F (c) obtained in Section 7.1.2with QBER=5%, whereas the dashed curve is obtained by using Theorem 2, which relies on the normal approximation (Here we performed the optimization of s and c min ).
Note that for both of these cases, the exact key rate and approximate key rate are almost identical.These results suggest that the simple analysis using the normal approximation (i.e., Proposition 1 or Theorem 2) can be justified for the security evaluations of practical QKD systems.Here we assume that Alice and Bob choose the x and the z bases with varying probabilities q, 1−q.The probability q and the minimum errors c min are also optimized to give maximum F .The typical QBER are chosen to be 1% (red), 2.5% (blue), and 5% (black).Parameters P ph , ǫ cor are chosen to be the same as in Figure 1, so that T max + ǫ corr ≤ 10 −10 is satisfied.

Summary
In this paper, we presented a concise analysis for the BB84 protocol that takes the finite key effect into account and yields better key generation rates, with and without relying on the normal approximation.Our results are indeed an improvement of preceding literature; as we have shown in Figure 1, our analysis give better key generation rates R in practical settings than in Refs.[2,4].
In order to serve the convenience of experimentalists who wish to evaluate the security of their QKD systems, we included explicit procedures of security evaluation in Sections 3 and 6.In particular, in addition to presenting the exact values of key rates and security parameters, we also presented how to obtain their rough estimates using the normal approximation.
For the sake of simplicity, we restricted ourselves to the simple case where Alice has a perfect single photon source.On the other hand, in order to achieve a long communication distance by a practical QKD system using a weak coherent light source, decoy pulses are necessary [28].This situation was analyzed by one of the authors [1], This curve is obtained by using Proposition 2, without using any approximation.Dashed Curve: The final key rate R(c) obtained for the same values of QBER, P max , r, l, n, using the straightforward bounds of Proposition 1; hence this curve is obtained using the normal approximation.Note that the two curves are almost identical.
relying on the normal approximation.A thorough and exact analysis in this direction without any approximation remains as future work.This curve is obtained by using Theorem 3 without using any approximation.Dashed Curve: The final key rate R(c) obtained for the same values of QBER, P max , r, l, n, using the straightforward bounds of Theorem 2; hence this curve is obtained using the normal approximation.Note again that the two curves are almost identical. where for Proof: Since h ′′′ (x) decreases monotonically, we have    ).Here note that the constants C 1 and C 2 are different from those defined in Theorem 1 of [30].
(iv) Let D = ⌈2 − log 2 P max ⌉. (v) Determine c min and c max .(vi) (Parameter check:) No parameter check is necessary for Proposition 1.

RFigure 1 .
Figure1.(Color online) Key generation rate R = (G − r)/n versus n + l, which is the sum of lengths of a sifted key and sample bits.Here we assume that x and the z bases are chosen with the equal probability, i.e., q = 1 2 .The typical QBER are chosen to be 1% (red), 2.5% (blue), and 5% (black).As to the security, we set r = 40 and P max < 0.98 × 1 8 × 10 −20 , so that T max + ǫ corr ≤ 10 −10 .That is, the sum of the trace distance and ǫ cor is less than 10 −10 .We have used two types of analysis to achieve this value of P max : The bold curves represent the key rates based on the straightforward bound given in Proposition 2 and in Section 6.2.2.The thin curves are based on the Gaussian bound given in Theorem 3 and in Section 6.3.2.We stress that these curves are obtained without using the normal approximation.Dots of the same color are the rates obtained in Figure2of Ref.[4].

FFigure 2 .
Figure 2. (Color online) Secret fraction F = (G − r)/N versus raw key length N .Here we assume that Alice and Bob choose the x and the z bases with varying probabilities q, 1−q.The probability q and the minimum errors c min are also optimized to give maximum F .The typical QBER are chosen to be 1% (red), 2.5% (blue), and 5% (black).Parameters P ph , ǫ cor are chosen to be the same as in Figure1, so that T max + ǫ corr ≤ 10 −10 is satisfied.

RFigure 3 .
Figure 3. Solid Curve: the same curve as the solid curve in Figure1with QBER=1%.This curve is obtained by using Proposition 2, without using any approximation.Dashed Curve: The final key rate R(c) obtained for the same values of QBER, P max , r, l, n, using the straightforward bounds of Proposition 1; hence this curve is obtained using the normal approximation.Note that the two curves are almost identical.

RFigure 4 .
Figure 4. Solid Curve: the same curve as the thin curve in Figure1with QBER=5%.This curve is obtained by using Theorem 3 without using any approximation.Dashed Curve: The final key rate R(c) obtained for the same values of QBER, P max , r, l, n, using the straightforward bounds of Theorem 2; hence this curve is obtained using the normal approximation.Note again that the two curves are almost identical.

Table 1 .
Notations of the key lengths, total bits, and sample bits.Functions pε (c) and psft,ε (c) denote the estimated upper bounds of p(k) and p sft (k, c), under the condition that there are c errors in sample bits.Parameter ε denotes the probability that the estimation fails.See Section 4 for details.