Statistical analysis of testing of an entangled state based on the Poisson distribution framework

A hypothesis testing scheme for entanglement has been formulated based on the Poisson distribution framework instead of the positive operator valued measure (POVM) framework. Three designs were proposed to test the entangled states in this framework. The designs were evaluated in terms of the asymptotic variance. It has been shown that the optimal time allocation between the coincidence and anti-coincidence measurement bases improves the conventional testing method. The test can be further improved by optimizing the time allocation between the anti-coincidence bases.

3 it is very important that a method for statistical testing of maximally entangled states be established.
Quantum state estimation and quantum state tomography are known methods used for identifying unknown states [3]- [5]. Quantum state tomography [6] has been recently applied to obtain full information on the 4 × 4 density matrix. However, if the purpose is testing of entanglement, it is more economical to concentrate on checking the degree of entanglement. Such a study has been done by Hayashi et al [7] in the form of optimization problems of positive operator valued measure (POVM). However, an implemented quantum measurement cannot be regarded as an application of a POVM to a single particle system or multiple applications of a POVM to single particle systems. In particular, in quantum optics, the following measurement is often realized, which is not described by a POVM on a single particle system. The number of generated particles is probabilistic. We prepare a filter corresponding to a projection P, and detect the number of particles passing through the filter. If the number of generated particles obeys a Poisson distribution, as is mentioned in section 2, the number of detected particles obeys another Poisson distribution whose average is given by the density and the projection P.
In this kind of measurement, if no particle is detected, we cannot decide whether particles are not generated or particles are generated but do not pass through the filter. If we can detect the number of generated particles as well as the number of passing particles, the measurement can be regarded as multiple applications of the POVM {P, I − P}. In this case, the number of applications of the POVM is the variable corresponding to the number of generated particles. Also, we can only detect the empirical distribution. Hence, our obtained information is almost completely explained by the use of the POVM {P, I − P}.
However, if it is impossible to distinguish the two events by some imperfections, it is impossible to reduce the analysis of our obtained information to the analysis of POVMs. Hence, it is necessary that the performance of the estimation and/or the hypothesis testing based on the Poisson distribution describing the number of detected particles be analysed. If we discuss the ultimate bound of the accuracy of the estimation and/or the hypothesis testing, we do not have to treat such imperfect measurements. Since several realistic measurements have such imperfections, it is very important to optimize our measurement among such a class of imperfect measurements.
In this paper, our measurement is restricted to the detection of the number of particles passing through the filter corresponding to a projection P. We apply this formulation to testing of maximally entangled states on two-qubit systems (two-level systems), each of which is spanned by two vectors, |H and |V . Since the target system is a bipartite system, it is natural to restrict our measurement to local operations and classical communications (LOCC). In this paper, for a simple realization, we restrict our measurements to the number of simultaneous detections at both the parties of the particles passing through the respective filters. We also restrict the total measurement time t, and optimize the allocation of time for each filter at both the parties. In this discussion, in order to remove the bias concerning the direction of difference, we assume equal time allocation among the vectors {|H V , |V H , |D X , |X D , |R R , |L L }, which corresponds to the anti-coincidence events, and among the vectors {|H H , |V V , |D D , |X X , |R L , |L R }, which corresponds to the coincidence events, where |D := 1 2 (|H + |V ), |X := 1 2 (|H − |V ), |R := 1 2 (|H + i|V ), |L := 1 2 (|H − i|V ). That is, we are interested not in the direction of error of the generated state, but in the fidelity between the generated state and the target state | (+) . Hence, the only useful information is the total count of coincidence events and of anti-coincidence events. 4 We obtain the following characterizations as our results. If the average number of generated particles is known, our choice is counting the coincidence events or the anti-coincidence events. When the true state is close to the target maximally entangled state | (+) := 1 √ 2 (|H H + |V V ) (i.e. the fidelity between these is greater than 1/4), the detection of anti-coincidence events is better than that of coincidence events. This result implies that the indistinguishability between the coincidence events and the non-generation event loses less information than that between the anti-coincidence events and the non-generation event. This fact also holds even if we treat this problem taking into account the effect of dark counts. Indeed, Barbieri et al [8] proposed to detect the anti-coincidence events for measuring an entanglement witness; they did not prove the superiority of detecting the anti-coincidence events in the framework of mathematical statistics.
However, the average number of generated particles is usually unknown. In this case, we cannot estimate how close the true state is to the target maximally entangled state from the detection of anti-coincidence events. Hence, we need to count the coincidence events as additional information. In order to resolve this problem, we usually use the equal allocation between anti-coincidence events and coincidence events in the visibility method, which is a conventional method for checking the entanglement. However, since we measure the coincidence events and the anti-coincidence events based on one or two bases in this method, there is a bias concerning the direction of difference. In order to remove this bias, we consider the detecting method with equal time allocation among all the vectors {|H V , |V H , |D X , |X D , |R R , |L L } and {|H H , |V V , |D D , |X X , |R L , |L R }, and call it the modified visibility method.
In this paper, we also examine the detection of the total flux, which can be realized by detecting the particle without the filter. We optimize the time allocation among these three detections. We found that the optimal time allocation depends on the fidelity between the true state and the target maximally entangled state. If our purpose is estimating the fidelity F, we cannot directly apply the optimal time allocation. However, the purpose is testing whether the fidelity F is greater than the given threshold F 0 ; the optimal allocation at F 0 gives the optimal testing method.
If the fidelity F is less than a critical value, the optimal allocation is given by the allocation between the anti-coincidence vectors and the coincidence vectors (the ratio depends on F). Otherwise, it is given by the allocation only between the anti-coincidence vectors and the total flux. This fact is valid even if dark counts exist. If the dark count is greater than a certain value, the optimal time allocation is always given by the allocation between the anti-coincidence vectors and the coincidence vectors.
Further, we consider the optimal allocation among anti-coincidence vectors when the average number of generated particles is known. The optimal allocation depends on the direction of difference between the true state and the target state. Since the direction is usually unknown, this optimal allocation does not seem useful. However, by adaptively deciding the optimal time allocation, we can apply the optimal time allocation. We propose to apply this optimal allocation by use of the two-stage method. Further, taking into account the complexity of testing methods and the dark counts, we give a testing procedure of entanglement based on the two-stage method. In addition, proposed designs of experiments were demonstrated by Hayashi et al [9] in twophoton pairs generated by spontaneous parametric down conversion (SPDC).
In this paper, we reformulate the hypothesis testing to be applicable to the Poisson distribution framework, and demonstrate the effectiveness of the optimized time allocation in the entanglement test. This paper is organized as follows. Section 2 defines the Poisson distribution framework and gives the hypothesis scheme for the entanglement. Section 3 gives the mathematical formulation concerning statistical hypothesis testing. Sections 4 and 5 give the fundamental properties of the hypothesis testing: section 4 introduces the likelihood ratio test and its modification and section 5 gives the asymptotic theory of the hypothesis testing. Sections 3-5, except for the last part of subsection 4.4 and subsection 5.2, give known facts in mathematical statistics [10]. Since it is not easy to study the minimum knowledge of mathematical statistics for this paper, we review these facts in this paper. This preparation part contains rather complicated subsections, i.e. subsections 4.4 and 5.2. These subsections are not applied except for section 11.
Sections 7-10 are devoted to the designs of the time allocation between the coincidence and anti-coincidence bases: section 7 treats the modified visibility method, and section 8 optimizes the time allocation, when the total photon flux λ is unknown. Section 9 gives the results with known λ and section 10 compares the designs in terms of the asymptotic variance. Section 11 gives a further improvement by optimizing the time allocation between the anti-coincidence bases. The appendices give details of the proofs used in the optimization.

Hypothesis testing scheme for entanglement in the Poisson distribution framework
Let H be the Hilbert space of our interest, and P be the projection corresponding to our filter. If we assume the generation process for each time to be identical and independent, the total number n of generated photon pairs during the time t obeys the Poisson distribution Poi(λt)(n) := e −λt (λt) n n! [11]. Hence, when the density of the single-pair state is σ (figure 1), the probability of the number k of counts is given as One may consider that the number of detected pairs should be described by another distribution, e.g. a thermal distribution. In fact, in a usual SPDC experiment, each pulse has the form ∞ n=0 N n (1+N ) n+1 e inφ |n ⊗ |n [12], where φ is the phase factor. However, in a usual SPDC experiment, too many pulses, e.g. 8 × 10 7 pulses, are generated in 1 s. That is, for each ∼ = 1.25 × 10 −8 s, a single-photon pair is generated with the probability N (1+N ) 2 . Note that since N is the average of photon pairs of one pulse, it is quite a small number. Now, by letting the probability N (1+N ) 2 be λ , the law of small numbers yields that Then, the probability of the number of detected single-photon pairs in t seconds can be approximated by the Poisson distribution Poi(λt). Here, one might be afraid that multi-photon pairs will be generated with a non-negligible probability. For each second, a multi-photon pair is generated with the probability N 2 (1+N ) 2 . Then, in t seconds, the probability of generating a multi-photon pair is almost equal to which is a negligible probability. Therefore, in an SPDC experiment, it is suitable to adopt our Poisson distribution framework. Our Poisson distribution framework is different from the usual framework in quantum hypothesis testing in the following points. Hayashi et al [7] treat testing of maximally entangled state by optimizing the two-valued POVM {P, I − P} for a single-particle system. In their framework, we obtain the two probabilities Trσ P, Trσ (I − P) for the true state σ with the assumption that the state is generated. We call this framework the POVM framework. In the POVM framework, if we replace P by I − P, we obtain the same amount of information.
In our Poisson distribution framework, when we detect no particles by the filter P, we cannot discriminate between 'the particle is not generated' and 'the particle is generated, and the event corresponds to I − P'. That is, the information loss between the above two events is crucial for our framework. Hence, if we replace P by I − P, we obtain different amounts of information. Therefore, we need a different mathematical treatment for testing a maximally entangled state with our Poisson distribution framework.
Further, if we erroroneously detect k particles with the probability Poi(δt)(k ), the probability of the number k of all detected particles is equal to This kind of incorrect detection is called the dark count. Further, since we consider the bipartite case, i.e. the case where H = C 2 ⊗ C 2 , we assume that our projection P has the separable form P 1 ⊗ P 2 .
In this paper, under the above assumption, we discuss the hypothesis testing when the target state is the maximally entangled | (+) state, while Usami et al [11] discussed the state estimation under this assumption. Here, we measure the degree of entanglement by the fidelity between the generated state and the target state: The purpose of the test is to guarantee that the state is sufficiently close to the maximally entangled state with a certain significance. That is, we are required to disprove that the fidelity F is less than a threshold F 0 with a small error probability. In mathematical statistics, this situation is formulated as hypothesis testing; we introduce the null hypothesis H 0 that entanglement is not enough and the alternative H 1 that the entanglement is enough: with a threshold F 0 . Visibility is an indicator of entanglement commonly used in the experiments, and is calculated as follows: firstly, A's measurement vector |x A is fixed, then the measurement |x A , y B is performed by rotating B's measurement vector |x B to obtain the maximum and 7 minimum number of the counts, n max and n min . We need to make the measurement with at least two bases of A in order to exclude the possibility of the classical correlation. We may choose the two bases {|H , |V } and {|D , |X } as |x A , for example. Finally, the visibility is given by the ratio between n max − n min and n max + n min with the respective A's measurement basis |x A . However, our decision will contain a bias if we choose only two bases as A's measurement basis |x A . Hence, we cannot estimate the fidelity between the target maximally entangled state and the given state in a statistically proper way from the visibility.
Since the equation | + I (4) holds, we can estimate the fidelity by measuring the sum of the counts of the following vectors: |H H , |V V , |D D , |X X , |R L and |L R , when λ is known [7,8]. This is because the sum n 1 := n H H + n V V + n D D + n X X + n R L + n L R obeys the Poisson distribution with the expectation value (λ 1+2F 6 + δ)t 1 , where the measurement time for each vector is t 1 6 . We call these vectors the coincidence vectors because they correspond to the coincidence events.
However, since the parameter λ is usually unknown, we need to perform another measurement on different vectors to obtain additional information. Since also holds, we can estimate the fidelity by measuring the sum of the counts of the following vectors: |H V , |V H , |D X , |X D , |R R and |L L . The sum n 2 := n H V + n V H + n D X + n X D + n R R + n L L obeys the Poisson distribution Poi((λ 2−2F 6 + δ)t 2 ), where the measurement time for each vector is t 2 6 . Combining the two measurements, we can estimate the fidelity without knowledge of λ. We call these vectors the anti-coincidence vectors because they correspond to the anti-coincidence events.
We can also consider different types of measurements on λ. If we prepare our device to detect all photons, i.e. the case where the projection is I ⊗ I , the detected number n 3 obeys the distribution Poi((λ + δ)t 3 ) with the measurement time t 3 . We will refer to this as the total flux measurement. In the following, we consider the best time allocation for estimation and test of the fidelity, by applying methods of mathematical statistics. We will assume that λ is known or estimated from the detected number n 3 .

Formulation
In this section, we review the fundamental knowledge of hypothesis testing for probability distributions [10]. Suppose that a random variable X is distributed according to a probability measure P θ identified by the unknown single parameter θ ∈ ⊂ R. We also assume that the unknown parameter θ belongs to one of the mutually disjoint sets 0 and 1 . For example, the case: 0 = {θ 0 } and 1 = {θ 1 } is called a simple test, and the case: 0 = {θ ∈ |θ θ 0 } and 1 = {θ ∈ |θ > θ 0 } is called a one-sided test. When we want to guarantee that the true parameter θ belongs to the set 1 with a certain significance, we choose the null hypothesis H 0 and the alternative hypothesis H 1 as H 0 : θ ∈ 0 versus H 1 : θ ∈ 1 .
In fact, we have chosen the hypothesis that the fidelity is less than the given threshold θ 0 as the null hypothesis H 0 in section 2. This formulation is natural because our purpose is guaranteeing that the fidelity is not less than the given threshold θ 0 . When the probability measure P θ is parameterized by plural parameters, the null hypothesis set 0 and the alternative hypothesis set 1 are given by subsets of R d , where d is the number of parameters. Our decision method is described by a test, which is described as a function φ(x) taking values in {0, 1}; H 0 is rejected if 1 is observed and H 0 is not rejected if 0 is observed. That is, we make our decision only when 1 is observed, and do not otherwise. This is because the purpose is accepting H 1 by rejecting H 0 while guaranteeing of the quality of our decision, and is neither rejecting H 1 nor accepting H 1 . Therefore, we call the region {x|φ(1) = 1} the rejection region. The test φ can be defined by the rejection region. From a theoretical viewpoint, we often consider randomized tests, in which we probabilistically make the decision for a given data set. Such a test is given by a function φ mapping to the interval [0, 1]. When we observe the data x, H 0 is rejected with the probability φ(x). In the following, we treat randomized tests as well as deterministic tests.
In the statistical hypothesis testing, we minimize error probabilities of the test φ. There are two types of errors. The type one error is the case where H 0 is rejected when it is true. The type two error is the converse case, H 0 is accepted when it is false. Hence, the type one error probability is given by P θ (φ) (θ ∈ 0 ), and the type two error probability is given by It is in general impossible to minimize both P θ (φ) and 1 − P θ (φ) simultaneously because of a trade-off relation between them. Since we make our decision with a guarantee of its quality only when H 1 is supported, it is definitively required that the type one error probability P θ (φ) be less than a certain constant α. For this reason, we minimize the type two error probability 1 − P θ (φ) under the condition P θ (φ) α. The constant α in the condition is called the risk probability, which guarantees the quality of our decision. If the risk probability is large enough, our decision has less reliability. Under this constraint for the risk probability, we maximize the probability to reject the hypothesis H 0 when the true parameter is θ ∈ 1 . This probability is given as P θ (φ), and is called the power of φ. Hence, a test φ of the risk probability α is said to be most powerful (MP) at θ ∈ 1 if P θ (φ) P θ (ψ) holds for any test ψ of the risk probability α. Thus, the power of the MP test at θ ∈ 1 gives the optimal power β α (θ ) := max φ:P θ (φ) α P θ (φ). Then, a test is said to be uniformly most powerful (UMP) if it is MP at any θ ∈ 1 . When the level-α UMP test φ α exists, β α (θ ) := P θ (φ α ). However, in general, it is not easy to calculate the maximum value β α (θ ). So, we often restrict our test to a class T of tests. In this case, we focus on the maximum value β T,α (θ ) := max φ∈T α P θ (ψ), where T α is the set of tests in T with the level α.

P-values
In the hypothesis testing, we usually fix our test before applying it to data. However, we sometimes focus on the minimum risk probability among tests in a class T rejecting the hypothesis H 0 with the given data. This value is called the P-value, which depends on the observed data x as well as the subset 0 to be rejected.

9
In fact, in order to define the P-value, we have to fix a class T of tests. For example, we often use the class of likelihood ratio tests, which will be explained in the next section. Then, for 0 , the P-value P T (x) is defined as a function of the observed data x: Since the P-value expresses the risk for rejecting the hypothesis H 0 , this concept is useful for comparison among several designs of the experiment. Note that if we are allowed to choose any function φ as a test, the above minimum is attained by the function δ x : In this case, the P-value is max θ∈ 0 P θ (x). However, the function δ x is unnatural as a test. Hence, we should fix a class of tests to define the P-value.
From the above discussion, we have two methods for comparing the performance when we have two types of experiments for testing the given hypotheses. One is the maximum power β T,α (θ ), where T α is the set of tests in T with the level α. The other is the P-value P T (x). Assume that we have to choose one of two types of experiments before applying the experiment. In this case, it is suitable to use β T,α (θ ). Next, assume that we have done two types of experiments and obtained two data sets. In this case, it is suitable to compare the strength of the decisions based on the two obtained data sets. The strength of the decision can be measured by the P-value. Therefore, we have to choose the criterion based on the situation.

The simple hypotheses case
In mathematical statistics, the likelihood ratio tests are often used as a class of standard tests [10]. These kinds of tests often provide the UMP test in some typical cases. In cases where the hypotheses are simple, that is, both 0 and 1 consist of single elements as 0 = {θ 0 } and 1 = {θ 1 }, the likelihood ratio test φ LR,r is defined as where r is a constant, and the ratio P θ 0 (x)/P θ 1 (x) is called the likelihood ratio. From the definition, any test φ satisfies When a likelihood ratio test φ LR,r satisfies the test φ LR,r is MP of the level α. Indeed, when a test φ satisfies P θ 0 (φ) α, . This is known as the Neyman-Pearson fundamental lemma.
The likelihood ratio test is generalized to the cases where 0 or 1 has at least two elements as Usually, in order to guarantee a small risk probability, the likelihood ratio r is chosen as r < 1.

The one-sided case: the one-parameter case
In cases where the hypothesis is one-sided, that is, the parameter space is an interval of R and the hypothesis is given as we often use the so-called interval tests for its optimality under some conditions as well as for its naturalness. When the likelihood ratio P θ (x)/P η (x) is a monotone increasing concerning x for any θ, η such that θ > η, the likelihood ratio is called a monotone. In this case, the likelihood ratio test φ LR,r between P θ 0 and P θ 1 is UMP of level α := P θ 0 (φ LR,r ), where θ 1 is an arbitrary element satisfying θ 1 < θ 0 . Indeed, many important examples satisfy this condition. For example, when a family of probability distributions {P θ |θ ⊂ } is an exponential family, i.e. there exists a random variable x such that the likelihood ratio exp(θ 0 x+g(θ 0 )) exp(θ 1 x+g(θ 1 )) = exp((θ 0 − θ 1 )x + g(θ 0 ) − g(θ 1 )) is a monotone concerning x for θ 0 > θ 1 . Hence, it is convenient to give its proof here.
From the monotonicity, the likelihood ratio test φ LR,r has the form with a threshold value x 0 . Since the monotonicity implies P θ 0 (φ LR,r ) P θ (φ LR,r ) for any θ ∈ 0 , it follows from the Neyman-Pearson lemma that the likelihood ratio test φ LR,r is MP of level α.
From (13), the likelihood ratio test φ LR,r is also a likelihood ratio test between P θ 0 and P η , where η is another element satisfying η < θ 0 . Hence, the test φ LR,r is also MP of level α. From the above discussion, it is suitable to treat the P-value based on the class of likelihood ratio tests (LR). In this case, when we observe x 0 , the P-value is equal to The normal distribution is known as the most famous example of a one-parameter exponential family, where v is the variance. In this case, the UMP test φ UMP,α of the level α is given as where The P-value concerning likelihood ratio tests is calculated to be For θ < θ 0 , the maximum power is calculated to be Hence, a smaller variance v gives a larger power . This property can be applied to other cases discussed later.
The n-trial binomial distributions also form an exponential family because another parameter θ := log p 1− p satisfies P n p (k) = n k 1 2 n e θ k+n log(e θ /(1+e θ )) .
Hence, in the case of the n-trial binomial distribution, the UMP test φ n UMP,α of the level α is given as the randomized likelihood ratio test: where k 0 is the maximum value k satisfying and γ is defined as Therefore, when k is observed, the P-value is

12
When n is sufficiently large, the distribution P n θ (k) can be approximated by the normal distribution with variance n(1 − θ )θ . Hence, the UMP test φ n UMP,α of the level α is approximately given as The P-value is also approximated to For θ < θ 0 , the power is approximately calculated to be The Poisson distributions Poi(µ) also form an exponential family because another parameter θ := logµ satisfies Poi(µ)(n) = 1 n e θ n−e θ . The UMP test φ UMP,α of the level α is characterized similarly to (18). When the threshold µ 0 is sufficiently large and the hypothesis is given as the UMP test φ UMP,α of the level α is approximately given as The P-value is also approximated to for sufficiently large n. For µ < µ 0 , when µ is sufficiently large, the power is approximately calculated to be

The one-sided case: the binomial Poisson case
Next, as an application of the above cases, we consider testing the following hypothesis in the case of the binomial Poisson distribution Poi(µ 1 , µ 2 ): In this case, as will be shown in (33) and (34) in section 4.4, the likelihood ratio test φ LR,r is characterized by the likelihood ratio test of the binomial distributions as φ LR,r (n 1 , n 2 ) = φ n 1 +n 2 LR,r (n 1 ).

The one-sided case: the multi-parameter normal distribution case
In the multi-parameter case, it is not so easy to characterize likelihood ratio tests. Only the multi-parameter normal distribution is easily treated, where V is the covariance matrix. Now, for a given vector w, we treat two hypotheses given in the linear form: By focusing on the random variable y := w · x, our problem can be reduced to the following hypothesis testing: H 0 : θ c 0 versus H 1 : θ < c 0 14 in the one-parameter normal distribution P θ (y) = 1 √ 2π v e −(y−θ ) 2 /2v with the variance v = w · V w. Hence, the UMP test φ UMP,α of the level α is given as The P-value is calculated to be For w · θ < c 0 , the maximum power is calculated to be Next, we consider the hypothesis testing in a more general form in the multi-parameter case. In this case, the UMP test does not always exist. Hence, we have to choose our test among non-UMP tests. One idea is choosing our test among likelihood ratio tests because likelihood ratio tests always exist and we can expect that these tests have good performances. Generally, it is not easy to give an explicit form of the likelihood ratio test. When the family is a multiparameter exponential family, the likelihood ratio test has a simple form. A family of probability distributions {P θ | θ = (θ 1 , . . . , θ m ) ∈ R m } is called an m-parameter exponential family when there exists an m-dimensional random variable x = (x 1 , . . . , x m ) such that where g( θ) := − log exp( θ · x)P 0 (d x). However, this form is not sufficiently simple because its rejection region is given by a nonlinear constraint. Hence, a test with a simpler form is required. In the following, we discuss the likelihood ratio test in the case of an m-parameter exponential family. In this case, the likelihood ratio test φ LR,r has the form where the maximum likelihood estimator (MLE) θ( x) is given by (see [13]) and the divergence D(P η P θ ) is defined as This is because the logarithm of the likelihood function is calculated as Hence, when = 0 ∪ 1 , the likelihood ratio test with the ratio r < 1 is given by the rejection region: Now, we show that in the case of the binomial Poisson distribution Poi(µ 1 , µ 2 ), the likelihood ratio test φ LR,r of the hypotheses (27) is characterized by the likelihood ratio test of the binomial distributions as (28). The divergence is calculated as Thus, log sup θ 0 ∈ 0 P θ 0 (n 1 , n 2 ) sup θ 1 ∈ 1 P θ 1 (n 1 , n 2 ) = (n 1 + n 2 )D P n 1 n 1 +n 2 P θ 0 = D P n 1 +n 2 n 1 n 1 +n 2 where P θ is the binomial distribution with one observation and P n θ is the binomial distribution with n observations. Then, the likelihood ratio test is given by the likelihood ratio test of the binomial distributions.

Asymptotic theory I: the n-trial case
In the above discussion, we treated the likelihood ratio tests concerning several special cases. It is not so easy to treat the likelihood ratio tests in general models. However, when the number of observations (trials) is sufficiently large, likelihood ratio tests can be treated approximately in the one-parameter case as follows.

Fisher information
Assume that the data x 1 , . . . , x n obeys the identical and independent distribution of the same distribution family p θ and n is sufficiently large. We consider testing of the one-sided hypotheses: H 0 : θ θ 0 versus H 1 : θ < θ 0 . When the true parameter θ is close to the threshold θ 0 , it is known that the meaningful information for θ is essentially given as the random variable L n θ ( where the random variable L θ 0 (x i ), the logarithmic derivative l θ 0 (x i ) and the Fisher information J θ are defined by In this case, the random variable L n θ ( x) can be approximated by the normal distribution with the expectation value θ − θ 0 and the variance 1 n J θ 0 . Hence, the testing problem can be approximated by the testing of this normal distribution family [10,13]. That is, the quality of testing is approximately evaluated by the Fisher information J θ 0 at the threshold θ 0 . The following test φ n α approximately achieves the minimum type two error probability with the level α: which is called an asymptotic likelihood ratio test. Then, the P-value concerning these kinds of tests is approximately calculated to be P ALR ( x) ∼ = ((L n θ 0 ( x)) n J θ 0 ). For θ < θ 0 , the maximum power is calculated to be In a usual asymptotic setting, we focus only on the point θ so close to θ 0 that J θ can be approximated by J θ 0 . Hence, a larger Fisher information J θ 0 gives a larger power β α (θ ).

The Fisher information matrix
This approximation can be extended to the multi-parameter case { p θ | θ ∈ R m }. Similarly, it is known that the testing problem can be approximated by the testing of the normal distribution family with the covariance matrix (n J θ ) −1 , where the Fisher information matrix J θ ;i, j is given by

17
Now we focus on the following hypothesis testing: In this case, the choice of the threshold point θ 0 : w · θ 0 = c 0 is not unique. Now, suppose that the random variable L θ ; j (x) := m i=1 (J −1 θ ) i, j l θ ;i (x) and its variance w · J −1 θ 0 w do not depend on the choice of the threshold point θ 0 . Then, we can use the random variable w · L n θ 0 ( x). That is, we can define the following level α test φ n α,ALR : which approximately achieves the minimum type two error probability with the level α. This test is called an asymptotic likelihood ratio test. Since the variance of the random variable w · L n is approximately given by w · J −1 θ 0 w, the P-value concerning this kind of tests is approximately calculated to be For θ : w · θ < c 0 , the maximum power is calculated to be Hence, a smaller variance w · J −1 θ 0 w gives a larger power β α ( θ ). Generally, this assumption does not hold. So, it is not so easy to give a suitable testing for this case. However, even in the general case, if n is sufficiently large, the MLEθ ML ( x) belongs to the neighborhood U θ 1 of the true value θ 1 . In this neighborhood U θ 1 , we can regard that the random variable w · L n θ 0 ( x) and the variance w · J −1 θ 0 w almost do not depend on the point Hence, the following test φ α,ALR might be useful: where θ 0,ML ( x) is the maximum likelihood estimator of the family { p θ | w · θ 0 = c 0 }. In this case, (35) and (36) asymptotically hold. Therefore, the variance w · J −1 θ 0 w roughly represents the best performance of testing.
Since the above discussion contains such rough approximations, it is suitable to use a better testing method in the case when the assumption does not hold. In the following, we propose testing methods in two typical cases of the multinomial Poisson case. Indeed, there exists a general asymptotic theory for one-side hypotheses for a multi-parameter family [14]; however, it is so complicated that it is not applied in this paper.

Asymptotic theory II: the multinomial Poisson case
In this section, we treat one-sided hypotheses in a general form of the multinomial Poisson However, we cannot treat it so precisely. Hence, the following two special cases are treated later.
(i) The parameter µ is written as The hypotheses are given by (ii) The hypotheses are given by where w satisfies the condition w i > 0.

The general case
In order to treat the multinomial Poisson case, we consider the Poisson distribution family Poi(θ t). In this case, the parameter θ can be estimated by X t . The asymptotic case corresponds to the case with large t. In this case, Fisher information is t θ . When X obeys the unknown Poisson distribution family Poi(θ t), the estimation error X t − θ is close to the normal distribution with the variance θ t , i.e.
√ t( X t − θ ) approaches the random variables obeying the normal distribution with the variance θ . That is, Fisher information corresponds to the inverse of the variance of the estimator.
Indeed, the same fact holds for the multinomial Poisson distribution family Poi(t µ). When the jth element X j of X is the jth random variable, the random variable m j=1 λ j √ t (X j − µ j ) converges to the random variable obeying the normal distribution with the variance m j=1 λ 2 j µ j in distribution: This convergence is compact uniform concerning the parameter µ. In this case, the Fisher information matrix J µ,t is the diagonal matrix with the diagonal elements ( t µ 1 , . . . , t µ m ). When our distribution family is given as a subfamily Poi(tµ 1 (θ), . . . , tµ m (θ)), the Fisher information matrix is A t θ J µ(θ),t A θ , where A θ ;i, j = ∂µ j ∂θ i . Hence, when the hypothesis is given by the testing problem can be approximated by the testing of the normal distribution family with variance Therefore, when the time t is sufficiently long, we can conclude that this variance expresses the best performance of testing of the hypotheses (40) in the given experiment. Hence, in this paper, we will use the variance (41) for comparing the experiment. However, in order to treat the quality of our decision, we use the P-value, whose formula is given in subsections 4.3, 6.2 and 6.3.

Special case (i)
The case (i) can be regarded as a generalization of the case of the binomial Poisson distribution family. In this case, | µ 1 | := m i=1 µ i,1 does not necessarily equal | µ 2 |. For simplicity, we consider the following testing problem. The parameter µ is written as 1], s > 0, and |ν 1 | = |ν 2 |. The hypotheses are given by In order to treat this problem, we define the conditional distribution family when the total number N is observed: The distribution family { p f,N } has only one parameter. Thus, we can apply the discussion of subsection 5.1. We define the random variable l f ( k), the random variable L f ( k) and the Fisher information J f by Then, the following test φ N f 0 ,α approximately achieves the minimum type two error probability with the level α: Therefore, we propose the following test φ f 0 ,α : for arbitrary k. We call this kind of test a quasi-asymptotic likelihood ratio test. Then, the P-value among these kinds of tests is approximately calculated to be For f < f 0 and s > 0, the maximum power is approximately calculated as because the total number | k| converges to t in probability.
In the general case (F µ 1 + (1 − F) µ 2 )t, the relation Hence, putting we can apply the above special case. Therefore, we can use the following test φ F 0 ,α : Then, the P-value among this kind of tests is approximately calculated to be For F < F 0 and t > 0, the maximum power is approximately calculated to be Now, we focus on the Fisher information matrix J Ft of the two-parameter model Poi(F µ 1 + (1 − F) µ 2 )t). As is checked by a simple calculation, the value s(F, t)(d f /dF)(F) 2 J f (F) can be described by using this matrix J Ft : where the parameter F is regarded as the first element. Thus, the performance of the test based on the experiment Poi(F µ 1 + (1 − F) µ 2 )t) is characterized by (J −1 Ft ) 1,1 . This conclusion does not contradict that of subsection 6.1.

Special case (ii)
In order to treat the hypotheses (38), using the discussion of subsection 4.4, we treat multinomial Poisson distributions: which are an exponential family. The divergence is calculated as where D( p p ) is the divergence between the multinomial distributions p and p . In the following, we treat two hypotheses given as with the condition w i 0. Using the formulae (42) and (43), we can calculate the likelihood ratio test for a given ratio r . Now, we calculate the P-value concerning the class of likelihood ratio tests when we observe the data k 1 , . . . , k m . When w · k < c 0 , this P-value is equal to where Since the calculation of (44) is not so easy, we consider its upper bound. For this purpose, we define the set B R as whereμ i (R) are defined as follows: where w M := max i w i and

22
Note thatμ i (R) is a monotone decreasing function of R. As is shown in appendix D, Then, the P-value concerning likelihood ratio tests is upper bounded by However, it is difficult to choose the likelihood r such that the P-value is equal to a given risk probability α because the set A R is defined by a nonlinear constraint. In order to resolve this problem, we propose to modify the likelihood ratio test by using the set B R instead of the set A R because B R is defined by a linear constraint, while A R is defined by a nonlinear constraint. That is, we define the modified test φ mod,R as the test with the rejection region B R . Among these kinds of tests, we can choose the test φ mod,R α with the risk probability α by choosing R α in the following way: Indeed, the calculation of the probability Poi( µ )(B R ) is easier than that of the probability Poi( µ )(A R ) because of the linearity of the constraint condition of B R .
Next, we calculate the P-value of the set of modified tests {φ mod,α } α . For an observed data The lhs is monotone increasing for R because eachμ i (R ) is monotone decreasing for R . Thus, R ( k) is the maximum R such that k ∈ B R . Then, the P-value is equal to max w· µ =c 0 Poi( µ )(B R ( k) ). Further, the relation (47) implies k ∈ B R( k) . Hence, R( k) R ( k), which implies B R( k) ⊃ B R ( k) . Therefore, the P-value max w· µ =c 0 Poi( µ )(B R ( k) ) concerning the modified tests {φ mod,α } α is smaller than the upper bound max w· µ =c 0 Poi( µ )(B R( k) ) of the P-value concerning the likelihood ratio tests. This test φ mod coincides with the likelihood ratio test in the one-parameter case. Further, as is shown later, when l i=1 µ i is sufficiently large, the P-value max w· µ =c 0 Poi( µ )(B R ( k) ) concerning the modified tests {φ mod,α } α is approximated to where 1, and .

23
Now, we prove (49). For this purpose, we apply normal approximation to the multinomial Poisson distribution Poi( µ). In this case, by usingμ i defined in (45) and (46), the upper bound (48) of the P-value concerning the likelihood ratio tests is approximated to because this convergence (39) is compact uniform concerning the parameter µ. Letting where Co(R) is the convex hull of (x 1 (R), y 1 (R)), . . . , (x m (R), y m (R)). As is shown in appendix E, this value is simplified to That is, our upper bound of the P-value concerning the likelihood ratio tests is given by Next, we approximately calculate the test with the risk probability α proposed in this section. First, we choose R α by Then, our test is given by the rejection region B R α . Using the same discussion, the P-value concerning the proposed tests is equal to − min i, j z i, j (R ( k)) .

Modification of visibility
In the following sections, we apply the discussions in sections 3-5 to the hypothesis (3). That is, we consider how to reject the null hypothesis H 0 : F F 0 with a certain risk probability α.
In the usual visibility, we usually measure the coincidence events only in one direction or two directions. However, in this method, the number of counts of coincidence events is reflected not only by the fidelity but also by the direction of difference between the true state and the target maximally entangled state. In order to remove the bias based on such a direction, we propose to measure the counts of the coincidence vectors |H H , |V V , |D D , |X X , |R L and |L R , which correspond to the coincidence events, and the counts of the anti-coincidence vectors |H V , |V H , |D X , |X D , |R R and |L L , which correspond to the anti-coincidence events. The former corresponds to the minimum values in the usual visibility, and the latter corresponds to the minimum values in the usual visibility. In this paper, we call this proposed method the modified visibility method. Using this method, we can test the fidelity between the maximally entangled state | (+) (+) | and the given state σ , using the total number of counts of the coincidence events (the total count on coincidence events) n 1 and the total number of counts of the anti-coincidence events (the total count on anti-coincidence events) n 2 obtained by measuring on all the vectors with the time t 12 . When the dark count is negligible, the total count on coincidence events n 1 obeys Poi(λ 2F+1 12 t), and the count on total anti-coincidence events n 2 obeys the distribution Poi(λ 2−2F 12 t). These expectation values µ 1 and µ 2 are given as µ 1 = λ 2F+1 12 t and µ 2 = λ 2−2F 12 t. Hence, the Fisher information matrix concerning the parameters F and λ is where the first element corresponds to the parameter F and the second one corresponds to the parameter λ. Then, we can apply the test φ LR given in the end of subsection 4.3. That is, based on the ratio µ 2 µ 1 +µ 2 = 2 3 (1 − F), we estimate the fidelity using the ratio n 2 n 1 +n 2 asF(n 1 , n 2 ) = 1 − 3 2 n 2 n 1 +n 2 . Based on the discussion in subsection 6.2, its asymptotic variance (J −1 Ft ) 1,1 is equal to Hence, similarly to the visibility, we can check the fidelity by using this ratio. Indeed, when we consider the distribution under the condition that the total count n 1 + n 2 is fixed at n, the random variable n 2 obeys the binomial distribution with the average value 2 3 (1 − F)n. Hence, we can apply the likelihood ratio test of the binomial distribution. In this case, by the approximation to the normal distribution, the likelihood ratio test with the risk probability α is almost equal to the test with the rejection region: concerning the null hypothesis H 0 : F F 0 . The P-value of these kinds of tests is

Design I (λ: unknown, one stage)
In this section, we consider the problem of testing the fidelity between the maximally entangled state | (+) (+) | and the given state σ by performing three kinds of measurements, coincidence, anti-coincidence and total flux, with the times t 1 , t 2 and t 3 , respectively. When the dark count is negligible, the data (n 1 , n 2 , n 3 ) obeys the multinomial Poisson distribution Poi(λ 2F+1 6 t 1 , λ 2−2F 6 t 2 , λt 3 ) with the assumption that the parameter λ is unknown. In this problem, it is natural to assume that we can select the time allocation with the constraint for the total time t 1 + t 2 + t 3 = t.
Indeed, this research is different from other research with the Poisson distribution framework in the following point. In other papers [11,15], we consider the projection-valued measure {P i } and assume that we apply the measurement with the filter P i in the same time span for every i. However, this paper treats the optimization concerning the time allocation for the measurement with the filter P i .
The performance of the time allocation (t 1 , t 2 , t 3 ) can be evaluated by the variance (41). The Fisher information matrix concerning the parameters F and λ is where the first element corresponds to the parameter F and the second one corresponds to the parameter λ. Then, the asymptotic variance (J −1 Ft ) 1,1 is calculated as 2F + 1 6 We optimize the time allocation by minimizing the variance (53). We perform the minimization by maximizing the inverse: Applying lemmas 1 and 2 shown in appendix A to the case of a = 2 3(2F+1) , b = 2 3(2−2F) , c = 2F+1 6 and d = 2−2F 6 , we obtain and (iii) λ max Then, these relations give the optimal time allocations between (i) coincidence and total flux measurements, (ii) anti-coincidence and total flux measurements, and (iii) coincidence and anticoincidence measurements, respectively. The ratio of (56) to (54) is equal to as shown in appendix B. That is, the optimal measurement using the coincidence and the anticoincidence always provides a better test than that using the coincidence and the total flux. Hence, we compare (ii) with (iii), and obtain max t 1 +t 2 +t 3 =t λ 2t 1 3(2F + 1) where the critical point F 1 < 1 is defined by The approximated value of the critical point F 1 is 0.899519. Equation (57) is derived in appendix C. Figure 2 shows the ratio of the optimal Fisher information based on the anticoincidence and total flux measurements to that based on the coincidence and anti-coincidence measurements. When F 1 F 1, the maximum Fisher information is attained by Otherwise, the maximum is attained by The optimal time allocation shown in figure 2 implies that we should measure the counts on the anti-coincidence vectors preferentially over other vectors. The optimal asymptotic variance is (2F+1)(2−2F)( when the threshold F 0 is less than the critical point F 1 . This asymptotic variance is much better than that obtained by the modified visibility method. The ratio of the optimal asymptotic variance is given by In the following, we give the optimal test of the level α in the hypothesis testing (6). Assume that the threshold F 0 is less than the critical point F 1 . In this case, we can apply testing of the hypothesis (27). Firstly, we measure the count on the coincidence vectors for a period of The ratio of the optimal Fisher information (solid line) and the optimal time allocation as a function of the fidelity F. The measurement time is divided into three periods: coincidence t 1 (plus signs), anti-coincidence t 2 (circles) and total flux t 3 (squares), which are normalized as t 1 + t 2 + t 3 = 1 in the plot.
to obtain the total count n 1 . Then, we measure the count on the anti-coincidence vectors for a period of t 2 = t √ 2F 0 + 1 √ 2F 0 + 1 + √ 2 − 2F 0 to obtain the total count n 2 . Note that the optimal time allocation depends on the threshold of our hypothesis. Finally, we apply the UMP test of α of the hypothesis: 2 − 2F 0 + √ 1 + 2F 0 with the binomial distribution family P n 1 +n 2 p to the data n 1 . In this case, the likelihood ratio test with the risk probability α is almost equal to the test with the rejection region: concerning the null hypothesis H 0 : F F 0 . The P-value of this kind of tests is We can apply a similar testing for F 0 > F 1 . It is sufficient to replace the time allocation to

28
In this case, the likelihood ratio test with the risk probability α is almost equal to the test with the rejection region: concerning the null hypothesis H 0 : F F 0 . The P-value of these kinds of tests is  Next, we consider the case where the dark count parameter δ is known but is not negligible. Since our distribution is given by the multinomial Poisson distribution Poi((λ 2F+1 6 + δ)t 1 , (λ 2−2F 6 + δ)t 2 , (λ + δ)t 3 ), the Fisher information matrix is equal to Hence, the inverse of the asymptotic variance (J −1 Ft ) 1,1 is equal to Then, we apply lemmas 1 and 2 in appendix A to f (t 1 ,t 2 ,t 3 ) and obtain the optimized value: (i) coincidence and total flux max t 1 +t 3 =t ((2F + 1) + (6(λ(2F + 1) + 6δ))/λ) 2 (60) (ii) anti-coincidence and total flux max t 2 +t 3 =t and (iii) coincidence and anti-coincidence max The ratio of (60) to (62) is where the final inequality is derived in appendix B. Therefore, the measurement using the coincidence and the anti-coincidence provides a better test than that using the coincidence and the total flux, as in the case of δ = 0. Define δ 1 and the critical point F δ for the normalized dark count δ = 6δ/λ < δ 1 as The parameter δ 1 is calculated to be 0.375. As shown in appendix C, the measurement using the coincidence and the anti-coincidence provides a better test than that using the anti-coincidence and the total flux, if the fidelity is smaller than the critical point F δ : The optimal time allocation is given by and The critical point F δ for optimal time allocation increases with the normalized dark count as illustrated in figure 3.

Design II (λ: known, one stage)
In this section, we consider the case where λ is known. Then, the Fisher information is .
The maximum value is calculated as max t 1 +t 2 +t 3 =t The above optimization shows that when F 1 4 , the count on anti-coincidence (t 1 = 0; t 2 = t; t 3 = 0) is better than the count on coincidence (t 1 = t; t 2 = 0; t 3 = 0). In fact, Barbieri et al [8] measured the sum of the counts on the anti-coincidence vectors |H V , |V H , |D X , |X D , |R R and |L L to realize the entanglement witness in their experiment. In this case, the variance is 3(λ(2−2F)+6δ) 2λ 2 t . When we observe the sum of counts on anti-coincidence n 2 , the estimated value of F is given by 1 + 3(δ − n 2 λt ), which is the solution of (λ 2−2F 6 + δ)t = n 2 . The likelihood ratio test with the risk probability α can be approximated by the test with the rejection region: concerning the null hypothesis H 0 : F F 0 , which is also the UMP test. The P-value of likelihood ratio tests is n 2 − (((λ(1 − F 0 ))/3) + δ)t √ (((λ(1 − F 0 ))/3) + δ)t .

Comparison of the asymptotic variances
We compare the asymptotic variances of the following designs for time allocation, when the dark count δ parameter is zero.
(i) Modified visibility: the asymptotic variance is (2F+1)(2−2F) λt . (iia) Design I (λ unknown), optimal time allocation between the counts on anti-coincidence and coincidence: the asymptotic variance is (2F+1)(2−2F)( (iib) Design I (λ unknown), optimal time allocation between the counts on anti-coincidence and the total flux: the asymptotic variance is (2−2F)( (iiia) Design II (λ known), estimation from the count on anti-coincidence: the asymptotic variance is 3(2−2F) 2λt . (iiib) Design II (λ known), estimation from the count on coincidence: the asymptotic variance is 3(2F+1) 2λt . Figure 4 shows the comparison, where the asymptotic variances in (iia)-(iiib) are normalized by the one in (i). The anti-coincidence measurement provides the best estimation for high (F > 0.25) fidelity. When λ is unknown, the measurement with the counts on anti-coincidence and coincidence is better than that with the counts on anti-coincidence and the total flux for F < 0.899519. For higher fidelity, the counts on anti-coincidence and the total flux turn out to be better, but the difference is small.

Optimal allocation
The comparison in the previous section shows that the measurement on the anti-coincidence vectors yields a better variance than the measurement on the coincidence vectors, when the fidelity is greater than 1/4 and the parameters λ and δ are known. We will explore further improvement in the measurement on the anti-coincidence vectors. In the previous sections, we allocate an equal time to the measurement on each of the anti-coincidence vectors. Here, we minimize the variance by optimizing the time allocation t H V , t V H , t D X , t X D , t R R and t L L between the anti-coincidence vectors B = {|H V , |V H , |D X , |X D , |R R and |L L }, under the restriction of the total measurement time: (x,y)∈B t x,y = t. The number of the counts n x y obeys the Poisson distribution Poi((λµ x y + δ)t x y ) with unknown parameter µ x y . Then, the Fisher information matrix is a diagonal matrix with the diagonal elements λ 2 t x,y λµ x,y + δ (x,y)∈B .
Since we are interested in the parameter the variance (41) is given by 1 4 (x,y)∈B λµ x,y + δ λ 2 t x,y , which expresses the performance of this allocation, as mentioned in section 6.1.

Conclusion
We have formulated the hypothesis testing scheme to test the entanglement in the Poisson distribution framework. Our statistical method can handle the fluctuation in the experimental data more properly in a realistic setting. It has been shown that the optimal time allocation improves the test: the measurement time should be allocated preferably to the anti-coincidence vectors. This test is valid even if dark counts exist. This design is particularly useful for experimental tests, because the optimal time allocation depends only on the threshold of the test. We do not need any further information on the probability distribution and the tested state. The test can be further improved by optimizing time allocation between the anti-coincidence vectors, when the error from the maximally entangled state is anisotropic. However, this time allocation requires the expectation values on the counts on coincidence, so that we need to apply the two-stage method. Further, replacing 2F+1 6 and 2−2F 6 by F and 1 − F in section 7, we can treat the discrimination between two pure states with fidelity F in the Poisson distribution framework. We believe that it is important to treat several decision problems in the quantum system by the Poisson distribution framework. Hence, we obtain (50).