Linear approximation to the statistical significance autocovariance matrix in the asymptotic regime

Approximating significance scans of searches for new particles in high-energy physics experiments as Gaussian fields is a well-established way to estimate the trials factors required to quantify global significances. We propose a novel, highly efficient method to estimate the covariance matrix of such a Gaussian field. The method is based on the linear approximation of statistical fluctuations of the signal amplitude. For one-dimensional searches the upper bound on the trials factor can then be calculated directly from the covariance matrix. For higher dimensions, the Gaussian process described by this covariance matrix may be sampled to calculate the trials factor directly. This method also serves as the theoretical basis for a recent study of the trials factor with an empirically constructed set of Asmiov-like background datasets. We illustrate the method with studies of a H → γγ inspired model that was used in the empirical paper.


Introduction
In high energy physics searches for new particles that appear in the data as resonances [1,2], one usually scans a mass region and hopes to find a peak of high significance at some mass.The significance at each mass of the scan is generally found by applying Wilks' theorem [3] to the likelihood-ratio test statistic (LRT) [4] for each point, and results in a field of significances measured across the search region.
While the resonance may appear anywhere in the search region, the analysis usually targets the highest (local) significance, which leads to the recurring challenge of estimating the global significance of this observation.The necessity of calculating the probability for a background fluctuation to give such a peak of significance anywhere in the search region, and not simply where the significance is maximal, is commonly referred to as the look-elsewhere effect (LEE).
There have been a number of studies investigating the LEE, and in our work we pay particular attention to those describing the significance field with a Gaussian process.While some studies [5][6][7] set the upper bound on the trials factor, which converts a local p-value into a global one, and only use a Gaussian process implicitly to link the low and high significance regions, other studies [8] require explicit values for the Gaussian process parameters.
In this paper we establish a chain of lightweight steps from a non-linear parametric statistical model to the trials factor by estimating the covariance matrix of the significance field.To construct the estimate involving only one background only fit to the data, we apply linear expansion to the non-linear background shape.The way to calculate the covariance matrix starting from a linear model was briefly discussed by Demortier [9, p. 23-33].As part of our work, we give a strict mathematical formulation of the method and demonstrate a practical application of it to non-linear background shapes, with the estimated covariance matrix serving as a proxy for the straightforward trials factor estimate.
A common input for the methods that quantify the LEE is a set of maximum likelihood fits to some number of Monte Carlo generated data realizations.They may be used to estimate the trials factor in the lower significance region, or the covariance matrix of the Gaussian process itself (the significance autocovariance).The challenge, then, is to fit enough datasets to estimate the trials factor with a satisfactory precision, while keeping the number of fits as small as possible.
In high-energy physics searches for a new particle or a resonance, typically, the likelihood-ratio test statistic is used to construct the p-value for each point on a search grid.In the asymptotic regime, the test statistic follows a  2 distribution.
For analyses that use a Gaussian process to model the significance, the number of degrees of freedom of the test statistic distribution is, typically, 1.For this case, in Chapter 2, we suggest a method to estimate the significance covariance matrix that makes use of a single background-only fit to the data.
We replace the set of fits that were required in our previous work, with derivatives of the best-fit-to-the-data background model.Fortunately, the derivatives can often be extracted from the fit software.
Core assumptions.In section 3 we show that three quite generic requirements: 1. the background model should be well approximated by its linear expansion around the best fit parameters, 2. the assumption that the data can be binned and fluctuations in different bins of the dataset are independent, 3. the fluctuations in each bin follow a Gaussian distribution, together, are consistent with the assumptions made in the empirical study by Ananiev & Read [8], which relied on the additivity (superposition) principle for the fluctuations to empirically estimate the covariance matrix of the significances.We argue, therefore, that this work serves as a theoretical basis for the method of the set of Asimov background samples introduced in the study, and at the same time may rely on its validations.

Statistical model
The basic structure of a statistical model commonly used in high-energy physics experiments that search for a new particle or a resonance was described in detail in the empirical study [8].For the present study, we chose the  →  inspired model as a benchmark, because it satisfies without approximation the second and third requirements above.
The search is conducted with the likelihood ratio test statistic evaluated for each point  of the search grid M.
In this binned model, the expected background   () has an exponential shape and is used as the null-hypothesis  0 .The shape of the expected signal   () is Gaussian and together with background   forms the alternative  1 , expected signal + background estimate: where  enumerates bins,  denotes the vector of nuisance parameters and  is the signal strength parameter.Generally, in the asymptotic regime (e.g.large sample), and neglecting constant terms, loglikelihoods for  0 and  1 may be approximated as follows1: where  enumerates bins,  ∈ M denotes the point in the search region M of parameters which are not present under the background-only hypothesis,  are the nuisance parameters, and   corresponds to the binned data with errors   2.
Our goal is to estimate the covariance matrix Σ   of the statistical significances   and   evaluated at two different points of the search region M: ) where   () is the likelihood-ratio test statistic (LRT),   is the so-called signed-root LRT,  0 are the nuisance parameters that maximize the background-only likelihood L 0 ,  0 +  1 together 1The  inspired model assumes Gaussian errors in its definition [8].The expressions for log-likelihoods (eq. 1. 2) in case of this model are, therefore, exact.
2We have assumed that the errors   are independent of the nuisance parameters .With a linear correction to   it is still possible to get a closed form expression for the test statistic and significance.The calculation of the covariance would require sampling toys to average out the fluctuations.No additional fits would be required, however, so this may be a potential option for more sophisticated analyses.
with the signal strength μ maximize the signal+background likelihood L 1 , and N [0, 1] denotes the standard normal distribution.
To give a feeling of the  inspired model, in figure 1 we plotted the shape of the backgroundonly hypothesis   , one sample of data sampled from it, and a corresponding significance curve (eq.1.4).Notice how clearly visible bumps in the data are reflected in peaks of the significance curve.
We would like to remark that for the signal+background model we are fitting  as a deviation from  0 .This is essential for the proper separation of variables in the subsequent calculations.
We assume that the best fit of the background model   to the data   is available for the study as   ( θ) = b .In order to simplify the notation, we make use of the freedom to choose the reference point for the model parameters  and define the best fit parameters to be θ = 0.

Method
To simplify the notation, we redefine   ,   and   to include   : The log-likelihoods then become: For every realization of the data, we expect the deviations of the fit parameters  and  from 0 to be small (in the absence of a signal), and therefore the first-order expansion of   () and   () around 0 to be accurate enough.
The log-likelihoods then are: where Δ   =   ( )   | =0 is the Jacobian of the best-fit background model and the Einstein summation rule applies to the indices .Since the signal model   contributes to the log-likelihoods eq.(2.3)only at lowest order, thus is constant, we simplify   (0) to   from now on.
The equations that define optimal values of  0 ,  1 , and  then are: To reduce the number of indices, we rewrite the expressions above with bra-ket notation: ) where in eq. ( 2.8) we used eq.(2.7) to cancel the  0 contribution.We can solve eq.(2.7) and eq.(2.8) for  0 and  1 correspondingly: ) It is important to mention that, although Δ itself is generally singular, the product Δ ⊺ Δ appears to be a Hessian of −2 ln L 1 with respect to  1 .For the background model best-fit point  = 0 to be a minimum, it is required that the Hessian be positive definite, thus Δ ⊺ Δ is invertible.
We substitute eq.(2.10) and eq.( 2.11) into eq.(2.9) and solve for μ: .12) An interesting and important fact is that  is a projector and it is symmetric:

.13)
A projector is always positive semi-definite, which means that the product below is non-negative for any non-zero s: (2.14) Let us estimate the test statistic   : We again use eq.(2.7) to cancel the  0 contribution and eq.( 2.11) to substitute the solution for  1 : (2.16) The significance   , as defined in eq.(1.4), is: (2.17) The square root in eq. ( 2.17) is always defined, as the product under the square root is always positive (eq.(2.14)).
For the covariance matrix estimation, we would need to average over data.We are looking for a solution with uncorrelated fluctuations in each bin (sec.1), and we recall that we normalized the errors to 1 in eq.(2.1), therefore, the following is true: where   denotes the expectation value calculated across samples of the dataset.The covariance matrix, then, is3: where we used the symmetry and projector properties of .
It should be noted that from the data fluctuations d − b contributing to the covariance matrix in the form a superposition principle, relied on in ref. [8], can be derived: where  enumerates independent fluctuations in different bins.In summary, we can estimate the autocovariance matrix of the significance field from the signal model and derivatives of the background model:

Justification of the set of Asimov background samples
In this section we would like to compare the derived expression eq. ( 2.22) for the linear approximation of the significance covariance matrix to the empirical study [8] and the  →  inspired model introduced there.To carry out the calculations we used the SigCorr package that we developed specifically for trials factor studies, which now includes functionality for the linear approximation [10].
We estimate the linear approximation using eq.( 2.22) with the true parameters of the model, which were predefined in the paper.The resulting matrix shown in figure 2 is visually indistinguishable from the one presented in the empirical study.We also show, in figure 3, the difference between the linear approximation computed on the model's true parameters (figure 2) and the empirical estimate.We confirm that the empirical covariance matrix is compatible with the linear approximation suggested in this paper within the accuracy of the empirical estimate.
On the one hand, the compatibility of the linear approximation and the empirical study allows us to refer to the validations conducted in the empirical study, including those regarding trials factor estimation, and to re-apply them to the method suggested in this paper.The direct calculation of the up-crossings from the covariance matrix, described in [8], becomes particularly appealing now, since it requires only a single fit of the statistical model to the data.The linear approximation, on the other hand, serves as the theoretical basis for the empirical set of Asimov background samples used to estimate the covariance matrix in the aforementioned work.

Conclusion
In this work we proposed a novel method for the estimation of the covariance matrix of statistical significance in new particle searches using a linear expansion of the statistical model around its background-only best fit to the data.In addition to the closed form expression for the linear approximation of the significance covariance matrix, we also presented elegant expressions for the best fitted signal strength and statistical significance in this approximation.
We proved that the suggested covariance matrix satisfies the superposition principle with regard to the fluctuations of the data, which makes it a good proxy to the covariance matrix constructed with the set of Asimov background samples [8].
Finally, we compared these two approaches with the example of a  →  inspired model and showed that the deviations are compatible with the error of the set of Asimov background samples.
We, therefore, claim that all the validations conducted in the empirical study, including those regarding trials factor estimation, hold for the linear approximation suggested in this paper, and the linear approximation serves as a theoretical basis for the empirical set of Asimov background samples construction.

Figure 1 :
Figure 1: One brute force toy (orange) sampled from a background-only hypothesis (blue) of the  →  inspired model.The signal significance curve is plotted below (green), where each point of the curve corresponds to a different choice of the signal hypothesis.

3To
see the parallel with Demortier[9], one needs to think of the background model as a linear combination of vectors in Δ.Then eq.(2.8) defines a vector|  ⟩ =  |  ⟩ √ ⟨  |  |  ⟩, which was introduced by Demortier and is orthogonal to each of the vectors constituting the background shape.The test statistic, then, can be rewritten as   =  − b   2 , and the covariance can be expressed as Σ   = ⟨  |  ⟩.

Figure 2 :
Figure 2: A linear approximation of the significance covariance matrix which was computed on the true parameters of the  →  inspired model.

Figure 3 :
Figure 3:The difference between the linear approximation of the significance covariance matrix computed with the true parameters of the  →  inspired model (figure2) and the covariance matrix estimated with the set of Asimov background samples[8].