Abstract
This paper presents a new statistical method that enables the use of systematic errors in the maximum-likelihood regression of integer-count Poisson data to a parametric model. The method is primarily aimed at the characterization of the goodness-of-fit statistic in the presence of the over-dispersion that is induced by sources of systematic error, and is based on a quasi-maximum-likelihood method that retains the Poisson distribution of the data. We show that the Poisson deviance, which is the usual goodness-of-fit statistic and that is commonly referred to in astronomy as the Cash statistics, can be easily generalized in the presence of systematic errors, under rather general conditions. The method and the associated statistics are first developed theoretically, and then they are tested with the aid of numerical simulations and further illustrated with real-life data from astronomical observations. The statistical methods presented in this paper are intended as a simple general-purpose framework to include additional sources of uncertainty for the analysis of integer-count data in a variety of practical data analysis situations.

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
This article was updated on April 7, 2025 to correct a production error which led to incorrect text citations for some references.
1. Introduction: Systematic Errors and Poisson Regression
It is common practice, in the physical sciences and related disciplines, to classify uncertainties or "errors" in quantities of interest into two categories: "random" or "statistical" errors on one hand, and "systematic" errors on the other. The two types of error are generally attributed to the following assumptions: (1) the method of measurement of the random variable yields an inherent sampling distribution of measurements, when the experiment is repeated under the same experimental conditions (e.g., a Normal or Poisson distribution); and (2) that there may be unknown sources of uncertainty, either in the measurement process itself or in the underlying theory, which may cause additional errors that do not average down with repeated measurements.
This subject is of practical importance for the measurement of physical quantities (e.g., H. H. Ku 1969; B. N. Taylor & C. E. Kuyatt 1994; ISO Guide 98-3 2008). While "random" errors have been the purview of probability and statistics since its beginnings, i.e., via the characterization of a random variable by its sampling distribution and moments, there seems to be no general consensus as to what is meant by "systematic" errors, and how to provide a probabilistic model for them. For example, comments on this subject by J. Glosup & M. Axelrod (1996), C. Eisenhart (1963), N. E. Dorsey & C. Eisenhart (1953), and H. Jeffreys (1966) suggest the presence of one or more additional random variables besides those of interest to the experimenter. These variables can be included in the statistical modeling but are considered uninteresting (i.e., a nuisance parameter such as a background level; e.g., L. Lyons 2008, 2020; G. Cowan 2019), or not be accounted for in the experimental design altogether (i.e., hidden or latent variables; e.g., A. Skrondal & S. Rabe-Hesketh 2004). In both cases, these variables have the potential to affect the measurement of the variable(s) of interest by biasing the estimation of the mean of target variables of interest, and also contributing additional variance (and possibly higher-order moments). This interpretation guides the model for systematic errors to be presented in this paper.
A review by D. van Dyk & L. Lyons (2023) describes several methods that are commonly used in the physical sciences to address sources of systematic error. Some of the simplest and more common methods use a one-parameter-at-a-time error propagation, or several parameters simultaneously (see, e.g., J. Heinrich & L. Lyons 2007). These methods are especially common, and have been the de facto standard for astrophysics (see, e.g., notable applications by W. L. Freedman et al. 2001; S. W. Allen et al. 2004). More complex frequentist and Bayesian methods are also available (e.g., R. D. Cousins & V. L. Highland 1992; H. Lee et al. 2011; M. A. Acero et al. 2022), depending on the data and type of application (i.e., point estimation, interval estimation, upper limits, goodness of fit). Certain such methods require subsidiary measurements to constrain the likelihood associated with the nuisance parameters, and to marginalize the posterior for the parameters of interest with respect to the nuisance parameters, e.g., see application to cosmological parameters by Planck Collaboration et al. (2016, 2014), or for the calibration of X-ray instruments by H. Lee et al. (2011) and J. J. Drake et al. (2006).
In regression applications with Normal data, it is customary to treat such systematic errors with the addition of a fiducial systematic variance to that of the data, assuming independence between measurement and systematic errors. This is the usual method to address sources of unknown systematic error in many physics and astronomy regression applications (e.g., P. R. Bevington & D. K. Robinson 2003; J. Nordin et al. 2008), especially when there is no additional information on the physical origin of the systematic error. A popular method for astronomers to include systematic errors in the linear regression is the bivariate correlated errors and intrinsic scatter (BCES) method of M. G. Akritas & M. A. Bershady (1996). That method, however, only applies to the linear regression, and does not explicitly provide a maximum-likelihood goodness-of-fit statistic.
For Poisson-distributed count data, on the other hand, the fixed relationship between the mean and variance of the distribution complicates the statistical treatment of systematic errors. Overdispersed count-data distributions, such as the negative binomial or the Poisson-inverse Gaussian (P-IG), are a possible solution to the problem (see, e.g., J. M. Hilbe 2011, 2014; C. Cameron & P. Trivedi 2013). However, it is often desirable to maintain the Poisson distribution of the data, both because of its simpler log-likelihood properties and the belief that the data do result from a fixed-rate Poisson process. Motivated by this challenge, a new method to deal with the regression of Poisson data with systematic errors was proposed by M. Bonamente (2023a), which is based on the modeling of systematic errors via an intrinsic model variance, instead of using an overdispersed model for the data themselves. Based on that idea, this paper presents in more detail the statistical framework for the regression of Poisson data in the presence of systematic errors, and certain asymptotic results on the Poisson deviance as its goodness-of-fit distribution.
This paper is structured as follows: Section 2 provides a review of maximum-likelihood regression with Poisson data and what is known concerning its goodness-of-fit statistic. Section 3 presents the details of the model for systematic errors for Poisson data, followed by the new proposed method of regression and its goodness-of-fit statistic in Section 4. The model is then tested in Section 5 with the aid of numerical simulations, Section 6 describes methods of use of this model, and Section 7 presents our conclusions.
2. The Maximum-likelihood Poisson Regression
This Section describes the data model and the goodness-of-fit statistic that results from the maximum-likelihood method of regression with Poisson data. Although other methods of regression are available, such as the least-squares method, the (generalized) method of moments (for a review, see, e.g., C. Cameron & P. Trivedi 2013) or the method of maximum entropy (e.g., E. T. Jaynes 1957), the maximum-likelihood criterion (e.g., R. Fisher 1925) is commonly used in the physical sciences, and it is the method considered in this study.
2.1. The Data Model
Regression of cross-sectional Poisson data via the maximum-likelihood method uses a data model of the type (xi, yi), for i = 1, ..., N independent measurements yi ~ Poisson(μi) at different values of independent variables xi. The maximum-likelihood method returns the Poisson log-likelihood as the fit statistic,
where θ = (θ1, ..., θm) are m adjustable parameters of a model y = f(x; θ) that can be nonlinear, and are the means of the Poisson distributions evaluated at the best-fit parameter values , i.e., . For a textbook review of Poisson maximum-likelihood regression, see, e.g., C. Cameron & P. Trivedi (2013), M. James (2006), and M. Bonamente (2022).
The x variable, with its xi fixed positions, represents the predictor or independent variable, and the response or dependent variable yi represent an integer number of occurrences detected as a function of the predictor. This data model applies to a variety of data in the physical sciences and related disciplines, and it is also commonly used in other fields such as econometrics (C. Cameron & P. Trivedi 2013), where the most common model used is the simple linear model with m = 2 parameters. In this paper, the term "model" is used to indicate interchangeably what astronomers might refer to as either a physical model (e.g., the type of radiation emitted by an astronomical source), or a detector model that characterizes the method of detection and how counts are collected by the instrument. The term is therefore used in this paper in a general sense to indicate a functional form for the parent mean μi for the Poisson distribution of detected counts, according to Equation (1).
In certain disciplines, it is common to use several predictors for the response, e.g., with k ≥ 1, typically using a linear model (i.e., the multiple linear regression, e.g., C. R. Rao 1973; C. Cameron & P. Trivedi 2013) or generalized linear models (GLMs; e.g., P. McCullagh & J. Nelder 1989). In applications for astronomy and the physical sciences, however, emphasis is placed on a physically motivated and often nonlinear model f(x) that is a function of just one predictor, k = 1. In astronomy, a scalar x often represents either time or the energy of a particle or a photon, and the associated data are referred to, respectively, as time series (or light curves) and spectra (see, e.g., E. D. Feigelson & G. J. Babu 2012). The class of GLMs with intercept, μi = η(Xi θT) with and η(·) the link function (e.g., P. McCullagh & J. Nelder 1989), is included in the general class of functions f(x, θ) considered in this paper. The results to be presented in this paper apply to predictors with any dimension, k≥1, although examples and simulations will only be reported for scalar predictors with .
2.2. The Poisson Deviance
The Poisson deviance is defined as twice the difference between the maximum achievable log-likelihood and that of the fitted model ,
and it is the goodness-of-fit statistic of choice for the Poisson regression, where is the best-fit Poisson means obtained for the best-fit parameters (e.g., C. Cameron & P. Trivedi 2013). The Poisson deviance differs from Equation (1) via the subtraction of , which is the likelihood for the saturated model (i.e., the model in which the N means are equal to the measured data), and it is not a function of the model parameters. The Poisson deviance statistic is also referred to as the G-statistic (Y. Bishop et al. 1975).4
This statistic has received much attention for the goodness-of-fit assessment of high-energy astronomy count data. In this field, the DP statistic is known as the Cash or C statistic, and it is commonly referred to as , after the pioneering work of X-ray astronomer W. Cash and others (W. Cash 1976, 1979; S. Baker & R. D. Cousins 1984). The Cash statistic is also the fit statistic of choice for the major high-energy astronomy spectral packages (K. A. Arnaud 1996; J. S. Kaastra et al. 1996; S. Doe et al. 2007).5 According to Equation (2), we define the function
where is the usual Poisson deviance evaluated at the ML estimates of the Poisson means. The definition of Equation (3) implies that yi ~ Poisson(μi) be independent measurements, as per the data model of Section 2.1.
2.3. Distribution of the Poisson Deviance
According to Equation (2), the Poisson deviance DP is a likelihood-ratio statistic, as already pointed out by W. Cash (1979). A likelihood-ratio statistic, as originally proposed by J. Neyman & E. S. Pearson (1928), is generally defined by the ratio of two likelihoods,
where is the full parameter space, and θ0 ⊂ θ is a subset of parameter space representing the null hypothesis. A general result for the asymptotic distribution of certain likelihood-ratio statistics is provided by the Wilks theorem (S. S. Wilks 1938; S. Wilks 1962).6 For a simple null hypothesis in which all of the m-parameters are fixed, the Wilks theorem may be reported as
in the asymptotic limit of a large sample. In Equation (5), is the log-likelihood for a simple null hypothesis H0 represented by a set of fixed (or true) parameters θ = θ0, with θ0 ∈ θ0; and is the usual log-likelihood obtained by using the m best-fit maximum-likelihood parameters that were left free in the fit.
The Wilks theorem also applies to a composite hypothesis with s < m of the parameters fixed at a reference (or true) value, and the remaining m–s parameters being free to adjust to their best-fit value. In that case, the Wilks theorem results in
One of the key hypotheses for the applicability of the Wilks theorem is that partial derivatives of the Poisson log-likelihood with respect to θi are continuous (see, e.g., C. R. Rao 1973, Section 5f).
Within the context of likelihood ratios for DP, the term used in Equation (2) represents the log-likelihood for the saturated or perfect model, as described earlier. Unfortunately, the hypotheses of Wilks' theorem are not satisfied when the logarithm of the denominator of Equation (4) is , as is the case for the Poisson deviance defined by Equation (2). This is because the perfect or saturated model in which is not described by the same continuous parameterization θ as the null model (term ). Therefore, the Wilks theorem does not apply in general to the Poisson deviance DP, and thus it is not possible to assume that DP ~ χ2(N − m) in full, for all values of the parent mean and for any form of the null-hypothesis parametric model.
Asymptotic and approximate distributions for are available in the literature. P. McCullagh (1986) found the conditional distribution of DP for log-linear models in the extensive and sparse regime (i.e., large N and small μ) via the method of cumulants (P. McCullagh 1984), including first- and second-order corrections to the asymptotic χ2 distribution that applies to the large-mean regime. Similar results were also obtained by J. S. Kaastra (2017) and M. Bonamente (2020), however limited to the case of a simple hypothesis. An in-depth analysis of the Poisson deviance, including more accurate approximations that generalize the P. McCullagh (1986) results to a broader class of models, is presented in Y. Chen et al. (2025, in preparation). A useful approximation to the distribution of the Poisson deviance is available in the large-mean regime, where DP is approximately distributed as χ2(N − m), due to the approximation of the Poisson distribution to a Normal distribution in that regime (e.g., in the Gaussian limit of W. Cash 1979). In astrophysics literature, the large-mean regime is usually interpreted as approximately ≥10 or 20 counts per bin (e.g., see discussion in M. Bonamente 2022).
These complications are in contrast with the case of Normal data, where the maximum-likelihood goodness-of-fit statistic
is distributed as S ~ χ2(N − m) for a variety of models and situations, under rather general conditions (e.g., Cramér's theorem; H. Cramer 1946; P. Greenwood & M. Nikulin 1996). For this reason, this statistic is usually referred to as the minimum chi-squared statistics () in astronomical applications. This simple result for Normal data is one of the main reasons why some practitioners use the minimum chi-squared statistic for parameter estimation and hypothesis testing of Poisson data, even though such a method leads to biased results (e.g., P. J. Humphrey et al. 2009).
3. A Model for the Regression of Poisson Data with Systematic Errors
This Section provides the details of the proposed method of analysis, starting with a definition of the type of systematic errors that are being considered.
3.1. Systematic Errors as Source of Overdispersion
In the development of a statistical model of systematic errors, we take the narrow view that systematics only provide an additional source of variance beyond that implied by the distribution of the data, without affecting the mean. While this assumption may seem restrictive, consider that any systematic shift or bias in the measurement of yi can in principle be accounted for by means of suitable model components that contribute to the model function f(xi; θ), provided they are a reasonably smooth function of the regressor variable x.
Practically, this view of systematic errors means that if the response variable yi is affected by systematic errors, the data analyst has corrected for the mean effect resulting from those errors. This is in fact what is routinely done in certain physics and astronomy applications, where such sources of systematics as instrumental gain or calibration are corrected with fixed factors that aim to provide a mean adjustment to the model (see, e.g., the calibration efforts for X-ray astronomy by M. Tsujimoto et al. 2011; P. P. Plucinsky et al. 2017). From an experimental viewpoint, this process results in a best-fit model that generally follows the data without systematic departures (in the mean), yet with a larger-than-expected variance.
In this work, we consider strictly independent measurements yi, as already emphasized in Section 2.1; this is a key assumption of the method. In certain applications, data may be binned with a finer resolution than the effective detector resolution, thereby introducing a certain degree of correlation among the detected counts in adjacent bins, and therefore also among the systematics. The choice of bin size is primarily a data reduction problem that is not directly addressed by this work (see, e.g., J. S. Kaastra & J. A. M. Bleeker 2016), and therefore it is assumed that bin-by-bin independence is satisfied. Nonetheless, it is important to point out that calibration, and therefore correction-in-the-mean, are often carried out over a range of the independent variable that covers several bins simultaneously (e.g., J. J. Drake et al. 2006; H. Lee et al. 2011, for X-ray astronomy examples). This is likely to introduce a degree of correlation in the systematics between adjacent bins that is not accounted for in this model of systematic errors, which would require additional considerations.
A way of specifying the effect of systematic errors on the response yi is illustrated via the compounding representation of random variables,7 with a nonnegative distribution of choice (such as the gamma) for the Poisson mean μi, leading to overdispersion. No restriction is placed on the origin of such overdispersion, i.e., on the shape of the compounding distribution for μi. In fact, we intend to develop a general-purpose model that enables (a) an estimate of the systematic variance, when the use of an a priori value is impractical; (b) a method of regression for Poisson count data in the presence of these errors; and, critically, (c) the evaluation of the distribution for the revised fit statistic (i.e., a deviance DP in the presence of systematics) for the purpose of hypothesis testing. These are the main goals of the method to be presented in this paper.
3.2. Review of Models for Overdispersion in Count Data
Some of the more common distributions to model overdispersed integer data are the negative binomial (NB) distribution, which results from the compounding of a Poisson distribution with a gamma-distributed rate parameter (e.g., M. Greenwood & G. Y. Yule 1920), or the Poisson-inverse-Gaussian (P-IG) distribution (e.g., M. C. K. Tweedie 1957; H. Sichel 1971; A. C. Atkinson & L. Yeh 1982). Such compounding (or mixing) models have been used in a variety of applications across the sciences, see, e.g., D. Karlis & E. Xekalaki (2005), L. Hurtado-Gil et al. (2017), or G. Willmot (1986) for a review of Poisson mixing distributions and applications. In addition to the gamma and inverse Gaussian distributions, some of the other notable mixing distributions include the lognormal distribution (M. G. Bulmer 1974) and the truncated Normal distribution (G. Patil 1964). The NB distribution is commonly used in GLM models (e.g., the NB1 and NB2 parameterizations; P. McCullagh & J. Nelder 1989; C. Cameron & P. Trivedi 2013).
A method to account for the additional variance induced by systematic errors is therefore the direct use of one such compounded Poisson distribution for the data, in place of the default Poisson model of Section 2.1, or any other distribution that allows overdispersion, such as the generalized Poisson distribution (e.g., P. C. Consul & F. Famoye 1992). There are two major reasons, however, to seek an alternate route for certain applications. First, there is substantial added computational cost in the use of alternative distributions such as the NB or the P-IG. For example, the P-IG distribution makes use of Bessel functions of the second kind (e.g., M. Holla 1967; A. C. Atkinson & L. Yeh 1982); likewise, the likelihood associated with the NB is more complex than for the Poisson distribution (see, e.g., J. M. Hilbe 2011). More critical, however, is the fact that the maximum-likelihood goodness-of-fit statistic becomes practically intractable, even for the relatively less complex NB model (see, e.g., C. Cameron & P. Trivedi 2013). The situation would therefore become even more dire than for the relatively simple Poisson regression statistic DP, which itself lacks the simple distributional properties of the chi-squared statistic for Normal data (see Section 2.3).
On the other hand, and moving away for a moment from "classical" compounded distributions, one could follow purely Bayesian methods for the inclusion of systematic errors in the analysis (see, e.g., J. M. Hilbe et al. 2017). Such methods would typically start with the choice of a prior distribution p(μi) to evaluate the posterior distributions for μi. In this case, assessment of the goodness-of-fit would be addressed via Bayes factors or by means of information criteria such as the Akaike information criterion (H. Akaike 1974), Bayesian information criterion (G. Schwarz 1978), or similar (for a review, see, e.g., P. Stoica & Y. Selen 2004; A. Gelman et al. 2013). Numerical evaluations are typically required for the posterior distribution, of a complexity that is probably of similar order to that of the numerical minimization of the DP statistic for the classical approach, depending on a number of factors such as the complexity of the model and that of the prior distribution of choice. The main drawback, however, is that the Bayesian approach does not have a simple goodness-of-fit statistic to directly test the null hypothesis H0 of a true model.
Certain applications may not require an accurate determination of the distribution of the goodness-of-fit statistic. In those cases, the NB or P-IG models or other compounded-variable variations, or alternatively Bayesian methods, may be the most appropriate route to model overdispersion. However, most physics and astronomy applications do require it, in that it is usually critical to perform an accurate test of a complex hypothesis as a means of validation or rejection of an underlying multiparameter theory. Examples in X-ray astronomy include most cosmological studies, such as the measurement of the Hubble constant (e.g., M. Bonamente et al. 2006; J. T. Wan et al. 2021), of the density of dark matter and dark energy (e.g., S. W. Allen et al. 2004; A. Vikhlinin et al. 2009a, 2009b; A. B. Mantz et al. 2014, 2022), of the cosmological density of baryons and the associated missing baryons problem (e.g., F. Nicastro et al. 2018; O. E. Kovács et al. 2019; D. Spence et al. 2023). This makes the classical compounded distribution method, or fully Bayesian methods, less desirable for this class of applications. The model proposed in the following aims to overcome this difficulty, by developing an approximate method of modeling systematic errors that is both computationally tractable even with limited resources (both computational and in terms of overall model simplicity), and more importantly leading to a simple and accurate method of hypothesis testing.
3.3. Proposed Model: The Intrinsic Model Variance and the Associated Mi Random Variable
We are now in a position to introduce our model for systematic errors in the regression of Poisson data. It is proposed that, after the usual Poisson regression is performed, the best-fit model be considered as a random variable indicated as Mi, as a means to model the presence of systematic errors. The mean of this random variable is naturally set to the ML estimate itself, and its variance is set to a newly introduced intrinsic model variance parameter , as proposed in M. Bonamente (2023a). This idea is illustrated in Figure 1 for a simple linear model, whereas the model has an intrinsic variability that is represented by the blue curve, representing the post-fit randomization of the best-fit linear model according to the random variable Mi.
Figure 1. Illustration of the method of regression with an intrinsic model variance. Data with N = 100 with a constant parent mean μ = 100 (dots) were fit to a linear model (solid line, ). The blue curve represents an instance of randomization of the best-fit model according to a Normally distributed with a relative systematic error of .
Download figure:
Standard image High-resolution imageFormally, the random variable Mi describes the distribution of the best-fit model , i.e.,
Any family of distribution functions with nonnegative support, such as the gamma distribution, can be used, and the choice of distribution for Mi is part of the process of modeling systematic errors. The choice of distribution for Mi is discussed in Section 4.5, and it will be shown not to be a critical task for most applications. Moreover, other choices could be made regarding the relationship between the parameters of the distribution of the Mi variable and the parameters and σint,i. For example, the variance in Equation (8) could be further generalized to a function of the intrinsic model variance parameter, i.e., where h( · ) is a positive-valued function. It is believed, however, that the present parameterization is probably sufficiently general for most applications.
The intrinsic model variance is assumed to be known a priori, e.g., from even limited knowledge of the instruments or methods used for the detection. For example, counts measured from a detector of the type used in high-energy astrophysics (e.g., H. Lee et al. 2011) are proportional to the effective area of the instrument, and an estimated calibration uncertainty on the effective area may translate linearly to an estimate of the intrinsic model error σint,i. Alternatively, can be estimated from the data, as will be shown below.
The intrinsic model scatter can be recast as a fraction fi of the best-fit model, whereas
is the fractional amount of systematic scatter relative to the best-fit model. This model of systematic errors only applies to small relative systematic errors, fi ≪ 1, which is the common regime for a variety of applications such as the astrophysics data that motivated this model (D. Spence et al. 2023). In fact, large disagreements between data and model are more likely to be imputed to a genuine failure of the model, rather than to the presence of systematic errors.
The introduction of the random variables Mi according to Equation (8), and the associated intrinsic model variance , are the main ingredients of this model for systematic errors in Poisson data. This model could also be applied to other data, e.g., Normally distributed measurements, since the definition in Equation (8) makes no assumptions on the distribution of the data themselves. Equation (8) embodies the change of perspective in the role played by systematic errors, i.e., the systematic variance is associated with the model after the regression is performed, rather than the data themselves. The random variable Mi plays a role that is somewhat similar to that of a Bayesian prior on the mean (as discussed in Section 3.2), in that it provides a probabilistic model that quantifies a prior belief on the intrinsic degree of variability of the parent model. The main difference, however, is that it is included in the probability model only after the regression is performed, and it does not have an effect on it. Therefore, the random variable Mi is used primarily for its effect on the distribution of the null-hypothesis goodness-of-fit statistic, as will be explained in detail in Section 4.
4. The Goodness-of-fit Statistic for the Poisson Regression with Systematic Errors
The introduction of the Mi random variable to model systematic errors is now incorporated into a statistical model for the regression of Poisson data. The main aim of this model is to derive a goodness-of-fit likelihood-ratio statistic that generalizes the usual , when systematic errors are present.
4.1. The Quasi-maximum-likelihood Estimation
The choice of introducing an intrinsic model variance and the associated random variables Mi to describe the model after the ML regression is performed (see Section 3.3) is borne out of convenience. With that assumption, the data yi can in fact be viewed as being distributed according to the compounded distribution
with Mi given by Equation (8), in place of the usual yi ~ Poisson(μi). Specifically, the moments requirements for Mi imply that the expectation remains E(yi) = μi under H0, while Var(yi) > μi according to the usual compounding formulas. This means that the data can be viewed as retaining the same parent means, but with larger variance (i.e., overdispersed), as was the intent of the method (see Section 3.1).
With this interpretation, which will be further justified in Section 4.2 below, the usual Poisson maximum-likelihood method of estimation that we use for the regression becomes a quasi-maximum-likelihood estimation (e.g., C. Cameron & P. Trivedi 2013), which is guaranteed to continue providing consistent estimates assuming that the means are correct according to the null hypothesis (as shown in, e.g., C. Gourieroux et al. 1984a). In other words, the regression can be performed by minimizing the usual DP statistic according to Equation (2), and therefore, there is no additional burden to the task of identifying the best-fit model in the presence of systematic errors.
Minimization of the DP statistic must be achieved with numerical methods, even for the simple linear model (e.g., C. Cameron & P. Trivedi 2013; M. Bonamente 2022). In fact, there is no fully analytical solution to the maximum-likelihood equations beyond the simple one-parameter constant model, when the parameter is trivially estimated by the sample mean. A semianalytical solution to the maximum-likelihood fit for the two-parameter linear model for Poisson data was provided in M. Bonamente & D. Spence (2022), including analytical estimates of the covariance matrix (M. Bonamente 2023b), which can reduce the computational burden for the linear model. For more complex models, the Poisson log-likelihood must be maximized via numerical methods.
4.2. The Goodness-of-fit Statistic
We now turn to the goodness-of-fit statistic in the presence of systematic errors, which we refer to as . Given that the Mi variables replace the simple estimates in this model of systematic errors, this new goodness-of-fit statistic is formally according to Equation (3), i.e.,
This statistic is therefore also immediately obtained as a likelihood-ratio statistic from the joint log-likelihood of the data distributed as in Equation (10), thus supporting our interpretation of Section 4.1.
It is also immediate to see that Equation (11) can be written as the sum of two terms. Referring to the statistic as Z for convenience, we can set
of which represents the usual DP statistic according to Equation (2). The additional statistic Y is associated with the random variable Mi, and it is defined by
It is clear that, when there are no systematic errors, Y vanishes as identically. Moreover, since Mi is independent of the data yi, Y is asymptotically Normal in the extensive data limit (i.e., large N) by the central limit theorem. 8 Accordingly, we can set the asymptotic distribution of Y as
where the two parameters of the distribution represent the additional mean and variance that are induced by the presence of systematic errors.
This model of systematic errors has therefore achieved our primary goal to provide a simple generalization for the goodness-of-fit statistic for the Poisson regression in the presence of systematic errors. The first contribution to the new goodness-of-fit statistic is thus the usual Poisson log-likelihood X = DP, whose distribution is assumed to be independent of the shape of the model μi = f(xi; θ). Therefore, the exact parameterization does not play a significant role in the determination of the overall distribution for , provided that basic requirements on model identifiability are satisfied (e.g., B. Hoadley 1971; J. R. Dale 1986; Y. Chen et al. 2025, in preparation).
The additional contribution Y has a simple Normal asymptotic distribution and, moreover, it is independent of X under the null hypothesis. This is the case for the following reasons. First, as we just remarked, the null-hypothesis distribution of X is not dependent on . Second, Y is by construction not dependent on the data yi ~ Poisson(μi), since the Mi random variable defined in Equation (8) depends only on the best-fit means , and on an intrinsic model variance that is assumed to be known a priori.
4.3. Estimation of the Parameters of Y
In order to estimate the asymptotic distribution of the Poisson goodness-of-fit statistic in the presence of systematic errors, as given in Equation (11), it is therefore necessary to simply estimate the mean and variance of Y in Equation (14). Detailed calculations to accomplish this task are provided in Appendix B, and key results are summarized below.
The moments of the Y distribution can be obtained via a Taylor series expansion of the logarithmic terms in Equation (13),
by retaining second-order terms in Equation (2). Given that , the expectation of the terms in the first sum of Equation (15) are null, leaving a second-order correction of
where according to Equation (8). The ratio is the fractional amount of scatter, and it is intended to be a number that is much smaller than one. This expectation will be referred to as the bias parameter μC associated with statistic. This expectation can therefore be estimated from the data via
The variance of Y is referred to as the overdispersion parameter , and it evaluates to
where kurt(Mi) is the kurtosis (or normalized fourth-order central moment) of Mi, e.g., kurt(Mi) = 3 for a Normal distribution. The second and third terms on the right-hand side are the second-order correction to the approximate results already provided in M. Bonamente (2023a). This approximation applies again for small relative errors, fi ≪ 1. This variance can be estimated from the data according to
see Appendix B for details.
Equations (17) and (19) are the main results for the estimation of the Y distribution, and they relate the key parameter of the model for systematic errors (i.e., the intrinsic model variance) to the bias and overdispersion parameters for the statistic in the presence of systematic errors. In principle, one may retain higher-order terms in the power series expansions. The accuracy of these approximations is tested in Section 5, to show that for typical values of systematic uncertainties, the second-order corrections presented in this paper may be sufficient for most applications.
4.4. Asymptotic Distributions
In general, a simple closed form for the asymptotic distribution of cannot be given in all regimes, as discussed in Y. Chen et al. (2025, in preparation). However, useful approximations can be given that cover most practical data analysis cases.
In the large-mean regime, X ≈ χ2(ν) due to the approximation Poisson(μ) ≈ N(μ, 2μ) for the distribution of the data (this is true even for data set with small N). Provided that Y is approximately Normally distributed for that value of N, then , and it follows that in the large-mean regime is approximately distributed like the sum of a chi-squared and an independent Normal variable. The resulting distribution is referred to as the overdispersed chi-squared distribution B,
which is a special case of the gamma-Normal family of distributions, with a probability distribution function that is the convolution of independent χ2(ν) and distributions (see M. Bonamente & D. Zimmerman 2024, for properties of this distribution).
For applications with extensive data and in the large-mean regime, it is possible to further approximate , therefore bypassing the use of the overdispersed chi-squared distribution. Therefore, is asymptotically Normal with mean and variance equal to the sum of means and variances of the two contributing statistics,9
Therefore, for most applications in the large-mean and extensive data regime, this model of systematic errors for Poisson data results in a straightforward extension of the χ2(ν) distribution for the fit statistic in that regime, with the simple use of Equation (21) as the null-hypothesis distribution for the goodness-of-fit statistic.
In the low-mean regime, however, additional considerations are needed even in the asymptotic limit of extensive data. Y. Chen et al. (2025, in preparation) showed that the Poisson deviance X = DP remains asymptotically Normal, but with mean and variance that will differ significantly from the simple large-mean values assumed in Equation (21). For extensive data in the low-mean regime,
where the conditional mean and the conditional variance of are provided by Theorems 8–10 of Y. Chen et al. (2025, in preparation). In the low-mean regime, there is therefore only the additional burden of evaluating the appropriate moments for , and the asymptotic distribution of remains Normal for extensive data.
4.5. Distribution for Mi and Other Considerations
Provided that the relative level of systematic errors is small, fi ≪ 1 as assumed throughout, the choice of distribution for Mi is not critical. According to the approximation of Equation (15), E(Y) depends only on the first- and second-order moments for Mi, and it is independent of other moments of the distribution of Mi. Equation (8) ensures that E(Y) is independent of the choice for the distribution of Mi. Moreover, Var(Y) has only a mild dependence on higher-order moments, according to Equations (19) or (B.16). It is nonetheless useful to illustrate in detail how the choice of distribution affects the estimate of the variance of Y. For this purpose, we use a gamma-distributed Mi variable.
Let Mi be distributed according to a gamma distribution with rate parameter α and shape parameter r, indicated by Mi ~ gamma(α, r). If the variable is required to have an expectation and a variance according to Equation (8), then the two parameters of the distribution are, respectively, and . The variance according to Equation (19) then becomes
where the term , which reflects the excess kurtosis of the gamma distribution, has a small effect, given that fi ≪ 1. Any other choice for the distribution of Mi can be treated in a similar way, and the resulting estimates for the variance of the Y distribution are in general provided by Equation (B.16).
Alternatively, a Normal distribution for Mi would reflect the assumption that there may be multiple sources for this variance that are likely to yield a Normal distribution, according to the central limit theorem. However, the fact that a Normal distribution allows negative values makes this choice reasonable only in the limit of , whereas one ignores negative values of the Normal distribution that are untenable for the mean of a Poisson distribution.
Finally, it may be useful to remark that the Normal approximation for the Y statistic is simply a result of the central limit theorem for extensive data. This conclusion applies regardless of choice for distribution of Mi, provided the distribution has finite mean and variance, as is the case for most distributions including the Normal and gamma.
5. Monte Carlo Tests
The statistical model of systematic errors presented in Sections 3 and 4 is now tested with the aid of Monte Carlo numerical simulations that are representative of real data analysis cases, with the goal of illustrating the overall accuracy of this model to predict the distribution of , and its limitations. Given the independence of the statistic on the parameterization of the regression model f(x), as discussed in Section 4, it is sufficient to test the results with simple analytic models. For this paper, we choose a constant model with m = 1 parameter, and a linear model with m = 2 parameters. The Monte Carlo simulations are aimed at validating the distributions for the fit statistics X, Y, and Z with these simple models.
5.1. Linear Model with 10% and 5% Systematic Errors
For these simulations, a data set with N = 100 independent data points is drawn from a constant model with a parent mean μ = 100, in such a way that the Poisson data are in the extensive and large-mean regime. The data are fit to the same constant model, and also to a linear model y = a + b x to obtain the maximum-likelihood estimates .
First, the X statistic is calculated according to Equation (2). Next, in order to simulate the presence of systematic errors, the best-fit models (both full and reduced) are randomized at each point according to a Normal distribution for Mi with parameters according to Equation (8), using a fixed value of fi = 0.1 representing a 10% level of systematic errors. The fit statistic with these randomized models are referred to as Z and Zr, with "r" denoting the fit to the reduced model. The contribution to the fit statistic due to the systematic error is then Y = Z − X. The same procedure was repeated 500 times to obtain the empirical cumulative distribution function (eCDF) of the statistics , X and Y.
Tests of the agreement between the empirical and predicted distributions of the X, Y and Z statistics were performed with the Kolmogorov–Smirnov one-sample statistic (A. Kolmogorov 1933). The fit statistics X and Xr in the absence of systematic errors are expected to be distributed as χ2(N − m) according to the Wilks theorem. In this limit, this distribution is well approximated by a Normal distribution of same mean and variance. In the same limit, the Y and Yr statistics are approximated by Normal distributions with mean and variance according to Equations (17) and (19), and the Z and Zr statistics also by Normal distributions according to Equation (21). For all statistics X, Y, and Z, the corresponding DN statistics for the comparison between the eCDF and the expected distribution have large p-values, indicating good agreement.
Additional Monte Carlo simulations similar to the one reported above were repeated with the level of systematic errors now set at 5%, or fi = 0.05. As for the previous case, we obtained good agreement between theoretical and simulated statistics, according to the respective DN statistics. The case of 5% systematic errors better represents a typical real-life data analysis situation where systematic errors may be invoked, rather than the case of 10% systematics. In fact, for N = 100 and μ = 100, a 5% systematic error leads to a bias that is at the level of 25% of the expected null-hypothesis statistic (with E(X) = N − m ≃ 100). This figure rises to ~ 100% for the 10% systematic errors, i.e., a bias of .
5.2. Linear Model with GLM Log-link
GLMs are commonly used in a variety of disciplines such as econometrics (e.g., C. Cameron & P. Trivedi 2013), and occasionally also in astronomy (e.g., R. de Souza et al. 2015, M. W. Hattab et al. 2019, S. C. Berek et al. 2023). In GLM models, the usual linear model with the two parameters θ = (a, b) is used with a linear predictor ηi = a + b xi and with Poisson mean , which corresponds to the canonical log-link in the GLM notation, . This is a common choice for the analysis of Poisson data, to avoid possible nonpositive means with the usual linear model of Section 5.1. We therefore performed simulations with the same parameters as for the usual linear model that was used in the previous Section (with N = 100 and μ = 100), and obtain statistically indistinguishable results from the results with the traditional linear model. Quantitative results with both the traditional linear model and the log-link model are presented in Section 5.3 below.
A GLM model with the canonical logarithmic link corresponds to a Pareto or power-law model with Poisson mean , with and when the independent variable is on an exponential scale, . It is necessary to highlight that, for any physically based model y = f(x), the usual linear model and the GLM linear model with a canonical log-link are not equivalent. Given that the linear model with log-link is equivalent to a power-law model with exponentially spaced independent variables xi,r, the GLM log-link linear model provides additional evidence of the independence of the results from model parameterization.
5.3. Comprehensive Tests of the Y Statistic
Comprehensive Monte Carlo tests for the mean and variance of the distribution of Y were performed as follows. A selected number of representative values for the N, μ, and f parameters were chosen, and 500 Monte Carlo realizations of the a constant model were used to generate random data sets that were then fitted to the same constant model, and also to a two-parameter linear model, the same as for the simulations presented in Section 5.1.
The mean and the variance of the resulting Y statistic for the fit to the linear model are shown in Figure 2, along with the predictions from Equations (17) and (19) that are reported as dashed lines. For all parameters, the Monte Carlo estimates of and agree typically with the model predictions to within ± 10% for fi ≤ 0.1, and often within just a few percent. This degree of agreement indicates that the Y random variable is sufficiently accurately described by Equation (14), in a wide range of model parameters that are typical for data analysis applications. Identical results were found for the Yr statistic for the fit to the reduced model.
Figure 2. Results of the Monte Carlo simulations of the Y variable for a data set with a constant parent model. The parent value of the Poisson mean are coded according to color, and the symbols (also color-coded) are used to distinguish the values f = 0.01, 0.02, 0.05, 0.10, and 0.20 for the fractional systematic error. Dashed lines are the theoretical curves with the approximations of Equations (17) and (19). The f = 0.20 curves with color-coded right triangle symbols are used to illustrate the departure from the approximations when the systematic errors are too large, and are not used in applications. Data sets with N = 10, 50, 100, 500, and 1000 independent data points were simulated.
Download figure:
Standard image High-resolution imageThe range of parameters for the simulation was limited by design to fi ≤0.1, plus the case of fi = 0.2 that is used to illustrate the errors in the estimates for larger values of systematic errors. Table 1 reports the average fractional deviations and , and their standard deviations, according to the results shown in Figure 2. A 20% systematic error is outside the domain of approximations presented in this paper, and therefore the deviations seen in Figure 2 and Table 1 for fi = 0.2 is simply a result of the inaccuracy of the Taylor series approximation; see Appendix Equation (B.1). In fact, large values of this parameter are not of practical use in data analysis applications, as already remarked in Section 5.1. Also, the low-mean regime μ≤10 is not tested by these simulations.
Table 1. Fractional Differences ημ and ησ between the Monte Carlo Simulations of μC and σC (Figure 2) and Their Predictions According to Equations (17) and (19)
ημ | ησ | |
---|---|---|
Linear model | ||
0.01 | −0.012 ± −0.055 | −0.055 ± 0.055 |
0.02 | −0.004 ± 0.058 | −0.030 ± 0.046 |
0.05 | 0.004 ± 0.023 | −0.003 ± 0.037 |
0.10 | 0.027 ± 0.012 | 0.002 ± 0.053 |
0.20 | 0.081 ± 0.029 | 0.184 ± 0.095 |
GLM with log-link | ||
0.01 | 0.022 ± 0.160 | −0.013 ± −0.047 |
0.02 | 0.021 ± 0.083 | −0.024 ± 0.057 |
0.05 | 0.010 ± 0.041 | 0.000 ± 0.044 |
0.10 | 0.010 ± 0.030 | 0.002 ± 0.049 |
0.20 | 0.068 ± 0.015 | 0.148 ± 0.082 |
Download table as: ASCIITypeset image
In addition, another set of Monte Carlo simulations similar to those illustrated in Figure 2 was performed using the linear GLM with canonical link introduced in Section 5.2. In this case, we used a logarithmically spaced set of independent variables xi between 0 and , to avoid possible overflow issues when implementing the exponential mean. The results of those simulations are statistically indistinguishable from those with the usual linear model; see Table 1.
The agreement between the Monte Carlo simulations for systematic errors with fi ≤ 0.1 and the predictions according to the theory presented in Sections 3.3 and 4 validates this model as an accurate tool to estimate the goodness of fit for the regression of Poisson data in the presence of systematic errors.
6. Methods of Hypothesis Testing
In this Section we provide a summary of the use of this model for two classes of applications. In the first case, a known value for the intrinsic model variance σint can be used for the usual hypothesis testing of a measured value of . In the second case, when it is not possible or convenient to estimate the intrinsic variance a priori, the distribution of presented in this paper can be used to estimate the intrinsic variance itself, under the assumption that the null hypothesis is correct. We then provide an application to real-life astronomical data.
6.1. Hypothesis Testing with Known Systematic Errors
In many data analysis situations, it is possible to estimate a priori the level of systematic errors present in the measurements. This is the regime that was envisioned for the model of systematic errors developed in Section 3, where the σint,i parameters are assumed to be known. For example, inherent uncertainties associated with either or both the photon-counting device and the parent astrophysical process that generates the photons are common for the detection of high-energy photons from distant quasars, such as in the recent applications by D. Spence et al. (2023), F. Nicastro et al. (2018), and O. E. Kovács et al. (2019), among many others. In that class of applications, it is often common to use a simple power law or other phenomenological models such as a spline, to model the number of photons detected as a function of wavelength. The parent model is likely to be more complex and often unknown, giving rise to the need for a source of systematic error in the Poisson regression, as opposed to the employment of an alternative model.
For example, in D. Spence et al. (2023), we estimated a level of systematic errors in the calibration of the so-called effective area of the CCD detector at the few percent level for the main instrument used for those data. Given that the effective area of the instrument is linearly related to the measured counts, this uncertainty could be used as an a priori value of the parameter, say fi ≃ 0.02, to test whether the resulting fit statistic is consistent with that value of the intrinsic model variance.
In this class of applications, assuming extensive data, hypothesis testing consists of two steps:
- 1.
- 2.Then, the relevant parent distribution for , as discussed in Section 4.4, is used to perform the usual one-sided null-hypothesis test with the measured value of , for a given p-value.
6.2. Estimate of the Intrinsic Model Variance from Data
In certain experimental cases, it is not possible or interesting to provide an accurate estimate of the level of systematic errors present in the problem a priori. This situation arises when the data regression yields an unacceptably high value of , and the data analyst (a) may not provide an alternative model for the regression, or (b) is interested in evaluating the level of systematic errors relative to such model, following the belief that the data are in fact accurately described by the model at hand.
In this class of applications, the parent distribution of according to Equations (21), or (22) in the low-mean regime, can be used to estimate the intrinsic model variance , i.e., the level of systematic errors, assuming that the regression model is the parent model. In this Section, the measured goodness-of-fit statistic will be referred to as lower-case , to distinguish it from the statistic itself. We also assume that the intrinsic level of systematic errors fi is constant among the data points, and we indicate this value as f.
A point estimate for the average level of systematic errors f can be obtained by requiring , i.e., the measurement equals the expectation of the parent distribution. In general, the values of the mean and variance of X were discussed in Section 4.4, and in the large-mean regime, they are E(X) = (N − m) and Var(X) = 2(N − m). It follows that the bias parameter can be estimated as
Then, using Equation (17) and assuming a constant value for f, the estimate is simply obtained as
For example, a value of =125 for a data set with N = 100 and a parent mean μ = 100, leads to an estimated according to Equation (25), similar to the one assumed for the Monte Carlo simulations with fi = 0.05.
Confidence intervals for f can be obtained by requiring that the measurement corresponds to specified quantiles of the parent distribution for Z. For example, for a 68% confidence interval, the condition
can be used to obtain the lower and upper limit of the confidence interval for , upon which the bias and overdispersion parameters in Equation (26) depend. The equation is quadratic in f, and it can be further simplified if the overdispersion parameter is held fixed at the point estimate Equation (19). With this approximation, confidence intervals on the f parameter are estimated via
i.e., the lower and upper bounds of the confidence intervals are obtained from Equation (27), with α = 1 for a 68% confidence level. Confidence intervals at a different level of probability p (i.e., p = 0.9 or 0.99) can be obtained by multiplying the last term in the numerator of Equation (27) by the proper factor α corresponding to the (1 + p)/2 quantile of an N(0, 1) distribution (e.g., α = 1.65 and 2.58 for p = 0.9 and 0.99, respectively). Continuing with the same example as above, the 68% confidence interval according to Equation (27) is given by f = (0.031, 0.067), and the estimate of the systematic error would be reported as . It is clear that Equation (27) may yield a negative number for the lower limit; that is an indication that no systematic errors are required in that case (i.e., a lower confidence value of zero).
In principle, it may be possible to estimate the unknown σint,i parameters following a direct maximum-likelihood method, which depends on the choice of the compounding distribution for the random variable Mi (see Section 4.5). For example, a compounding gamma distribution results in a negative binomial distribution for the data, as discussed in Section 3.3, and one could proceed with a maximum-likelihood method to estimate the overdispersion parameter of the gamma distribution (e.g., following the methods discussed in R. A. Fisher 1941; W. W. Piegorsch 1990; J.-P. Wang 2010). Other methods are also are available to estimate the compounding distribution of a Poisson variable (e.g., H. G. Tucker 1963; L. Simar 1976). The approximate method to estimate the overdispersion parameter presented in this Section, on the other hand, has the advantage of being distribution-independent, provided conditions in Equation (8) on the choice of Mi are met. This method can therefore be used immediately for virtually any application, without the additional complications associated with the use of a direct maximum-likelihood method for this task.
6.3. A Case Study with Astrophysical Data
The methods discussed in this Section are further illustrated with the same data presented in D. Spence et al. (2023) and M. Bonamente (2023a) for the spectra of the quasar 1ES 1553+113. The spectrum used in this case study correspond to the data illustrated in Figure 4 and Table 2 of M. Bonamente (2023a), of which a narrow range is illustrated in Figure 3.
Figure 3. Zoom-in on a narrow wavelength range of the XMM–Newton spectra of the source 1ES 1553+113 analyzed by D. Spence et al. (2023). The RGS1 data were shifted upward by 20% for clarity. Vertical error bars are the square root of the number of counts, for illustrative purposes only.
Download figure:
Standard image High-resolution imageThe data are representative of a maximum-likelihood regression with Poisson data, in the extensive and large-count regime. Several data points have a significantly lower counts detected (i.e., below 25 Å for RGS2), due to a decrease in the instrument's sensitivity, or gain. Such detector inefficiencies are calibrated and corrected in the mean (e.g., M. Tsujimoto et al. 2011), but residual uncertainties are presumed to contribute to the larger-than-expected variance, following the model of systematics presented in Section 3.1. Figure 3 illustrates how the complex parametric model generally follows the data well, without systematic deviations that could be interpreted as a failure of the model's mean to describe the data. Small-scale fluctuations of the model (of order ≤1%) are due to changes in the sensitivity of the instrument as a function of wavelength, while large-scale trends are a combination of wavelength-dependent sensitivity of the instrument and the genuine spectral distribution of the source.
The data have the following parameters: N = 1526 independent data points, a model with m = 48 free parameters, for a number of degrees of freedom ν = N − m = 1478, and a fit statistic =1862.7. The number of detected counts per bin varied across the wavelength range with ranges 114–1337 for RGS1 and 133–1226 for RGS2, with an average of 728.6 for RGS1 and 756.2 for RGS2 counts per bin; therefore these data are in the large-mean regime. The one-sided p-value of the measured statistic according to the parent distribution χ2(ν) is negligibly small, indicating that the null hypothesis of the Poisson data being drawn from the parent model is formally unacceptable. Following the present model for systematic errors in the Poisson regression to address the larger-than-expected fit statistic, Equation (25) and (27) yield an estimate of , meaning that the larger-than-expected statistic is consistent with a systematic error of approximately 2%, which is in fact consistent with known levels of detector systematics (D. Spence et al. 2023).
An alternative analysis of these data consists of the assumption of a given value for the level of systematic errors. If one assumes the same point estimate f = 0.018 based on Equation (25), it is immediate to see that the null-hypothesis probability is 50%: this is by design, according to Equation (24). A more interesting example is to test the hypothesis of a somewhat smaller value, say f = 0.01 or 1% intrinsic variance. Then, Equations (17) and (19) yield a bias parameter and an overdispersion parameter , and the parent overdispersed χ2 distribution in Equation (21) is approximated with a Normal distribution
The measured value corresponds to a null-hypothesis probability value of p = 0.000002, indicating that this is indeed too small a value of the intrinsic variance, in accord with the estimate above of . The same procedure can be used to test any other level of systematic errors.
7. Discussion and Conclusions
This paper has presented a simple statistical method to address the presence of systematic errors in the regression of Poisson data, even when there is no information on the physical source of such uncertainties. The method is based upon the reversal of the role played by systematic uncertainties, which are usually attributed to an overdispersion of the data generating process. Within that context, overdispersed integer-count data my be modeled with a variety of alternative distributions, such as the negative binomial (see, e.g., J. M. Hilbe 2011, 2014). That approach, however, comes at the price of a more complex method of regression and its associated goodness-of-fit measure. Instead, retaining the relatively simpler Poisson distribution for the data and attributing the overdispersion to an intrinsic variance in the model, makes it possible to retain the unbiasedness and relative simplicity of the quasi-maximum-likelihood Poisson regression (see C. Gourieroux et al. 1984b, 1984a; as discussed also in Section 3.3), and obtain a simple generalization of the goodness-of-fit statistic.
The main result of this paper is that the goodness-of-fit statistic for the regression of Poisson data in the presence of systematic errors is the sum of two independent statistics. The first term is the usual Poisson deviance or Cash statistic , with an asymptotic Normal distribution in the extensive data regime (Y. Chen et al. 2025, in preparation). The second is a corrective term Y that is also asymptotically Normal, with simple relations given in Equations (17) and (19) for its mean and variance. As a result, hypothesis testing with this model for systematic errors is simple to implement in a variety of data analysis situations, as illustrated with the numerical simulations of Section 5 and in the case study of Section 6.3.
This new method was motivated by the need for an accurate method to include systematics in the regression of integer-count data, which are ubiquitous across the sciences. In addition to hypothesis testing, this method also permits an estimate of the intrinsic variance from the data, when no information is available on the level of systematic errors. The case study based on the astrophysical spectral data (D. Spence et al. 2023) has illustrated these methods with real-life measurements. The numerical simulations that we have performed for this paper show that the method is accurate in the Poisson large-mean limit, where we have used the chi-squared approximation for the distribution of the Poisson deviance DP statistic.
Appendix A: Reference Distributions, Compounding, and Other Formulas
A.1. The Poisson and Gamma Random Variables
The probability mass function of a Poisson random variable X ~ Poisson(μ) is
where μ > 0 is usually referred to as the rate of the Poisson distribution. The mean and variance of a X are
The probability distribution function (pdf) of a gamma random variable X ~ gamma(α, r) is
where Γ is the usual Gamma function and where α, r are positive real numbers. The parameter α is the rate parameter (and 1/α as the scale parameter), and r is the shape parameter. The mean and variance of a gamma(α, r) random variable X are
with a skewness and kurtosis of, respectively, and kurt(X) = 3 + 6/r. Other properties of the Poisson and gamma distributions can be found in most textbooks on probability (e.g., R. Hogg et al. 2023; K. Siegrist 1997).
A.2. Compounding or Mixing of Distributions
Many experiments are conveniently described by random variables whose parameters are themselves variables following another distribution. This compounding of random variables is well documented in the literature, such as the Poisson distribution with a gamma-distributed rate parameter, i.e.,
which results in a negative binomial distribution (e.g., M. Greenwood & G. Y. Yule 1920; C. I. Bliss & R. A. Fisher 1953).
Compound distributions are sometimes referred to as contagious distribution (e.g., J. Gurland 1958), based on the fact that certain models of diseases feature a (positive) correlation between the number and the probability of occurrence of certain outcomes (see, e.g., M. Greenwood & G. Y. Yule 1920). More recently, these types of distributions have been referred to as mixtures, and they are commonly used in a variety of machine learning applications, such as the EM algorithm (T. Hastie et al. 2009; L. E. Baum et al. 1970).
General formulas for the mean and variance of a compound distribution can be given, under certain circumstances. Let X have a distribution F (i.e., Poisson), and a parameter θ of its distribution (for the Poisson, there is only the μ parameter) be distributed according to G; then the resulting compounded distribution H is such that
where the subscript highlights the relevant distribution to be used (e.g., A. M. Mood et al. 1974). In the example of the Poisson distribution with a gamma-distributed rate parameter μ, which is also the mean of the Poisson distribution, this implies
The results show that the Poisson distribution, compounded with a gamma-distributed rate parameter, leads to overdispersion relative to the pure Poisson case.
A.3. Other Formulas of Interest for DP
In general, there is no guarantee that the sum of the deviations in a Poisson regression is equal to zero. When an intercept is used in a GLM with the canonical logarithmic link, i.e., , then it is immediate to see that the sum of the ML Poisson deviations does in fact equal zero. In that case, the deviance statistic simplifies to
and it is defined as such in certain applications (e.g., L. A. Goodman 1969). However, given the emphasis on nonlinear models that may or may not have an intercept, it is convenient to retain the usual expression of Equation (2) that applies in general.
Appendix B: Estimates of Moments of the Y Statistics
This Appendix describes the evaluation of the moments of the Y statistic, which was defined in Equation (13).
B.1. Definition
The Y statistic is defined by
where Mi is a random variable with mean and variance ; see Equation (8). In Equation (B.1), is the ML estimate of the mean of the N independent Poisson data points yi. The goal is to estimate the mean E(Y) and the variance Var(Y) as a function of the fixed parameters and σint, and as a function of the data yi or its parent means μi.
The Y random variable is approximated via the usual McLaurin series for the logarithmic term,
With a zero-mean variable of variance , and also by design, the expectation of the higher-order terms is much small than the second-order term. Therefore, ignoring higher-order terms provides an accurate approximation, and hereafter the Y statistic is approximated by its first three terms as written in Equation (B2).
B.2. Approximations
It is well known that, under mild conditions on the properties of the parametric model, the ML parameters are asymptotically Normally distributed,
where A is the Fisher information matrix, and its inverse the usual covariance or error matrix. A common method to estimate the parameter variances is via the Hessian estimator , which evaluates the information matrix at without taking the expectations. An equivalent -dependence applies to any function of the parameters, in particular to the means μi(θ), whereas the approximate delta method, usually referred to as the error propagation method in the physical sciences, leads to
where θ is a column vector, and T denotes its transpose (for textbook references, see, e.g., P. R. Bevington & D. K. Robinson 2003; L. Wasserman 2010). For example, for the constant model considered in the Monte Carlo simulations of Section 5, the partial derivatives in Equation (B4) are trivially equal to the unity scalar, while for the usual linear model y = a + b x with θ = (a, b), it is ∂μi/∂θ = (1, xi), where xi is the fixed coordinate for the yi measurement.
For extensive data, the standard deviation of the ML in means Equation (B4) therefore decreases as , and accordingly, we approximate as fixed values for the purpose of estimating the moments of Y. It is worth pointing out that the variability of is duly accounted for in the distribution of the X statistic in Equation (12), which represents the usual Poisson deviance DP without systematic errors. This assumption was tested with the Monte Carlo simulations presented in Section 5.3.
B.3. Evaluation of the Moments
The assumption of independence between Mi and yi, by construction, and the assumption that have negligible variance in the extensive data regime (see above) simplify considerably the evaluation of the moments of Y.
The approximate estimates for the mean of Y are therefore given by
which simply requires the given first- and second-order moments for the Mi variable, respectively, for terms (A) and (B). It therefore immediately follows that
which is reported as Equation (17). Notice that the expectation of (A) is null regardless of the approximation of a constant , since . The second approximate equality is exact if the model y = f(x) has an intercept, as do both the constant and the linear models (and if is constant for all data points), since the ML Poisson regression in that case satisfies ∑(μi − yi) = 0. This approximation applies to any distribution for Mi, provided the mean and variance are set according to Equation (8).
For the variance, we start by evaluating the moment of second order
where the subscript "P" indicates the Poisson distribution for the data yi, and the subscript "int" represents the distribution for the Mi variable, which is assumed independent from the data. The cross-product terms of (A) and those between (A) and (B) vanish because of independence among the N terms and the property .
The cross-product terms of (B) for i ≠ j, indicated by C, survive as the indicated double sum, which by independence evaluates to
using Equation (B6) above, with the negative correction term arising from the j = i terms that were missing in the second sum, and indicating as usual .
It therefore follows that the general expression for the approximation to the variance of Y according to the Mi distribution is
which depends on the kurtosis of the distribution for Mi. For Normal , kurt(Mi) = 3. For a gamma distribution of the type considered in Section 4.5,
where the excess kurtosis relative to the Normal distribution provides a small correction, since fi ≪ 1.
A simplified expression for Var(Y) could have been obtained from Equation (B8) with the assumption that yi are constant, which would operationalize the variance of Y as
It is therefore reasonable to use either Equation (B8) with the replacement of μi with yi, or Equation (B9), as numerical estimators of the variance of Y. In the large-mean regime that is assumed throughout, differences between the estimators are expected to be negligible.
Footnotes
- 4
See also C. Cameron & P. Trivedi (2013), where the factor of 2 was omitted.
- 5
- 6
- 7
- 8
This is true provided each term of Y has finite mean and variance according to the choice of distribution in Equation (8), which is usually the case for most distributions.
- 9
This can be immediately seen from the property of infinite divisibility of the Normal distribution, or that the convolution of two Normal distributions remains Normal.