GammaBayes: a Bayesian pipeline for dark matter detection with CTA

We present GammaBayes, a Bayesian Python package for dark matter detection with the Cherenkov Telescope Array (CTA). GammaBayes takes as input the CTA measurements of gamma rays and a user-specified dark-matter particle model. It outputs the posterior distribution for parameters of the dark-matter model including the velocity-averaged cross section for dark-matter self interactions 〈σv〉 and the dark-matter mass mχ . It also outputs the Bayesian evidence, which can be used for model selection. We demonstrate GammaBayes using 525 hours of simulated data, corresponding to 108 observed gamma-ray events. The vast majority of this simulated data consists of noise, but 100000 events arise from the annihilation of scalar singlet dark matter with mχ = 1 TeV. We recover the dark matter mass within a 95% credible interval of mχ ∼ 0.96–1.07 TeV. Meanwhile, the velocity averaged cross section is constrained to 〈σv〉 ∼ 1.4–2.1 × 10-25 cm3 s-1 (95% credibility). This is equivalent to measuring the number of dark-matter annihilation events to be NS ∼ 1.1-0.2 +0.2 × 105. The no-signal hypothesis 〈σv〉 = 0 is ruled out with about 5σ credibility. We discuss how GammaBayes can be extended to include more sophisticated signal and background models and the computational challenges that must be addressed to facilitate these upgrades. The source code is publicly available here.


Contents 1 Introduction
For decades there has been mounting evidence that cold, non-baryonic dark matter makes up a majority of the matter of the Universe [1][2][3].Most of the evidence for dark matter is gravitational in nature, such as of anomalous galaxy rotation curves [4,5] and gravitational lensing observations such as with the Bullet Cluster [6].While these measurements, together with observations of the CMB, show that dark matter makes up 85% of the matter in the Universe, they do not tell us the dark-matter mass m χ or how it interacts with itself and other standard-model particles [7,8].If dark matter interacts with standard-model particles, it may be possible to measure these quantities [9,10].In particular, the self-annihilation of dark matter can produce high-energy photons, from regions of high dark-matter density such as the Galactic Center.High-energy gamma-ray telescopes such as the Cherenkov Telescope Array (CTA) [11] therefore provide a powerful indirect probe of dark matter particles with masses in the m χ ≈ 10 GeV − 10 TeV range [12,13].
The CTA is an imaging atmospheric Cherenkov telescope array [11].It works by capturing images of the flashes of Cherenkov radiation emitted from extensive air showers in the atmosphere, initiated by relativistic cosmic rays [14].The CTA, which is currently under construction, will eventually consist of two separate arrays: one in the Northern Hemisphere in La Palma, Spain near the MAGIC telescope, and another in the Southern Hemisphere in Paranal, Chile near the European Southern Observatory [15].The Southern site is wellpositioned to observe the Galactic Center, which is a promising target for indirect dark matter searches [12].The Galactic Center provides a relatively nearby and deep gravitational potential well which, as structure formation simulations indicate, contains a substantial amount of dark matter.In this high-density environment, dark matter may self-interact, annihilating to particles that eventually produce gamma rays [16].The Galactic Center, however, is also rich in astrophysical gamma-ray sources.The primary backgrounds for dark matter measurements include diffuse interstellar emission and localised gamma-ray sources from active galactic nuclei, pulsar wind nebulae, supernova remnants, and pulsars [17].Weak, as-ofyet unresolved point sources are likely to be difficult to separate from a dark matter signal [13].The challenge of disentangling a dark matter signal from astrophysical sources makes it difficult to confidently attribute gamma rays from the Galactic Center to dark matter annihilation [9,18].
Therefore, it is important to design dark matter detection pipelines that carefully model the distribution of signal and background in both energy and sky location.In this work we present a Bayesian analysis pipeline to help address the challenge of detecting a dark matter signal from the Galactic Center with the CTA.The pipeline, called GammaBayes, is open source, 1 and is also on the Python package index (PyPI).
The remainder of this paper is structured as follows.In section 2 we introduce the Bayesian formalism that underpins GammaBayes.The instrument response functions (IRFs), accessed using Gammapy, are cast as likelihood functions.The emitted energy spectra and sky position distribution expected for signal and background are cast as priors.We largely follow the formalism from ref. [19], though, with more sophisticated modelling and new notation chosen to better adhere to other CTA literature. 2 In section 3 we apply GammaBayes to simulated data and demonstrate that we can correctly recover the dark-matter mass m χ and velocity weighted annihilation cross section ⟨σv⟩ for a scalar-singlet dark-matter model.We estimate the sensitivity of the pipeline for various values of m χ given the scalar singlet model.In section 4 we discuss the limitations of the current version of GammaBayes and our plans for future improvements.

CTA Data
Raw CTA data consists of images of particle showers measured by an array of optical telescopes.Each candidate gamma-ray detection is referred to as an event.The raw data for each event is pre-processed in order to obtain estimators for the reconstructed sky position Ω r and energy E r .These estimators are in contrast to the true sky position and the true energy (Ω, E) (no subscript r).Our starting point is this processed data.We denote the data for event i as d i = (Ω i r , E i r ).The sky position includes two numbers: the Galactic latitude b and the Galactic longitude l-the Galactic Center is situated at (l, b) = (0, 0).For the sake of simplicity, our simulated data is generated assuming that the CTA is pointed directly at the Galactic Center so that the field-of-view (FOV) coordinates [13,20] are identical to the Galactic coordinates.

Likelihood
The likelihood function is a model of the measurement, quantifying the probability of observing d i = (Ω i r , E i r ) conditioned on the true values (Ω i , E i ).The likelihood factorizes into Within the gamma-ray astronomy field these two components are referred to as the instrument response functions.The likelihood of the reconstructed energy E r , referred to as the energy dispersion, is the probability density of the reconstructed energy given some true energy E and true sky position Ω.The likelihood of the reconstructed sky position Ω r , referred to as the point spread function, is the probability density of the reconstructed sky position, given some true sky position Ω and some true energy E. Example plots of the latest energy dispersion and point spread functions for the CTA are shown in the left and right panels of figure 1, respectively.We use the prod5, version-0.1 instrumental response functions available on Zenodo [21]. 3

Signal prior
The particle and astrophysics assumptions on dark matter are captured by a prior distribution of dark-matter gamma-ray annihilation products in energy and sky position: It is a conditional prior, the shape of which depends on the dark-matter parameters, θ, such as mass m χ , which in this context is a hyper-parameter.In principle, this prior can depend on additional hyper-parameters as well such as particle coupling constants.Additionally parameters describing the dark matter mass distribution about the Galactic Center could be included.However, to rigorously perform this analysis would lie outside the scope of this paper.For the sake of simplicity, however, our prior from now on will only depend on m χ .The S denotes that this prior describes a specific signal model.The sky position distribution is expected to be strongly peaked toward the Galactic Center; see figure 3. The dark matter gamma ray annihilation spectrum then depends on the dark matter model presumed.For demonstration, we use the scalar singlet dark matter model, though we explain below how to adapt our framework to an arbitrary dark matter model.The scalar singlet model is one of the simplest particle dark matter models.It adds a single massive scalar field to the standard model denoted S. 4 The Z 2 invariant terms of the Lagrangian for this model are given by [22].
These terms describe the bare S mass, the Higgs-portal coupling, the S quartic self-coupling, and the S kinetic term respectively 6 .Assuming S does not obtain a vacuum expectation value, the model contains only three free parameters: µ 2 S , λ hS and λ S .The tree-level mass of the singlet particle is where v 0 = 246 GeV is the vacuum expectation value of the Higgs field.For indirect search methods we focus on the mass and Higgs-portal coupling as they predominately determine the properties of the gamma-ray spectrum produced.The scalar singlet dark-matter annihilation gamma-ray spectrum is characterized by a resonant bump followed by a hard cut-off at E = m χ ; see figure 2b, which shows the spectrum for a scalar singlet model.In our fits, we set λ hS to 0.1 and vary the mass, as variation of λ hS changes the overall shape of the spectra negligibly and essentially contributes a normalisation constant that is removed when the spectra is normalised to be used as a probability density.Using the code DarkSUSY [23,24], we generate annihilation fractions of S into various Standard Model final states for a range of masses; see figure 2. It is at this stage of the framework that one can introduce a different dark matter model by plugging in a model's annihilation ratios 7 , to construct the dark matter spectra.Using algorithms from [25,26], we combine the single-channel spectra in order to produce energy spectra for the scalar singlet dark matter.An example spectrum for the scalar singlet model, created for a scalar mass m χ = 1 TeV, is shown in figure 2. The prior for gamma-ray energy from dark matter annihilation is proportional to this spectrum.
The second component to the dark matter prior is its sky position distribution.We use the differential J-factor based on the Einasto profile [27] defined as Here l.o.s refers to the "line of sight" between the observer and the dark matter annihilation, the axis of which is ℓ(Ω).Meanwhile, ρ χ (r) is the dark matter density at position r relative Annihilation Ratio There is a small resonance for dark matter masses higher than the W boson, before a cut-off at E = m χ .
to the Galactic Center.We take the distance to the Galactic Center to be 8.5 kpc and the local dark matter density at the Sun to be 0.39 GeV/cm 3 [28]; see figure 3. We use the CTA science tools package Gammapy [29,30] to calculate these values.The Einasto profile is an example of a cuspy profile which has a large increase in the mass density near the Galactic Center as opposed to a cored profile such as the Burkert profile [31], which exhibits a near-flat density distribution close to the Galactic Center.

Background Prior
The background prior is denoted: It describes the distribution of all gamma-ray-like events, which are not due to dark-matter annihilation.In principle, the shape of the background prior can depend on hyper-parameters like the signal prior does.For example, one can imagine introducing parameters describing the shape of the astrophysical gamma-ray flux from the Galactic Center.However, for illustrative purposes, we assume the background is known precisely, and so there are no background hyper-parameters.
Our background model consists of three components8 : diffuse flux from astrophysical gamma-ray sources clustered about the Galactic plane, astrophysical point sources, and misidentified cosmic rays.For the diffuse component, the sky position distribution is based on the Fermi-LAT Galactic diffuse model used for the fourth Fermi Large Area Telescope source catalog analysis. 9The energy distribution for the diffuse model is then the same as  the power-law fit in ref. [32], based on observations within the pacman region around the Galactic Center.Graphical representations of the background model are shown in figure 4 and figure 5.
According to the HESS catalogue there are seven sources within 5 • of the Galactic Center; we include these in our background model. 10There are likely many unidentified point sources near the Galactic Center due to its complex source morphology.The source J1747-281 is associated with a pulsar wind nebula that primarily emits gamma rays through an inverse Compton scattering process initiated by electrons accelerated by a termination shock front [17,33,34].We do not mask these sources, but opt instead to model them.(Gamma rays pointing toward these point sources do not provide much support for darkmatter annihilation because they can be explained with the background model.)The energyintegrated spectrum for this component as seen by the CTA is shown in figure 4.
The final component of our background model is the prod5 version of misidentified cosmic-ray background as shown in figure 6.This component represents the background of charged cosmic rays that pass the CTAs selection cuts.This background is described in the frame of the telescope; the background varies according to the offset from the center of the pointing direction.

Effective Area
The signal and background models defined in the previous two subsections describe the distribution of gamma rays emitted by various sources in and around the Galactic Center.However, the CTA observes a different distribution due to its effective area A eff , which depends on both gamma-ray energy and also on the offset from the center of the field of   Flux [sr Background rate [1 / (MeV s sr)] Figure 6: The prod5 CTA cosmic-ray misidentification rate-an ingredient in the background prior.For large field-of-view offsets the rate significantly decreases because the atmosphere is effectively thicker for larger offset angles and so events are less likely to be detected in general.view; see figure 7. Due to this dependence, the CTA is more sensitive to high-energy gamma rays than it is to low-energy gamma rays.The observed distribution is related to the emitted distribution like so:11 π(Ω, E) observed ∝ A(E, Ω) π(Ω, E) emitted . (2.7) In the discussion that follows, all of the priors we refer to are priors for observed gamma rays, which take into account the effective area of the CTA South site for a zenith angles between 0 and 20 degrees.

Combined likelihood
We now have the ingredients to construct a likelihood function for the entire data set given the dark-matter parameters.First, we consider a single event with data d i .Since we do not know whether any given event is due to signal or background, the likelihood of d i can be expressed as a mixture model of signal S and background B: Here, ξ represents the probability that the event is drawn from the signal distribution.Meanwhile, are marginal likelihoods for the signal and background models.
Since the events are uncorrelated, the combined likelihood for all data ⃗ d = {d i } is the product of N single-event likelihoods: .11)In this context, one can interpret ξ as the fraction of events that are drawn from the signal distribution N S : (2.12) The ξ parameter is sometimes referred to as the "mixing fraction" or a "duty cycle".In the next subsection we describe how ξ relates to dark-matter parameters.
We use the combined likelihood to calculate the posterior probability distribution for parameters ξ, m χ given the data: Here, π(m χ ) is the prior on the dark-matter mass, which we take to be log-uniform on the interval (100 GeV, 100 TeV).Meanwhile, π(ξ) is the prior on mixing fraction, which we take to be uniform on the interval (0, 1).The Bayesian evidence Z( ⃗ d) is the fully marginalised likelihood.It appears in eq.2.13 as a normalization factor, but we discuss below how it can be used for model selection; see eq. 4.1.

Dark matter parameters
We relate ξ to the dark matter parameters through the relation, Here, T obs is the observation time and dΦ/dΩdE is the differential flux of gamma rays produced by dark matter annihilation per unit energy and solid angle given by, Here, S χ is a symmetry factor representing whether the dark matter particle is its own antiparticle (S χ = 1) or not (S χ = 2).The factor dJ/dΩ is as defined in eq.2.5.The summation is over all the possible dark matter annihilation final states f including gauge bosons, quarks or leptons.The annihilation fraction B f is the ratio of the annihilation cross section for final state f to the total annihilation cross section, Here σ tot refers to the "total" annihilation cross-section defined as the sum of all the individual final state cross-sections and the distribution dN f /dE represents the differential energy flux of gamma rays emitted from these final states.Finally, ⟨σv⟩ represents the thermally-averaged, velocity-weighted total annihilation cross section.We then assume the low-velocity limit which, for the scalar singlet model, allows us to approximate ⟨σ f v⟩ ≈ σ f v, so the equation becomes, dJ dΩ The essence of this subsection is that the mixing fraction ξ and mass m χ are mappable to ⟨σv⟩, and so the posterior for ξ and m χ can be converted into a posterior on ⟨σv⟩.

Computer code
We implement the analysis described above within a pip-installable Python package, called GammaBayes.It is built in such a way that it is easy-to-use and modular so that a user can change the various models and assumptions discussed above.For example one can change the dark matter model by changing the input branching fractions (differential cross-sections).
Or, if the CTA point spread function model is updated, one can swap in a new IRF.Core functions use the Gammapy Python package such as the calculation of the differential Jfactors for the Einasto dark matter mass distribution.GammaBayes, is open source, available at https://github.com/lpin0002/GammaBayes.

Demonstration
In this section we apply GammaBayes to simulated data in order to demonstrate the detection of a signal arising from the self-annihilation of scalar singlet dark matter.We simulate 525 hours of data, corresponding to approximately 10 8 gamma-ray events assuming that the backgrounds are stable, or that transient sources are excluded.More realistic analyses would require specific IRFs provided for each individual observation run.This cannot be realistically done until actual data has been collected.However, if relevant IRFs are chosen, it should not cause any significant changes to our results.The vast majority of these simulated events are drawn from the background distribution.However, there are 10 5 events drawn from the signal distribution (corresponding to a mixing fraction ξ = 10 −3 ) with dark-matter mass m χ = 10 TeV.
For each event, we simulate the reconstruction of the gamma-ray energy on the interval (100 GeV, 300 TeV) approximately corresponding to the expected observable energy range of the CTA [15].We simulate the reconstruction of the sky position in box centered on the Galactic Center and spanning 7 • along the axis of Galactic longitude and 6 • along the axis of Galactic latitude. 12Applying GammaBayes to this simulated data yields a joint posterior for mixing fraction ξ and log 10 m χ , which is shown in figure 8.The true value of each parameter is indicated using orange lines.The white curves indicate the one-, two-, and three-sigma credibility intervals.The dark matter mass is constrained to 7.9 +3.2 −2.1 TeV while the mixing fraction is constrained to 1.0 +0.2 −0.2 × 10 −3 (both 95% credibility).The mixing fraction excludes ξ = 0 with five-sigma credibility (ξ = 0 falls outside the highest probability density interval that includes all but 3 × 10 −5 of the probability).Figure 8 is therefore an illustration of a 5σ dark-matter detection.This posterior can also be converted into a posterior on ⟨σv⟩ as discussed in section 2.7, this result is shown in figure 9.
In the event that the posterior for mixing fraction is consistent with ξ = 0, then the data provides no evidence for dark-matter annihilation, and so we set an upper limit on ξ.In figure 10, we plot in blue the average 95% credibility upper limit on ⟨σv⟩ as a function of the dark-matter mass.Here, we also use 525 hrs of the CTAs central Galactic Center survey with no dark matter events.For comparison, we show in orange the expected sensitivity for the W W channel estimated in ref. [13].We caution the reader: this is an apples-to-oranges comparison, useful only for order-of-magnitude checks and qualitative features.The two analyses employ different background models, different instrument response functions, and different annihilation final states.Additionally, the blue curve is a Bayesian credible interval while the orange is a frequentist confidence interval.With that important caveat in mind, the two curves are similarly shaped with similar minimum values of ⟨σv⟩. 13The horizontal grey dashed line represents the ⟨σv⟩ value inferred from cosmological calculations for the current dark matter (relic) density.Thus, given our signal and background models, we forecast that the CTA is capable of seeing at least weak ≳ 2σ evidence of dark matter annihilation in the range of (400 GeV, 5 TeV).

Discussion and conclusions
In our demonstration we assume that our signal and background models are adequately specified.In reality, however, these models are subject to systematic uncertainty.For example, the dark matter density profile is uncertain [35].While we employ the cuspy Einasto profile, the actual distribution of dark matter may be better described by the Burkert profile, which is nearly flat for small radii [36].A misspecified model for the distribution of dark matter  −0.2 × 10 −3 (95% credibility).The null hypothesis that no dark-matter signal is present in the data ξ = 0 is excluded with five sigma credibility.could yield unreliable inferences on ξ-Bayesian inferences are as reliable as the assumptions that underpin them.The solution is to build flexible models that incorporate theoretical uncertainty, e.g., with hyper-parameters that allow for different plausible profile shapes.By marginalising over these hyper-parameters, it is possible to include systematic uncertainty in the posterior for m χ , ξ. Future GammaBayes development will include the development of such flexible models.
Another priority is to improve the computational efficiency of the analysis in order to reduce the calculation run time.Currently, our analysis is carried out on a grid of m χ , ξ.However, this is inefficient because computer cycles are wasted exploring low-likelihood regions.By employing a stochastic sampler, we aim to significantly improve efficiency.
Another area for future development is the use of GammaBayes for model selection.In addition to producing a posterior distribution, GammaBayes produces the Bayesian evidence   When the Bayes factor deviates significantly from unity, the data may be said to prefer one model over another.A common rule of thumb is that ln (|BF|) > 8, ( corresponds to a statistically significant preference [37,38].The Bayes factor compares not just the goodness of fit for each model, but also includes Occam factors, which penalize overly flexible models [39].The Occam factor is a mathematical formulation of the maxim that the simplest explanation is most likely correct, all else being equal.Using Bayesian model selection, it should be possible to test, for example, whether the data are better explained by the scalar singlet model above, a gauge singlet scalar dark matter such as in [40] or scalar boson-fermion two particle minimal extension such as [41].The latter two models can be thought of as generalizations of the scalar singlet model we use for the demonstration.A more general model (that includes other sub-models as special Figure 10: The sensitivity of the CTA to dark-matter annihilation from the annihilation of scalar single particles.In blue we plot the expected 95% credibility upper limits as a function of m χ .In orange, meanwhile, is the expected frequentist upper limits from [13].The two curves should be compared qualitatively since the two analyses employ different signal and background models; see the main text for details.The horizontal grey dashed line represents the ⟨σv⟩ value inferred from cosmological calculations for the current dark matter relic density.cases) will always produce a better fit (a higher likelihood) than its sub-models.However, this improved fit must compete with the Occam penalty incurred from the larger parameter space of the general model.

2 ]Figure 1 :
Figure 1: The two components of the CTA likelihood.(a) The likelihood for reconstructed energy; different colours correspond to different examples of true energy values.The true sky location is (0,0).(b) The likelihood for reconstructed sky position given a true energy value of 1 TeV and true sky position of (0, 0).

Figure 2 :
Figure 2: (a) The non-negligible dark matter annihilation fractions versus dark-matter mass for the scalar singlet model; we fix λ hS = 0.1.When m χ ≳ 1 TeV, the final products are consistently the W , Z and Higgs bosons.(b) Example gamma-ray energy spectra for the scalar singlet model with the single channel contributions from the W , Z and Higgs boson, which are the main final states for high masses.There is a small resonance for dark matter masses higher than the W boson, before a cut-off at E = m χ .

Figure 3 :
Figure 3: Various models for the dark matter density distribution within our galaxy.The NFW, Einasto and Moore profiles continue to increase for small radii; they are examples of cuspy profiles.On the other hand, the Isothermal and Burkert distributions flatten off, and are thus examples of cored profiles.The vertical line represents the approximate local dark matter density of 0.39 GeV/cm 3 [28].

Figure 4 :
Figure 4: Gamma-ray energy spectra contributing to the background prior.The main contribution is from misidentified cosmic rays.

Figure 5 :
Figure 5: Sky map of the diffuse Fermi-LAT background, integrated over energy.

6 Figure 7 :
Figure 7: Effective area of the CTA South site for a zenith angles between 0 and 20 degrees as a function of true energy and offset between the reconstructed sky position and true sky position.

Figure 8 :
Figure 8: A corner plot showing the posterior distribution for 525 hours of simulated CTA Galactic Center survey data corresponding to 10 8 gamma-ray events.The data contains 100000 gamma rays from m χ = 10 TeV dark-matter annihilation corresponding to a mixing fraction ξ = 10 −3 .The values of these parameters are indicated with the orange lines.The bottom-left panel shows in white the one-, two-, and three-sigma credible intervals.The log 10 of the dark matter mass m χ is constrained to 0.90 +0.11−0.15 and the mixing fraction is constrained to 1.0 +0.2 −0.2 × 10 −3 (95% credibility).The null hypothesis that no dark-matter signal is present in the data ξ = 0 is excluded with five sigma credibility.