Formulating and Critically Examining the Assumptions of Global 21 cm Signal Analyses: How to Avoid the False Troughs That Can Appear in Single-spectrum Fits

, , and

Published 2020 July 10 © 2020. The American Astronomical Society. All rights reserved.
, , Citation Keith Tauscher et al 2020 ApJ 897 132 DOI 10.3847/1538-4357/ab9a3f

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/897/2/132

Abstract

The assumptions inherent to global 21 cm signal analyses are rarely delineated. In this paper, we formulate a general list of suppositions underlying a given claimed detection of the global 21 cm signal. Then, we specify the form of these assumptions for two different analyses: (1) the one performed by the team for the Experiment to Detect the Global Epoch-of-Reionization Signature (EDGES) showing an absorption trough in brightness temperature that they modeled separately from the sky foreground and (2) a new, so-called minimum assumption analysis (MAA) that makes the most conservative assumptions possible for the signal. We show fits using the EDGES analysis on various beam-weighted foreground simulations from the EDGES latitude with no signal added. Depending on the beam used, these simulations produce large false troughs because of the invalidity of the foreground model in describing the combination of beam chromaticity and the shape of the Galactic plane in the sky, the residuals of which are captured by the ad hoc flattened Gaussian signal model. On the other hand, the MAA provides robust fits by including many spectra at different time bins and allowing any possible 21 cm spectrum to be modeled exactly. We present uncertainty levels and example signal reconstructions found with the MAA for different numbers of time bins. With enough time bins, one can determine the true 21 cm signal with the MAA to <10 times the noise level.

Export citation and abstract BibTeX RIS

1. Introduction

The highly redshifted 21 cm signal generated by the hyperfine, spin-flip transition of neutral hydrogen tracks the history of the universe after recombination and before reionization (Pritchard & Loeb 2012), from the Dark Ages, before there were any compact sources, through Cosmic Dawn, when the first stars were born. This signal comes in two forms: the power spectrum, which has both angular and spectral dependence, and the sky-averaged global signal, which is a single spectrum. While the  former is being focused on by experiments like the Hydrogen Epoch of Reionization Array (HERA; DeBoer et al. 2017), the latter, which is the focus of this paper, is actively being searched for by experiments such as the Large-Aperture Experiment to Detect the Dark Age (LEDA; Bernardi 2018), the Shaped Antenna measurement of the background RAdio Spectrum (SARAS; Singh et al. 2018), the Cosmic Twilight Polarimeter (CTP; Nhan et al. 2019), the Radio Experiment for the Analysis of Cosmic Hydrogen (REACH; de Lera Acedo 2019), and the Experiment to Detect the Global Epoch-of-Reionization Signature (EDGES; Monsalve et al. 2017a, 2018). In particular, in Bowman et al. (2018), the EDGES team reported a trough in the sky-averaged radio spectrum at 78 MHz using an analysis technique that we will examine in detail in this paper. Space-based mission concepts, such as the Dark Ages Polarimeter PathfindER (DAPPER; Burns et al. 2019; Burns 2020) and its precursor the Dark Ages Radio Explorer (DARE; Burns et al. 2017), are also being developed to measure the global 21 cm signal from the vicinity of the Moon, where experiments do not have to deal with systematic effects such as human-generated radio frequency interference (RFI) or attenuation and emission from the ionosphere or interactions of the antenna beam with the environment.

While all of these experiments must be designed carefully to have the sensitivity to measure the global signal, which is expected to be a few hundred millikelvin deep, this paper will focus on the data analysis techniques necessary to extract the signal from the beam-weighted galactic foreground emission, which, at the relevant frequencies, is generally at a level of a few thousand kelvin (for the magnitude of the beam-weighted foreground seen by EDGES, see Figure 1(a) of Bowman et al. 2018). The shape of the beam-weighted foreground in the absence of the global signal is not known a priori, so simple subtraction is not an option. Therefore, one must devise a model that is known (or at least reasonably assumed) to encapsulate the possible values of the beam-weighted foreground. This task would be easy if the beam would be achromatic, but such beams do not exist. Real antennas have beams that change with frequency over the wide bandwidth that these experiments need to measure. This beam chromaticity distorts the spectral shapes of the intrinsic foreground as measured by an achromatic beam (and that would be well fit by models such as the ones EDGES uses) in ways particular to the antenna being used. Therefore, to fit the beam-weighted foreground, a model that is particular to the given antenna and experiment location must be created. Such a model can be created via singular value decomposition (SVD) of a training set as it is in the pipeline laid out in Tauscher et al. (2018, 2020) and Rapetti et al. (2019). The method presented for making this model is similar in spirit to that defined in Switzer & Liu (2014), except the foreground basis is derived from an a priori training set instead of the data itself. It is also very similar to the method performed by Vedantham et al. (2014), although in that paper, SVD is performed on spectra at different time bins to define modes that span frequencies, while in this paper, different simulations of spectra defined at multiple time bins are used to define modes that span both frequencies and time bins. This difference is crucial because the constraining power of  our method comes from the fact that its foreground model encodes correlations between time bins, which is not true of the purely spectral modes arising from the methods of Vedantham et al. (2014). Our method, a new variant of which will be described in Section 4, connects with the idea from Liu et al. (2013) that angular information can effectively supplement spectral information in signal estimation. However, instead of requiring instruments with angular resolution, our method gleans spatial information from time-dependent drift-scan measurements from single-antenna instruments with no angular resolution, where the spectra weight the entire spatial sky into a single pixel.

This paper is especially relevant in light of the large volume of literature written by the community in their attempts to explain the trough presented by Bowman et al. (2018), especially its depth. Some have explored the possibility that it could represent a larger radio background than expected (Ewall-Wice et al. 2018, 2020; Feng & Holder 2018; Fialkov & Barkana 2019; Mebane et al. 2020), while others have hypothesized that Rutherford-like scattering of dark matter could cause the hydrogen gas to cool faster than adiabatic cooling would imply (Barkana 2018; Barkana et al. 2018; Berlin et al. 2018; Fialkov et al. 2018; Loeb & Muñoz 2018; Muñoz et al. 2018). While none of these ideas perfectly describe the detection (see, e.g., Creque-Sarbinowski et al. 2019, for a criticism of dark matter explanations), they illustrate the temptation to use the results of Bowman et al. (2018) to explore possible exotic physics. In addition to these physical explanations, some have explored possible unmodeled systematic effects present in the data (Hills et al. 2018; Draine & Miralda-Escudé 2018; Bradley et al. 2019; Singh & Subrahmanyan 2019; Spinelli et al. 2019; Sims & Pober 2020). In this paper, we lay out yet another possible explanation of the EDGES results—that the modeling performed could be mistaking beam chromaticity distortion of the foreground for signal.

In Section 2, we lay out the general presumptions necessary to perform any global 21 cm signal experiment. In Section 3, we generate and fit multiple EDGES-like simulations of beam-weighted foregrounds with the analysis method of Bowman et al. (2018). In Section 4, we propose a new minimum assumption analysis (MAA) that allows for any possible 21 cm global signal and demonstrate it on simulated spectra with a signal and beam-weighted foreground. In Section 5, we discuss and compare the results of the two analyses. Finally, we conclude in Section 6.

2. Enumerating Assumptions

The likelihood used when doing a foreground-only fit (like the one done to generate the residuals found in Figure 1(b) of Bowman et al. 2018) is

Equation (1)

where ${\boldsymbol{y}}$ is the spectrum written as a column vector, ${\boldsymbol{C}}$ the covariance matrix of the data's noise distribution, and ${\boldsymbol{F}}$ a matrix with the basis vectors used to fit the foreground as columns. The value of ${\boldsymbol{a}}$ that maximizes ${{ \mathcal L }}_{\mathrm{fg} \mbox{-} \mathrm{only}}({\boldsymbol{a}})$ is

Equation (2)

and the maximum-likelihood reconstruction of the foreground, ${{\boldsymbol{\gamma }}}_{\mathrm{fg}}$, is given by ${{\boldsymbol{\gamma }}}_{\mathrm{fg}}={\boldsymbol{F}}{{\boldsymbol{a}}}_{\mathrm{ML},\mathrm{fg} \mbox{-} \mathrm{only}}$, or ${{\boldsymbol{\gamma }}}_{\mathrm{fg}}={\boldsymbol{\Phi }}{\boldsymbol{y}}$, where

Equation (3)

is the matrix that projects vectors into the column space of ${\boldsymbol{F}}$ (i.e., the space of spectra formable by the linear foreground model). The residual, ${\boldsymbol{r}}$, unfitted by this procedure is given by ${\boldsymbol{r}}={\boldsymbol{y}}-{{\boldsymbol{\gamma }}}_{\mathrm{fg}}=({\boldsymbol{I}}-{\boldsymbol{\Phi }}){\boldsymbol{y}}$. The data are given by a sum of a foreground term ${{\boldsymbol{y}}}^{(\mathrm{fg})}$, a signal term ${{\boldsymbol{y}}}^{(21)}$, and random noise ${\boldsymbol{n}}$.

We can write ${{\boldsymbol{y}}}^{(\mathrm{fg})}={\boldsymbol{F}}{\boldsymbol{\alpha }}+{\boldsymbol{\delta }}$ where ${\boldsymbol{\alpha }}$ is the coefficient vector that best describes ${{\boldsymbol{y}}}^{(\mathrm{fg})}$ given the basis matrix ${\boldsymbol{F}}$ 5 , and ${\boldsymbol{\delta }}$ is the part of ${{\boldsymbol{y}}}^{(\mathrm{fg})}$ that cannot be fit by the foreground model, that is, the part of ${{\boldsymbol{y}}}^{(\mathrm{fg})}$ that is not in the column space of ${\boldsymbol{F}}$, which  satisfies ${\boldsymbol{\Phi }}{\boldsymbol{\delta }}=0$. This means that $({\boldsymbol{I}}-{\boldsymbol{\Phi }}){{\boldsymbol{y}}}^{(\mathrm{fg})}={\boldsymbol{\delta }}$ and the foreground-only residual ${\boldsymbol{r}}$ is given by

Equation (4)

Therefore, to solve for the signal plus foreground inaccuracy and noise, which we define as ${\boldsymbol{s}}={{\boldsymbol{y}}}^{(21)}+{\boldsymbol{\delta }}+{\boldsymbol{n}}$, we must find the general solution of $({\boldsymbol{I}}-{\boldsymbol{\Phi }}){\boldsymbol{s}}={\boldsymbol{r}}$, which is the sum of any particular solution, sp, and the general solution, sh, to the homogeneous equation, $({\boldsymbol{I}}-{\boldsymbol{\Phi }}){{\boldsymbol{s}}}_{h}=0$. We guess that ${\boldsymbol{r}}$ is a particular solution and verify by noting that $({\boldsymbol{I}}-{\boldsymbol{\Phi }}){\boldsymbol{r}}={({\boldsymbol{I}}-{\boldsymbol{\Phi }})}^{2}{\boldsymbol{y}}=({\boldsymbol{I}}-2{\boldsymbol{\Phi }}+{{\boldsymbol{\Phi }}}^{2}){\boldsymbol{y}}=({\boldsymbol{I}}-{\boldsymbol{\Phi }}){\boldsymbol{y}}={\boldsymbol{r}}$ because ${{\boldsymbol{\Phi }}}^{2}={\boldsymbol{\Phi }}$. The homogeneous solution is any vector in the null space of I −  Φ, which is equivalent to any vector that remains unchanged when multiplied on the left by ${\boldsymbol{\Phi }}$. Since ${\boldsymbol{\Phi }}$ is a projection matrix onto the columns of ${\boldsymbol{F}}$, any vector in the column space of ${\boldsymbol{F}}$ remains unchanged when being multiplied by ${\boldsymbol{\Phi }}$. Therefore, ${{\boldsymbol{s}}}_{h}={\boldsymbol{F}}{\boldsymbol{x}}$ for any ${\boldsymbol{x}}\in {{\mathbb{R}}}^{N}$. This means that the signal satisfies

Equation (5)

for some unknown vector ${\boldsymbol{x}}$. Equation (5) implies that, due to the filtering out of the foregrounds when fitting, the signal can have any foreground modes added to it without changing the quality of the fit. This shows that with only one spectrum and no more information, even if ${\boldsymbol{\delta }}=0$ (i.e., the foreground model is perfect), the 21 cm signal cannot be uniquely determined. Thus, an extra assumption about the form of the signal is required. The full set of assumptions needed to extract the global 21 cm signal with useful, rigorous uncertainties is as follows:

  • 1.  
    Sky-averaged radio data contain a sum of beam-weighted foreground emission and the global signal. This assumes a well-calibrated instrument.
  • 2.  
    The noise of the data follows a known or estimated distribution.6
  • 3.  
    The true beam-weighted foreground can be fit with the given foreground model, described by the basis matrix ${\boldsymbol{F}}$, to well below the noise level of the data. This is equivalent to ${{\boldsymbol{\delta }}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{\delta }}\ll {N}_{c}$, where Nc is the number of channels in ${\boldsymbol{y}}$ and ${\boldsymbol{\delta }}$ is the unmodeled component of the foreground as above, given by ${\boldsymbol{\delta }}=({\boldsymbol{I}}-{\boldsymbol{\Phi }}){{\boldsymbol{y}}}^{(\mathrm{fg})}$.
  • 4.  
    The signal follows a specific form.

Assumption 4 can take many forms, some much stronger and more unjustified than others. In Bowman et al. (2018), the signal is assumed to follow the form of a flattened Gaussian profile. Note that this is an assumption for analyzing the data and cannot be justified by examining the data itself, especially in light of the Bayesian evidence-based analysis of Sims & Pober (2020), which showed that many possible assumed signal models lead to essentially the same Bayesian evidence. Crucially, the errors obtained via fitting a given model of the signal do not account for the unknown likelihood that the true signal can be fit by that model.

The specific flattened Gaussian profile in Bowman et al. (2018) merely represents the value of ${{\boldsymbol{y}}}^{(21)}$ from Equation (5) that best matches the assumed flattened Gaussian model. Different values of ${\boldsymbol{x}}$ along with different assumed signal models lead to equally valid interpretations of the data (see, e.g., Hills et al. 2018; Bradley et al. 2019; Singh & Subrahmanyan 2019; Sims & Pober 2020).

3. EDGES-like Analysis

3.1. Assumptions

3.1.1. Assumption 1: Calibration

The EDGES team has worked extensively on the calibration of their receiver (Monsalve et al. 2017a, 2017b). Assuming solar, weather, RFI, and other transient events are removed, due to the rigor of their lab measurements and calibration strategy, it is reasonable to assume that, to the millikelvin level, their data consist solely of beam-weighted foreground and global 21 cm signal.

3.1.2. Assumption 2: Noise Level

Although the covariance distribution of the noise is not known, the residuals presented in Bowman et al. (2018) seem to have a relatively flat magnitude of 20 mK when many smooth modes are removed. So, for our simulations we use ${\boldsymbol{C}}={\sigma }^{2}{\boldsymbol{I}}$ where σ = 20 mK and ${\boldsymbol{I}}$ is the identity matrix.

3.1.3. Assumption 3: Foreground Model

The EDGES analysis involves multiple linear foreground models, but a common model in the literature is a polynomial in $\mathrm{ln}(\nu /{\nu }_{0})$ multiplied by ${(\nu /{\nu }_{0})}^{-2.5}$ where ν is the observed frequency and ${\nu }_{0}$ is a reference frequency:

Equation (6)

This is equivalent to assuming that ${\boldsymbol{F}}$ is a matrix with the terms ${(\nu /{\nu }_{0})}^{-2.5}{[\mathrm{ln}(\nu /{\nu }_{0})]}^{k-1}$ as its columns and the foreground coefficient vector ${\boldsymbol{a}}$ is given by ${{\boldsymbol{a}}}^{T}=\left[{a}_{1}\ \ {a}_{2}\ \ \cdots \ \ {a}_{{N}_{t}}\right]$. This model is based on the Taylor series approximation of ${T}_{0}{(\nu /{\nu }_{0})}^{\beta }$ around β = −2.5, which should adequately describe the intrinsic synchrotron foreground in each sky direction (see also J. J. Hibbard et al. 2020, in preparation).

3.1.4. Assumption 4: Signal Model

In Bowman et al. (2018), the EDGES team uses a flattened Gaussian signal model of the form

Equation (7)

where

Equation (8)

In this section, we adopt the same model.

3.2. Simulations

To illustrate the potential problems with the EDGES analysis in Bowman et al. (2018), we have created simple simulations of the beam-weighted foreground expected to be seen from the Murchison Radio-astronomy Observatory (MRO), where the experiment is located.7 This section will lay out those simulations as well as apply the EDGES analysis to them.

The foreground brightness temperature, ${y}^{(\mathrm{fg})}$, in the simulations presented throughout Sections 3 and 4 is given by

Equation (9)

where $B(\nu ,\theta ,\phi )$ is the antenna beam, $T(\nu ,\theta ,\phi )$ is the foreground emission, and θ and ϕ are the spherical coordinate angles.

3.2.1. Antenna Beam

The antenna beam used in our simulations is a Gaussian beam with a frequency-dependent FWHM, $\alpha (\nu )$:

Equation (10)

The FWHMs we use are parabolas with fixed end points at $(50\ \mathrm{MHz},83^\circ )$ and $(100\ \mathrm{MHz},104^\circ )$ 8 and varying curvature c. This means they have the form

Equation (11)

where ${\alpha }_{l}(\nu )$ is the line connecting the two end points and ${\alpha }_{c}(\nu )$ is the parabola with unit curvature and zeros at the end-point frequencies. Here, ${\alpha }_{l}(\nu )$ and ${\alpha }_{c}(\nu )$ are given by

Equation (12a)

Equation (12b)

For illustration purposes, we use $c/(1^\circ /{\mathrm{MHz}}^{2})$ values of $8.0\times {10}^{-3}$, $1.6\times {10}^{-2}$, $2.4\times {10}^{-2}$, and $3.4\times {10}^{-2}$. The FWHM curves of the four beams used in our simulations are shown in the upper left panel of Figure 2. To ensure that results are not tainted by any fraction of the beam below the horizon, the beams are normalized such that

Equation (13)

where ${ \mathcal H }$ represents the half of the sphere above the horizon, that is, $0\leqslant \theta \leqslant \pi /2$ and $0\leqslant \phi \lt 2\pi $.

3.2.2. Foreground Map

For the simulations used in this paper, we use the 408 MHz map from Haslam et al. (1982), scaled to lower frequencies via multiplication by a power law with spectral index −2.5:

Equation (14)

The time dependence indicated in both T and ${T}_{\mathrm{Haslam}}$ is determined by the rotation of the Earth from the EDGES latitude as a function of sidereal time. The Haslam maps, rotated to match the zenith direction at 0, 3, and 6 hr LST (local sidereal time) at 26fdg7 S latitude, are shown in the top left, top right, and bottom left panels of Figure 1.

Figure 1.

Figure 1. Haslam map scaled to 80 MHz with a spectral index of −2.5 shown with the zenith from the EDGES latitude ($26\buildrel{\circ}\over{.} 7$ S) as the uppermost point at different LSTs. The top left, top right, and lower left panels show the cases at LST hours of 0, 3, and 6. The lower right panel shows the case where the map has been smoothly smeared between hours 0 and 6 with directions below the horizon (gray region) masked out, which is the foreground used in the simulations in Sections 3 and 4. In all panels, the celestial poles are marked by white dots.

Standard image High-resolution image

3.2.3. Observation Strategy

The simulations were performed by averaging together equally all beam-weighted foregrounds between LST hours 0 and 6 from the EDGES latitude. As the central bulge of the Milky Way is closest to the zenith between LST hours 17 and 18, this strategy amounts to averaging during low foreground times. This observation strategy leads to an effective foreground given by

Equation (15)

and shown at 80 MHz in the bottom right panel of Figure 1.

3.2.4. No 21 cm Signal Added

No 21 cm global signal was included in the simulations. All fits shown in Figure 2 are performed on the beam-weighted foreground alone with noise.

Figure 2.

Figure 2. Summary of EDGES-like simulations and analyses. Throughout the figure, curves with the same color correspond to the same beam FWHM curvature. Upper left: FWHM curves of the four simulated antenna beams as a function of frequency. Upper right: residuals when the foregrounds generated with the beams from LST hours 0–6 at the EDGES site are fitted with the foreground model of Equation (6) with Nt = 6. The rms value of each residual is shown in the legend (to be compared to noise level of 20 mK). Lower left: flattened Gaussian best fits for each beam case when the foreground and absorption trough models are fit simultaneously. Large troughs appear as the beam curvature grows. Lower right: residuals of the combined foreground and absorption trough fits that produced the signals in the lower left panel. The green dashed lines in the upper right and lower panels show results found when the Haslam map is replaced by a toy model of the galaxy that accounts only for the shape of the galactic plane (see Section 5.1.3 for details on the model). Note that the same noise realization was added to all five simulated spectra for comparison purposes.

Standard image High-resolution image

3.2.5. Random Noise

To each simulated measurement of the beam-weighted foreground, we add the same realization of noise at the 20 mK level (see Section 3.1.2) to control for the effect of noise without ignoring it.

3.3. Likelihood-maximization Techniques

For foreground-only fits, we maximize the likelihood through the analytical computation of the coefficient vector ${{\boldsymbol{a}}}_{\mathrm{ML},\mathrm{fg} \mbox{-} \mathrm{only}}$ as in Equation (2). Then, the maximum-likelihood foreground reconstruction is simply ${\boldsymbol{F}}{{\boldsymbol{a}}}_{\mathrm{ML},\mathrm{fg} \mbox{-} \mathrm{only}}$.

For fits with both the linear foreground model and the flattened Gaussian absorption trough model, we numerically explore the space of the trough parameters $\{A,\mu ,w,\tau \}$ to find the maximum-likelihood value.9 At the $k\mathrm{th}$ iteration of the optimization algorithm, the exploration is at $({A}^{(k)},{\mu }^{(k)},{w}^{(k)},{\tau }^{(k)})$. To evaluate the full likelihood at that point, we perform a foreground-only fit, as above, on the modified data spectrum given by ${\boldsymbol{y}}-{{\boldsymbol{ \mathcal M }}}_{21}({A}^{(k)},{\nu }^{(k)},{w}^{(k)},{\tau }^{(k)})$. This allows for exploration of only the nonlinear absorption trough parameters instead of including the foreground parameters, which would slow down the analysis. This technique is similar to that used in Rapetti et al. (2019), except instead of exploring the entire posterior distribution of the trough parameters with a Markov Chain Monte Carlo (MCMC) simulation, here we merely wish to find its maximum through gradient ascent.

3.4. EDGES-like Analysis Results

The beam-weighted foregrounds with noise from the simulations described in Section 3.2 were fit with the model given by Equation (6), which is meant to describe the sky foreground completely. The residuals from these foreground-only fits are shown in the upper right panel of Figure 2. While the beam-weighted foreground from the lowest curvature FWHM beam case is fit to the noise level, it is clear that the residuals grow as the curvature increases, with the largest curvature beam case yielding residuals two times higher than the noise level. These foreground-only residuals correspond to the ${\boldsymbol{r}}$ curves from Equation (4) with ${{\boldsymbol{y}}}^{(21)}=0$.

As in Bowman et al. (2018), we also performed fits that include both the linear foreground model and a flattened Gaussian absorption trough model. The resulting troughs are shown in the lower left panel of Figure 2. As the beam FWHM curvature increases, the size of the fit trough increases up to ∼800 mK, showing that inaccuracies in the foreground model (i.e., violations of assumption 3) can lead to false troughs appearing in fits. The residuals to these foreground and trough fits are shown in the lower right panel of Figure 2. In all four cases, these residuals are at the 20 mK noise level, showing that flattened Gaussian troughs can effectively complement the smooth foreground model from Equation (6) to fit the foreground down to the noise level.

The simulations presented here show that some frequency dependencies, such as curvature in the FWHM of an antenna beam (which there is some sign of in extended data in Figure 4(a) of Bowman et al. 2018, although only three frequencies are shown), can induce structure in the beam-weighted foreground that cannot be fitted by a sixth-order polynomial. Not accounting for this structure can lead to artificial troughs being found when a signal model is fit simultaneously with the beam-weighted foreground. It is important to note that the simulations proposed here use simple beams, fully characterized by the FWHM function $\alpha (\nu )$, whereas real antenna beams have many independent modes of variation corresponding to the geometry and electrical properties of antenna components. One complication in the specific case of the EDGES beam is the lack of azimuthal symmetry, leading to different E- and H-plane beam patterns.

4. Minimum Assumption Analysis

4.1. A More Robust Assumption 4

For a robust analysis, instead of assumption 4 as laid out for the EDGES-like analysis in Section 3.1.4, which is not properly motivated, we can utilize exclusively the fact—not even used in the previous analysis—that, by definition, the global 21 cm signal must be spatially uniform. If the data ${\boldsymbol{y}}$ is a concatenation of Ns spectra taken by the same instrument instead of a single spectrum, that is,

Equation (16)

then the spatial uniformity of the signal implies that the signal contribution to each spectrum is identical, that is, ${{\boldsymbol{y}}}_{k}^{(21)}={{\boldsymbol{y}}}^{(21)}$ for all k. On the other hand, the foregrounds of each spectrum will be different (unless the same sky is overhead in two or more of the spectra).

The analysis with the fewest possible assumptions about the signal would involve assuming nothing about the spectral behavior of the signal, that is, using a signal model that can fit all values of ${{\boldsymbol{y}}}^{(21)}$ in ${{\mathbb{R}}}^{{N}_{\nu }}$ where Nν is the number of frequencies. This can be achieved by computing a so-called "expansion matrix" ${\boldsymbol{\Psi }}$ (see Tauscher et al. 2018 for the initial definition) that encodes how the signal appears in the data; that is, the signal term in the full data, ${\boldsymbol{y}}$, is ${\boldsymbol{\Psi }}{{\boldsymbol{y}}}^{(21)}$. To encode the defining information aforementioned that all spectra have the same signal in them, we use ${\boldsymbol{\Psi }}={\left[{\boldsymbol{I}}{\boldsymbol{I}}\cdots {\boldsymbol{I}}\right]}^{T}$. We term an analysis with this form of the signal the MAA.

Note that ${\boldsymbol{\Psi }}$ can also be adapted to fit different experimental designs, such as full Stokes measurements (e.g., from CTP and DAPPER) or data for which a different amount of sky is blocked in each spectrum (as it can be for DAPPER due to the shifting position of the moon).

4.2. Assumption 3: Choosing Foreground Basis

4.2.1. Polynomial Bases

Using identical and independent basis sets for each spectrum leads to a degeneracy between the foreground and signal models, causing infinite uncertainties. To see this, suppose that each spectrum has its own polynomial basis, represented by the matrix ${{\boldsymbol{F}}}_{0}$. In this case, the full foreground basis matrix is

Equation (17)

Multiplying this basis by a coefficient vector of the form ${{\boldsymbol{x}}}^{(\mathrm{fg})}={\left[{{\boldsymbol{\zeta }}}^{T}{{\boldsymbol{\zeta }}}^{T}\cdots {{\boldsymbol{\zeta }}}^{T}\right]}^{T}$ leads to a foreground vector of the form ${\boldsymbol{F}}{{\boldsymbol{x}}}^{(\mathrm{fg})}={\boldsymbol{\Psi }}{{\boldsymbol{F}}}_{0}{\boldsymbol{\zeta }}$. Since, in this case, the foreground basis can generate vectors in the column space of ${\boldsymbol{\Psi }}$, the foreground and signal bases are degenerate and the uncertainties diverge to infinity. This is true even if the ${{\boldsymbol{F}}}_{0}$ basis can exactly fit the foregrounds in each spectrum. Note that this explanation holds no matter what the true beam-weighted foreground is because it only depends on the form of the beam-weighted foreground model.

4.2.2. General Choice of Basis

We wish to find a basis that satisfies ${{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}}$ and can accurately fit the beam-weighted foreground expected for each spectrum. We suggest generating a simulated training set of foregrounds of the form

Equation (18)

where ${{\boldsymbol{b}}}_{n}^{(m)}$ is the $n\mathrm{th}$ spectrum of the $m\mathrm{th}$ simulation and there are Nt total simulations. For a given number of basis vectors NF, the weighted SVD of ${\boldsymbol{B}}$ as in Tauscher et al. (2018) yields a basis ${\boldsymbol{F}}$ satisfying the normalization conditions.10 This basis will encode expected correlations between the different spectra in the foreground model directly, which is a crucial aspect of the analysis necessary to obtain finite errors (see also Tauscher et al. 2020) when allowing for any signal to be fit as discussed in Section 4.1.

4.3. Maximum-likelihood Calculations

With a data vector ${\boldsymbol{y}}$, a signal expansion matrix ${\boldsymbol{\Psi }}$, a noise covariance matrix ${\boldsymbol{C}}$, and a foreground basis matrix ${\boldsymbol{F}}$, the channel covariance ${{\boldsymbol{\Delta }}}^{(21)}$ of the signal and the corresponding mean ${{\boldsymbol{\gamma }}}^{(21)}$ are given by

Equation (19a)

Equation (19b)

where, under the normalization condition ${{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}}$, ${\boldsymbol{\Phi }}$ is the foreground projection matrix described in Section 2, given by ${\boldsymbol{\Phi }}\,={\boldsymbol{F}}{{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}$. The calculations leading to Equations 19(a) and 19(b) are presented in Appendix A. Appendices B and C give the forms of ${{\boldsymbol{\Delta }}}^{(21)}$ and ${{\boldsymbol{\gamma }}}^{(21)}$ for the specific cases of Ns total power spectra, where the signal expansion matrix is ${\boldsymbol{\Psi }}\,={\left[{\boldsymbol{I}}{\boldsymbol{I}}\cdots {\boldsymbol{I}}\right]}^{T}$, and 4Ns spectra of all four Stokes parameters, where the expansion matrix is ${\boldsymbol{\Psi }}=\left[{\boldsymbol{I}}\ \ 0\ \ 0\ \ 0\ \ \cdots \ \ {\boldsymbol{I}}\ \ 0\ \ 0\ \ 0\right]$.11

4.4. Simulation Details

To test the MAA, we apply it using a foreground training set that is made identically to that used in Tauscher et al. (2020), which we briefly review here. Like the simulations from Section 3, the foreground map used in this training set was the Haslam map scaled down in frequency using a spatially constant spectral index of −2.5. The pointing of the antenna is set by the EDGES latitude, and the sources and beam below the horizon are set to zero as in Section 3. The beams used have FWHMs specified by a distribution of quadratic functions given by $\alpha (\nu )={\sum }_{k\,=\,0}^{2}{b}_{k}{L}_{k}\left(\tfrac{\nu -(80\ \mathrm{MHz})}{40\ \mathrm{MHz}}\right)$, where Lk are the $k\mathrm{th}$-order Legendre polynomials and the bk are normal with means ${\mu }_{0}=70^\circ $, ${\mu }_{1}=-20^\circ $, and ${\mu }_{2}=0^\circ $ and standard deviations ${\sigma }_{0}=10^\circ $, ${\sigma }_{1}=5^\circ $, and ${\sigma }_{2}=5^\circ $.

The day is split into 100 LST chunks,12 which are then averaged to the number of bins used for the given analysis. For example, in the case with five time bins, LST chunks 1–20, 21–40, 41–60, 61–80, and 81–100 are used to generate the five time bins. If the number of time bins used does not divide 100, then the 100 chunks are reduced to the largest multiple of the number of bins less than 100 before binning.

4.5. MAA Results

The results of applying MAA using the training set described in Section 4.4 are shown in Figures 3  and 4.  Figure 3 shows the 1σ uncertainty levels at each frequency using different numbers of total power spectra in the training set. These uncertainties are very large for small numbers of bins (indeed, they diverge when only one time bin is used) but approach the noise level for large numbers of bins. Note that different training sets will lead to different results and that confidence levels of 68%, 95%, and 99.7% only correspond to 1σ, 2σ, and 3σ when the training set adequately describes the beam-weighted foreground. If the training set is not adequate, any given percentage confidence interval will be wider than expected based on the σ level. Appendix D derives the so-called signal bias statistic, ε, from Tauscher et al. (2018) in the MAA case when the foreground training set is imperfect. This statistic connects percentage confidence to the σ level at which the signal is inside the interval in an rms sense.

Figure 3.

Figure 3. 1σ uncertainty levels (solid lines) under the MAA with varying number of time bins. The dashed line represents the noise level on the signal. We assume an integration time of 100 hr split evenly across all time bins.

Standard image High-resolution image
Figure 4.

Figure 4. Top: example signal reconstruction for the MAA using six time bins. While the case shown is a Gaussian in frequency, note that the MAA can be used in the same manner, obtaining equal errors to measure any possible signal spectrum. The black line shows the input signal. The blue line shows the channel mean of the reconstruction, given by ${{\boldsymbol{\gamma }}}^{(21)}$ in Equation 19(b), and the intervals show one and two times the square root of the diagonal elements of ${{\boldsymbol{\Delta }}}^{(21)}$ from Equation 19(a). Because the beam-weighted foreground model (derived from the training set) fits the given beam-weighted foreground case (which was taken from the training set), these intervals correspond to 68% and 95% confidence levels. Bottom: frequency correlation matrix defined in Equation (20). Clearly, the foreground filter generated by this training set induces positive correlations between all frequencies of the 21 cm signal estimate. As in Figure 3, both panels assume an integration time of 100 hr split evenly across all time bins.

Standard image High-resolution image

The top panel of Figure 4 shows an example reconstruction found when applying the MAA with six time bins to the sum of a beam-weighted foreground curve from the training set and an additional signal component that is Gaussian in frequency. The two intervals show the 1σ and 2σ confidence levels at each individual frequency, which are one and two times the square roots of the diagonal elements of ${{\boldsymbol{\Delta }}}^{(21)}$, respectively. Note, however, that even though a Gaussian signal is used in this example, as shown in the figure, the method would work equally well, providing the same errors, with any conceivable global 21 cm signal.

The bottom panel of Figure 4 shows the frequency correlation matrix, ${{\boldsymbol{\Lambda }}}^{(21)}$, which is a normalized form of the covariance matrix ${{\boldsymbol{\Delta }}}^{(21)}$. The correlation matrix is designed to contain only values between 1 and −1, which occur at perfect correlation and anticorrelation, respectively. It is defined through ${{\rm{\Lambda }}}_{{ij}}^{(21)}=\mathrm{Corr}[{\gamma }_{i},{\gamma }_{j}]=\mathrm{Cov}[{\gamma }_{i},{\gamma }_{j}]/\sqrt{\mathrm{Var}[{\gamma }_{i}]\,\mathrm{Var}[{\gamma }_{j}]}$, which can also be written as

Equation (20)

where ${{\boldsymbol{\Gamma }}}^{(21)}$ has the same diagonal elements as ${{\boldsymbol{\Delta }}}^{(21)}$ but has no nonzero off-diagonal elements. By construction, all diagonal elements of ${{\boldsymbol{\Lambda }}}^{(21)}$ are equal to 1. It is clear from Figure 4 that all of the frequency channels are positively correlated with each other in this particular case.

To completely utilize the power of the MAA, the full covariance matrix, which would include such correlations, should be used when defining a likelihood function to extend MAA results to constrain any given signal model.

5. Discussion

5.1. EDGES-like Analysis

5.1.1. Beam Chromaticity Distortions

The fits in Section 3 (Figure 2) show that false troughs can appear when fitting a single spectrum. Some may argue that troughs cannot be created by the foreground because the spectral dependence of the foreground does not allow it. However, one needs to keep in mind that what is measured is not the intrinsic spectral structure of the foreground radiation, but rather the spectral structure of the beam-weighted foreground. Since the angular structure of the beam weighting changes from frequency to frequency, the foreground component of the measured spectrum is composed of different ratios of each direction's radiation at each frequency.

This means that a linear model that fits the intrinsic foreground spectrum of each pixel, like the model from Equation (6), will not necessarily fit the beam-weighted foreground. To illustrate this, suppose that every sky direction's intrinsic foreground spectrum can be fit by a model using a basis matrix ${{\boldsymbol{F}}}_{0}$, that is, ${\boldsymbol{T}}(\theta ,\phi )={{\boldsymbol{F}}}_{0}{\boldsymbol{x}}(\theta ,\phi )$, where the $k\mathrm{th}$ element of ${\boldsymbol{T}}(\theta ,\phi )$ is the foreground spectrum at the $k\mathrm{th}$ frequency. If the beam is achromatic, then the beam-weighted foreground is given by

Equation (21a)

Equation (21b)

where $B(\theta ,\phi )$ is the beam pattern at every frequency that satisfies $\int B(\theta ,\phi )\ d{\rm{\Omega }}=1$. This means that a linear model that fits ${\boldsymbol{T}}(\theta ,\phi )$ for every angle also fits the beam-weighted foreground ${{\boldsymbol{y}}}_{\mathrm{achromatic}}^{(\mathrm{fg})}$; that is, the foreground model residual is ${{\boldsymbol{\delta }}}_{\mathrm{achromatic}}=({\boldsymbol{I}}-{\boldsymbol{\Phi }}){{\boldsymbol{y}}}_{\mathrm{achromatic}}^{(\mathrm{fg})}=0$. In the case of beam chromaticity, the beam is a diagonal matrix with the diagonal entries being the beam patterns at each frequency, satisfying $\int {\boldsymbol{B}}(\theta ,\phi )\ d{\rm{\Omega }}={\boldsymbol{I}}$. In this case, the beam-weighted foreground is

Equation (22a)

Equation (22b)

and the residual

Equation (23)

is not necessarily zero,13 leading to a possible failure of the assumption that the foreground model is adequate (Assumption 3). Therefore, while troughs are unlikely to appear in intrinsic foreground spectra, they may well appear when beam-weighted foregrounds are fit with a model of a trough and an intrinsic foreground spectrum model simultaneously.

5.1.2. EDGES Beam Chromaticity Factor

In several of their works (see, e.g., Bowman et al. 2018; Monsalve et al. 2018; Mozdzen et al. 2019), the EDGES team uses what they term the "beam chromaticity factor" (see Equation (14) of Monsalve et al. 2017a), defined at each LST through

Equation (24)

where the LST dependence comes from how the temperature map is rotated with respect to the beam and ${B}_{\mathrm{ref}}(\nu ,\theta ,\phi )$ and ${T}_{\mathrm{ref}}(\nu ,\theta ,\phi )$ are the assumed forms of the beam and foreground. The goal of this factor is to make the spectrum appear as if it was measured with an achromatic beam (specifically, the beam at $\nu ={\nu }_{\mathrm{ref}}$) so that, as discussed in Section 5.1.1, basis vectors that fit the intrinsic foreground spectra can be used to fit the beam-weighted foreground spectrum. If we denote a spectrum measured at a single LST by $y(\nu )$, then the EDGES beam calibration forms a corrected spectrum, ${y}^{{\prime} }(\nu )=y(\nu )/C(\nu )$. Neglecting noise and possible receiver errors for the sake of clarity, we can express the measured beam-weighted foreground spectrum through

Equation (25)

where $B(\nu ,\theta ,\phi )$ and $T(\nu ,\theta ,\phi )$ are the unknown forms of the true beam and foreground. With these definitions, the corrected spectrum is given by

Equation (26)

The intention is for the fraction on the second line of Equation (26) to be 1 so that $y^{\prime} (\nu )$ is simply the reference foreground weighted by the reference beam at the reference frequency. However, in order to assume that the fraction is 1, one must assume that ${B}_{\mathrm{ref}}(\nu ,\theta ,\phi )=B(\nu ,\theta ,\phi )$ and ${T}_{\mathrm{ref}}(\nu ,\theta ,\phi )=T(\nu ,\theta ,\phi )$, but this is not a reasonable assumption as foreground and beam models are likely imperfect.14 The beam chromaticity factor calibration method is therefore prone to introducing significant biases. On the contrary, our method of deriving modes describing the beam-weighted foreground from a large sample of simulations and observations allows uncertainties in both the beam and foreground to be included in the analysis. However, as will be elaborated on in Section 5.2, it is important to note that our method is contingent on the ability of the simulations and observations to accurately describe the data being analyzed. SVD can only provide eigenmodes in the space spanned by the training set curves.

5.1.3. Toy Galaxy Model

In Section 3, in addition to the Haslam 408 MHz map, we also tested a toy model of galactic emission described by

Equation (27)

This is a model that treats the Galactic plane as an 8° FWHM Gaussian in colatitude, θ, with a maximum given by 300 K at the Galactic center and 200 K at the anticenter and asymptotes to 25 K away from the plane. This map is then scaled by ${[\nu /(408\mathrm{MHz})]}^{-2.5}$. Because the false troughs remain after this change (see, e.g., the dashed green line in the lower left panel of Figure 2), we infer that the chromaticity of the quadratic beams used in Section 3 interacts with the galactic plane as seen from the EDGES latitude to generate residuals that are better fit with a flattened Gaussian plus foreground model than the foreground model alone, which could lead to analyses falsely concluding that there are troughs in the sky-averaged radio spectrum.

Since the false troughs found in Section 3 are fits to aspects of the beam-weighted foreground, we urge EDGES to perform the same experiment and analysis from the northern hemisphere where the galaxy behaves very differently in the sky than from the southern hemisphere. Because of the different orientation of the galactic plane, the distortions caused by chromatic beams in our simulations are more easily fit by foreground models like that in Equation (6) at northern latitudes than southern latitudes.

5.1.4. Exploring Parameter Distributions

The natural end point of the EDGES-like analysis is not a maximum-likelihood fit like the one shown in Figure 1 of Bowman et al. (2018), on which Figure 2 was based. Instead, one wishes to explore the parameter distribution using a sampling method like MCMC exploration. However, the resulting distribution is only meaningful if the pieces of the model (i.e., foreground and signal models) accurately describe their respective components. In fact, parameter distributions from global 21 cm signal fits to observations are generated by a complex combination of the following four factors: (1) the foreground model parameterization, (2) the bias of the foreground model, (3) the signal model parameterization, and (4) the bias of the signal model.

Ideally, factors 2 and 4 are unimportant because the foreground and signal models can fit the true foreground and signal to within the noise level. However, it is impossible to fully verify that this is the case in practice, so their effects must be considered. Section 3 is not meant to explain the exact form and parameter distribution of the trough observed by EDGES, but instead to show that generic polynomial foreground models like those used by EDGES do not account for beam chromaticity and may lead to untrustable results. Essentially, we are pointing out that factor 2 is likely affecting the EDGES analysis.

In addition to generating bias in fits, residuals from fits with inaccurate foreground models may have led EDGES to choose the unjustified flattened Gaussian signal model, possibly furthering the signal bias of factor 4 (with respect to a more conventional signal model) silently while significantly reducing overall residuals.15

Due to the complex interactions between the four factors in generating a posterior distribution, MCMC exploration of distributions is outside the scope of this paper and is left for an extended examination in future work.

5.2. Minimum Assumption Analysis

5.2.1. Advantages and Limitations

While moving the experiment to a different location could provide evidence that the reported trough is a product of an inadequate foreground model, this does not solve the underlying problem—that the foreground model does not account for beam chromaticity.

The MAA proposed in Section 4 avoids this problem by making a basis specific to the given antenna by performing SVD on a training set of simulations made with that antenna's beam (see Section 4.2.2).

In addition, the MAA allows any possible global signal, avoiding unwanted bias from generating a signal model that interacts with the foreground model bias to produce false results. In fact, under the assumption that the foreground model generated by SVD is adequate, the MAA signal mean ${{\boldsymbol{\gamma }}}^{(21)}$ and covariance ${{\boldsymbol{\Delta }}}^{(21)}$ constitute a direct measurement not of a signal model, but of the signal itself. In fact, since all degrees of freedom are retained, a physical model can be fit directly to the constraints implied by ${{\boldsymbol{\gamma }}}^{(21)}$ and ${{\boldsymbol{\Delta }}}^{(21)}$ (such as those shown in the bottom panel of Figure 3).

Note that, for simplicity, the cosmic microwave background (CMB) is not included in the training set used in Section 4, which merely uses the Haslam map scaled with a spectral index of −2.5. If it was included (as it should be when analyzing data from a real experiment),16 then the signal mean ${{\boldsymbol{\gamma }}}^{(21)}$ would include a 2.725 K flat spectrum component that would need to be subtracted out.

Another important note about the MAA is that it is predicated upon the beam-weighted foreground training set provided to it. The results may change if the training set is changed, even if the data being analyzed remain the same. The training set used in Section 4 contained a wide variety of beam FWHMs but only one intrinsic foreground map. A different training set built using many intrinsic foreground maps and one beam might produce errors that differ significantly from the ones presented in Figure 3. However, even as different training sets would lead to different levels of precision, the accuracy of the MAA should remain steady as the training set changes as long as the true beam-weighted foreground is encompassed by the SVD eigenmodes of the training sets.17 One aspect of this effect is that some training sets will require more modes to be included in the foreground model than others, forcing one to degrade precision in order to retain accuracy.

5.2.2. Extension to Motion-induced Dipole

Slosar (2017) pointed out that, much like with the CMB, Doppler shifting of the 21 cm background induces a dipole component that is related to the monopole. In particular, the dipole component, $\delta {T}_{b}^{(\mathrm{dip})}$, of the 21 cm signal is related to the monopole spectrum, $\delta {T}_{b}^{(\mathrm{mon})}$, through18

Equation (28)

where β is the magnitude of our velocity with respect to the CMB as a fraction of the speed of light, and ψ is the angle between that velocity and the line of sight. Based on CMB observations, $\beta \approx 1.2\times {10}^{-3}$. Deshpande (2018) suggested that Equation (28) should be considered an essential qualifying test of any measurement of the global signal. Because (by Equation (28)) $\delta {T}_{b}^{(\mathrm{dip})}$ is linear in $\delta {T}_{b}^{(\mathrm{mon})}$, it could conceivably be included in the MAA in order for any fit to pass this test by default, essentially extending the minimal signal assumption introduced in Section 4.1. To do so, we note that the sum of the monopole and dipole components is

Equation (29)

where $s(\nu )=\delta {T}_{b}^{(\mathrm{mon})}(\nu )$. To determine how this would appear to a single-antenna experiment, we must multiply by the beam and integrate over all angles. Defining ${t}_{k}(\nu )\,\equiv \int {B}_{k}(\nu ,\theta ,\phi )\ \delta {T}_{b}^{(\mathrm{mon}+\mathrm{dip})}(\nu ,\psi )\ d{\rm{\Omega }}$ as the signature of the 21 cm signal in the spectrum measured at the $k\mathrm{th}$ time period (or, equivalently, pointing angle), we see that

Equation (30)

where ${a}_{k}(\nu )=\int {B}_{k}(\nu ,\theta ,\phi )\ \cos \psi \ d{\rm{\Omega }}$.19 This can be written in matrix notation where frequency channels correspond to the elements of vectors. In this way, the signature of the 21 cm signal in the $k\mathrm{th}$ spectrum is

Equation (31)

where ${{\boldsymbol{A}}}_{k}$ is the diagonal matrix with nonzero elements given by ak${\boldsymbol{N}}$ is the diagonal matrix with the frequencies of the data channels on the diagonal, and ${\boldsymbol{D}}$ is a derivative matrix (see Appendix E for subtleties involved in taking derivatives with a matrix). Now, combining all Ns spectra into one vector leaves ${\boldsymbol{t}}={\boldsymbol{\Psi }}{\boldsymbol{s}}$, where

Equation (32)

To extend the MAA to include the motion-induced dipole, one must merely plug this ${\boldsymbol{\Psi }}$ into Equations 19(a) and 19(b). The effect of including the motion-induced dipole on the results of the MAA is left for future work.

5.3. Between the Two Extremes

In past work (Tauscher et al. 2018, 2020; Rapetti et al. 2019), we laid out an SVD-based pipeline for extracting the 21 cm global signal from the large beam-weighted foregrounds. That analysis is identical in its foreground basis generation to the MAA from Section 4. However, it differs in that the signal is restricted to a specific model, either a physically motivated one or one created from performing SVD on a signal training set. So, in a sense, the pipeline described in those works is between the two extremes discussed in this paper of strong assumptions (EDGES-like analysis) and robust assumptions (MAA).

6. Conclusions

In this paper, we formulated a list of general assumptions for global 21 cm signal analyses. These include assumptions about the calibration of the instrument (Assumption 1), the distribution of the noise (Assumption 2), the adequacy of the foreground model (Assumption 3), and the form of the signal (Assumption 4). We then contrasted two different specific forms of these assumptions: an EDGES-like analysis and a new, so-called MAA.

The EDGES-like analysis is performed on a single spectrum with a polynomial-based foreground model and a flattened Gaussian signal model. We presented fits of simulations using zero signal and the Haslam map (as seen from the EDGES latitude) scaled with a constant spectral index and weighted by four different Gaussian beams with quadratic FWHMs. Two of these fits resulted in large, flattened Gaussian troughs near 78 MHz, like the one reported by the EDGES team. These troughs also appeared when the Haslam map was replaced by a toy model of galactic emission that uses a 25 K background temperature (at 408 MHz) and models the galactic plane as a Gaussian in latitude with a peak of 300 K at the galactic center and 200 K at the anticenter, showing that the interaction of the beam with the Galactic plane introduces troughs.

This vulnerability to false troughs is due to the inadequacy of the foreground model, which does not account for beam chromaticity. While the only real solution to this problem is to modify the analysis technique, steps such as moving the experiment to the northern hemisphere, where the galaxy appears very differently, should at least modify (and potentially decrease) any false troughs that are found.

The MAA, on the other hand, is performed on many time-binned spectra (see also Tauscher et al. 2020) instead of one spectrum. Also, instead of assuming a specific model for the signal, it allows for any possible spectrum that is the same across each time bin. In addition, the foreground model accounts for beam chromaticity because the basis vectors are taken from applying SVD to a training set of foregrounds weighted by the beam of the specific antenna being used (Tauscher et al. 2018). Given the beam-weighted foreground training set employed, we found that, under these assumptions, any signal can be measured with uncertainties within an order of magnitude of the noise level.

While the MAA could be considered the most robust, conservative form of the assumptions specified in Section 2, if there are well-motivated theoretical models for the signal, the pipeline we laid out in Tauscher et al. (2018, 2020) and Rapetti et al. (2019) can be applied. That method uses training sets for both the beam-weighted foreground and the global signal to create models for each of them. Ultimately, experimenters can decide if a theoretical model of the signal (not just based on residuals from the data with respect to the intrinsic foreground model, as in the case of the flattened Gaussian) is rigorous enough to be explored using our full pipeline, for stronger constraints. However, given the current theoretical uncertainties, if the MAA is possible for the selected beam-weighted foreground model, we recommend starting with this analysis before selecting a physical model of the signal because it allows for any signal—even those that are unexpected—and may guide the model selection procedure.

We thank Neil Bassett and Joshua Hibbard for useful discussions during the development of this study. We also appreciate detailed discussions with Judd Bowman and Raul Monsolve about the EDGES data analysis approach. This work was supported by the National Aeronautics and Space Administration (NASA) under award number NNA16BD14C for NASA Academic Mission Services. This work is directly supported by the NASA Solar System Exploration Virtual Institute cooperative agreement 80ARC017M0006.

Appendix A: MAA General Calculation

In this appendix, we will compute the minimum assumption maximum likelihood for signal reconstruction and the uncertainties on that reconstruction when the signal expansion matrix is ${\boldsymbol{\Psi }}$ (i.e., the signal ${{\boldsymbol{y}}}^{(21)}$ appears in the data as ${\boldsymbol{\Psi }}{{\boldsymbol{y}}}^{(21)}$), the noise covariance is ${\boldsymbol{C}}$, and the foreground basis matrix is ${\boldsymbol{F}}$.

The signal assumption may be implemented by assuming ${{\boldsymbol{y}}}^{(21)}={\boldsymbol{A}}{{\boldsymbol{x}}}^{(21)}$ for an invertible20 basis matrix ${\boldsymbol{A}}$.21 The model for the full data is

Equation (A1a)

Equation (A1b)

This can also be written ${\boldsymbol{ \mathcal M }}({\boldsymbol{x}})={\boldsymbol{G}}{\boldsymbol{x}}$, where ${\boldsymbol{G}}=\left[\begin{array}{c}{\boldsymbol{F}}\ \ {\boldsymbol{\Psi }}{\boldsymbol{A}}\end{array}\right]$, ${\boldsymbol{x}}=\left[\begin{array}{c}{{\boldsymbol{x}}}^{(\mathrm{fg})}\\ {{\boldsymbol{x}}}^{(21)}\end{array}\right]$, and ${\boldsymbol{F}}$ is the foreground basis that applies to all of the spectra simultaneously, as opposed to the single-spectrum basis laid out in Section 2. The likelihood function is therefore

Equation (A2)

The maximum-likelihood values of ${\boldsymbol{x}}$, ${\boldsymbol{\xi }}$, and its covariance, ${\boldsymbol{S}}$, are given by

Equation (A3)

These are easiest to calculate when ${\boldsymbol{F}}$ and ${\boldsymbol{\Psi }}{\boldsymbol{A}}$ are normalized through ${{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}}$ and ${{\boldsymbol{A}}}^{T}{{\boldsymbol{\Psi }}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{\Psi }}{\boldsymbol{A}}={\boldsymbol{I}}$. Under these conditions, the covariance of the signal parameters is

Equation (A4)

where ${\boldsymbol{\Phi }}$ is the projection matrix described in Section 2, which is given by ${\boldsymbol{\Phi }}={\boldsymbol{F}}{{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}$ under the current normalization conditions. This means that the channel covariance of the signal distribution is ${{\boldsymbol{\Delta }}}^{(21)}={\boldsymbol{A}}{{\boldsymbol{S}}}^{(21)}{{\boldsymbol{A}}}^{T}$, which can be written as

Equation (A5)

The mean of the signal parameters is

Equation (A6)

and the channel mean of the signal distribution is ${{\boldsymbol{\gamma }}}^{(21)}={\boldsymbol{A}}{{\boldsymbol{\xi }}}^{(21)}$, or

Equation (A7)

To satisfy our normalization condition, the signal basis matrix ${\boldsymbol{A}}$ must satisfy ${{\boldsymbol{A}}}^{T}{{\boldsymbol{\Psi }}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{\Psi }}{\boldsymbol{A}}={\boldsymbol{I}}$. Multiplying on the left by ${({\boldsymbol{A}}{{\boldsymbol{A}}}^{T})}^{-1}{\boldsymbol{A}}$ and on the right by ${{\boldsymbol{A}}}^{-1}$, we find that ${({\boldsymbol{A}}{{\boldsymbol{A}}}^{T})}^{-1}={{\boldsymbol{\Psi }}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{\Psi }}$. Therefore, Equation (A5) becomes

Equation (A8)

This means that if any vector in the column space of ${\boldsymbol{\Psi }}$, that is, any vector of the form ${\boldsymbol{\Psi }}{\boldsymbol{z}}={\left[{{\boldsymbol{z}}}^{T}{{\boldsymbol{z}}}^{T}\cdots {{\boldsymbol{z}}}^{T}\right]}^{T}$, can be represented by the foreground vectors, then the uncertainties are infinite. Plugging Equation (A8) into (A7), we find that the signal channel mean is

Equation (A9)

Appendix B: MAA with Only Total Power Spectra

As mentioned in Section 4.1, when there are Ns spectra concatenated together in the data curve being fit, that is, ${\boldsymbol{y}}={\left[{{\boldsymbol{y}}}_{1}^{T}{{\boldsymbol{y}}}_{2}^{T}\cdots {{\boldsymbol{y}}}_{{N}_{s}}^{T}\right]}^{T}$, the signal expansion matrix ${\boldsymbol{\Psi }}$ is given by ${\boldsymbol{\Psi }}={\left[{\boldsymbol{I}}{\boldsymbol{I}}\cdots {\boldsymbol{I}}\right]}^{T}$. Assuming the different spectra have independent noise, we can write the covariance ${\boldsymbol{C}}$ in this case as

Equation (B1)

We split ${\boldsymbol{F}}$ into Ns different components through ${\boldsymbol{F}}\,={\left[{{\boldsymbol{F}}}_{1}^{T}{{\boldsymbol{F}}}_{2}^{T}\cdots {{\boldsymbol{F}}}_{{N}_{s}}^{T}\right]}^{T}$. This means that the normalization condition of the foreground basis matrix, ${{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{F}}={\boldsymbol{I}}$, becomes ${\sum }_{k=1}^{{N}_{s}}{{\boldsymbol{F}}}_{k}^{T}{{\boldsymbol{C}}}_{k}^{-1}{{\boldsymbol{F}}}_{k}={\boldsymbol{I}}$, the foreground projection matrix is

Equation (B2)

the signal channel covariance matrix is

Equation (B3)

and the signal channel mean is

Equation (B4)

Appendix C: MAA with Polarization

In this appendix, we will lay out the form of the MAA when the data vector is the concatenation of 4Nb spectra containing all four Stokes parameters at Nb time bins:

Equation (C1)

The signal appears only in the Stokes I spectra, so the signal expansion matrix is

Equation (C2)

Since the four Stokes parameters of the $k\mathrm{th}$ time bin have roughly the same noise covariance,22 ${{\boldsymbol{C}}}_{k}$, we can write the full noise covariance matrix as

Equation (C3)

In this case, the foreground basis returned by the SVD algorithm is given by

Equation (C4)

which implies that the normalization convention for the foreground basis matrix is ${\sum }_{k=1}^{{N}_{s}}{\sum }_{S\in \{I,Q,U,V\}}{{\boldsymbol{F}}}_{k,S}^{T}{{\boldsymbol{C}}}_{k}^{-1}{{\boldsymbol{F}}}_{k,S}={\boldsymbol{I}}$, and the foreground projection matrix is

Equation (C5)

Very similarly to Equation (B3), the signal channel covariance given by Equation 19(a) is

Equation (C6)

The signal channel mean is

Equation (C7)

For computation purposes, we define the total signal noise covariance, ${{\boldsymbol{C}}}_{T}$, the weighted total power basis, ${\boldsymbol{H}}$, the weighted total power data, ${\boldsymbol{w}}$, and the overlap vector of the data, ${\boldsymbol{d}}$, through

Equation (C8)

With these definitions, ${{\boldsymbol{\Delta }}}^{(21)}$ and ${{\boldsymbol{\gamma }}}^{(21)}$ are given by

Equation (C9)

These equations also work when calculating the channel mean and covariance with total power spectra only as long as the sum over S in the definition of ${\boldsymbol{d}}$ includes only I.

Appendix D: Signal Bias Statistic under MAA

In this appendix, we examine the effect of biases in the foreground model, that is, the effect of nonzero ${\boldsymbol{\delta }}$ on the uncertainties of minimum assumption analyses. To do this, we write the data as ${\boldsymbol{y}}={\boldsymbol{F}}{\boldsymbol{x}}+{\boldsymbol{\delta }}+{\boldsymbol{\Psi }}{\boldsymbol{z}}+{\boldsymbol{n}}$, where ${\boldsymbol{\delta }}$ is the foreground bias (i.e., unmodeled foreground) that satisfies ${{\boldsymbol{F}}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{\delta }}=0$, ${\boldsymbol{F}}{\boldsymbol{x}}$ is the modeled foreground, ${\boldsymbol{z}}$ is the true signal, and ${\boldsymbol{n}}$ is the Gaussian noise realization with covariance ${\boldsymbol{C}}$. With these definitions, the signal channel mean ${{\boldsymbol{\gamma }}}^{(21)}$ is given by

Equation (D1)

Since ${\boldsymbol{\Phi }}{\boldsymbol{F}}{\boldsymbol{x}}={\boldsymbol{F}}{\boldsymbol{x}}$ and ${\boldsymbol{\Phi }}{\boldsymbol{\delta }}=0$, this means

Equation (D2a)

Equation (D2b)

Equation (D2c)

This means that the signal channel bias is

Equation (D3)

The so-called signal bias statistic, ${\varepsilon }^{2}$, defined through ${\varepsilon }^{2}=\tfrac{1}{{N}_{\nu }}{({{\boldsymbol{\gamma }}}^{(21)}-{\boldsymbol{z}})}^{T}{\left[{{\boldsymbol{\Delta }}}^{(21)}\right]}^{-1}({{\boldsymbol{\gamma }}}^{(21)}-{\boldsymbol{z}})$, which yields the squared number of sigma at which the signal is measured in an averaged sense across the band, satisfies

Equation (D4)

where Nν is the number of frequency channels. Assuming that ${\boldsymbol{\delta }}$ follows a normal distribution (with a singular covariance matrix), we find that the expectation value and variance of ${\varepsilon }^{2}$ are

Equation (D5a)

Equation (D5b)

where ${\boldsymbol{A}}=\tfrac{1}{{N}_{\nu }}{({\boldsymbol{I}}-{\boldsymbol{\Phi }})}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{\Psi }}{{\boldsymbol{\Delta }}}^{(21)}{{\boldsymbol{\Psi }}}^{T}{{\boldsymbol{C}}}^{-1}({\boldsymbol{I}}-{\boldsymbol{\Phi }})$, ${\boldsymbol{\mu }}=E[{\boldsymbol{\delta }}]$, and ${\boldsymbol{\Sigma }}=\mathrm{Cov}[{\boldsymbol{\delta }}]$. Assuming that ${\varepsilon }^{2}$ approximately follows a normal distribution, this means that, at a confidence level of p,

Equation (D6)

This equation connects the rms number of sigma, ε, to confidence levels, p, in the general case where the foreground model is imperfect.

Appendix E: Taking Derivative of Finite-dimensional Signal with Matrix

In Section 5.2.2, the frequency derivative of a signal ${\boldsymbol{s}}={\left[s({\nu }_{1})s({\nu }_{2})\cdots s({\nu }_{{N}_{\nu }})\right]}^{T}$ is written as ${\boldsymbol{D}}{\boldsymbol{s}}$. There are some subtleties to this representation. First, the discrete nature of ${\boldsymbol{s}}$ means that any derivative taken will be a finite difference approximation. Second, note that the derivative of a vector with ${N}_{\nu }$ elements is only strictly defined on the ${N}_{\nu }-1$ midpoints. The derivative matrix defined in this way is $({N}_{\nu }-1)\times {N}_{\nu }$ and in the case of equally spaced frequency channels with resolution ${\rm{\Delta }}\nu $ is given by

Equation (E1)

To define ${\boldsymbol{D}}{\boldsymbol{s}}$ in the same space as ${\boldsymbol{s}}$, the derivative must be interpolated in some way (i.e., ${\boldsymbol{D}}$ must be made square in some way). A natural interpolation scheme is to take the derivative at the first (last) end point to be the derivative at the midpoint of the first (last) pair of elements and to take the derivative at each interior point to be the average of the derivatives at the two adjacent midpoints. Defined in this way, the derivative matrix from Equation (E1) is modified to

Equation (E2a)

Equation (E2b)

This is the derivative matrix referred to in Section 5.2.2.

Footnotes

  • If the signal, ${{\boldsymbol{y}}}^{(21)}$, was zero and there was no random noise, then ${\boldsymbol{\alpha }}$ would be equal to ${{\boldsymbol{a}}}_{\mathrm{ML},\mathrm{fg} \mbox{-} \mathrm{only}}$. However, in the general case where there is a nonzero signal, ${{\boldsymbol{a}}}_{\mathrm{ML},\mathrm{fg} \mbox{-} \mathrm{only}}$ absorbs some of the signal spectrum. In addition, ${{\boldsymbol{a}}}_{\mathrm{ML},\mathrm{fg} \mbox{-} \mathrm{only}}$ is subject to small biases caused by random noise, whereas in this formulation ${\boldsymbol{\alpha }}$ is the best-fitting coefficient vector of the ideal, noiseless foreground (i.e., the one that would be measured given infinite integration time with a stable instrument).

  • Usually, this distribution is a zero-mean Gaussian specified entirely by its covariance matrix, ${\boldsymbol{C}}$, which must be known up to a constant of proportionality. The proportionality constant must be sufficiently close to unity if we also wish to measure goodness-of-fit.

  • 26fdg7 S, 116fdg6 E.

  • These end points are similar to those of the beam of the blade antenna used by EDGES (see extended data in Figure 4(a) of Bowman et al. 2018).

  • We use the minimize function from scipy.optimize to minimize the negative log-likelihood.

  • 10 

    Weighted SVD refers to a decomposition of the form ${\boldsymbol{B}}={\boldsymbol{U}}{\boldsymbol{\Sigma }}{{\boldsymbol{V}}}^{T}$ where ${{\boldsymbol{U}}}^{T}{{\boldsymbol{C}}}^{-1}{\boldsymbol{U}}={\boldsymbol{I}}$, ${{\boldsymbol{V}}}^{T}{\boldsymbol{V}}={\boldsymbol{I}}$, and ${\boldsymbol{\Sigma }}$ is a rectangular diagonal matrix with decreasing nonnegative elements on the diagonal. The basis matrix ${\boldsymbol{F}}$ is made from the first NF columns of ${\boldsymbol{U}}$.

  • 11 

    For the sake of simplicity, results using the MAA with polarization are not shown in this paper, but the equations of Appendix C will prove useful in future work.

  • 12 

    In each of these ∼15 minute chunks, the foreground is smeared as shown in Figure 1.

  • 13 

    The only situation in which ${{\boldsymbol{\delta }}}^{\mathrm{chromatic}}$ is necessarily zero for all beams, ${\boldsymbol{B}}(\theta ,\phi )$, occurs when ${\boldsymbol{I}}={\boldsymbol{\Phi }}$. But, this implies that projecting into the span of the foreground basis has no effect, which can only happen if the foreground basis is complete. In this case, the signal model will be degenerate with the foreground model and therefore cannot be extracted.

  • 14 

    If the reference beam and foreground were equal to the true beam and foreground, then one would know the exact beam-weighted foreground and should simply subtract it from the measured spectrum, leaving only noise and the 21 cm signal.

  • 15 

    If true, this would be a case of factor 2 affecting factor 3. For robustness reasons, it is important to justify the signal model in a way external to the observations themselves.

  • 16 

    More precisely, the CMB should be subtracted from the foreground model used to generate the training set so that it can be found by the signal fitting and subtracted after the fact.

  • 17 

    Here, we use precision to refer to the size of uncertainties, which is known even when the true 21 cm signal is unknown, and accuracy to refer to the size of the difference between the signal reconstruction and the true signal with respect to the size of the uncertainties.

  • 18 

    Note that this assumes that there is no intrinsic (i.e., non-motion-induced) dipole of the 21 cm signal.

  • 19 

    Note that ${a}_{k}(\nu )$ will not be known exactly because the beam will not be known exactly. However, since the dipole component will be on the order of 10 mK (see Slosar 2017) and ${a}_{k}(\nu )$ will be of order unity, the expected imprecision levels in the beam, which should be at the subpercent level, should not impact the results significantly.

  • 20 

    Note that invertible matrices are square, meaning that there are as many parameters in ${{\boldsymbol{x}}}^{(21)}$ as there are frequencies in ${{\boldsymbol{y}}}^{(21)}$.

  • 21 

    ${\boldsymbol{A}}$ could be assumed to be the identity matrix, but the analytical calculations can be completed more easily if the signal basis matrix is normalized.

  • 22 

    If Q/I, U/I, and V/I have magnitudes of order x, then the fractional difference between the noise levels on I, Q, U, and V is of order x2. See Tauscher et al. (2020) for exact calculations of Stokes parameters noise levels. For our purposes, it is best just to assume all four Stokes parameters have the same noise level.

Please wait… references are loading.
10.3847/1538-4357/ab9a3f