Abstract
The sky-averaged (global) highly redshifted 21 cm spectrum from neutral hydrogen is expected to appear in the VHF range of ∼20–200 MHz and its spectral shape and strength are determined by the heating properties of the first stars and black holes, by the nature and duration of reionization, and by the presence or absence of exotic physics. Measurements of the global signal would therefore provide us with a wealth of astrophysical and cosmological knowledge. However, the signal has not yet been detected because it must be seen through strong foregrounds weighted by a large beam, instrumental calibration errors, and ionospheric, ground, and radio-frequency-interference effects, which we collectively refer to as "systematics." Here, we present a signal extraction method for global signal experiments which uses Singular Value Decomposition of "training sets" to produce systematics basis functions specifically suited to each observation. Instead of requiring precise absolute knowledge of the systematics, our method effectively requires precise knowledge of how the systematics can vary. After calculating eigenmodes for the signal and systematics, we perform a weighted least square fit of the corresponding coefficients and select the number of modes to include by minimizing an information criterion. We compare the performance of the signal extraction when minimizing various information criteria and find that minimizing the Deviance Information Criterion most consistently yields unbiased fits. The methods used here are built into our widely applicable, publicly available Python package, pylinex, which analytically calculates constraints on signals and systematics from given data, errors, and training sets.
Export citation and abstract BibTeX RIS
1. Introduction
The highly redshifted 21 cm spectrum from neutral hydrogen in the early universe has become an important observational target in the astrophysics community. This is because it fills outstanding gaps in our knowledge of the universe between the formation of the first neutral atoms years after the Big Bang (recombination) and when the earliest stars seen to date by the Hubble Space Telescope existed about years later. One of these knowledge gaps concerns the heating of the gas in the intergalactic medium (IGM) around the time when the first stars formed and the first black holes began accretion. This heating changes the spin temperature of the IGM gas, changing how strongly it emits the 21 cm line (Madau et al. 1997; Cohen et al. 2017; Mirocha et al. 2017a). Since the 21 cm line is a quantum transition of neutral hydrogen (Ewen & Purcell 1951), its brightness temperature is modulated by the neutral fraction of the IGM gas. The other knowledge gap addressable by the 21 cm signal is therefore the nature of the reionization of the universe's hydrogen, including both when it started and how it progressed. Furlanetto et al. (2006), Loeb & Furlanetto (2013), Morales & Wyithe (2010), and Pritchard & Loeb (2012) provide in-depth studies of 21 cm cosmology.
Experiments made to measure the highly redshifted 21 cm line come in two different varieties—power spectrum and global signal—with different science goals. The power spectrum reflects spatial anisotropies in 21 cm emission, whereas the global signal quantifies its isotropic component. For both cases, but particularly for the former, physical parameter inference is difficult because the posterior distributions are broad and the likelihood functions needed to numerically explore them often require a prohibitively long time to calculate for a Markov Chain Monte Carlo (MCMC) exploration. Recently, to avoid this challenge, Kern et al. (2017) proposed an emulation technique, which uses a Gaussian Process model to drastically speed up the evaluation of the likelihood function for the Hydrogen Epoch of Reionization Array power spectrum experiment (DeBoer et al. 2017). For a similar purpose, Schmit & Pritchard (2018) proposed a different emulation technique based on a neural network. In contrast to these approaches, which maintain a connection between curves and the physical parameters that generated them, in this paper, we explore our likelihood analytically by using linear basis functions encoding information from training sets.
Although there has not yet been a detection of the 21 cm spectrum, some recent limits have been placed on both the power spectrum (Paciga et al. 2013; Dillon et al. 2014, 2015; Parsons et al. 2014; Ali et al. 2015; Jacobs et al. 2015, see also Figure 9 in DeBoer et al. 2017 for a comprehensive review) and the global signal (Bowman & Rogers 2010; Voytek et al. 2014; Bernardi et al. 2016; Monsalve et al. 2017b; Singh et al. 2017).
Although some have suggested that interferometers may be suited to measure the global signal (Presley et al. 2015; Singh et al. 2015), most experimental efforts use single antennas. Experiments and mission concepts to measure the global 21 cm signal include the Experiment to Detect the Global EoR Signature (EDGES; Bowman & Rogers 2010; Monsalve et al. 2017a; Monsalve et al. 2017b), the Dark Ages Radio Explorer (Burns et al. 2012, 2017), the Shaped Antenna measurement of the background RAdio Spectrum (Patra et al. 2013; Singh et al. 2017), the Sonda Cosmológica de las Islas para la Detección de Hidrógeno Neutro (Voytek et al. 2014), the Zero-spacing Interferometer Measurements of the Background Radio Spectrum (Mahesh et al. 2014), the Large-aperture Experiment to detect the Dark Ages (Bernardi et al. 2015, 2016; Price et al. 2017), and the Broadband Instrument for Global HydrOgen ReioNisation Signal (Sokolowski et al. 2015).
The main impediment to all of these experiments is the vast strength of the foregrounds ( K) relative to the expected strength of the 21 cm signal (∼10–100 mK). Since the observations must be made at VHF radio frequencies (∼20–200 MHz), the instrument making the observations will have a very large beam that will corrupt the intrinsically smooth foreground spectrum by imprinting its own spatial/spectral characteristics onto it. Likewise, the radiometer system will leave its own features in the data (some visible even after calibration). As these antenna and receiver features are characteristic of the particular instrument, we use Gaussian beams and do not include calibration errors in the results of this paper.
In order to fit or remove the immense beam-weighted foregrounds observed by global signal experiments, a foreground model must be chosen. In most analyses, this model is a polynomial or a polynomial multiplied by a power law. However, these models can also be used to fit spectral features which look like the expected 21 cm signal, leading to large covariances between the foreground and signal models and, hence, large errors and low result significance. Some have attempted to circumnavigate this by proposing a model that only considers polynomials that have special smoothness properties (Sathyanarayana Rao et al. 2017). This model is "loss resistant," meaning that when the model is used to fit spectra, it cannot fit shapes similar to the signal. Use of this model, however, requires all spectral features introduced into the data by the instrument which are not as smooth as the foreground to have been removed completely in some cleaning process. The effectiveness of this technique relies on knowledge of all relevant aspects of the instrument (e.g., antenna and receiver reflection coefficients, receiver noise temperature, etc.); if the knowledge is inaccurate, the cleaning process and the rigidity of the foreground model introduce biases. In contrast to this method, here we present an analysis which, instead of requiring precise knowledge of instrument parameters, requires precise knowledge of the modes in which the instrument parameters can vary (e.g., in frequency)—knowledge more feasibly extracted from simulations and lab measurements. This knowledge allows the errors introduced by imperfect cleaning of the data to be fit consistently.
In this first paper in the series, we propose a method in which we simulate a given experiment multiple times to create a "training set." Then, Singular Value Decomposition (SVD)—also commonly known as Principal Component Analysis and EigenValue Decomposition—is used to extract from this training set linear basis functions describing the systematics. After a similar process is performed using a global signal training set, the signal and systematics basis functions are combined to fit the spectra, yielding an estimate of the 21 cm global signal based solely on the input training sets and the data. This is an extremely fast process that also allows us to extract the signal in a model independent parameter space with a unimodal distribution. The second paper in this series will address how the pipeline takes advantage of initial results such as those found here by facilitating the initialization of an MCMC search to constrain physical parameters (D. Rapetti et al. 2018, in preparation; referred as Paper II hereafter). Paper III (K. Tauscher et al. 2018, in preparation) will then detail how, with a realistic instrument, we can take advantage of instrument-induced polarization to differentiate the power of anisotropic sources like our Galaxy from the power of isotropic sources like the 21 cm global signal (see Nhan et al. 2017).
Others have proposed using SVD in the context of 21 cm cosmology. Chang et al. (2010) were the first to suggest that SVD could be used to fit out foregrounds of 21 cm experiments. Vedantham et al. (2014) and Switzer & Liu (2014) both put forth SVD as a possible method to model systematics in data acquired to measure the 21 cm signal. Unlike those works, we develop systematic modes from a training set based on instrumental degrees of freedom and published maps of Galactic emission, rather than infer modes from the data themselves. Our approach is similar to Leistedt & Peiris (2014), who identified principal contaminant modes in a quasar survey based on known systematic maps.
One important aspect of our method is the choice of where to truncate the modes of both signal and systematics. We perform this choice through a grid search for the global minimum of a given Information Criterion (IC). By running the extraction algorithm with many different inputs and changing the IC, which is minimized, we compare the statistical accuracy of the signal extractions resulting from each IC. The IC we consider are Akaike's Information Criterion (AIC; Akaike 1974), the Bayesian Information Criterion (BIC; Schwarz 1978), the Bayesian Predictive Information Criterion (BPIC; Ando 2007), and the Deviance Information Criterion (DIC; Spiegelhalter et al. 2002, 2014). Astrophysics has seen studies of model selection using IC in various contexts (see, e.g., Porciani & Norberg 2006; Liddle 2007). However, they come to differing conclusions of which is the best to use, with some preferring the BIC and others the DIC. Using a statistical measure of the bias in our separation of components of the data (e.g., signal and systematics), we offer a method of choosing which IC to minimize in order to better separate those components.
This work also includes a Python package named pylinex (Linear Extraction in Python), described in Appendix A, implementing the methods described in Section 2. Since its methods are so general, pylinex, which fits for signals hidden in large systematics and Gaussian noise, is a quick and efficient statistical tool with applications outside of 21 cm cosmology and even outside of astrophysics.
2. Methodology
2.1. Framework
Consider a data vector , which is a combination of N different data components and Gaussian noise ,
In our case, would be the global 21 cm signal and for would include all identified systematic effects (e.g., foregrounds). The matrices —referred to as "expansion matrices" here since they allow the data components, , to exist in different (usually smaller) spaces than the full data vector, —encode the observation strategy used in obtaining the data. For example, if the data vector consists of the concatenation of two spectra which contain the same signal but different systematics, then the relevant expansion matrices are
The expansion matrices can also be used to model situation dependent modulation of data components5 or linear transformations of data.6 Note that if the different components of the data, , have different sizes, then the corresponding expansion matrices, , will have different numbers of columns but the same number of rows.
Our task is to separate a single component of the data (say ) from . To do so, we model less noise as a sum similar to Equation (1),
In this expression, is a matrix with basis vectors for fitting as its columns and is a column vector of coefficients modulating those basis vectors. With the definitions
the model can be written . If the model is adequate, the data vector is real,7 the true model parameters are , and the noise has covariance , then, up to a constant, the probability of observing is
To form the posterior parameter distribution from this likelihood, we must assume a form for the prior parameter distribution. Here, we use the least informative prior possible:8 one that is flat over the entirety of parameter space and any transformation of it.9 This means that the likelihood function is proportional to the posterior distribution, which is therefore Gaussian in with covariance and mean given by
The maximum likelihood reconstruction of , , its channel covariance, , and its averaged root-mean-square error, , are given by
where is the portion of containing parameters modeling , is the diagonal block of the covariance matrix corresponding to those parameters, and nk is the number of data channels in the basis. These expressions are also valid without the k subscript, where they represent the same quantities for reconstructions of the full data. When removing the k subscript, is replaced by .
2.2. Training Sets
The expressions above are valid for any choice of the basis vectors in .10 However, only sets of basis vectors, , which capture the variability of their respective data components, , will provide meaningful constraints. To automate the intelligent choice of such sets of basis vectors, we assume that, for all , it is possible to simulate a set of curves , which reasonably characterizes the most important modes of variation of . These modes are then encoded in and used in fitting the full data, . For this reason, the set of curves is known as a training set. The training set is described by an matrix, , with the curves as its columns. For the rest of Section 2.2, it is assumed that we are speaking about only one of the N different components of the data, so the k subscripts and superscripts are neglected.
We define the basis for the given data component as the matrix , which, under the constraint (where ), minimizes the Total Weighted Square Error (TWSE) of the fits to all curves in the training set, given by
where is a diagonal matrix of weights Wi,11 and projects out components of in the subspace of . The desired basis is given in terms of the weighted SVD of , which takes the form , where is diagonal with decreasing non-negative numbers on the diagonal, , and (see Appendix B for implementation details). Given a number of basis vectors to choose η, the basis defined above is given by the first η columns of .
SVD is a reliable way of capturing the modes of variation of a single training set, but if it is performed independently on all components of the data, it may not yield the optimal set of basis vectors for the present purpose, which is separating the different components when they are combined in the same data set and fit simultaneously. Nevertheless, in lieu of a more sophisticated technique, we perform SVD independently on each training set.
2.3. Model Selection
To select from the models formed by different truncations of the SVD basis sets, we set up a framework within which we test figures of merit, known as information criteria, based on the competition between two terms, the goodness-of-fit term that measures the bias in the fit to the data and the complexity term that penalizes the number of parameters used in that fit. We consider the following information criteria for every truncation under consideration: the DIC (Spiegelhalter et al. 2002, 2014), the BPIC (Ando 2007), and the BIC (Schwarz 1978). For our likelihood function (Equation (4)), up to constants independent of the parameters, these are given by
is the total number of varying modes across all N sets of basis vectors, Nc is the total number of data channels, , , and . Note that while the goodness-of-fit remains the same across the information criteria, the complexity term varies. The AIC (Akaike 1974) and a variant of the DIC where the complexity term is based on the variance of the log-likelihood (Gelman et al. 2013, page 173) were also considered but since our model is linear, they are both equivalent to the DIC in Equation 8(a). When truncating, i.e., selecting between our nested SVD models, we choose the model that minimizes the desired criterion. We investigate which information criterion works best in our analysis in Section 3.2.
3. 21 cm Global Signal Application
3.1. Simulated Data and Training Sets
To illustrate our methods, we propose a simple, simulated experiment to measure the global 21 cm signal using dual-polarization antennas that yield data for all four Stokes parameters at frequencies between 40 and 120 MHz. For simplicity, we ignore all systematic effects other than beam-weighted foreground emission, such as human generated Radio Frequency Interference (RFI), refraction, absorption, and emission due to Earth's ionosphere, and receiver gain and noise temperature variations. The experiment proposed here is similar to a pair of antennas orbiting above the lunar farside, where the ionospheric effects and RFI need not be addressed (Burns et al. 2017). An instrument training set corresponding to a realistic antenna and receiver will be included in the analyses of Papers II and III.
The data product of our simulated experiment is a set of 96 brightness temperature spectra. The spectra correspond to four Stokes parameters and six different rotation angles for four different antenna pointing directions. The data vector, , consists of the concatenation of all of the spectra. The noise level of the data, σ, is roughly constant across the different Stokes parameters and is related to the total power (Stokes I) brightness temperature, Tb, through the radiometer equation, , with a frequency channel width of 1 MHz and an integration time of 1000 hr (split between the different antenna pointing directions and rotation angles about those directions). The data are split into N = 5 different components—one for the 21 cm signal and one for the beam-weighted foregrounds (which are correlated across boresight angles and frequency) of each pointing. The signal is the same across all four pointings, while the foregrounds for each pointing only affect the data from that pointing. The expansion matrices encode this fact.
To create the beam-weighted foreground data for each of the four antenna pointing directions, we use the simulation framework of Nhan et al. (2017), henceforth referred to as N17. Each Stokes parameter, ζ, observed by the instrument is given by an expression of the form , where is the galaxy brightness temperature and is the differential solid angle. As in N17, the four relevant beams at frequency ν, polar angle θ, and azimuthal angle ϕ are of the form
where and is the angular extent of the beam as a function of frequency. The beam's polarization response converts intensity anisotropy into apparent polarization, while the monopole (cosmological signal) averages to zero in the instrumental Stokes Q and U channels. Measurement in the polarization channels therefore provides discrimination of foreground modes from the signal. Furthermore, rotation about the boresight is used to modulate the polarized components. N17 used this method assuming a spectrally invariant beam, and a sky following a single power law in frequency. Here, with the aid of training sets, we extend the method to allow for spectrally varying beams and an arbitrary sky model. The Galaxy map used in this paper is a spatially dependent power-law interpolation between the maps provided by Haslam et al. (1982) and Guzmán et al. (2011). The beam-weighted foreground training sets are created using 125 Gaussian beams described by Equation (9) with varying quadratic models of . Figure 1 shows the training set for one of the four antenna pointing directions and some of the corresponding basis functions. Although V = 0 in this work because , in real 21 cm global signal experiments with polarimetry, since we expect no circularly polarized light to be incident on the antenna, V can contain useful information about instrument variations.
The 21 cm global signal training set and a few of the SVD basis functions it provides are shown in Figure 2. The training set was created by varying the parameters of the Accelerated Reionization Era Simulations (ares) code.12 See Mirocha et al. (2015, 2017a, 2017b) for information on the signal models used by ares. In this paper, we ignore the parameter values that make the signals in the training set because we focus only on measuring the brightness temperature profile of the 21 cm signal, not on the astrophysical or cosmological parameters that generated it. Paper II will address inference on these parameters.
Download figure:
Standard image High-resolution image3.2. Optimized Truncation
The quality of the weighted least squares fit to the data given by the equations in Section 2 depends on the number of parameters describing the signal, , and the number of parameters describing the foregrounds of each antenna pointing direction, . For each and ,13 we evaluated all of the information criteria in Section 2.3. A section of the 60 × 30 grid of DIC values for a typical fit where both signal and foregrounds are taken from their training sets is shown in Figure 3. The model with the lowest DIC is the one with and . See Section 3.4 for a comparison between the qualities of the fit signals generated by minimizing each of the information criteria.
Download figure:
Standard image High-resolution image3.3. Parameter Covariance
The parameter covariance matrix, (Equation 5(a)), corresponding to the DIC-chosen model from Figure 3 is shown in Figure 4. To aid in intuition about , the set of all basis vectors and their overlaps (with respect to the noise covariance, ) can be viewed as a complete graph—a graph with an edge connecting every pair of vertices—whose vertices belong to N distinct sets. Each vertex is characterized by a pair of numbers , where i is the set of basis vectors the vertex belongs to and α is the specific basis vector from that set. Each edge has a weight, , given by the overlap between the modes under consideration,
where is the column of . For the purpose needed here, we define the weight of a path traversing many edges as the product of the weights of its edges, i.e., . Assuming the normalization conditions , the value of the covariance of the coefficients modulating two basis vectors is the sum of the weights of all paths connecting the corresponding vertices of the graph.
Download figure:
Standard image High-resolution imageFrom the above, it is clear that the parameter covariances and, hence, errors in the fit are influenced both by the modes included in the basis matrices, , and by the expansion matrices, . The former are controlled by the training sets and the latter are determined by the experimental design. Therefore, errors can be lowered either by modifying the training set to refine the modes used to fit both signal and foreground spectra or by designing an experiment that intrinsically separates the signal and foreground spectra (e.g., through polarization modulation, as employed here).
3.4. Results
3.4.1. Statistical Measures
To evaluate the behavior of the extraction algorithm, we calculate a few statistical measures for an ensemble of input data sets. In order to assess how accurately the signal is extracted from the data, we first define the signal bias statistic, ε, as
where nν is the number of frequencies at which the signal is measured, is the "true" (input) signal, and and are given by Equations 6(a) and (b). In an averaged sense, a signal bias statistic of ε corresponds to a signal bias at the level. For this reason, if the errors are to be meaningful, the signal bias statistic should be of the order of unity.
The signal bias statistic is a measure of the accuracy and precision of the signal estimate but it does not gauge the overall quality of the fit to the data. In order to do so, we define the normalized deviance, D, through
where Np is the number of parameters in the fit, Nc is the number of data channels, and is the bias in the fit to the data. D is a measure of how well the full set of basis functions (signal and foreground) fits the data. Its true distribution is difficult to ascertain because the number of parameters chosen varies with the inputs; but, since the likelihood is Gaussian and we expect to be able to adequately fit the non-noise component of the data, if Np parameters are chosen for a given extraction, then the corresponding value of D should follow a distribution with degrees of freedom. Since, unlike ε, D can be calculated even for data whose split between signal and foreground is unknown, it is useful to check its value against confidence intervals generated from the expected distribution to determine if any anomalies were encountered in the extraction process or if the training sets used were inadequate.
3.4.2. Comparing Different Inputs and Information Criteria
We calculated ε and D for 5000 input data sets, each with their own signal, foregrounds, and noise realization, for two different scenarios. In both cases, the input foregrounds are taken from the training set. In the first case (solid lines in Figures 5 and 6), the input signals were taken from the ares training set used to define the modes (shown in Figure 2). In the second case (dashed lines in Figures 5 and 6), the input signals come from a separate set that parameterizes the ionization history and HI spin temperature as hyperbolic tanh functions of redshift (see Harker et al. 2016, and the red curves in Figure 2).
Download figure:
Standard image High-resolution imageThe cumulative distributions of ε for the two cases described above are shown in Figure 5. The different colors represent the different IC from Section 2.3, which are minimized in order to choose the number of SVD modes of each type to retain. As expected, when the input signals are not taken from the training set that was used to generate the SVD modes (dashed lines), on average, the fits tend to degrade (compare the dashed with the corresponding solid lines). Whether the input signal is in the training set or not, for a given value of the statistic, we find that the DIC recovers the global 21 cm signal with more often than any other IC. Thus, we adopt the DIC as our model selection criterion.
Figure 6 shows that whether the input signals are taken from the training set or are generated with the tanh model, D follows the expected distribution. Therefore, we find that, combined, the selected SVD signal and foreground modes are adequate to fit the input data down to the noise level.
Download figure:
Standard image High-resolution image3.4.3. Signal Estimates
The channel mean and covariance of the estimated global 21 cm signal (in frequency channel space), and , are given by Equations 6(a) and (b), respectively. Figure 7 shows a representative selection of extractions of simulated signals by our analysis. In each panel of the figure, the widths of the bands at frequency ν are given by 1.7 and 3.2 times the square root of the corresponding diagonal element of , making them 1.7σ and 3.2σ confidence intervals. Figure 5 shows that, when the input signal is generated by the tanh model as opposed to being taken from the training set, signals are fit with biases of less than 1.7σ (3.2σ) about 68% (95%) of the time. Therefore, the light bands in Figure 7 show 95% confidence intervals in the sense that there is a 95% probability that the squared ratio of the bias (difference between red and black lines) to the error (light red band) averaged across the spectrum is less than 1. Note that this does not mean that there is a 95% probability that the value of the 21 cm global signal at a given frequency is within the error band. The root-mean-square widths of the 95% bands of 95% of the ares (tanh) input signals are less than 21 (28) mK.
Download figure:
Standard image High-resolution image4. Conclusions
In this paper, we have proposed a method of analysis that transforms the task of extracting a signal from data into one of composing training sets that accurately model the relevant modes of variation in all components of the data. Applying this method to a simple, simulated 21 cm global signal experiment using dual-polarized antennas, we extracted a wide variety of input signals accurately with a 95% confidence rms error of mK. We also showed that, for our purpose of extracting components of data using well separated training sets, minimizing the DIC yields less biased results than minimizing the BIC or BPIC.
For 21 cm experiments analyzed with our method, there are three strategies for decreasing the errors: changing the expansion matrices through modifications of the experimental design (e.g., implementing the polarization technique of N17), changing the basis vectors by using more separable training sets (if possible), and lowering the noise level in the data by adding integration time. The first two of these strategies decrease parameter covariances, while the last changes the normalization of the modes, leading to lower errors with the same parameter covariances (see Section 3.3).
As presented in this paper, our method assumes that the posterior distribution on the coefficients of the basis functions is Gaussian. This imposes two requirements: (1) the likelihood function and, hence, all noise must be Gaussian and (2) the different components of the data must be combined by addition. If either of these conditions is violated (e.g., if an expansion matrix, , is parameter dependent), then the posterior parameter distribution is not Gaussian and therefore must be explored with more sophisticated methods such as MCMC sampling. Such an exploration, however, could still be relatively fast because, although they may be combined in nonlinear ways, the models of the individual components are still linear. Under this variation of the method, model selection as in Section 2.3 can still be performed but different expressions for the information criteria must be derived. This more general extraction will be implemented with an MCMC sampler in future versions of pylinex. In addition to generalizing our methods, in future analyses, we will include multiple Galaxy maps in the foreground training set and we will use calibration errors generated by a realistic instrument to generate an instrument training set.
As will be described in Paper II, we have also created a pipeline built around the methods described here to perform inference on physical parameters of the 21 cm signal. After achieving a signal fit like those shown in Figure 7, we use it to initialize the iterates of an MCMC sampler tightly packed around or near the likelihood peak. In this MCMC, the 21 cm signal model using SVD basis functions is replaced with one that uses physical parameters. Since there are essentially no constraints on the majority of signal parameters and most realizations of the 21 cm global signal require to compute, such a first guess is important to avoid local minima and is very advantageous toward the convergence of an MCMC chain.
Therefore, in this paper, we have presented methods to bypass the initial phase of potential traps and the time wasted when an MCMC algorithm wanders through parameter space looking for features in the likelihood by skipping to the stage where the MCMC explores systematically around the likelihood peak. In a simplified yet general setting, we have shown that, with well-characterized training sets and a well-designed experiment, a wide variety of realistic 21 cm global signals can be extracted from simulated sky-averaged observations in the VHF range.
We thank Raul Monsalve, Jordan Mirocha, Richard Bradley, Bang Nhan, Licia Verde, and Nicholas Kern for useful discussions. This work was directly supported by the NASA Solar System Exploration Research Virtual Institute cooperative agreement number 80ARC017M0006. This work was also supported by grants from NASA's Ames Research Center (NNA09DB30A, NNX15AD20A, NNX16AF59G) and by a NASA ATP grant (NNX15AK80G). D.R. is supported by a NASA Postdoctoral Program Senior Fellowship at NASA's Ames Research Center, administered by the Universities Space Research Association under contract with NASA. Resources supporting this work were provided by the NASA High-End Computing (HEC) Program through the NASA Advanced Supercomputing (NAS) Division at Ames Research Center through the award SMD-16-7501.
Appendix A: Software Release
pylinex14 is a Python package implementing the methods presented in Section 2. It is split up into five modules, the most important of which are the expander, basis, and fitter.
The expander module contains classes that efficiently implement the expansion matrices, . This is particularly useful when there is a large amount of data involved in the calculations as storing all elements of could be unnecessarily memory-intensive, especially when is sparse.
The basis module stores classes representing many different types of basis functions, including simple polynomials, Legendre polynomials, Fourier series, and, most importantly for the current work, SVD basis sets.
The fitter module contains the main products of the software: objects that perform fits using the basis sets and expansion matrices from the other modules. The most useful and comprehensive class is the Extractor class, which takes as inputs training sets, data with errors, and expansion matrices and yields posterior channel means and errors for each data component as outputs.
Pylinex is compatible with Python 2.7+ as well as Python 3.5+ and is implemented efficiently through the use of the numpy, scipy, and matplotlib packages.15 Future versions of pylinex will implement extraction when the model is nonlinear and/or the likelihood function is non-Gaussian as described in Section 4.
Appendix B: Weighted SVD Implementation
This section describes our implementation of weighted SVD using only basic SVD algorithms. By weighted SVD, we refer to the decomposition of a real matrix, , of the form , where is diagonal and of the same shape as , , , and and are symmetric positive definite matrices. If is a matrix with training set curves as columns, then represents the noise covariance used in fits and is a matrix that weights the different training set curves. To begin, with standard SVD, we decompose the matrix into , where is diagonal, , and . Solving for , we find the weighted SVD of ,
Therefore, the matrices in the weighted decomposition are , , and . In most cases, and are diagonal and the matrix square root is equivalent to the element-wise square root.
Empirically, we find that the weighted SVD algorithm described above scales as , where Nt is the number of training set curves used and Nc is the number of data channels in each training set curve.
Appendix C: Priors from Training Sets
Under certain circumstances, it could be beneficial to include prior information on one or more of the data components derived from their training sets. This technique relies on a few key assumptions that limit its applicability. First, the training set must encompass the true multivariate distribution of the observed curves. Second, the prior must be Gaussian for the posterior to be Gaussian, which it must be for simple, analytical expressions to be written for the signal estimate and error. If either of these assumptions is invalid, priors will bias the results.
To generate priors from the training sets, each curve in the training set can be fitted with the SVD modes defined by that training set. Then, the sample mean, μ, and covariance, Λ, of the points defined by the coefficients of each fit can be found and used to define a multivariate Gaussian prior distribution. Defined in this way, μ and Λ are given by
where is a column vector composed of Nb ones. If η basis vectors are used (as in Section 2.1), the matrix is the top left block of and the matrix is the first η columns of (where and are defined as in Equation (13)). Including this prior in the fit, the posterior parameter covariance and parameter mean given by Equations 5(a) and (b) are modified to
Footnotes
- 5
For example, if the signal is expected to be times as strong in the second spectrum as in the first (e.g., if a fraction κ of the source of the signal is blocked in the second spectrum), then the signal expansion matrix is modified to
- 6
For instance, setting equal to the Discrete Fourier Transform (DFT) matrix allows one to model the DFT of .
- 7
If is complex, transpose operations in this paper should be replaced by Hermitian conjugates.
- 8
See Appendix C for both the assumptions required to include informative priors derived from the training sets, and the equations necessary to do so.
- 9
Note that even though such a prior is referred to as improper, i.e., it cannot be normalized, the posterior distribution is well-behaved.
- 10
The only exception to this statement occurs when two or more of the basis vectors are degenerate with one another.
- 11
We generally use .
- 12
- 13
As is true here, the maximum number of parameters considered for a given data component should be greater than or equal to the number of modes necessary to fit each curve in the corresponding training set down to the noise level of the data component, which is determined by the covariance matrix . Recall that is the covariance of the noise, , in the data (see Equation (1)).
- 14
- 15