This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.

A MARKOV CHAIN MONTE CARLO ALGORITHM FOR ANALYSIS OF LOW SIGNAL-TO-NOISE COSMIC MICROWAVE BACKGROUND DATA

, , , , , and

Published 2009 May 1 © 2009. The American Astronomical Society. All rights reserved.
, , Citation J. B. Jewell et al 2009 ApJ 697 258 DOI 10.1088/0004-637X/697/1/258

0004-637X/697/1/258

ABSTRACT

We present a new Markov Chain Monte Carlo (MCMC) algorithm for cosmic microwave background (CMB) analysis in the low signal-to-noise regime. This method builds on and complements the previously described CMB Gibbs sampler, and effectively solves the low signal-to-noise inefficiency problem of the direct Gibbs sampler. The new algorithm is a simple Metropolis–Hastings sampler with a general proposal rule for the power spectrum, C, followed by a particular deterministic rescaling operation of the sky signal, s. The acceptance probability for this joint move depends on the sky map only through the difference of χ2 between the original and proposed sky sample, which is close to unity in the low signal-to-noise regime. The algorithm is completed by alternating this move with a standard Gibbs move. Together, these two proposals constitute a computationally efficient algorithm for mapping out the full joint CMB posterior, both in the high and low signal-to-noise regimes.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

Since the detection of anisotropy in the cosmic microwave background (CMB; Smoot et al. 1992), there has been an emphasis on likelihood or Bayesian methods for the inference of cosmological parameters and their error bars, or more generally, their confidence intervals. CMB analysis is most suitably addressed in a Bayesian, as opposed to frequentist, framework, simply because the observed microwave sky is interpreted as a single realization of a spatial random process.

Early measurements of the CMB were limited to signal-to-noise ratios (S/Ns) of order unity at relatively low angular scales, where direct evaluation of the likelihood for the power spectrum or cosmological parameters is possible. However, the ${\cal O}(N^{3})$ scaling of computational expense with pixel number N prohibits direct likelihood evaluation for current and future CMB observations. Motivated by the scientific potential of CMB data with increasingly high spatial resolution, yet beset with systematics including partial sky coverage and foregrounds, an iterative method of sampling from the Bayes posterior, using a special case of Markov Chain Monte Carlo (MCMC) known as Gibbs sampling, was introduced by Jewell et al. (2002, 2004). The method was later independently discovered and applied to COBE data (Wandelt et al. 2004), numerically extended to high resolution on the sphere (Eriksen et al. 2004), applied to an analysis of the Wilkinson Microwave Anisotropy Probe (WMAP; Bennett et al. 2003; Hinshaw et al. 2007; Page et al. 2007) data (O'Dwyer et al. 2004; Eriksen et al. 2007a, 2007b), as well as generalized to include inference of foreground model parameters (Eriksen et al. 2008a, 2008b).

While Gibbs sampling provably converges to the Bayes posterior over the entire range of angular scales probed by the data, the run time required to generate enough independent samples at the low signal to noise, small angular scale regime was found to be prohibitive (Eriksen et al. 2004). The reason for this is that typical variations in the power spectrum from one sample to the next are determined by cosmic variance alone, whereas the posterior itself is given by both cosmic variance and noise. This results in a long correlation length in the sequence of spectra in the low signal-to-noise regime, thus requiring a very long run time to generate a sufficient number of independent samples.

In this paper, we generalize the original Gibbs sampling algorithm to include a new type of MCMC step alternating with standard Gibbs sampling, which solves this problem of slow probabilistic convergence in the low signal-to-noise regime. This method therefore makes possible an exact Bayesian approach to CMB analysis over the entire range of angular scales probed by current and future experiments.

The paper is organized as follows. We first review the CMB Gibbs sampler, and describe the associated numerical difficulties in analysis at small angular scales. We then introduce the new MCMC step to the Markov chain, designed specifically to allow large variations in the high-ℓ CMB spectrum, precisely where the S/N ⩽1. We derive the required Metropolis–Hastings acceptance probability correctness in Appendix A, and numerically demonstrate the method in Section 4, for both temperature and polarization. Finally, we summarize and conclude in Section 5.

2. REVIEW OF GIBBS SAMPLING

2.1. The Joint Posterior

We begin by assuming that the observed data may be modeled by a signal and a noise term,

Equation (1)

where d is a vector containing the data (at every pointing of the detectors), the matrix A involves both pointing and beam convolution (and where for this paper we will assume symmetric beams and neglect the details of this operation), and n is additive noise (here in the pixel domain). We assume both the CMB signal and noise to be Gaussian random fields with vanishing mean and covariance matrices S and N, respectively. In harmonic space, where s = ∑ℓ,mamYm, the CMB temperature covariance matrix is given by $\hbox{C}_{\ell m, \ell ^{\prime } m^{\prime }} = \langle a_{\ell m}^* a_{\ell ^{\prime } m^{\prime }}\rangle = C_{\ell } \delta _{\ell \ell ^{\prime }} \delta _{m m^{\prime }}$, C being the angular power spectrum. A generalization to polarization merely requires the replacement of the signal matrix diagonal elements with 3 × 3 matrices of the form

Equation (2)

For the discussion in this section, we focus on the temperature case, but note that the generalization to polarization is straightforward and discussed by Larson et al. (2007).

Given these assumptions, our goal is to quantify what has been learned about the underlying power spectrum of the CMB given the data, or how well the data constrain the cosmological parameters. One proceeds then, in a Bayesian framework, by writing down the posterior given the data,

Equation (3)

Here, $\mathcal {L}({\bf d}| C_{\ell })$ is the likelihood and P(C) is a prior on C.

In order to derive the functional form of the likelihood, one imagines randomly choosing any relevant model (here a power spectrum drawn from P(C)), and asks what sequence of effects needs to be modeled in order to simulate the data. Here, simulation is understood as conditioning on the chosen model, and leads to a joint density

Equation (4)

where the last line follows directly from our data model through the assumption of additive noise. Specifically, the factors in the above are

Equation (5)

which follow from the assumption that both the signal and noise are independent Gaussian processes.

The idea of a "simulation chain" provides a conceptually clear approach to constructing a joint density, from which we immediately have the Bayesian posterior

Equation (6)

The relevance of the above for this paper lies in relating what we refer to as the joint posterior, P(C, s|d), and the more familiar likelihood $\mathcal {L}({\bf d}| C_{\ell }) \propto P(C_{\ell } | {\bf d}) / P(C_{\ell })$.

Although we can analytically compute the integral of the joint posterior over the signal for the Gaussian signal and noise processes considered here, and therefore simply write down the functional form of the likelihood, it is too expensive to evaluate it for any specified Cl given high-resolution data. Furthermore, for more complicated data models (i.e., including foreground model uncertainties) we will not be able to perform the integrals over the additional degrees of freedom. Both situations then instead motivate sampling from the joint posterior, and thereby generating samples from P(C|d) without ever evaluating P(C|d). We now discuss the original Gibbs sampling approach proposed and implemented by Jewell et al. (2004), Wandelt et al. (2004), and Eriksen et al. (2004), and then introduce a new MCMC step which directly addresses the previously reported slow probabilistic convergence in the low signal-to-noise regime (Eriksen et al. 2004).

2.2. The CMB Gibbs Sampler

As stated above, our goal is to sample from the joint posterior

Equation (7)

For notational convenience, we have here dropped constant factors of 2π, and also defined

Equation (8)

One approach to sample from this posterior is to use an algorithm known as Gibbs sampling, where we can alternately sample from the respective conditional densities,

Equation (9)

Here, ← indicates sampling from the distribution on the right-hand side. After some burn-in period, during which all samples must be discarded, the joint samples (si, Ci) will be drawn from the desired density. Thus, the problem is reduced to that of sampling from the two conditional densities P(s|C, d) and P(C|s, d).

We now describe the sampling algorithms for each of these two conditional distributions, starting with P(C|s, d). First, note that P(C|s, d) = P(C|s) which follows directly from the construction of the joint density of "everything" above. This is also intuitively easy to understand since if we already know the CMB sky signal, the data themselves tell us nothing new about the CMB power spectrum. Next, since the sky is assumed to be Gaussian and isotropic, the distribution reads

Equation (11)

which, when interpreted as a function of C, is known as the inverse Gamma distribution. In this expression, $\sigma _{\ell } = \frac{1}{2\ell +1} \sum _{m} |a_{\ell m}|^2$ denotes the observed power spectrum of s. Fortunately, there exists a simple textbook sampling algorithm for this distribution (e.g., Gupta & Nagar 2000), and we refer the interested reader to the previous papers for details. For an alternative, and more flexible, sampling algorithm, see Eriksen & Wehus (2009).

In order to describe the sky signal sampling step, we first define the mean-field map (or Wiener filtered data) to be $\hat{{\bf s}} = ({\bf S}^{-1} + {\bf N}^{-1})^{-1} {\bf N}^{-1} {\bf d}$, and note that the conditional sky signal density given the data and Cl can be written as

Equation (12)

Thus, P(s|C, d) is a Gaussian distribution with mean equals $\hat{{\bf s}}$ and a covariance matrix equals (S−1 + N−1)−1.

Sampling from this Gaussian distribution is straightforward, but computationally somewhat cumbersome. First, draw two random white noise maps ω0 and ω1 with zero mean and unit variance. Then solve the equation

Equation (13)

for s. Since the white noise maps have zero mean, one immediately sees that $\langle {\bf s}\rangle = \hat{{\bf s}}$, while a few more calculations show that 〈sst〉 = (S−1 + N−1)−1.

The problematic part about this sampling step is the solution of the linear system in Equation (13). Since this a ∼106 × 106 system for current CMB data sets, it cannot be solved by brute force. Instead, one must use a method called Conjugate Gradients (CG), which only requires multiplication of the coefficient matrix on the left-hand side, not inversion. For details on these computations, together with some ideas on preconditioning, see Eriksen et al. (2004).

2.3. Convergence Issues in the Low Signal-to-Noise Regime

As originally applied to high-resolution CMB data, the Gibbs sampling algorithm as described above has very slow convergence at the high-ℓ, low signal-to-noise part of the spectrum. The reason for the slow convergence is easy to understand in light of the above: when sampling from P(C|s), the typical step size is given by cosmic variance at all angular scales. In the high signal-to-noise regime, cosmic variance dominates the noise variance, and we are able to explore the full width of the posterior in only a few Gibbs iterations. However, in the low signal-to-noise end, cosmic variance is far smaller than the posterior variance, and it takes a prohibitively long time to converge probabilistically. This problem of "slow mixing" of the Gibbs sampler is illustrated in Figures 1 and 2. The long correlation length starting at signal to noise of unity leads to extremely long run times in order to produce a reasonable number of uncorrelated samples.

Figure 1.

Figure 1. Comparison of C chains produced by standard Gibbs sampling (black) and by the Gibbs+MCMC hybrid (red) for three selected multipole bins. The simulation was based on full sky coverage and uniform noise. See the text for full details.

Standard image High-resolution image
Figure 2.

Figure 2. Comparison of chain correlation functions for standard Gibbs sampling (blue) and Gibbs+MCMC (red), computed from the full-sky uniform noise temperature data set. Note that while the correlation length goes to infinity with increasing ℓ (or equivalently, low signal to noise) for standard Gibbs sampling, it is ≲40 everywhere for the MCMC hybrid case.

Standard image High-resolution image

3. A LOW SIGNAL-TO-NOISE MCMC SAMPLER

When sampling from the true posterior, the goal is to produce as many independent samples from P(C, s|d) as possible. One might intuitively guess that it should be straightforward to establish good approximations to the posterior in the low signal-to-noise regime, since in the limit of vanishing signal to noise we simply recover the prior. This suggests that we look for a sampling scheme in which we first sample a new spectrum from some approximation to the true posterior independent on the current spectrum and CMB map, followed by sampling the CMB map from the conditional P(s|C, d). The problem with such a direct scheme is that the accept probability will involve a ratio of determinants which are too expensive to compute.

We are therefore motivated to look for a sampling scheme in which we can make a large variation in C in the low signal-to-noise regime, and make an associated deterministic change in the CMB map, while still maintaining a reasonably high acceptance rate. The motivation for a deterministic change is that it will avoid introducing ratios of determinants which we cannot compute.

3.1. Proposal Rule and Acceptance Probability

Assume that we have defined a deterministic sampling scheme for s, and that our new CMB map is given by some function

Equation (14)

Then the condition of detailed balance for our MCMC sampler requires that

Equation (15)

or, in other words, that the inverse function is given by exchanging the order of the spectra in the function F. One simple function which has this property is

Equation (16)

The total proposal matrix is then

and the "reverse" proposal is

The condition of detailed balance including deterministic moves requires the consideration of some technical points which we leave to Appendix A. There we show that the full Metropolis–Hastings accept probability reads

Equation (17)

The significance of the above is that we can make relatively large changes to the power spectrum in the low signal-to-noise regime, where N−1 is getting small, since the χ2 is affected only very mildly by changes in any low signal-to-noise mode.

We note the interesting point (discussed more completely in Appendix B) that if one changes variables in the joint posterior from CMB maps, s, to whitened maps, ${\bf x}= {\bf C}_{\ell }^{-\frac{1}{2}} {\bf s}$, and then Gibbs sample in the new variables (C, x), the resulting accept probability for changes in Cl conditioned on x is numerically identical to the above. While at first there might seem to be a conceptual distinction between MCMC schemes in a change of variables or those involving partial deterministic proposals, it turns out that this numerical equivalence holds more generally, and in Appendix B we establish a relation between both approaches. We bring this point to the attention of the reader as there could be other deterministic proposal schemes or another change of variables which lead to improvements over the approach presented in this paper.

For the numerical demonstration of the MCMC algorithm presented in this paper, we use a simple symmetric Gaussian proposal, truncated at C > 0 (or, for polarization, the region where the resulting CMB covariance matrix is positive definite), for the power spectrum,

Equation (18)

where τ is a measure of the typical step size taken between two samples. Note that because this proposal density is symmetric, the ratio of C proposals cancels, and the acceptance probability is entirely determined by the change in χ2.

It should be noted that while the above MCMC step satisfies detailed balance, it is not irreducible, in the sense that there is not a nonvanishing probability in reaching any state from any other state in a finite number of MCMC steps; the phases are unchanged in each MCMC step. However, alternating these steps with a traditional Gibbs sampling step gives a combined "two-step" MCMC algorithm which indeed is irreducible, and therefore provably converges to the joint posterior. Once again, the details are left to the Appendices for the interested reader.

3.2. Optimization of the MCMC Sampler

A general advantage of the Gibbs sampler is the fact that it is free of tunable efficiency parameters. The same is not true for the Metropolis–Hastings MCMC algorithm; for satisfactory sampling performance, it typically has to be tuned quite extensively. In this section, we describe three specific features that helps in this task, namely (1) step size tuning, (2) slice sampling, and (3) binning.

First, we have to ensure that the step size of our Gaussian proposal density roughly matches the width of the target distribution, in order to maintain both a reasonable acceptance rate and high mobility (for a general discussion of these issues and technical considerations of how to tune proposals, we refer the interested reader to Roberts & Rosenthal 2001). We do this by performing an initial test run, producing typically a few hundreds C samples, and compute the standard deviation of these samples for each ℓ. These are then adopted as the proposal widths for the main run, scaled by some number less than unity, typically between 0.05 and 0.5. For the initial test run, we approximate the posterior width by the noise variance alone,

Equation (19)

because the MCMC sampler is used only in the low signal-to-noise regime. In this expression, N is the power spectrum of the instrumental noise alone, and b is the product of the Legendre transform of the beam and the HEALPix window function.

Next, the Metropolis–Hastings MCMC algorithm is inefficient in spaces with too many free parameters. For this reason, we divide the power spectrum coefficients, C, into subsets, each containing typically only 10–20 multipoles. Then we propose changes to one subset at a time, while keeping all other multipoles fixed. Finally, we loop over subsets, and thus effectively implement a multipole slice Gibbs sampler for the full power spectrum.

This is computationally feasible, because a single MCMC proposal only requires a single χ2 evaluation, which has a computational cost of a single spherical harmonic transform. Since drawing a full sky map from P(s|C, d) in the classical Gibbs sampling step requires $\mathcal {O}(10^{2})$ spherical harmonic transforms, we can indeed afford to perform many MCMC proposals for each Gibbs step, without dominating the total cost.

Nevertheless, for very high resolution analysis, it is often beneficial to bin several C together, both in order to increase the signal to noise of the joint coefficient, and to decrease the number of parameters that needs to be sampled by MCMC. We implement this by defining a new binned spectrum, weighted by ℓ(ℓ + 1)/2π, as follows:

Equation (20)

Here, b = [ℓmin, ℓmax] denotes the current bin, and Nb = ℓmax − ℓmin + 1 is the number of multipoles within the bin. These new (and fewer) coefficients are then sampled with the above MCMC sampler, after which the original spectrum coefficients are given by

Equation (21)

4. TESTING AND VALIDATION

We have implemented the new sampling step described above in the previously Gibbs sampling code called "Commander" (Eriksen et al. 2004, 2008b), and in this section we demonstrate its advantages compared to the old sampling algorithm. We consider two different cases, namely high-ℓ temperature and low-ℓ polarization analysis. In the former case, we also analyze two cases, with and without a sky cut. The former allows us to verify the results against an analytically known answer, while the latter demonstrates that the sky cut does not degrade the sampling efficiency.

4.1. Temperature Analysis

The high-ℓ temperature simulation is designed to mimic the five-year WMAP temperature data (Hinshaw et al. 2009) with one exception, namely that the noise is assumed spatially uniform, in order to facilitate analytic comparison. Specifically, the CMB realization was drawn from the best-fit ΛCDM model derived from WMAP alone (Komatsu et al. 2009), including multipoles up to ℓmax  = 1000, and then smoothed with the instrumental beam of the WMAP V1 differencing assembly, and pixelized at HEALPix8 resolution Nside = 512. Finally, uniform noise of σ0 = 40 μK RMS was added to each pixel. This corresponds to an S/N of unity at ℓ ∼ 550, roughly similar to the five-year WMAP data. We analyze this simulation both with and without the WMAP KQ85 sky cut (Gold et al. 2009).

In both analyses, we adopted the Gaussian proposal density with tuned variances, as described above. We also bin the power spectrum in progressively wide bins, starting at ℓ = 600, to maintain a reasonable signal to noise per sampled power spectrum parameter. Ten bins were sampled jointly per proposal, while all others were kept fixed.

In the full-sky case, we produced a total of 31,800 samples over 60 chains, and in the cut-sky case a total of 6800 samples. The cost for producing one sample in the latter, and by far most expensive, set was 2.5 CPU hours, for a total of 17,000 CPU hours. The number of MCMC steps per Gibbs step was one in the former and 20 in the latter. (Since the signal sampler dominates the cut-sky Gibbs chain, one can perform more low S/N steps without slowing down the overall code significantly.) In addition to these two main sample sets, we also produced two longer chains with each 3500 samples for the full-sky case, both with and without the new MCMC step turned on, in order to compare the Markov chain correlation lengths before and after including the MCMC sampler.

We first consider the full-sky data set, and in Figure 1 we show a segment of each of the two longer chains for three selected multipole bins. The top panel shows ℓ = 600, which is the first bin to be sampled by MCMC, the middle panel shows ℓ = 732–742, where there is still some signal in the data, and, finally, the bottom panel shows ℓ = 855–1000, which is strongly noise dominated. Starting with the top panel, we see that the red curve (Gibbs+MCMC) scatters significantly faster than the black curve (Gibbs only), implying more efficient sampling. This trend becomes even stronger with lower signal to noise, until the last case, where the Gibbs-only chain essentially does not move at all, while the MCMC sampler does probe the full range. Note, however, that even the MCMC sampler has a significant correlation length in this range, and this implies that there is still some room for improvement to be made in defining our proposals.

Next, these considerations are quantified in Figure 2, where we plot the Markov chain correlation length as a function of distance in the chain, for six bins with and without the MCMC sampler. As first reported by Eriksen et al. (2004), we see that the Gibbs-only correlation length increases dramatically with decreasing signal to noise, rendering the algorithm essentially useless in this regime. However, we also see that the new MCMC step effectively resolves this issue, as the correlation length (here defined by having a correlation less than 0.2) now is less than ∼40 steps. This is a dramatic improvement, and makes the algorithm useful even in this range. Nevertheless, we once again point out that it is possible to make further improvements by establishing better proposal densities.

In Figure 3, we consider the convergence properties of the ∼30 k samples set, by computing the Gelman–Rubin statistic R (Gelman & Rubin 1992) as a function of ℓ. Typically, one recommends that R should be less than, say, 1.2 in order to claim convergence. We see that this holds everywhere for this sample set, and typically it is even less than 1.05. Note also the step at ℓ = 600, showing clearly the beneficial effect of the MCMC sampler.

Figure 3.

Figure 3. Gelman–Rubin statistic for the full-sky, uniform noise temperature analysis. Note the feature at ℓ = 600, which marks the transition between standard Gibbs sampling and Gibbs+MCMC.

Standard image High-resolution image

Next, in Figure 4, we compare the marginal distributions derived from this sample set with the analytic result,

Figure 4.

Figure 4. High-ℓ temperature marginal posteriors computed with Gibbs+MCMC from the full-sky, uniform noise temperature data set, compared to analytic results.

Standard image High-resolution image

Equation (22)

Here, b = [ℓmin, ℓmax ] indicates a given multipole bin, b denotes the product of the instrumental beam and the HEALPix pixel window, and σS+N is the power spectrum of the noisy data map. We see that the new algorithm reproduces the analytic distributions very well, and this verifies the overall method.

Finally, the cut-sky power spectrum with one-sigma confidence regions is shown in three panels in Figure 5, focusing on different ℓ-ranges, namely all ℓ, the S/N ∼ 1 transition region, and the low S/N region. This completes the high-ℓ temperature analysis validation.

Figure 5.

Figure 5. Temperature power spectrum estimated from cut-sky temperature data. The panels show the same spectrum, but emphasizing different multipole ranges (full-range; S/N ∼ 1 transition region; and high-ℓ, low S/N).

Standard image High-resolution image

4.2. Polarization Analysis

We now consider polarization analysis, and construct a new low-ℓ simulation for this purpose. This simulation does not mimic any planned experiment, but is rather designed to highlight the analysis method itself. Specifically, we drew a new CMB realization from the best-fit WMAP ΛCDM spectrum that includes a nonzero tensor contribution, including multipoles up to ℓmax = 150, and convolved this with a 3° FWHM Gaussian beam, and pixelized it at Nside = 64. Uniform noise of 5 μK RMS was added to the temperature component, and 1 μK RMS to the polarization components. The five-year WMAP polarization sky mask was imposed on the data.

We allowed for nonzero CTT, CTE, CEE, and CBB spectra, but fixed CTB = CEB = 0. These spectra were then individually binned to maintain a reasonable signal to noise per bin. (Details on how to introduce individual binning of each power spectrum were recently described by Eriksen & Wehus 2009.) Again, a tuned Gaussian proposal density was used in the MCMC step. A total of 12,000 samples were produced over 12 chains, and the CPU time per sample was 55 s, for a total of ∼200 CPU hours.

In Figure 6, we show one C chain for each of the four sampled spectra, for the last (and therefore most difficult) bin in each case. Note that the CEE and CBB spectra have essentially vanishing signal to noise, and therefore these chains reach zero values. Clearly, we see that mixing properties of these chains are satisfactory, and the correlation lengths are quite short.

Figure 6.

Figure 6. C chains generated by Gibbs+MCMC hybrid for the cut-sky polarization data set. Only the highest multipole bin for each spectrum is shown (ℓ = 108–150 for TT, ℓ = 88–150 for TE, ℓ = 101–150 for EE, and ℓ = 61–150 for BB).

Standard image High-resolution image

In Figure 7, we show the Gelman–Rubin statistics for each of the four power spectra, and with the single exception of the very last bin of CEE, all R values are well below 1.1. Thus, all spectra have converged well everywhere.

Figure 7.

Figure 7. Gelman–Rubin statistic for cut-sky polarization analysis.

Standard image High-resolution image

Finally, in Figure 8, we show the reconstructed marginal power spectra for each polarization component, overplotted on the input spectrum. The agreement is very good. Note, however, that these spectra are direct marginals, and not a joint maximum likelihood estimate. They are therefore not individual unbiased estimators. In particular, the marginal CEE power spectrum is biased slightly high because of the combination of the CTTCEE − (CTE)2 > 0 positivity constraint and relatively low signal to noise. Consideration of the joint polarization posterior, which is an unbiased estimator, is postponed to a future publication.

Figure 8.

Figure 8. Marginal C power spectra (red curves) estimated from cut-sky polarization data. Gray bands indicate 68% confidence regions, and the black lines show the input spectrum. (Note that the marginal spectra shown here are not individually unbiased estimators because of the correlations between TT, TE, and EE. Proper treatment of the full joint polarization density will be considered separately in a future publication.)

Standard image High-resolution image

5. CONCLUSIONS

We have presented a new MCMC algorithm for the high-ℓ, low signal-to-noise limit of the joint posterior which solves the slow probabilistic convergence of the traditional Gibbs sampler in this regime. This, in principle, allows sampling over the joint posterior p(Cl, s|d) over the entire range of angular scales probed by current and future CMB experiments. The limiting computational burden is now entirely in the map-making step of Gibbs sampling, for which the cost per Gibbs iteration now scales with the expense of multiplication by the inverse noise matrix N−1. Assuming pixel uncorrelated (but scan weighted) noise as a good approximation at small angular scales, the cost of an N−1 multiplication is that of a forward and inverse spherical harmonic transform, or ${\cal O}\big(\ell _{{\rm max}}^{3}\big)$. Future work will attempt to push the generalized Gibbs+MCMC sampling scheme presented here to smaller angular scales, ultimately limited by the degree to which we can compute harmonic transforms.

We acknowledge use of the HEALPix9 software (Górski et al. 2005) and analysis package for deriving the results in this paper. H.K.E. acknowledges financial support from the Research Council of Norway. The research described in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.

APPENDIX A: INCLUDING DETERMINISTIC PROPOSALS IN MCMC

Here we review the derivation of the accept probability in MCMC when using deterministic proposals (or proposals where some of the degrees of freedom are specified as deterministic functions of the past state and/or proposed variations in some other degrees of freedom). We first briefly review the Metropolis–Hastings MCMC algorithm and the proof of its convergence, and then turn to the special case involving deterministic proposals. Much of the review of the MCMC algorithm here follows (Sokal 1989). We also note that similar technical considerations including deterministic elements in proposals are presented in Green (1995) in the context of MCMC algorithms in which the dimension of the state space itself is included as a random variable to be sampled over.

The goal is the construction of a transition matrix T(Cl, s|C'l, s', d) such that after initializing the Markov Chain with a sample from any probability density p0(Cl, s|d), we generate samples from a sequence of probability densities

Equation (A1)

which eventually converge to an equilibrium density π(Cl, s|d)

Equation (A2)

We remind the reader of the sufficient conditions to establish convergence of an MCMC algorithm: stationarity, which means that the MCMC transition matrix satisfies

Equation (A3)

and irreducibility, which means that for any two states, there is a finite number of iterations which give a nonvanishing probability to transition from one state to the other. It is well known that these two properties are sufficient to establish convergence, as can be seen simply from the triangle inequality

The Metropolis–Hastings MCMC algorithm is one method of constructing such a transition matrix. We choose any proposal matrix w(Cl, s|C'l, s', d) and then accept the proposed move with a probability

Equation (A4)

while rejecting the proposed move with probability 1 − A leads to a "null transition" where the next state in the Markov Chain remains the same. Application of this algorithm then leads to the sequence of probability densities which satisfy

Equation (A5)

where the first term is the contribution to the probability density pn+1 if we reject any proposed move, while the second term is the contribution from accepting the proposed move from any possible previous state. If we demand that, for a chosen proposal matrix, the accept probability satisfies

Equation (A6)

then we see that the Metropolis–Hastings MCMC algorithm satisfies stationarity, i.e., denoting by T°π the density resulting from one application of the transition matrix to π, we have directly from detailed balance

Equation (A7)

We now turn to the case where our proposal is of the form

Equation (A8)

where we randomly propose a new power spectrum, possibly in a manner conditionally dependent on the current spectrum and the data, and then deterministically compute a new CMB map with some function

Equation (A9)

To satisfy detailed balance with a nonvanishing accept probability, our function must satisfy

Equation (A10)

or, that the inverse function is equivalent to interchanging the order of the power spectrum arguments

Equation (A11)

In this paper, we have chosen one such function, given by

Equation (A12)

where interchanging the spectra in the function above does in fact give the inverse function itself.

Our job now is to derive the accept probability such that we satisfy stationarity (as discussed above). For the proposal with deterministic changes to some of the degrees of freedom, stationarity is satisfied if

Equation (A13)

In order to determine the integral over the δ-function in the accept term above, we recall the identity for δ[G(x)], where G(a) = 0,

Equation (A13)

In our case, we can identify

Equation (A14)

which vanishes at F−1(s, Cl, C'l) = F(s, C'l, Cl). We also have the Jacobian

Equation (A15)

(i.e., G(s') is considered a function of s' with the other CMB map s considered fixed) which therefore gives

Equation (A16)

Inserting this into the condition for stationarity, we have

where in the second line we again used the property that the inverse F−1 is equivalent to F with the spectra arguments interchanged. We see from the above that a sufficient condition for stationarity is

Equation (A17)

An accept probability which satisfies this condition therefore gives cancellation of the integrals over the δ-functions for both the reject and accept contributions, leaving us exactly with T°π = π. We therefore have the accept probability

Equation (A18)

We give the expression above for the general case of any deterministic change in the CMB map with a function which satisfies F(s, Cl, C'l) = F−1(s, C'l, Cl). We now explicitly evaluate this accept probability for the functional form chosen for this paper.

Since we have F(s', Cl, C'l) = [C]1/2[C']−1/2s', we have

Equation (A19)

Reminding the reader of the functional form of the joint posterior in Equation (7), we have the accept probability given by

Equation (A20)

where the last line follows from the invariance of the quadratic form under the functional mapping s'[C']−1s' = sC−1s. Finally, we note that for the special case of a symmetric proposal matrix where w(C'l|Cl, d) = w(Cl|C'l, d), the accept probability is completely determined by the (exponentiated) change in χ2

Equation (A21)

As emphasized earlier in the main part of the text, the above allows large changes to the spectrum precisely where the signal to noise is getting small, as χ2 does not change much in this regime.

APPENDIX B: RELATION TO GIBBS SAMPLING IN A CHANGE OF VARIABLES

We note here another interesting approach to an MCMC algorithm in a different set of variables which in fact allows for large moves in the spectrum in the low signal-to-noise regime. We define the CMB map

Equation (B1)

We therefore have the joint posterior in the new variables according to

Equation (B2)

which is explicitly, up to a normalization constant

Equation (B3)

Then traditional Gibbs sampling in the new variables leads to an accept probability when changing the spectrum given the change of variable map x as

Equation (B4)

where in the above the proposed variation in the spectrum can now be conditionally dependent on the current change of variable map x. Assuming a symmetric proposal, or one conditionally independent of x leads to an accept probability which is numerically the same as (A20), and also has the same property—large moves in the spectrum are possible in the low signal-to-noise regime. As a side note, we can see that log p(Cl|x, d) is quadratic in C1/2l, and suggests a proposal given by a Gaussian in C1/2l. However, there are two problems with this scheme—sampling in C1/2l will result in reintroducing a Jacobian factor given by the ratio of |C'|1/2/|C|1/2 which results typically in low acceptance probabilities, and furthermore we cannot afford to exactly compute the local "Fisher" covariance matrix for each x. Because of these difficulties, we in general need to produce a proposal for Cl and then compute the accept probability above.

We now explore the relation between MCMC steps which involve deterministically changing some of the degrees of freedom in the original variables and conditional steps in a new set of variables. It turns out that a change of variables can always be used to generate MCMC steps in the original variables where some of the degrees of freedom are changed deterministically, and where the accept probabilities are numerically equivalent.

For notational convenience, we will assume the state space is separated into two sets of variables (x, y), i.e., for the CMB sampling context we have (s, Cl). Now, consider a "global" change of variables of the form

Equation (B5)

with Jacobian given in block form (i.e., see Gantmakher 1959)

Equation (B6)

A Gibbs sampling step varying v with u fixed, has accept probability

where in the last line we have expressed the numerical value of the accept probability in terms of the original variables (x, y), and where in the above we have the constraint

Equation (B7)

Now consider an MCMC step in the original variables of the form

Equation (B8)

with general accept probability, according to the discussion above

Equation (B9)

Interestingly enough this suggests that we can set H to be the function

Equation (B10)

Does this function have the correct properties for its inverse? Assuming we have computed in the forward direction x' = H(x, yn+1, yn), we can invert to find x by computing sequentially

Equation (B11)

where the last line follows from definition of the forward H. Since we have, by definition

we therefore have shown that

Equation (B12)

as required for a nonvanishing accept probability. To numerically evaluate the accept probability involving the deterministic move, we need to compute the Jacobian H(x, yn, yn+1) = F−1(F(x, yn+1), yn), which is

Substituting into the accept probability in the original variables with the deterministic move gives

Equation (63)

But under the change of variables v = y, and provided that the ratio of proposal densities is numerically the same

Equation (B13)

the accept probability using a deterministic change in x is numerically equivalent to the accept probability for the move in the change of variables holding u fixed! We therefore see a numerical equivalence in the accept probability between a conditional step in a change of variables and an associated move in the original variables, where x is deterministically changed to hold F(x, y) constant. For the case studied in this paper, we have explicitly

Equation (B14)

which is exactly the functional form used for the deterministic MCMC steps.

Footnotes

Please wait… references are loading.
10.1088/0004-637X/697/1/258