CMB-S4: Forecasting Constraints on Primordial Gravitational Waves

CMB-S4---the next-generation ground-based cosmic microwave background (CMB) experiment---is set to significantly advance the sensitivity of CMB measurements and enhance our understanding of the origin and evolution of the Universe, from the highest energies at the dawn of time through the growth of structure to the present day. Among the science cases pursued with CMB-S4, the quest for detecting primordial gravitational waves is a central driver of the experimental design. This work details the development of a forecasting framework that includes a power-spectrum-based semi-analytic projection tool, targeted explicitly towards optimizing constraints on the tensor-to-scalar ratio, $r$, in the presence of Galactic foregrounds and gravitational lensing of the CMB. This framework is unique in its direct use of information from the achieved performance of current Stage 2--3 CMB experiments to robustly forecast the science reach of upcoming CMB-polarization endeavors. The methodology allows for rapid iteration over experimental configurations and offers a flexible way to optimize the design of future experiments given a desired scientific goal. To form a closed-loop process, we couple this semi-analytic tool with map-based validation studies, which allow for the injection of additional complexity and verification of our forecasts with several independent analysis methods. We document multiple rounds of forecasts for CMB-S4 using this process and the resulting establishment of the current reference design of the primordial gravitational-wave component of the Stage-4 experiment, optimized to achieve our science goals of detecting primordial gravitational waves for $r>0.003$ at greater than $5\sigma$, or, in the absence of a detection, of reaching an upper limit of $r<0.001$ at $95\%$ CL.


INTRODUCTION
Determining the origin of structure in the Universe is one of the most important open problems in cosmology.CMB anisotropies sourced by early-Universe density perturbations are currently the most powerful observational probe of the earliest mechanisms of structure formation.It is possible that the same processes that produced the density perturbations also sourced tensor perturbations, or primordial gravitational waves (PGWs).If this is the case, detecting a PGW signal would yield insight into physics far earlier than the epoch of recombination, and allow us to build an unprecedented understanding of the earliest moments of time.
PGWs leave imprints on the polarization of the CMB (Seljak & Zaldarriaga 1997;Kamionkowski et al. 1997;Hu & White 1997).In particular, the sensitivity of CMB measurements to gravitational waves arises from the generation of polarization at the surface of last scattering: to first order, scalar perturbations produce only even-parity E -mode polarization, while tensor perturbations produce odd-parity B -mode polarization as well.Thus, a measurement of primordial B -mode polarization in the CMB, parametrized by the tensor-to-scalar ratio r, is a direct measurement of the amplitude of tensor perturbations.Detecting r has profound implications for high-energy physics and the quantum nature of gravity (Krauss & Wilczek 2014), and the potential to shed light on the mechanism that produced these primordial perturbations.
Cosmic inflation is our current leading paradigm for what occurred in the very early Universe.It was first put forward to explain the lack of observed magnetic monopoles and to solve the flatness and horizon problems (Starobinsky 1980;Kazanas 1980;Sato 1981;Guth 1981;Linde 1982Linde , 1983;;Albrecht & Steinhardt 1982) and has since been an active field of research.The theory describes a period of exponential expansion in which quantum fluctuations are magnified to cosmic size and become the seeds for all structure in the Universe (Mukhanov & Chibisov 1981, 1982;Hawking 1982;Guth & Pi 1982;Starobinsky 1982; Bardeen et al. 1986;Mukhanov 1985).In addition to the production of PGWs (for a recent review, see Kamionkowski & Kovetz 2016), inflation makes several predictions, most of which-superhorizon fluctuations, Gaussian perturbations, adiabatic fluctuations, spatial flatness, and a nearly scale invariant scalar spectral tilt-have been confirmed, most recently by the Planck collaboration (Planck Collaboration X 2018).
There are currently a number of ground-based experiments measuring the CMB polarization to high precision on a range of scales, and attempting to constrain the tensor-to-scalar ratio, including ACT (Aiola et al. 2020), BICEP/Keck (BICEP2/Keck Array Collabora-tions X 2018), CLASS (Harrington et al. 2016), POLAR-BEAR/Simons Array (Suzuki et al. 2016;Hasegawa et al. 2018), and SPT (Bender et al. 2018;Sayre et al. 2020), with Simons Observatory to follow soon (Ade et al. 2019).Additionally, there are current and future balloon and satellite missions such as SPIDER (Gualtieri et al. 2018) and LiteBIRD (Hazumi et al. 2019), which we expect to complement ground-based measurements.The current best constraints are r(k s = 0.05 Mpc −1 , n t = 0) < 0.06 at 95% confidence (BICEP2/Keck Array Collaborations X 2018), where k s is the scalar pivot scale and n t is the spectral index of the tensor modes.Ongoing efforts in both the Atacama desert and at the South Pole, between now and the start of CMB-S4, will steadily improve these constraints while continuing to prove the methodologies on which CMB-S4 will rely.
CMB-S4, anticipated to start observations in 2027, is intended to be the definitive ground-based CMB polarization experiment.It is designed to cross critical thresholds in constraining the B -mode polarization signature of primordial gravitational waves and in sensitivity to new light relics, while also improving our understanding of the nature of dark energy and General Relativity on large scales (Abazajian et al. 2016).To achieve these goals requires a significant increase in sensitivity, from 2-4 × 10 4 detectors in Stage 3 experiments to roughly 5 × 10 5 detectors, and an unparalleled control over other sources of signal (e.g., Galactic foregrounds, gravitational lensing, etc.) and of systematics.Therefore, CMB-S4 will require telescopes at multiple frequencies, each with a maximally outfitted focal plane of pixels utilizing superconducting, photon-noise-limited detectors, and likely novel analysis techniques.To understand the optimal design for achieving the desired science goals, in particular focusing on primordial gravitational waves, we present the development of the CMB-S4 r forecasting framework and its application towards determining the CMB-S4 baseline r survey.
The sensitivity achieved by a CMB experiment, which observes for a given number of years with a given number of detectors, is subject to a number of efficiency factors.These include the fraction of detectors that are actually functional, the achieved sensitivity per detector versus model prediction, the fraction of days per year spent observing, the fraction of observing time spent "on field", and the fraction of data passing weather and other cuts.As an illustrative example, if each of the above efficiency factors were 0.65 then the product is 0.65 5 ∼ 0.1.Forecasts for the sensitivity to r, where excess low frequency noise or systematic contamination can lead to additional (potentially large) sensitivity loss, are particularly challenging.Most previous forecasts of the sensitivity of CMB experiments to r have been ab initio-

DC Map Synthesis
Section 3 Figure 1.Schematic representation of the CMB-S4 r forecasting loop.Green boxes represent inputs, purple boxes represent outputs, yellow boxes represent large code frameworks, and gray boxes represent iterable code modules.For each stage of the loop we identify the sections of this paper in which more detail is available.We start with the achieved performance of Stage 3 datasets, in the form of full covariance matrices and noise spectra, and a set of scalable instrument specifications, as well as a fiducial sky model.These are fed as inputs to the semi-analytic optimization framework, yielding an optimized detector allocation and a baseline survey definition.Based on this definition, we develop standard data challenge (DC) noise maps, as well as a suite of signal maps with various degrees of complexity.We proceed by analyzing these maps with multiple independent component-separation analysis methods, and check for parameter recovery and the presence of biases.If the results suggest a necessary change in survey definition, or additional studies are required, the process is iterated as needed.
requiring assumptions as to the value of all efficiency factors.Possibly due to the human tendency towards optimism, many of these forecasts have not in practice been achieved once the data have been taken and analyzed: ABS (Essinger-Hileman 2011; Kusaka et al. 2018); BI-CEP/Keck (Hivon & Kamionkowski 2002;Yoon et al. 2006;Nguyen et al. 2008;BICEP2/Keck Array Collaborations VI 2016); EBEX (Reichborn-Kjennerud et al. 2010;Abitbol et al. 2018); Planck (Planck Collaboration 2006; Planck Collaboration X 2018); POLARBEAR (Lee et al. 2008;Kermish et al. 2012;Polarbear Collaboration et al. 2019); QUIET (Lawrence et al. 2004;Araujo et al. 2012); SPIDER (Montroy et al. 2006;Fraisse et al. 2013;Gualtieri et al. 2018);SPTpol (McMahon et al. 2009;Austermann et al. 2012;Sayre et al. 2020).Therefore for CMB-S4 we take an alternate approach, scaling from the overall performance achieved in the best available existing experimental results.We scale directly from published B -mode noise spectra and bandpower covariance matrices derived from multi-year maps that have passed systematics null tests.This bypasses the need for an unbiased, individual accounting of the various efficiency factors, and naturally incorporates all effects which impact real-world experiments.This also differentiates our current approach from forecasts other groups have made for CMB-S4 prior to the formal existence of our collaboration (Wu et al. 2014;Errard et al. 2016;Barron et al. 2018).

The CMB-S4 r Forecasting Loop
The CMB-S4 baseline r survey, described below, has been continuously evolving by translating science requirements to measurement and instrument requirements, based on our understanding of the impact of astrophysical foregrounds, instrumental systematics, delensing non-idealities, and analysis methodology.To maintain forecasting realism as complexity increases, our general approach has been an iterative one.We rely on a closed forecasting loop, presented in Figure 1, to tie the semi-analytic tools (which allow for fast optimizations) to map-based studies (which can include multiple layers of additional complexity).To establish our measurement requirements and the baseline experiment configurations that can achieve them, we perform multiple passes through this loop.In the figure, the section number accompanying each stage of the loop indicates the paper section in which that stage is described in detail.
The main steps describing this process are as follows.
1. Develop a semi-analytic power-spectrum-level forecast, assuming noise performance that is scaled from analyses of real experiments.
2. Use this forecasting tool to optimize the allocation of detector effort across observing frequencies, de-termining certain "checkpoints" in survey definition space.
3. Use the checkpoint configurations to create standardized, version-numbered map-based data challenges (DCs) for validation.
4. Estimate science parameters from the DC maps with independent component-separation analysis methods.
5. Check that independent analyses show recovery of science parameters from these challenge maps that match analytic forecasts, either in terms of variance or bias.If they do not, we revise the forecasts accordingly.
6. Iterate steps  Developing the forecasting machinery for CMB-S4, and increasing the robustness and realism of its results, has been an ongoing, ever-evolving, group effort, performed under the auspices of the CMB-S4 r Forecasting Working Group.This work has undergone three major iterations, with results presented in the CMB-S4 Science Book (Abazajian et al. 2016), the CMB-S4 Concept Definition Task force (CDT) Report (Lawrence et al. 2017), and most recently in the CMB-S4 Reference Design Report (Abazajian et al. 2019).
Each stage in the evolution of this framework has been collectively considered, and carefully documented in the CMB-S4 Simulation and Forecasting Logbook. 1 The resulting baseline survey definitions have been translated to publicly available, version-numbered map-based Data Challenges (DCs).2Thus far, two independent groups have participated in testing the strengths and shortcomings of different analysis methods on these simulations, as well as improving the simulations themselves, as described in Sections 5 and 6.
Given the difficulty in describing a continuouslyevolving process, for this paper we present several snapshots of our forecasting work and describe the relevant details.Sections 2 to 6 represent a full pass through the forecasting loop, as presented in Figure 1.In Sections 2 and 3 we present the full semi-analytic forecasting framework and optimization process, as used in the CMB-S4 CDT Report.In Section 4 we discuss the creation of the DC4 simulation suite, corresponding to the baseline presented in that report.Sections 5 and 6 describe two independent analysis methods and the results obtained from applying these methods to DC4.In Section 7 we describe several recent updates to the framework and the resulting findings presented in the CMB-S4 Reference Design Report.We conclude this paper with a discussion of the path forward.

SEMI-ANALYTIC FORECASTING FRAMEWORK
For the CMB-S4 Science Book (Abazajian et al. 2016), we developed a semi-analytic forecasting framework specifically targeted towards optimizing sensitivity to the tensor-to-scalar ratio, r, in the presence of Galactic foregrounds and gravitational lensing of the CMB.Currently, only small aperture telescopes (SATs) have reached the level of systematics control and noise performance necessary to pursue a ground-based, high-precision measurement of B -mode polarization down to low multipoles ( 30), targeting the 80 peak from the polarization signature generated by PGWs at the epoch of recombination.Therefore, to forecast the performance of next-generation SATs, this machinery is based on scaling the bandpower covariance matrices (BPCMs) and noise power spectra (N ) of current published SAT analyses, such as those from BICEP/Keck (BICEP2/Keck Array Collaborations VI 2016; BICEP2/Keck Array Collaborations X 2018).This automatically builds into the forecast all real-world inefficiencies including (but not limited to): imperfect detector yield; non-uniform detector performance; read-out noise; observing inefficiency; losses due to timestream filtering; beam smoothing; and nonuniform sky coverage.
At its core, this code is based on the BICEP/Keck parametric power-spectrum-based likelihood analysis.Such types of parametric analyses have also been extensively used by the Planck collaboration for ≥ 50 (Planck Collaboration XV 2014;Planck Collaboration XI 2016;Planck Collaboration V 2019).We validate this approach using map-based simulations as part of the forecasting loop and present detailed results in Section 6.Our confidence in the projections is grounded in the connection to achieved performance and published results.
Figure 2 presents a schematic representation of the semi-analytic forecasting framework, identifying the user inputs, code modules, and outputs.The subsections that follow describe this framework in detail.

Fisher Formalism
Given a likelihood function of the form where d are the data bandpowers, θ are the theory parameters, and µ(θ) and Σ(θ) are the bandpower expectation values and the bandpower covariance matrix given the parameters, we can calculate the expectation value of the log-likelihood curvature, evaluated at the position of the best fit model: This quantity, called the Fisher information matrix, measures how steeply the likelihood falls as we move away from the best-fit model, and F −1 can be thought of as the best possible covariance matrix for the measurement errors on the parameters θ i .It can be shown that  (F −1 ) ii is the minimum obtainable standard deviation on the desired parameters (e.g., Cramér 1946;Kendall 1979;Tegmark et al. 1997).
Inserting Equation 1 into Equation 2 yields We then calculate our parameter constraints as In all the projections below, for each fiducial model considered, we choose to fix the covariance matrix with respect to the theory parameters, i.e., Σ(θ) = Σ, making the second term of Equation 3 identically zero.Equation 3 provides a clear picture of how the construction of the covariance matrix Σ impacts the final constraints, and how its misestimation could lead to constraints that are far too optimistic.It is with this in mind that we have decided to compute our BPCMs by directly scaling the values in achieved covariance matrices (see Section 2.5).

Forecasting Inputs
In this section we briefly describe the set of inputs to the forecasting code, shown as green boxes in Figure 1.We discuss the key inputs in more detail in Sections 2.3 and 2.4.
Achieved Performance: The code takes signal and noise simulations of the BICEP/Keck datasets (BI-CEP2/Keck Array Collaborations VI 2016; BI-CEP2/Keck Array Collaborations X 2018) and derives the bandpower covariance matrix and the ensembleaveraged signal and noise bandpowers.These inputs contain information about the actual on-sky map noise achieved from multiple receivers, over multiple years, at 95, 150, and 220 GHz, including all real-world penalties.A more detailed description of these simulations is available in Appendix H of BICEP2/Keck Array Collaborations X (2018).Similar information from another experiment could easily be substituted.To project the performance of CMB-S4 channels, we assume that we can scale down the achieved noise based on increased detector count and integration time and that we can apply beam-size and noise-equivalent temperature (NET) rescalings to account for the differences in experimental design.
Scalable Instrument Specification: To specify the forecast instrument we start by selecting a set of observing frequency bands, with bandpass functions describing the response of each band.Then, for each observing band, we must provide the beam size, number of detectors, and ideal per-detector NET.The last two items can be used to make an idealized calculation of the instrument sensitivity in each band.We emphasize that we use these ideal performance numbers only for scaling between frequency bands, by comparing to similarly-calculated ideal sensitivities of BICEP/Keck.The scaling factors are ultimately applied to the achieved sensitivities, as described in Section 2.5, to obtain performance-based sensitivities in our desired bands.We note that for the frequency bands in which we do not currently have existing data, we extrapolate from the closest frequency for which we do.
Fiducial Sky Model : A fiducial parametric model describing the foreground and CMB signal.Our standard model, which has 14 parameters, is discussed in Section 2.4 and in Appendix A.
Priors: If we have external prior knowledge for a given parameter θ i , we can introduce this information by adding P i = 1/σ 2 i to the diagonal of the Fisher matrix, where σ i is the width of the prior.

CMB-S4 Scalable Instrument Specification
To span the four atmospheric windows available to ground-based experiments (Figure 3) and have enough channels to mitigate against complex foregrounds, we assume eight channels at 30, 40, 85, 95, 145, 155, 220, and 270 GHz, which are placed on small-aperture telescopes.For these low-resolution instruments, we pick 0.52-m apertures, motivated by proven SAT Stage-3 experiments (Kang et al. 2018), yielding the beams presented in Table 1.In addition, we also include a 20-GHz channel on a large-aperture telescope (LAT).  .Calculated atmospheric brightness temperature spectra (at zenith) for the South Pole at 0.5 mm precipitable water vapor (PWV) and Atacama at 1.0 mm PWV (both are near the median values).Atmospheric spectra are generated using the am Atmospheric Model (Paine 2017).The top-hat bands, in red and blue, are plotted on top of these spectra, with the height of each rectangle equal to the band-averaged brightness temperature using the South Pole spectrum.Details about the bands, such as fractional bandwidths, are presented in Table 3-1 of Abazajian et al. (2019).
We derive the split in each window by separating the overlapping bands as far as possible while still keeping the calculated per-detector NET within 10-15% of the NET for a detector that spans the full window.The ideal per-detector NETs were calculated with NETlib.py4,5 at the South Pole and Chile, using the 10-year MERRA2 median atmospheric profiles (Gelaro et al. 2017).We use the average of NET calculated for detectors at the two sites, which are 214,177,224,270,238,309,331,747 and 1281 µK CMB √ s for our nine channels at 20-270 GHz, respectively.6 .These NETs are lower than similar BICEP/Keck channels (which are 287.6,313.1 and chrotron residuals.To mitigate against such biases the reference design was updated to include this additional channel.Placing this low-frequency band on a SAT would result in a very broad beam, which would dominate the noise at the relevant scales; to circumvent this, as mentioned above, we place it on a LAT.This means that while the scaling of the noise for this channel is still calculated from achieved performance, we use noise parameters (see Equation 7) that are more in tune with what has been achieved by LATs at the time of this forecasting (Louis et al. 2017;Henning et al. 2018), i.e., the 1/f noise component is characterized by an knee = 200, keeping γ -the slope of this componentthe same as for the small-aperture noise, and a beam of Θν = 11 FWHM (assuming a 6-m aperture).

µK CMB
√ s for 95, 150 and 220 GHz) predominantly because they are calculated for a 100-mK thermal bath, as opposed to 250 mK (which was also used in the CMB-S4 Science Book forecasts).
We also fold in information from two WMAP channels, 23 and 33 GHz, and seven Planck channels, 30,44,70,100,143,217,and 353 GHz,though this extra information is only relevant in the early stages of CMB-S4 observation.
Since the forecasts use scaled BICEP/Keck bandpower statistics, they also use the same bandpower window functions and uniform binning: nine multipole bins with ∆ = 35, spanning a multipole range of 21 ≤ ≤ 335.

Delensing Treatment
One of the main challenges for detecting primordial B modes is the lensing B -mode contribution from the weak lensing of E modes as the CMB photons travel to us.For instrument noise levels below 5 µK-arcmin, this lensing signal becomes an important source of contamination and its sample variance significantly worsens our constraining power on PGWs (Smith et al. 2012).Unlike Galactic foregrounds, the lensing signal is achromatic and cannot be mitigated with multi-frequency observations.However, its contribution can be reduced by knowing the cumulative gravitational lensing potential φ along the line of sight, and having a high-fidelity E -mode map.Together, the two can be combined to form a lensing Bmode template by lensing the E -mode map with the φ field and subtracting this template from the measured Bmode map.This technique is known as delensing (Knox & Song 2002;Seljak & Hirata 2004;Carron et al. 2017;Carron & Lewis 2017).
Reconstructing φ with high S/N requires highsensitivity, high-angular resolution CMB polarization maps (Lewis & Challinor 2006).Therefore, in addi-tion to the low-resolution effort, we assume a separate high-resolution large-aperture instrument dedicated to measuring the intermediate-and small-scale information necessary to delens the B -mode map.This instrument is assumed to have 1-arcminute resolution and noise performance equivalent to the 145-GHz channel from the smallaperture telescopes.These experiment specifications are revised for the CMB-S4 Reference Design Report and updated in Section 7. The translation between detector effort and map noise in the delensing instrument is based on the method used for the low-resolution instrument (as described in Section 4.1 and Equation 7), but using SPTpol achieved performance (Sayre et al. 2020), i.e., without incurring penalties from non-idealities specific to low-resolution instruments and low-analysis, such as low-mode filtering and non-uniform coverage.Following the iterative formalism in Smith et al. (2012) 7 , using min = 300 and max = 4000 for the φ reconstruction and min = 30 for the E -mode map, we convert the map noise in the delensing survey to a delensing efficiency, or equivalently a fractional residual in the lensed B -mode power, specified by setting the residual lensing amplitude A L to the corresponding level.
The detector effort dedicated to the delensing instrument comes out of the total detector effort budget for the r survey, and the distribution of effort between the low-resolution and delensing instruments is part of the optimization process, as shown in Figures 4 and 5.

Multicomponent Theory Model
Our model includes a CMB component parametrized by r and the residual lensing amplitude, A L , and components of polarized dust and synchrotron emission.We assume that the synchrotron scales as a simple power law in both frequency and .For the dust we assume a power-law scaling in and a modified blackbody spectral energy distribution (SED).We allow for spatially correlated synchrotron and dust, parameterized by a single correlation parameter (Choi & Page 2015;Krachmalnicoff et al. 2018); the effective frequency scaling of this correlation depends on the relative strength of the two components.In addition, we also consider dust and synchrotron frequency decorrelation parameters, which allow their spatial pattern to change with frequency, suppressing the correlation of foreground signals between observing bands.A detailed description of the full parametric model is presented in Appendix A. The current model is easily extendable to accommodate additional complexities that have not yet been captured; alternatively, other models could be substituted here as well.
The model parameters are: -r, tensor-to-scalar ratio, at pivot scale k s = 0.05 Mpc −1  and spectral index of the tensor modes n t = 0; -A L , residual lensing amplitude; -A dust , dust amplitude, in µK 2 CMB , at 353 GHz and = 80; 7 Delensing estimators which are technically more optimal have been introduced in Carron (2019) and Millea et al. (2020), and we are currently developing tools to test their feasibility in further iterations of our forecasting.
For a given auto-or cross-spectrum, we step through the model components, combine the appropriate amplitude functions for the two bands contributing to the spectrum, and apply the bandpower window functions to obtain the binned expectation values.Finally, we sum over model components to find the total expectation value for that spectrum.In addition, since a Fisher forecast requires knowledge of the response of the model expectation values with respect to the model parameters, we also output the derivatives of the model expectation values.
The Fisher matrix that we consider is usually 10dimensional.The parameters we constrain are r, A dust , β d , α d , ∆ d , A sync , β s , α s , ∆ s , .We fix T d =19.6 K because this parameter is mostly degenerate with A dust for observations below 300 GHz, where the SED is in the Rayleigh-Jeans limit.The parameter A L is assumed to be known, but its value is adjusted to represent varying levels of delensing, as discussed in Section 2.3.1.The EE/BB ratios are not relevant for calculations presented here because we are focusing on constraints from the BB spectrum only; however, these parameters are left in for possible future forecasting.The fiducial model used for forecasting is centered at either r = 0 or 0.003, with A dust = 4.25 µK 2 CMB (bestfit value from BICEP2/Keck Array Collaborations VI 2016) and A sync = 3.8 µK 2 CMB (95% upper limit from BICEP2/Keck Array Collaborations VI 2016).In Sections 2-6 we assume no variation of these foreground amplitudes over the sky fractions relevant to this study, i.e., they are always pinned to the values listed above.We revisit this assumption in Section 7. The spatial and frequency spectral indices are centered at the preferred Planck and WMAP values (Planck Collaboration Int. XXII 2015;Fuskeland et al. 2014): β d = 1.59 (with Gaussian prior of width 0.11); β s = −3.10(with a Gaussian prior of width 0.30); α d = −0.42;α s = −0.6;and the dust/synchrotron correlation is centered at = 0. Following Planck Collaboration Int.L (2017), the central dust correlation value is taken to be ∆ d = 0.97 (3% decorrelation) and the synchrotron correlation value is assumed to be ∆ s = 1 (no fiducial synchrotron decorrelation).Unless otherwise stated, the parameters have flat unbounded priors.

Signal Scaling
The output model expectation values are also useful in the construction of our bandpower covariance matrix.To construct the BPCM components, we use lensed-ΛCDM + BICEP/Keck noise simulations.However, because we have the individual signal-only, noise-only, and signal×noise terms, we can record all the individual BPCM components: Here S are signal simulations, N are noise simulations, and the indices i, j, k, l run over the experimental frequency channels.
While calculating the covariances from the signal and noise simulations, we also record the average signal bandpowers from the simulations.For a new signal sky model, we can calculate the new bandpower expectation values, and rescale the signal components in the bandpower covariance matrix by the appropriate power of the ratio of the recorded average signal bandpowers and the newly calculated expectation values.The full BPCM construction and rescaling procedure is presented in Appendix B.
When we do this, we set to zero any term that has an expectation value of zero (under the assumption that signal and noise are uncorrelated, and different signals are uncorrelated) to reduce the Monte Carlo error in the resulting covariance matrix, given the relatively modest number of 499 realizations used.We also set to zero the covariance between bandpowers that are separated by more than one bin in , but, importantly, preserve the covariance between the auto-and cross-spectra of the different frequency bands.
It is worth noting that this procedure allows us to have different numbers of degrees of freedom per bandpower for noise than for signal, which is a byproduct of signal and noise entering differently in a real analysis.This complication is often ignored in other forecasts by setting the noise and signal degrees of freedom to be identical.
The ability to estimate a BPCM for any model means that only a single set of simulations is necessary, and one does not have to run simulations for any and all conceivable scenarios.As already mentioned, in all the projections below we choose to fix Σ(θ) = Σ and hence we only apply the rescaling step once per fiducial model considered, i.e., we do not rescale our BPCM at every step along the way.

Noise Scaling
In addition to scaling from one signal model to another, recording all the covariance terms allows us to rescale the noise components as well.Given a dataset for which we have simulations, the noise scaling can be performed in one of two ways.The first is to take a frequency present in the dataset and scale down the noise in the BPCM by the desired amount.In particular, each BPCM component is scaled independently by N ,S4 /N ,BK for each factor of N present.The second way is to add an additional frequency, for which we do not yet have data, by taking the covariance structure of an existing frequency, scaling down the noise by the amounts described above, and then expanding the BPCM by filling it in with the appropriate variance and covariance terms between the new band and all the existing ones.These tools facilitate the construction of a new data structure to explore any combination of frequency bands, with any sensitivity in each band.
To obtain N values for a CMB-S4 channel by scaling the achieved N values of a Stage-3 channel, we have to first scale by the ratio of their respective survey weights and then scale by the ratio of beam window functions, B 2 .The survey weight is defined as w = 2f sky /σ 2 map , where f sky is the effective sky area and σ map is the Q/U map noise level.The input simulations use measured non-Gaussian B shapes, but we rescale based on Gaussian approximations that are close to the true functions.We can write the noise spectrum of a CMB-S4 channel as where , Θ ν is the full width at half maximum (FWHM), in radians, of the Gaussian beam, and w i,achieved is the achieved integrated survey weight of a particular instrument.
To obtain the projected achieved survey weight for any of the CMB-S4 channels, we rescale the achieved survey weights as where n det−yr

S4
is the number of detector-years assumed for CMB-S4 at any particular frequency, and n det−yr BK is the number of detector-years in the BK15 dataset, with the instruments in their final state (BICEP2/Keck Array Collaborations X 2018).
The implicit assumption in this step is that the performance of this new CMB-S4 frequency channel falls short of idealized performance by the same factor as the real map from which we are scaling.The survey weight scaling is always performed from the closest frequency for which we have available simulation inputs: 20-95-GHz are scaled from BICEP/Keck 95-GHz data; 145 and 155 GHz are scaled from BICEP/Keck 150-GHz data; and 220 and 270 GHz are scaled from BICEP/Keck 220-GHz data.
Since we are using BICEP/Keck products, which are calculated with a particular sky mask m (with f BK sky =

Ωpix 4π
i m i 1%, where the sum is over the pixel i), we must also scale these products appropriately to evaluate the effect of different sky fractions.We propagate the effects of f sky in the noise spectra and BPCM in two ways: first, we inflate the N l values by a factor β = f S4 sky /f BK sky , which boosts the (signal × noise) and (noise × noise) terms of the covariance matrix by β and β 2 , to take into account the redistribution of the achieved sensitivity onto a larger patch.Note that the (signal × signal) component remains unchanged in this step.Second, we scale down the entire covariance matrix by a factor of β to increase the number of degrees of freedom in the BPCM, accounting for the fact that we are now observing more modes.This procedure scales the signal and noise degrees of freedom independently, preserving the relative effects that filtering and non-uniform coverage have on the covariance structure.We revisit the way this scaling is performed in Section 7, where we separately take into account the impact of realistic observing strategies on the various components.
With the N scalings in hand, we can perform the aforementioned BPCM operations to arrive at a scaled CMB-S4 BPCM that encompasses the intricacies of realistic observing conditions.

OPTIMIZED FORECASTING FOR r
In this section, we answer the following question: given a fixed amount of effort and the instrument specifications offered in the previous sections, what is the optimal distribution of effort for foreground cleaning and delensing such that the tightest constraint on r is achieved?To do this, we set up an optimization process that calculates the steepest descent through the ten dimensional space (effort in the nine single-frequency low-resolution channels plus one high-resolution channel for delensing).
We operate in discrete units of effort, with a single unit defined to be equivalent to 500 detector-years at 150 GHz (similar to a BICEP/Keck 150-GHz receiver observing for 1 year).For other channels, the number of detector-years per unit of effort is calculated as n det,150 ×(ν/150 GHz) 2 .We define "effort" in these units because it is proportional to focal plane area, which is one of the strongest drivers of the overall project cost.
At each step of the algorithm, we allocate a unit of effort in each dimension.For each separate allocation, we rescale the BPCM, compute a new Fisher matrix, and calculate the resulting σ(r).We then compare the constraints and permanently assign one half of the unit of effort to the channel that produces the largest improvement in σ(r) and the other half to its atmospheric window counterpart (the groupings are 30/40, 85/95, 145/155, and 220/270 GHz).This last step, motivated by earlier iterations through the forecasting loop, is enforcing a split in order to reduce the foreground biases obtained under various foreground models.Projections run to a total of 6000 units of effort, which would be equivalent to 500,000 150-GHz detectors operating for six years.Stage-4 scale surveys seem likely to be in the range of 1-3×10 6 detector-years, an order of magnitude increase from Stage-3 experiments.Though it is generally prohibitive to calculate the entire 10-dimensional hypercube of σ(r), we have validated our approach with a full-grid calculation at various points in the optimization.
Figure 4 shows the optimized constraints on r as a function of total effort, as well as the fraction of effort spent on removing the lensing sample variance and the resulting map rms lensing residual, for the no-detection scenario (i.e., r = 0).To reach the desired science goal of σ(r) = 5 × 10 −4 , for f sky = 3% (see discussion on sky fraction below), requires 1.2 × 10 6 150-GHz equivalent detector-years (or 1.8 × 10 6 when including marginalization over the decorrelation parameters).
Upon obtaining the optimized detector count distribution, we obtain the input noise spectra according to Equation 5. To use these spectra to create noise simulations (discussed in Section 4.1), it is useful to distill them to a few input parameters.To that end we fit them to the formula and obtain the map depth σ map , slope γ, and knee values.For the small-aperture data, we find knee = 50-60 with γ of −2 to −3, depending on the frequency.The optimal distribution of effort is presented in Figure 5 and the configuration that achieves the science goal in 4 years (or 6 years when marginalizing over decorrelation) is summarized in Table 1. .Top: optimized constraints on r for 3% sky fraction as a function of total effort.We include in solid black the case with delensing, allowing for decorrelation of the foregrounds, in solid gray the case without delensing, in dotted gray the case where no decorrelation is allowed in the model (with delensing), and in dashed black the raw sensitivity in the absence of foregrounds and lensing.Bottom: for the delensed case (with decorrelation), we show the fraction of effort spent on removing the lensing sample variance and the resulting rms lensing residual.
As mentioned, it is also necessary to optimize the sky fraction.The trade-off between raw sensitivity, ability to remove foregrounds, and ability to delens results is .Optimized map depth in each of the small-aperture channels as well as in the delensing channel, for f sky = 3%, corresponding to the delensed (with decorrelation) case in Figure 4.
a complicated optimization problem with respect to sky coverage.Figure 6 shows the r sensitivity forecast for CMB-S4 as a function of the observed sky fraction for the case that we only have an upper limit (r = 0).In this case, the optimization prefers a deep survey that targets as small an area as possible.This conclusion, of course, depends on the forecasting assumptions; to that end we would like to draw attention to several key factors.
First, holding the desired constraint on r fixed, the level to which we rely on delensing to decrease the sample variance increases at smaller sky fractions, as expected.For example, as shown in Figure 6, achieving the forecasted sensitivity on r for a survey targeting 1% of the sky will require a > 80% reduction in the map rms level of the CMB lensing B -modes.While from a sensitivity standpoint it is possible to achieve these levels, the extent to which systematic effects and small-scale foregrounds will need to be constrained may become too stringent (Carron et al. 2017;Polarbear Collaboration et al. 2019).
Second, the current optimization assumes uniform foreground behavior across the sky (with amplituide equal to that in the BICEP2/Keck region), while in reality the average amplitude, and possibly the complexity of foregrounds increase as larger sky areas are targeted.This effect would steepen the optimization curve at larger sky fractions and increase our preference for small amounts of sky.
Third, in the case of a detection, a practical consideration for the robustness of the final r result is its reproducibility across the sky.It is therefore useful to observe multiple 1% patches from which we can derive and compare separate cosmological constraints.
Finally, technical aspects of E/B separation of CMB maps may heavily disfavor patches smaller than about 1% of the sky due to cut-sky effects (Bunn et al. 2003;Smith & Zaldarriaga 2007;BICEP2/Keck Array Collaborations VII 2016).
Balancing the forecasting results with these concerns, we have chosen 3% as the default sky fraction for CMB-S4 r constraints (assuming a true value of r = 0).This choice was made for both the CMB-S4 Science Book (Abazajian et al. 2016) and CMB-S4 CDT Report (Lawrence et al. 2017) versions of the forecasts.As mentioned in Section 2.5.2, we revisit the issue of sky coverage in Section 7 with updated assumptions about our survey strategy and how we perform the f sky scaling from the achieved products.

RMS lensing residual
Delensing/Total effort Figure 6.Top: optimized constraints on r as a function of sky fraction, for a fixed effort of 1.2×10 6 150-GHz-equivalent detectoryears.We include in solid black the case with delensing, allowing for decorrelation of the foregrounds, in solid gray the case without delensing, in dotted gray the case where no decorrelation is allowed in the model (with delensing), and in dashed black the raw sensitivity in the absence of foreground and lensing.Bottom: for the delensed case (with decorrelation), we show the fraction of effort spent on removing the lensing sample variance and the resulting rms lensing residual.

MAP-BASED SIMULATIONS
Using simulations to optimize the design of a CMB experiment inevitably involves a trade-off between the degree of detail that the simulations are able to capture and the computational cost of generating and analyzing them.This trade-off includes the choice of the domain in which the simulation is generated, ranging from the most detailed (but most expensive) time domain, through the map domain, to the most simplified (but most flexible) spectral domain.Inclusion of additional detail can help validate semi-analytic results, explore their sensitivity to assumptions about foreground models, sky coverage, and instrumental noise and systematics, and in more mature stages of design can inform specific instrument and survey strategy choices.
Here we review the methods used to explore the parameter space for the PGW survey, including map-level noise simulations, sky models, and observation strategy.We also describe our approach to modeling instrumental systematics, the delensing survey, and the analysis methods.
In addition, we use these simulations to validate the spectral domain forecasts for configurations where the approaches are directly comparable, thereby closing the loop presented in Figure 1.

Map Noise Simulations
We use Equation 7to obtain the desired noise prescription, for a fixed total effort, and then generate 499 Gaussian noise realizations at each band.As in the Science Book, we have mostly used a circular sky area of 3%.Small-aperture cameras have a very wide instantaneous field of view and hence the observed sky region necessarily has a large edge taper.For the nominal 3% sky coverage simulations, we assumed a circular sky patch centred at RA=0 • , Dec=−45 • (slightly below the BICEP/Keck patch) with full coverage out to radius θ < 12 • and "relative hits" tapering to zero with a cosine-squared shape for 12 • < θ < 27 • .The noise realizations are divided by the square root of this coverage pattern such that the noise "blows up around the edge" as it does in real maps.We also assume an min = 30 cut-off below which we do not recover any information.
For the CMB-S4 Reference Design Report (and Section 7), we include an explicit scan strategy on the sky and produce more realistic sky coverage patterns, but for the moment we regard the above as a reasonable compromise between idealism and reality.

Foreground Models
To make simulated sky maps, we add realizations of lensed CMB (both with and without a PGW component) to models of the Galactic foregrounds.So far, we have run simulations with seven foreground models, which we now describe.0. Simple Gaussian realizations of synchrotron and dust with power-law angular power spectra at amplitudes set to match the observations in the BI-CEP/Keck field, and simple uniform SEDs (power law for synchrotron, modified blackbody for dust).
1.The PySM8 model a1d1f1s1, where the letters refer to anomalous microwave emission (AME), dust, free-free, and synchrotron, respectively, and the numbers are the base models described in Thorne et al. (2017).Free-free and AME are assumed to be unpolarized in this model and thus do not affect the analysis in this paper.
2. The PySM model a2d4f1s3, where the models have been updated to variants that are also described in Thorne et al. (2017).Note that these include 2% polarized AME, a curvature of the synchrotron SED, and a two-temperature model for dust.
3. The PySM model a2d7f1s3, where the dust model has been updated to a more sophisticated physical characterization of dust grains as described in Hensley (2015).This model is interesting in that it does not necessarily conform to the modified blackbody SED.
4. The dust in Model 3 is replaced by a model of polarized dust emission that incorporates Hi column density maps as tracers of the dust intensity structures, and a phenomenological description of the Galactic magnetic field as described in Ghosh et al. (2017).The model is expanded beyond what is described in that paper to produce a modest amount of decorrelation of the dust emission pattern as a function of frequency motivated by the analysis of Planck data in Planck Collaboration Int.L (2017).
5. A toy model where the strong dust decorrelation suggested in Figure 3 of Planck Collaboration Int.L ( 2017) is taken at face value (∆ 217×353 = 0.85, at = 80) and scaled to other frequencies using the functional form given in appendix B of Vansyngel et al. (2017), with a linear scaling in .While such a model is not ruled out by current data, it appears to be very hard to produce such strong decorrelation in physics-based models.We also note that Sheehy & Slosar (2018) and Planck Collaboration XI (2018) have re-analyzed the same Planck data and, while they find that the high level of decorrelation in this model is still consistent with the data, their best fit to that same data has no decorrelation.
6.A model based on MHD simulations (Kritsuk et al. 2017) of the Galactic magnetic field, which naturally produces non-Gaussian correlated dust and synchrotron emission.
Models 1 to 4 use the large-scale modes of the real sky as measured above the noise in the Planck data.This means that these models are intrinsically "singlerealization," and this must be borne in mind when interpreting the results.Models 4 and 6 are not based on Planck data, but still contain a fixed signal realization.Models 0 and 5 have different seeds for each signal map and include the (Gaussian) sample variance.The PySM models fill in the small-scale structure with power-law Gaussian extrapolations, while Models 4 and 6 naturally produce non-Gaussian small-scale structure.However, all of these models are consistent with current data, and the more complex models are not necessarily more accurate reflections of reality.

Instrumental Systematics
Control of instrumental systematics is a critical design consideration.However, predicting and modeling these effects realistically is a difficult task that is dependent on actual instrument and survey design details, and furthermore, their impact on actual results comes not through the modeled effects but through unmodeled residuals.Many existing CMB experiments have published in-depth studies that use calibration data and simulations to set upper limits on a wide variety of effects, e.g.Keisler et al. (2015) For this study, in the absence of detailed instrument and survey designs, we have taken the first steps in simulating various generic classes of additive systematic by injecting additional noise-like components into the maps and then re-analyzing them without knowledge of what was put in.We have experimented with components that are both correlated and uncorrelated across frequency bands, and that have white, 1/ , and white + 1/ spectra, at varying levels compared to single-frequency map noise or, for correlated cases, combined map noise.The leading-order effects of such components can be mitigated via explicit modeling or filtering, but they may still produce map-level residuals.Examples of mechanisms in this class include bandpass mismatches, beam and pointing variations, calibration variations, cross-talk effects, half-wave-plate leakage, ground pickup, and readout irregularities.
To assess the impact of instrumental systematics on measurement requirements, for the purpose of determining both the required survey depths and the maximum allowable levels of systematic effects in the final singlefrequency survey maps, our general procedure is to feed parameterizations of various systematic effects into semianalytic forecasts and judge at what levels classes of systematics introduce parameter biases or additional uncertainties that are significant compared to the science targets for those parameters.
Other classes of systematics could be simulated by manipulating the analysis procedure only.Examples of such effects include uncertainties in the bandpasses, polarization angles, calibration, and beam shapes.Such examples are not presented here and are left for future work.

Delensing
We have generated high-resolution simulated maps on which we intend to run explicit lensing reconstruction and then include that information in the analysis.While we are currently working on this analysis, this process has not yet converged, and so for the present, we approximate delensing by scaling down the ΛCDM lensing signal by the appropriate factor, as described in Section 2.3.1.

SIMULATION ANALYSIS METHODS
To make simulated maps, the noise realizations described in Section 4.1 are added to the sky models described in Section 4.2.For each realization, one then has a stack of multi-frequency I, Q, U maps containing nonuniform noise, foregrounds, and signal, and the challenge is to re-analyze them to recover the parameter of interest (in this case r).This can be done by different teams using different methods and could be carried out in a blind manner, although we have not done this yet.
So far, we have experimented with two methods.The first is a spectral internal linear combination (ILC) method (e.g., Tegmark et al. 2003), which determines the linear combination of multipole coefficients that minimizes the foreground and noise power without altering the CMB contribution.This method only relies on the frequency dependence of the CMB and does not rely on assumptions about the spectral dependence of foreground components.Determination of cleaning coefficients in each bin (∆ = 31) leads to the smallest foreground residuals, but yields foreground residuals that are difficult to model.Relying on a single cleaning coefficient across all bins leads to foreground residuals that are easy to model but large.As a consequence, we use the same cleaning coefficients in three neighboring bins, which results in residuals that are acceptable and can still be marginalized in the power spectrum likelihood analysis.For the marginalization, since foreground residuals are typically dominated by dust, we assume an -dependence of the residuals across the three bins that share a common cleaning coefficient that is consistent with thedependence observed by Planck for dust.For models that exhibit decorrelation between different dust components, this model is no longer correct.This could be improved with better understanding of foregrounds, but no attempts were made to do so in this analysis method.
The second method is an evolution of the parametric multi-component fit to the ensemble of auto-and cross-spectra, as used for the BICEP/Keck analysis to date (BICEP2/Keck Array and Planck Collaborations 2015; BICEP2/Keck Array Collaborations VI 2016; BI-CEP2/Keck Array Collaborations X 2018).This method fits the observed bandpowers to a model composed of the lensing expectation plus dust and synchrotron contributions and a possible r component.Dust and synchrotron each have an amplitude (A dust and A sync ), a spatial spectral parameter (α d and α s ), and a frequency spectral parameter (β d and β s ).We also allow a dust/synchrotron correlation ( ), and decorrelation of the foreground patterns over frequency (∆ d and ∆ s ).This model is equivalent to the one described in Section 2.4.
Both of these analysis methods are only close to optimal when the foreground behavior is close to uniform across the observing field.For analysis of larger fields, algorithms that fit more complex behavior will likely be required, for example, modeling the frequency spectral indices individually in (large) pixels.

MAP-BASED RESULTS
Table 2 summarizes the results of the analysis for simulations of the optimized configuration obtained in Section 3 (1.2 × 10 6 150-GHz-equivalent detector-years) and residual lensing power A L = 0.1.The lensing residual is expected for iterative EB delensing according to Smith et al. (2012) for the sensitivity and angular resolution of the delensing survey.The results from the parametric analysis naturally depend on whether a marginalization over decorrelation is performed, while the ILC analysis did not attempt to capture the effects of decorrelation on the recovery of r and σ(r).This is evidenced by the large bias for the ILC method for Model 5 when compared to the parametric analysis that directly accounts for a possible decorrelation (last column).In general, we see that for r = 0 the simple Gaussian foreground Model 0 gives σ(r) ≈ 5 × 10 −4 , exactly as expected from the semi-analytic formalism.As we progress to the more complex foreground models, σ(r) is generally in the range 5-8×10 −4 .
The level of biases is generally below 1.0σ for all the models.These simulations are sets of 499 realizations, so the statistical uncertainty on the bias is approximatively 0.04σ.However, the strong decorrelation in Model 5, as well as the high-significance detection of decorrelation in the parametric analysis of Model 4, do significantly increase σ(r) and the level of bias.While the parametric method is able to account for the decorrelation, by construction information is lost, and in fact if one believed in such a scenario, a different re-optimization to concentrate the sensitivity at closer-in (less decorrelated) frequencies would be called for.
Table 3 shows results on detection significance for the CDT Report configuration for sky Model 6.For r = 0, the 95% upper limit is about 2.1σ(r).The value of the tensor-to-scalar ratio for which we expect a 5σ detection after 4 years of operation is r = 0.004.9For a tensorto-scalar ratio of r = 0.003, the median detection significance after 4 years is expected to be 4σ.If a detection were to be emerging at this point, extending the run time to 8 years would be justified in order to reach a 5σ detection.
While σ(r) can be precisely forecast for given assumptions, the true achieved detection level for r depends on the particular realization of the B-mode field in the observed patch of sky.Therefore we can only forecast a distribution of detection levels.For a tensor-to-scalar ratio of r = 0.003 and 8 years of observing we expect to achieve more than a 3σ detection with a probability of 0.99, more than 4σ with a probability of 0.93, more than 5σ with a probability of 0.53, and more than 6σ with a probability of 0.14.For simplicity, we focus on σ(r), and on median detection levels as well as median 95% confidence upper limits to state the typical outcome.
The numbers in Table 2 clearly show dependence on the foreground model used in the simulation.If the actual foregrounds are substantially different than any of these cases, then the biases could be larger.To obtain some understanding of how large the biases could be, and what instrument modifications might help to reduce them, we have also looked at ILC biases in the extreme case that the foreground residuals are not modeled or marginalized over, but simply absorbed into the estimated B -mode power spectrum.Doing so with dedicated simulations based on sky Model 6 increases the magnitude of the bias on r to 4.1 × 10 −4 .The dominant contribution to the bias comes from synchrotron residuals, which motivated placing one lower-frequency channel on the LAT (reducing the magnitude of the bias to 1.8 × 10 −4 ).
Table 4 summarizes the results of the analysis of simulations including additive systematic effects on top of foreground Model 3 (note: these simulations correspond to DC3).Different combinations of uncorrelated and correlated contamination with varying spectra are considered.The levels of systematic contamination for these simulations were chosen to predict biases on r of 1×10 −4 in semi-analytic forecasts.We can see that the different combinations explored increase the bias on r by amounts that typically vary from 0.5 to 1.5×10 −4 for the two separate analyses, over the different cases.We find that to restrict the bias on r to this level, the sum of additive contamination effects needs to be controlled to 3-7% of the single-frequency survey noise, or (in the case of correlated systematics) 6-11% of the total combined noise levels.Such percentages are consistent with the upper limits currently achieved for residual additive systematic contamination compared to survey noise by smallaperture experiments (e.g., BICEP2/Keck Array Collaborations VI 2016).Assuming that CMB-S4 will include a sustained effort to continue to control, understand, and model systematic effects down to levels limited by survey noise, these percentages provide reasonable benchmark requirements.
Results of simulating systematic errors in the determination of bandpasses vary by analysis method.The construction of the ILC method makes it largely insensitive to such errors.The parametric analysis, which includes parametrized models of the frequency spectra of different foregrounds, shows biases on r at the 1 × 10 −4 level for uncorrelated random deviations in band-center determination of 0.8%, or for correlated deviations of 2%; we adopt these as reasonable benchmark requirements to accommodate a variety of both blind and astrophysical foreground modeling approaches.

DESIGN
In this section, we describe updates to the framework and the reference design, which attempt to take into account the impact of realistic observing strategies, realistic focal-plane layouts and mapping onto optics tubes, as well as a more conservative approach to our delensing forecasts.We also seek to answer the question of siting, with the South Pole and Chile being the two choices considered.The work in this section has led to the forecasts and plans presented in the most recent CMB-S4 document-the CMB-S4 Reference Design Report (Abazajian et al. 2019).While this section is selfcontained, we direct the reader to the aforementioned source for more details.

Table 2
Results of two analysis methods applied to map-based simulations assuming the CMB-S4 CDT Report (Lawrence et al. 2017) configuration and our suite of sky models (DC4).All simulations assume an instrument configuration including a (high-resolution) 20-GHz channel, a survey of 3% of the sky with 1.2 × 10 6 150-GHz-equivalent detector-years, and A L = 0.1.2.5 a An extreme decorrelation model-see Section 4.2.In the right column the parametric analysis includes a decorrelation parameter.No attempt is made in the ILC analysis to model the decorrelation.The middle columns shows the parametric analysis when we do not include deccorelation parameters.

Table 3
Results on detection significance for the CMB-S4 CDT report (Lawrence et al. 2017) configuration, using the two analysis methods.For the r = 0 model we report the 95% confidence level upper limit (CL UL).

Table 4
Map-based simulation results for dedicated simulations containing systematics (DC3).Simulations here assume the Science Book Configuration (Abazajian et al. 2016), i.e., an instrument configuration including a (low-resolution) 20-GHz channel, a survey of 3% of the sky with 1.0 × 10 6 150-GHz-equivalent detector-years, and A L = 0.1.We report sky Model 3 and r = 0, with additive systematic effects in varying combinations, the amplitudes of which are specified as percentages of survey noise, for the white (A) and 1/ (B) components.In previous versions of our forecasting, we have had the ability to choose the number of detectors in each frequency band in a continuously variable manner, as shown in Figure 5.For the reference design, a mapping of detectors into dichroic optics tubes has been carried out, while seeking to maintain the band distribution as determined in the optimization calculations.In scaling the achieved performance from the existing monochromatic instruments to dichroic detectors and optics, no degradation of optical performance has been assumed at this stage, but this assumption should be verified with upcoming data.This results in the configuration described in Abazajian et al. (2019) with 18 SAT tubes, observing for 7 years.We use this configuration to scale the BICEP/Keck noise bandpower covariance matrix in the same way as described in Section 2.5.A further re-scaling is applied to account for sky coverage, as explained be-

Sky Coverage Effects
The semi-analytic calculations of Section 2.5 assumed a simplified re-scaling for sky area, while the map based simulations of Section 6 assumed an idealized circular sky patch, which is not actually achievable with a practical instrument from a site at any latitude.Figure 7 compares our prior assumptions to more realistic hit patterns.
From the South Pole, it is possible to concentrate the coverage onto a compact region of sky, but from Chile the region that can be observed is affected by Earth's rotation, resulting in more extended coverage.The large instantaneous field of view of the SAT telescopes means that there is a minimum field size that can be achieved, and that there is always a strong edge taper in the coverage pattern.
We have performed a calculation that attempts to optimize simulated SAT observations from Chile to produce the densest possible coverage on a 3% patch of lowforeground sky, resulting in the overall pattern shown in Figure 7 as "Chile full."We segment this into its deepest part, which we call "Chile deep," and the remainder, which we call "Chile shallow." From the South Pole, one can scan the same patch at all times of the day and year at the same observing elevation, with the size of the observed patch controlled by the length of the scan throw in Right Ascension.A minimal-length scan results in the pattern shown in the figure as "Pole deep."Lengthening the scan while remaining in the low foreground sky results in the pattern "Pole wide."In the results below, "Pole deep" and "Pole wide" are therefore "either/or" options.
Because the noise increases in regions with less observing time, the effective sky area for noise is larger than the effective sky area for signal-and both of these also depend on the weighting applied when analyzing the maps.The patterns shown in Figure 7 have the effective sky fractions reported in Table 5, assuming inverse noisevariance weighting.
We can take into account the above effects by rescaling the BICEP/Keck BPCM in a more sophisticated manner.First, we need to scale the noise due to distributing the effort on a patch of sky larger than the original BI-CEP/Keck one.The noise is scaled by the effective noise 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Table 5
Effective sky fractions for signal, noise, and signal-cross-noise, as percentages, for the observation patterns shown in Figure 7, and the case of inverse noise-variance weighting (i.e., Equations 9-11 with w i = h i ).Here "SP" is the South Pole and "CH" is Chile.Note: the CH Shallow numbers appear larger than CH Full due to the effects of the weighting.
SP deep SP wide CH deep CH shallow CH full f sig sky 1.9 4.3 2.4 10 5.9 f noi sky 2.9 6.5 3.4 20 18 f cross sky 2.5 5.5 3.0 16 12 For comparison, the BICEP2/Keck and BICEP3 values are (1.0, 1.3, 1.1) and (1.9, 2.7, 2.3) for signal, noise, and signal-cross-noise respectively. factor where Ω pix is the solid angle of a single pixel, w i are the weights for pixel i, and h i are the hit counts.
Second, since we observe a different number of modes, we need to scale the signal, noise and signal-cross-noise contributions of the BPCM by the factors We also need to take out the effect of these factors from the original BPCM.In the BICEP/Keck analysis, the weights are the inverse noise variance, i.e., w i = h i .In the case of CMB-S4, we will never be noise-dominated, either due to an actual primordial signal, or due to the lensing and foreground residuals, so here we use the inverse variance of the total signal and noise to determine the optimal weighting.Lastly, while the scanning strategy used here has been optimized to mostly observe the cleanest available sky, some regions with high Galactic emission are also observed.Realistically, we would mask such regions when analyzing the maps.To assess this effect, we used masks based on a smoothed Planck 353 GHz polarized intensity map, keeping the cleanest 30% or 60% of the full sky (28% and 58% after apodization).We then used these masks to disregard some of the pixels, resulting in a degradation of the constraints on r.

Delensing Revised
The CMB-S4 PGW science goal can only be achieved if the majority of the lensing B modes can be removed.The optimization in Section 3 assumed a single frequency channel assigned to the higher resolution delensing observations.The strength of polarized foregrounds at small angular scales is currently poorly constrained by data; the reference design, therefore, includes some additional coverage at higher and lower frequencies. 10o forecast the delensing performance, we proceed in two steps.For a given LAT configuration and sky coverage, we derive the noise levels for an ILC that minimizes the variance of components with a frequency dependence that differs from that of a blackbody (Tegmark & Efstathiou 1996).In this step, we assume that polarized foreground emission is dominated by Galactic synchrotron and thermal dust emission.Using the ILC noise power spectrum, we then forecast the performance expected for iterative EB delensing (Smith et al. 2012).
The LAT reference design, established independently from this work, and described in Abazajian et al. (2019), includes 2 LATs in Chile for the wide field survey targeting science goals other than PGWs, and 1 LAT at the South Pole for the PGW delensing survey.For the Chile LATs, and a wide-area survey covering 70% of the sky, the two-step procedure predicts that 73% of the lensing power can be removed in the "Chile shallow" region after 7 years of observation.Similarly, for the single LAT at the South Pole dedicated to delensing of the approximately 3% "Chile deep" and "Pole deep" regions, we expect to be able to remove close to 90% of the lensing power after 7 years of observation.
The numbers given above assume identical hits maps for the South Pole LATs and SATs, and assume inverse noise-variance weighting rather than a weighting scheme that accounts for both signal and noise.For the lensing residuals achieved by the reference design, inverse noisevariance weighting for the SAT maps is suboptimal because the signal (e.g., for r = 0 lensing residual after foreground removal) is measured with signal-to-noise ratio above unity over a region that extends beyond the region that dominates the analysis in inverse noise-variance weighting.As a consequence, in all the forecasts presented below we employ weights that account for both signal and noise, which significantly increases the number of modes that contribute to the analysis, especially for small, deep patches.Even though, for the same survey, this leads to slightly higher noise and lensing residuals, the increase in the number of modes overall reduces the lensing sample variance contribution to σ(r).In practice we determine the optimal weights iteratively, accounting for the effect on the SAT and LAT analyses and find rapid convergence.

Results
The covariance matrices calculated as described above are used to produce the results given in this section, where the number, siting, and coverage patterns of the SATs are varied.In all cases, a delensing LAT at the South Pole is assumed to concentrate its coverage on a small patch of sky, while delensing over larger sky areas is assumed to be available from the Chilean LATs.
As mentioned earlier, we split the Chilean coverage shown as "Chile full" in Figure 7 into a deep patch, "Chile deep", which overlaps with the "Pole deep" region, and call the remainder "Chile shallow."We then make separate forecasts for each Chilean sub-region us- We assume an instrument with 18 SAT optics tubes and an observation time of 7 years, with the five masks defined in Section 7.
Each band shows different Galactic cuts, based on Planck polarized foregrounds: the upper edge uses the cleanest 28% of the full sky, whereas the lower edge uses the 58% We explore the effect of turning on foreground decorrelation in the forecasting.We also explore adding a foreground bias, in quadrature, with a 1% value of the equivalent r of the current foreground minimum of the BK15 data at = 80.
ing the appropriate delensing level for that sub-region.
To obtain forecasts for the full region we add the σ(r) results in simple inverse quadrature, thereby making the approximation of independence of the measured modes.When we combine these with South Pole observations, we mimic a joint analysis over the overlapping region by taking the sum of the "Pole deep" and "Chile deep" coverage maps and computing the corresponding weights and lensing residuals, and add the "Chile shallow" results in inverse quadrature.
We also explore the possibility of unmodeled foreground residuals contributing residual power to the cleaned maps, assuming a foreground cleaning to 1% at 95 GHz.We do so by adding in quadrature to the r forecasts a foreground bias equal to 1% of the equivalent r of the foreground minimum of the BK15 data at 95 GHz and = 80.
Since some parts of the "Chile shallow" coverage lie closer to the Galactic plane, we boost the foreground level in the shallow region, and the foreground bias if it is included, by a factor of 3 with respect to the deep patch.This scaling is based on the dust amplitude measured in the map-based simulations for surveys observed from the South Pole or from Chile, using a map with spatially varying foregrounds based on Planck data.
In Figure 8, we show the dependence of σ(r) on r for the different coverage masks.We calculate these constraints for r = 0, 0.003, 0.01, and 0.03, for the different Galactic cuts, and show the linear interpolation between these points, sampled on a high-resolution linearlyspaced grid.We find that the survey strategy from the South Pole is always favored in the limit of small r, with the crossover point depending on the specific assumptions.This comes as a direct result of the fact that while a larger fraction of the sky can be observed from Chile, one can concentrate the available sensitivity more deeply from the South Pole (see Figure 7).We note that the delensing requirements are more stringent for the smaller surveys (such as the South Pole one), highlighting the importance of periodically revisiting the assumptions made for the delensing survey of CMB-S4 as new data from Stage-3 experiments become available.
In Tables 6-7 we present a set of σ(r) results for seven years of observations and r = 0, and in Tables 8-9 we present a set of significance of detection levels for r = 0.003, while varying the number of optics tubes at the South Pole and Chile over a wide range of options.We show results for two different variants: (i) with no marginalization over the decorrelation parameters and for the 28% cleanest polarized sky; and (ii) the same as (i), but marginalizing over the foreground decorrelation parameters.
In all these calculations we assume that the observing efficiency from Chile is 100% equivalent to that from the South Pole.Despite long histories of CMB observations at both sites, it is still quite challenging to make a clean comparison of their observing efficiencies.Results from the BICEP/Keck program are responsible for the leading constraints on r for the last decade, but it is still not currently possible to disentangle the role of the observing site from other factors that have contributed to the success of that program, such as detector performance, instrument design, observing strategy, and operations management.However, note also that even as- 3.6 4.9 5.3 5.6 5.9 6.4 18 4.4 5.3 5.7 5.9 6.2 6.6 30 5.4 6.0 6.2 6.4 6.6 6.9 suming equal observing efficiency at both sites (Table 6), there is still a difference between σ(r) obtained for equivalent numbers of optics tubes in Chile versus Pole, due to sky fraction, foreground complexity and delensing.In the CMB-S4 Reference Design Report we also show forecasting results that assume an observing efficiency from Chile equivalent to 50% of South Pole efficiency.Note that the results where the total number of SAT optics tubes is different from the assumed number of 18 are subject to a caveat: the delensing effort is assumed to be held fixed to the reference design-one LAT at the South Pole and two in Chile.In principle, as the total effort is varied away from the reference design, one should re-optimize the fraction of delensing effort as per Figure 4.

CONCLUSIONS
In this paper we have presented the CMB-S4 r forecasting framework and its evolution through three major iterations.To ensure forecasting realism, we have chosen to directly ground our forecasting in the end-to-end on-sky achieved performance of Stage 2-3 experiments, rather than rely on ab initio assumptions.Additionally, we have checked the semi-analytic calculations with mapbased simulations of increasing complexity, thereby creating a closed forecasting loop (presented in Figure 1).This approach allows us to confidently incorporate all the effects that impact current real-world experiments, to flexibly trace the end-to-end effects of changes to experimental design or assumptions, and to iterate over the framework as more sophisticated sky and instrument models become available.
The semi-analytic calculations described in Sections 2 and 3 indicate that for a 3% sky fraction, 1.2 × 10 6 150-GHz-equivalent detector-years (or 1.8 × 10 6 when including marginalization over the decorrelation parameters) are required to reach the science requirement of σ(r) = 5 × 10 −4 , with 30% of this effort assigned to the delensing survey.The resulting optimal distribution of detectors across frequency bands is summarized in Table 3.
The map-based simulations described in Sections 4-6 confirm the σ(r) results from the semi-analytic calculations.These simulations also indicate that bias in the recovered r value is within 1σ for a suite of different foreground models.However, we note that foregrounds remain a serious issue that must be periodically revisited as the project progresses.
Additional map-based simulations indicate that the systematic bias on r can be controlled to < 1σ, provided that fractional contamination levels similar to those already achieved by small aperture telescopes can be maintained.
Mapping the requirements from the semi-analytic calculations onto realizable instruments (see Section 7) results in the reference design described in Abazajian et al. (2019) for a seven-year survey period, and in the constraints presented in Tables 6-9.
In Section 7 we have also extended the semi-analytic calculations to account for realistic observation patterns and probed the dependence of σ(r) on r, as a function of experiment siting, finding that the survey strategy from the South Pole is always favored in the limit of small r.We note that surveys with smaller footprints (such as the South Pole survey) depend more tightly on the levels of achieved delensing.Therefore, revising the delensing assumptions will be important as new studies of smallscale foregrounds become available.
Going forward, as the CMB-S4 project matures, the collaboration will need to converge on increasingly spe-cific instrumentation choices.To quantitatively discern between the different options, we intend to continue using and developing the closed-loop framework presented here.Among others, we anticipate adding complexity to the following directions: delensing treatment; foreground simulations; foreground parametrization; survey strategies; and instrumental systematics.Additionally, we expect to apply new analysis methods to our data challenges and incorporate new achieved performance levels at multiple additional frequencies from multiple sites as these data become available.The iterative nature of our framework can easily accommodate these revisions, achieving increased realism with each iteration.Laboratory is supported by the U.S.

MULTICOMPONENT THEORY MODEL
The forecasting framework uses a parametric model to describe the bandpower expectation values as a combination of cosmological and foreground signals.This parametrization follows the one presented in Appendix G of BICEP2/Keck Array Collaborations X (2018).
The model includes signals from the CMB (lensed-scalar and tensor contributions), Galactic dust, and Galactic synchrotron, with the possibility of spatial correlation between dust and synchrotron.Contributions from dust and synchrotron to the BB spectrum between maps at frequencies ν 1 and ν 2 can be written in the following form, after computing the expected spectrum as a function of , and applying bandpower window functions are applied to calculate binned bandpower expectation values: . (A1) Parameters A dust and A sync specify the dust and synchrotron power in units of µK 2 cmb at angular scale = 80.These are defined at pivot frequencies of 353 GHz for dust and 23 GHz for synchrotron.The dust and synchrotron components scale as power laws in with slopes α d and α s , respectively (note that slope parameters are defined for D ≡ ( + 1) C /2π).The parameter specifies the level of spatial correlation between dust and synchrotron; this correlation coefficient is assumed to be constant across all .If either A dust or A sync are negative, the contribution of the correlated component to the expectation value flips sign.Negative foreground amplitudes are technically nonphysical, but this analytic continuation becomes important when we explore entire parameter phase-spaces, for instance in the simulation re-analysis steps.
In a similar vein, we define the parameter ∆ s , which describes decorrelation of the synchrotron pattern.We do not include foreground decorrelation parameters in the dust-synchrotron correlated component.A complete foreground model would include all correlations between dust and synchrotron foregrounds across observing frequencies, but the current data do not provide useful guidance about the form of such correlations.
In addition to foregrounds, we include CMB scalar and tensor contributions.We make the simplifying assumptions that the tensor BB spectrum is given by a template scaled by parameter r and the BB spectrum from lensed CMB is given by a template scaled by parameter A L .Using CMB temperature units, the CMB contribution to the BB spectrum is given by where D tensor ,BB is the BB template for a tensor signal with r = 0.1 and D lensing ,BB is the expected lensing BB spectrum for ΛCDM.These are obtained using the CAMB11 package (Lewis & Challinor 2011).

BPCM CONSTRUCTION AND RESCALING
The bandpower covariance matrix (BPCM) construction relies on analytic rescaling of simulations.We calculate and store the bandpower covariance of signal and noise simulations for a particular sky and instrument model; rescaling from these covariance matrices eliminates the computational cost of creating large suites of simulations for each desired model.In the rescaling process, we explicitly set to zero terms that have zero expectation value (under the assumption that signal and noise are uncorrelated, and physically unrelated signals are uncorrelated) to reduce Monte Carlo error in the covariance matrix that stems from the modest number (499) of simulation realizations used.This framework is described in detail in Buza (2019), but we review it briefly here.
In a general case, let m a , m b , m c , and m d denote the four maps included in our analysis, with m a × m b denoting the cross-spectrum between those maps.Each map is the sum of independent signal components (CMB and foregrounds) plus a noise contribution, i.e., m a = i s ai + n a .Since the synchrotron and dust foregrounds could be spatially correlated, we choose to divide them into three mutually-independent components-uncorrelated synchrotron, uncorrelated dust, and the correlated part of synchrotron and dust.Using the properties that these signal and noise fields are independent and have zero mean, the expectation values for many spectra can be set to zero: s xi × s yj = 0 for i = j (independent signal fields) and s xi × n y = 0 (signal is independent of noise).Then, a generic bandpower covariance term can be written as where the first three terms are signal-only covariances, the next four terms are covariances between select signal-noise cross-spectra, and the last term is a noise-only covariance.Additional terms can be set to zero if we make the further assumption that the noise is independent in each map, i.e., n x × n y = 0 for x = y.We also set to zero the covariance between any bandpowers that are separated by more than one bin in (for bins with ∆ = 35), since these correlations are very small and not well measured.Rather than running simulations with a complicated combination of CMB and foreground signals, we calculate the above covariance terms for a simple signal model and then rescale to obtain the bandpower covariance matrix for an arbitrary model.In practice, the signal simulations are CMB realizations for a lensed ΛCDM model.Using s x0 to denote the simulated signal component of map m x , the rescaled bandpower covariance matrix term is given by Cov (m a × m b , m c × m d ) = Cov (s a0 × s b0 , s c0 × s d0 ) i s ai × s ai s bi × s bi s ci × s ci s di × s di s a0 × s a0 s b0 × s b0 s c0 × s c0 s d0 × s d0 The factors of 1/2 in the second and third terms are necessary to account for the factor of 2 difference between the variance of an auto-spectrum and the variance of the cross-spectrum between two uncorrelated fields.
An important feature of this rescaling method is that it considers signal and noise separately, rather than rescaling a combined signal-plus-noise covariance matrix.This is important because, for the highly non-uniform hit patterns (see Figure 7) that result from a large field-of-view telescope making deep maps, the spatial distribution of signal and noise in the map are very different.This leads to a significant difference in the number of signal versus noise degrees of freedom, and therefore different amounts of bandpower variance for the same overall power.This rescaling method does not attempt to account for different degrees of freedom between the different signal types, but that is a much smaller effect.

Figure 2 .
Figure 2. Schematic representation of our semi-analytic forecasting framework.Green boxes represent user inputs, yellow boxes represent code modules, and purple boxes represent outputs.BPCM = bandpower covariance matrix.

Figure 3
Figure3.Calculated atmospheric brightness temperature spectra (at zenith) for the South Pole at 0.5 mm precipitable water vapor (PWV) and Atacama at 1.0 mm PWV (both are near the median values).Atmospheric spectra are generated using the am Atmospheric Model(Paine 2017).The top-hat bands, in red and blue, are plotted on top of these spectra, with the height of each rectangle equal to the band-averaged brightness temperature using the South Pole spectrum.Details about the bands, such as fractional bandwidths, are presented in Table3-1 ofAbazajian et al. (2019).
Figure4.Top: optimized constraints on r for 3% sky fraction as a function of total effort.We include in solid black the case with delensing, allowing for decorrelation of the foregrounds, in solid gray the case without delensing, in dotted gray the case where no decorrelation is allowed in the model (with delensing), and in dashed black the raw sensitivity in the absence of foregrounds and lensing.Bottom: for the delensed case (with decorrelation), we show the fraction of effort spent on removing the lensing sample variance and the resulting rms lensing residual.
Figure5.Optimized map depth in each of the small-aperture channels as well as in the delensing channel, for f sky = 3%, corresponding to the delensed (with decorrelation) case in Figure4.

Figure 7 .
Figure 7. Hit patterns on the sky for small aperture telescope surveys.Top panel: the actual BICEP3 2017 hit pattern (peak normalized).Second panel: idealized circular pattern as used in Section 4. Third panel: simulated "Chile full" pattern, Fourth panel: simulated "Pole wide" pattern.Fifth panel: simulated "Pole deep" pattern.Each pattern is normalized to the same hit sum as in the top panel, and the color scales are the same.The "Chile deep" and "Chile shallow" regions referred to in the text are subregions of the "Chile full" pattern.

Figure 8 .
Figure8.Constraints on r as a function of the value of r.We assume an instrument with 18 SAT optics tubes and an observation time of 7 years, with the five masks defined in Section 7.Each band shows different Galactic cuts, based on Planck polarized foregrounds: the upper edge uses the cleanest 28% of the full sky, whereas the lower edge uses the 58% We explore the effect of turning on foreground decorrelation in the forecasting.We also explore adding a foreground bias, in quadrature, with a 1% value of the equivalent r of the current foreground minimum of the BK15 data at = 80.
Cov (m a × m b , m c × m d ) = (m a × m b )(m c × m d ) − m a × m b m c × m d (B1) = i Cov (s ai × s bi , s ci × s di ) + i j =i Cov (s ai × s bj , s ci × s dj ) + i j =i Cov (s ai × s bj , s cj × s dj ) + i Cov (s ai × n b , s ci × n d ) + i Cov (s ai × n b , n c × s di ) + i Cov (n a × s bi , s ci × n d ) + i Cov (n a × s bi , n c × s di ) + Cov (n a × n b , n c × n d ) ,(B2)

Table 1
Optimized instrument configuration for the PGW survey, as presented in the CMB-S4 CDT Report.

Table 6
Combined 10 4 × σ(r) values (smaller numbers are better), assuming r = 0 after 7 years of observation, keeping only the 28% cleanest part of the sky, assuming no decorrelation and an observing efficiency in Chile the same as at the South Pole.The bolded cells conserve the 18 SAT optics tube count of the reference design, while non-bolded cells explore other counts at the South Pole and in Chile.

Table 8
Combined detection significance (larger numbers are better) for r = 0.003 after 7 years of observation, keeping only the 28% cleanest part of the sky, assuming no decorrelation and an observing efficiency in Chile the same as at the South Pole.The bolded cells conserve the 18 SAT optics tube count of the reference design, while non-bolded cells explore other counts at the South Pole and in Chile.

Table 9
Same as Table 8, but assuming additional foreground decorrelation paramaters.
Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515.This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.In the US, work on CMB-S4 by individual investigators has been supported by the National Science Foundation (awards 1248097, In Canada, RH is supported by the Discovery Grants program from NSERC, and acknowledges funding from CIFAR, the Sloan Foundation and the Dunlap family.In Italy, CB acknowledges support under the ASI COSMOS and INFN INDARK programs.In the Netherlands, DM acknowledges NWO VIDI award number 639.042.730.In Switzerland, JC is supported by a SNSF Eccellenza Professorial Fellowship (No. 186879).In the United Kingdom, AL, GF, JC are supported by the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement No. [616170].AL also acknowledges STFC award ST/P000525/1.SM is supported by the research program Innovational Research Incentives Scheme (Vernieuwingsimpuls), which is financed by the Netherlands Organization for Scientific Research through the NWO VIDI Grant No. 639.042.612-Nissanke and the Labex ILP (reference ANR-10-LABX-63) part of the Idex SUPER, received financial state aid managed by the Agence Nationale de la Recherche, as part of the programme Investissements d'avenir under the reference ANR-11-IDEX-0004-02.Some computations in this paper were run on the Odyssey cluster, supported by the FAS Science Division Research Computing Group at Harvard University.