The Atacama Cosmology Telescope: map-based noise simulations for DR6

The increasing statistical power of cosmic microwave background (CMB) datasets requires a commensurate effort in understanding their noise properties. The noise in maps from ground-based instruments is dominated by large-scale correlations, which poses a modeling challenge. This paper develops novel models of the complex noise covariance structure in the Atacama Cosmology Telescope Data Release 6 (ACT DR6) maps. We first enumerate the noise properties that arise from the combination of the atmosphere and the ACT scan strategy. We then prescribe a class of Gaussian, map-based noise models, including a new wavelet-based approach that uses directional wavelet kernels for modeling correlated instrumental noise. The models are empirical, whose only inputs are a small number of independent realizations of the same region of sky. We evaluate the performance of these models against the ACT DR6 data by drawing ensembles of noise realizations. Applying these simulations to the ACT DR6 power spectrum pipeline reveals a ∼ 20% excess in the covariance matrix diagonal when compared to an analytic expression that assumes noise properties are uniquely described by their power spectrum. Along with our public code, mnms, this work establishes a necessary element in the science pipelines of both ACT DR6 and future ground-based CMB experiments such as the Simons Observatory (SO).


ABSTRACT
The increasing statistical power of cosmic microwave background (CMB) datasets requires a commensurate effort in understanding their noise properties. The noise in maps from ground-based instruments is dominated by large-scale correlations, which poses a modeling challenge. This paper develops novel models of the complex noise covariance structure in the Atacama Cosmology Telescope Data Release 6 (ACT DR6) maps. We first enumerate the noise properties that arise from the combination of the atmosphere and the ACT scan strategy. We then prescribe a class of Gaussian, map-based noise models, including a new wavelet-based approach that uses directional wavelet kernels for modeling correlated instrumental noise. The models are empirical, whose only inputs are a small number of independent realizations of the same region of sky. We evaluate the performance of these models against the ACT DR6 data by drawing ensembles of noise realizations. Applying these simulations to the ACT DR6 power spectrum pipeline reveals a ∼ 20% excess in the covariance matrix diagonal when compared to an analytic expression that assumes noise properties are uniquely described by their power spectrum. Along with our public code, mnms, this work establishes a necessary element in the science pipelines of both ACT DR6 and future ground-based CMB experiments such as the Simons Observatory (SO).

INTRODUCTION
Increasingly precise cosmic microwave background (CMB) instruments and data products have the potential to place state-of-the-art constraints on astrophysical, cosmological, and theoretical models. Currently, ground-based instruments collect data at both small angular scales, including the Atacama Cosmology Telescope (ACT) (Fowler et al. 2007; Thornton et al. 2016;Henderson et al. 2016) and the South Pole Telescope (SPT) (Benson et al. 2014;Anderson et al. 2018); and large angular scales, including POLAR-BEAR (Suzuki et al. 2016), CLASS (Dahal et al. 2022), and BICEP/Keck (Kang et al. 2018). While these instruments observe the sky with O(10 3 − 10 4 ) detectors, next-generation projects, such as the Simons Observatory (SO) (Galitzki et al. 2018), the BICEP Array (Schillaci et al. 2020), and AliCPT (Salatino et al. 2020) will increase this count by an order of magnitude once fully operational. Thereafter, the CMB-S4 experiment (Abitbol et al. 2017; Barron et al. 2022) plans to deploy yet another order of magnitude increase in detector count, to several hundred thousand. With growing statistical power, the need for accurate modeling of instrument systematics and noise properties will become more urgent.
Accurate, computationally tractable noise models are a critical component for statistical inference with these new data. For instance, accurate noise models are necessary for an unbiased reconstruction of the CMB power spectrum covariance matrix (e.g., Hivon et al. 2002;Li et al. 2021); for the subtraction of dominant bias terms in CMB lensing reconstruction (e.g., Kesden et al. 2003;Hanson et al. 2009;Madhavacheril et al. 2021); to check for biases and quantify the variance of internal linear combination (ILC) foreground mitigation pipelines (e.g., Delabrouille et al. 2009;Madhavacheril et al. 2020); for unbiased Bayesian/parametric component separation pipelines (e.g., Eriksen et al. 2008;Dunkley et al. 2009;Svalheim et al. 2020); for optimal Wiener filtering or inverse-covariance filtering used in, for example, bispectrum (Smith & Zaldarriaga 2011) or lensing (Smith et al. 2007) estimation. A unique difficulty in the noise modeling of ground-based CMB instruments, however, is the dominance of complicated, spatially-red atmospheric noise from broad microwave water lines (e.g., Dünner et al. 2013;Takakura et al. 2017;Morris et al. 2022) at large scales. Importantly, increasing detector counts do not circumvent this noise component. Therefore, our experience with current CMB experiments offers a realistic preview of the noise modeling challenge facing future surveys.
Existing noise modeling frameworks either do not scale well to ground-based CMB data volumes, or simply ignore prominent features in the data. A reasonably successful, though resource-intensive, approach has been to model instrumental noise in the time-domain, and then propagate noise timestream realizations into map space (e.g., Planck Collaboration III et al. 2020;Akrami et al. 2020;Chown et al. 2018;Bleem et al. 2022) through a mapmaking pipeline. For Planck, 25 million CPU hours were spent processing the FFP8 (full focal plane) simulations (Planck Collaboration XII et al. 2016). As the size of current and future ground-based experiments' timeordered data (TOD) exceeds that of Planck, 1 this route will grow more costly. Many analyses also make use of analytic noise models from mapmaker products, such as low-resolution, dense pixel-pixel noise covariance matrices from WMAP (Jarosik et al. 2007) or per-pixel noise variance maps from Planck (Planck Collaboration I et al. 2020). For high-resolution ground-based experiments, the former approach is numerically intractable and the latter approach neglects the strongly correlated atmospheric noise. Other analyses have taken a hybrid approach to model both spatially-varying, uncorrelated noise variance and large-scale noise correlations simultaneously (e.g., De Belsunce et al. 2021). More recently, ACT (Louis et al. 2017;Choi et al. 2020) and SPT (Millea et al. 2021) have developed noise models that are more appropriate for ground-based instruments. In particular, both collaborations used a model that captures spatially-varying, uncorrelated noise variance, as well as a stationary, two-dimensional (2D) "stripy" component (defined in either Fourier space or spherical harmonic space) resulting from the scan strategy. 2 Nevertheless, by limiting the non-white noise to a stationary representation, these models do not generalize to surveys with wide sky coverage, such as the complete ACT survey, or the SO and CMB-S4 planned surveys.
ACT's sixth data release (ACT DR6) will deliver microwave sky maps in three frequency bands (f090, f150, and f220), covering ∼40% of the sky to a cumulative depth of 15 µK-arcmin ( 20 µK-arcmin) in temperature (polarization). In this paper, we develop novel noise models for ACT DR6, which aim to capture the observed, complicated map space noise structure inherent to ground-based CMB instruments. These models are empirical: they are built directly from ACT maps and make few a priori assumptions about particular map properties. We begin with an adapted version of a tiled noise model from Naess et al. (2020), which builds a stationary, 2D noise model within interleaved tiles on the sky. We generalize the formalism of this model into a class of Gaussian noise models in arbitrary bases; this allows us to prescribe an isotropic wavelet noise model and a directional wavelet noise model. The latter two wavelet approaches draw on existing wavelet formalism in Fourier space (e.g., Freeman & Adelson 1991) and harmonic space (e.g., Wiaux et al. 2008;Leistedt et al. 2013;McEwen et al. 2018;Chan et al. 2017), which have been applied to CMB data (e.g., Delabrouille et al. 2009;Planck Collaboration XLVIII et al. 2016;Rogers et al. 2016;Coulton et al. in prep.), large-scale structure (e.g., Allys et al. 2020;Cheng et al. 2020) and Galactic foregrounds (e.g., Regaldo-Saint Blancard et al. 2021). While the isotropic model uses the wavelets of Wiaux et al. (2008); Leistedt et al. (2013), we also prescribe a new, tunable, directional wavelet set in Fourier space. In addition, our directional wavelets allow the lossless transform to and from wavelet space. To our knowledge, these models represent the first application of 2D wavelet analysis in the context of astrophysical instrument noise modeling. The models' primary user interface is the drawing of simulated noise maps, and we make our codemnms 3 -publicly available. These simulations are also consumed by downstream ACT DR6 analyses, to be described in upcoming publications (e.g., Qu et al. in prep.; Coulton et al. in prep.). This paper is structured as follows. In §2, we summarize the ACT instrument and DR6 data products that enter the noise modeling. In §3, we define the mapbased noise and establish its most salient properties in the ACT data. In §4, we detail the construction of the three noise models, including their mathematical formalism. In §5, we test the noise models' ability to reproduce the complex noise covariance structure in the data, and apply them to the ACT DR6 CMB power spectrum pipeline. We discuss implications and future directions for noise modeling, and conclude in §6.
Unless otherwise stated, in this paper, we refer to scalar quantities by italicized variables of any case, vector quantities as boldface variables of lower case, and matrix quantities as boldface variables of upper case.  Table 2. Properties of the ACT DR6 maps for all arrays, bands, and Stokes polarization components. In addition to the data maps, these also apply to the inverse-variance and cross-linking maps. The max gives the Nyquist sphericalharmonic bandlimit of the pixelization.
We also define multiplication ( * ) and division (/) of vectors to signify element-wise operations.

THE ACT INSTRUMENT AND PRODUCTS
The Atacama Cosmology Telescope (ACT) observed the polarized microwave sky from an altitude of ∼5,200 m in the Atacama Desert, Chile. Its receiver has gone through several upgrades since its initial deployment in 2007; this paper -and DR6 products generally -focus on data collected since its most recent upgrade, Advanced ACT, in 2016 (Henderson et al. 2016). 4 In particular, we focus on data collected by three dichroic detector modules, known as "polarization arrays" (PAs, for short): PA4 (deployed 2016 -2022) observing in the f150 and f220 frequency channels (Ho et al. 2017), PA5 (deployed 2017PA5 (deployed -2022 observing in the f090 and f150 frequency channels, and PA6 (deployed 2017 -2019), observing the same frequency channels as PA5 (Choi et al. 2018). Additional properties of these detector arrays are given in Table 1. In 2019, PA6 was exchanged with a new detector array, PA7, observing in the f030 and f040 frequency channels. These "low frequency" measurements are not discussed in this paper. For arrays PA4 -6, we use the DR6 "night only" maps, which includes only data taken between 23:00 and 11:00 UTC;  however, the methods we develop are equally applicable to ACT daytime data and null test maps. 5 We make use of three ACT DR6 map-based products in this paper as follows.
Sky Maps: The mapmaking for DR6 will be described in a forthcoming paper; the following is a summary. The detector timestreams of each array are reduced into Stokes I, Q, and U sky maps by a maximum-likelihood mapmaking algorithm (Dünner et al. 2013;Naess et al. 2014;Louis et al. 2017;Aiola et al. 2020). The ACT mapmaker assumes the following data model: where d t are TODs, P is a projection matrix mapping the pixelized sky m from map space to TOD space (using telescope pointing), and n t is additive, Gaussian noise in each detector sample. Each data vector in Equation 1 contains intensity and polarization components. The maximum-likelihood map under this model is: 5 Distinguishing features of datasets such as their beam, overall depth, sky footprint, etc. make no difference to our empirical modeling. Indeed, the methods are not even restricted to maps of the CMB.
where N t is the covariance of n t . TODs are grouped from the same array and frequency band into the same vector d t , thus producing per-array, per-frequency, polarized (Stokes I, Q, U) maps. In ACT DR6, Equation 2 is solved in eight disjoint "splits" of the timestreams: one-day-long TOD blocks are distributed exclusively into eight sets, such that the noise in each split map is independent. As in ACT DR4 Mallaby-Kay et al. 2021), the maps are produced in the Plate-Carrée cylindrical projection. Further properties of the DR6 maps are given in Table 2. 6 Inverse-Variance Maps: In addition to maps of the microwave sky signal, the mapmaking algorithm produces auxiliary products used later in this paper. Of particular importance are "inverse-variance" maps, which give the inverse of the white-noise covariance be-tween Stokes components in each pixel. Following the ACT convention (Choi et al. 2020;Madhavacheril et al. 2020), we label them h. The inverse-variance maps are analogous to detector hit-count maps, but also include the effect of intrinsic detector properties per detector hit (Mallaby-Kay et al. 2021). Because the TOD noise power spectrum is red, with a "white-noise floor" at high frequencies (Dünner et al. 2013;Morris et al. 2022), the inverse-variance maps are only accurate at small scales. Although the Stokes covariance in each pixel is formally a 3×3 matrix, for the same reasons as in DR4 (see Aiola et al. (2020)), DR6 includes only the Stokes intensityintensity (II) element . This has little effect on our noise models, however. As we will show in §4, our models still account for correlated Stokes components in the DR6 maps. An example inverse-variance map is shown in the top panel of Figure 1 Cross-Linking Maps: The mapmaker also produces "cross-linking maps," which we use as a reference in §3.2 and to guide our modeling decisions in §4. These are maps which contain information about the ACT receiver's local scanning direction (see e.g., Aiola et al. 2020;Choi et al. 2020;Mallaby-Kay et al. 2021, for a more complete description). In particular, the "polarization fraction," f p , of the cross-linking map encodes the degree to which a given pixel is observed by many different scanning directions. The cross-linking maps are produced in such a way that f p = 0 corresponds to maximal cross-linking (many crossing scans), and f p = 1 corresponds to no cross-linking (all scans in the same direction). Therefore, we use the quantity 1 − f p in later sections, such that high values of 1 − f p correspond to high cross-linking and vice versa. An example crosslinking map is shown in the bottom panel of Figure  1. Both the inverse-variance and cross-linking maps are produced with the same projection and parameters (see Table 2) as the sky maps.

MAP-BASED NOISE IN ACT DATA
We begin our analysis by exploring the map-based noise in the ACT dataset. To do so, we first define "map-based noise" and prescribe how we measure it.

Definition and Measurement
Each pixelized, polarized (Stokes I, Q and U) split map has an independent noise realization: where m i is the map solution in TOD split i, s is the underlying sky signal (assumed constant over splits), and n i is additive noise in each pixel (we suppress indices labeling the array, frequency, and polarization). The form of Equation 3 is not arbitrary -it follows from the additive time-domain noise in the definition of d t (Equation 1) and the linearity of the mapmaking solution (Equation 2). That is, n i is the mapped time-domain noise. This paper aims to model the noise, n i , in each ACT split. Of course, we can never measure n i directly, or else we could produce noiseless maps. Rather, we must use a proxy for the noise. In particular, we use differences of map splits, relying on the assumption that each m i in Equation 3 measures an identical signal. We give the prescription by which we construct these difference maps as follows.
The first step is to construct an unbiased estimate of the signal, s, in the split maps. We do this by taking the inverse-variance-weighted average of m i over splits i, which we call the map-based "coadd," c: where h i is the inverse-variance map for split i and Σ ≡ i h i . Next, we subtract c from each of the m i to construct a "raw" difference map, d i : If c were exactly s, then d i would equal n i in that split. Instead, because c contains an average of the noise from each split, so too does d i . We can see this by combining Equations 3, 4, and 5: The upshot, though, is that d i is a decent proxy for n i : it contains only noise, and "mostly" (which we will quantify in Equation 10) noise from split i. Indeed, this is the proxy used in previous ACT analyses (Choi et al. 2020;Madhavacheril et al. 2020;Naess et al. 2020).
In this paper, we perform one more step by applying a correction factor to the raw difference maps. Specifically, we correct for the fact that the variance in d i is less than the variance in n i . To evaluate V ar(d i ), first recall from §2 that V ar(n i ) = 1/h i only in the white noise, uncorrelated-pixel limit. We work in that limit for now, and also define the difference map inverse-variance along the same lines: V ar(d i ) ≡ 1/w i . In Appendix B, we show that w i satisfies: That is, the left-hand side -the variance of d i -is less than that of n i by exactly 1/Σ. This motivates the following correction to the raw difference maps: Figure 2. Ratios of BB pseudospectra between the corrected difference maps, νi, and the individual split maps, mi, for each array and frequency channel in ACT DR6. All pseudospectra are measured within this paper's "pseudospectrum mask" described in §4.5.3. The dashed black line is the nominal result (ratio of unity), the solid lines (shaded bands) are the means (1σ range) over the eight splits of each dataset. To mitigate any expected BB signal, in addition to masking the Galaxy, all maps are point-source subtracted and filtered using the ACT DR4 ground pickup filter (Choi et al. 2020), where we remove Fourier modes with |kx| ≤ 90 and |ky| ≤ 50. Spectra are smoothed with a tophat kernel of size ∆ = 250 to better visualize their overall trend.
such that the variance of ν i is the same as that of n i : Both when we examine map-based noise properties of a given split ( §3.2), and build noise models ( §4), we will always be referring to the corrected difference map noise measurements, ν i , in Equation 8.
We emphasize that ν i and n i are distinct but related quantities. Their correlation is given by (Appendix B): which approaches 1 as the number of splits increases, assuming splits contain reasonably uniform TOD volumes. For example, given eight approximately uniform splits, Equation 10 yields a correlation of ∼93.5%. Because we can never directly measure n i in a map, we resort to using ν i as a highly correlated tracer. Fortunately, we can test the two assumptions that the uncorrected difference maps (Equation 5) contain pure noise, and that the corrected difference maps (Equation 8) have accurate variance. From the best-fit model of the signal from ACT DR4 (Choi et al. 2020), the expected BB power from the lensed CMB and Galactic dust (with the Galaxy masked, as in this paper's pseudospectrum mask, see §4.5.3) should be at the sub-percent level compared to the ACT DR6 per-split noise. In other words, the BB power spectra from raw, non-differenced split maps, m i , should be a reasonably clean measure of the map-based noise; in particular, they should match BB power spectra from the corrected-power difference maps, ν i . Such a test is not trivial: Planck (Planck Collaboration XII et al. 2016) found a several percent discrepancy in their High Frequency Instrument (HFI) channels when performing a similar comparison, which was attributed to systematics in their pre-processing.
As shown in Figure 2, ACT DR6 exhibits good agreement. All arrays and frequencies achieve an average difference map (ν i ) to raw map (m i ) BB spectra ratio of within 0.1% of nominal. Recall, in constructing the difference maps (Equation 8), we had to work in the white noise, uncorrelated-pixel limit where the inverse-variance maps, h i , are expected to be accurate. As discussed more in §3.2, this limit is reached only on small angular scales. However, the performance of the w i /h i correction factor is more robust: the BB ratios are consistent with nominal at the ∼ 1% level for scales as large as ∼ 300. While the observed percent-level deviations for 300 do matter when evaluating our noise models in §5, overall this result strongly supports the scheme by which we measure the noise in a given map split.

ACT Map-Based Noise Properties
We motivate our noise model design choices by enumerating five salient features of the ACT map-based noise. We take these five features to define the scope of our modeling in §4; however, we do not expect this list to be exhaustive in reality. Nevertheless, we do expect Figure 3. Top, large-scale noise correlations: Noise power spectra by polarization autocorrelation (TT, EE, BB), for each detector array and frequency band. The power spectra are evaluated on the corrected difference maps (Equation 8), after applying the apodized pseudospectrum mask described in §4.5.3. Shaded bands give the 1σ range over the eight split maps for each array/frequency. All spectra are smoothed with a tophat kernel of size ∆ = 25. Both the TT and polarization (EE, BB) spectra are normalized by the value of the TT spectrum at = 10, 800 to suppress the effect of variable data volume between arrays. Polarization spectra plateau at a value of ∼2, owing to each polarized component receiving approximately half the TOD volume as intensity. Both the intensity and polarization spectra are steep and red at large scales. Bottom, correlated frequencies: Correlation spectra (r , Equation 11) evaluated from the mean cross-power spectrum of the eight difference maps, for each detector array. The same mask is applied as in the top panel, and shaded regions again correspond to the 1σ range over the eight split maps. For legibility, we only plot the cross-spectra between different frequency bands -all crosses are between the same polarization components (i.e., TT, EE, or BB). The colored series indicate cross-spectra between frequency bands on the same physical detector array, while the grey series indicate the cross-spectra between bands on different physical detector arrays. The only significant cross-spectra come from the same physical detector array. the noise properties in this section to be (a) the most significant for downstream analyses, and (b) common to ground-based, scanning microwave telescopes.

Large-Scale Noise Correlations
Large-scale noise correlations are a dominant noise feature in ground-based microwave observatories (Dünner et al. 2013;Henning et al. 2018). Their origin is typically attributed to fluctuating, nearly unpolarized emission by atmospheric water vapor (Errard et al. 2015;Morris et al. 2022), which can leak into polarized timestreams via optical and detector systematics (Takakura et al. 2017). We see their effect in the power spectra of the DR6 difference maps, as shown in the top panel of Figure 3. Both the intensity and polarized map-based noise power spectra are steep and red, spanning several orders of magnitude in power from their white-noise level at small scales to their peaks at large scales. The correlated noise in intensity maps is higher than in polarization, mainly due to a smaller transition scale -or, higher knee -from red, correlated noise to white noise. The power-law index of the correlated noise -the slope in log-log space, as plotted in the top panel of Figure 3 -is similar between intensity and polarization. As is expected for thermal water emission, the intensity spectra increase monotonically with the frequency band of the map. This scaling is not as clear in the polarization spectra, highlighting the contribution from non-atmospheric effects to the large-scale polarized noise. Likewise, the convergence of the PA4 f220 TT spectrum to the f150 spectra at low-requires more work to understand its origin.

Correlated Frequencies
Because the ACT detector arrays are dichroic, maps of each frequency band on a given detector array observe correlated fluctuations. We probe this by forming "correlation spectra" of the two frequency maps: where C ab denotes the cross-spectrum of two fields a and b. The bottom panel of Figure 3 shows how the correlation spectra vary by detector array, polarization, and angular scale. The TT correlation spectra between frequency maps on the same array stand out in particular: the two frequencies are ∼25% -50% correlated at scales where the atmospheric noise is dominant ( 3, 000, see the top panel Figure 3). Likewise, the polarization (EE, BB) correlation spectra between frequency maps, again on the same array, appear significant to similar levels on larger scales (though more for PA4 and PA6 than PA5). Owing to the increased distance between detectors, correlations of any component between frequency maps on different physical detector arrays are considerably reduced. Other details apparent in Figure 3 are less intuitive. It is not clear why the TT correlation peaks at ≈ 1, 000 but declines for larger scales especially in PA4, nor why PA5 polarization correlations are considerably different from those of PA4 and PA6. Further work is needed to understand these features. Nevertheless, our models treat these empirical noise correlations in the ACT maps as a given.

Map-Depth Inhomogeneity
Map-depth inhomogeneity 7 is a commonly modeled property of map-based noise in the literature. For ACT, this information is encoded in the inverse-variance maps h discussed in §2. The inverse-variance maps contain considerable morphology, as is seen in the top panel of Figure 1. In particular, we see map depth varies spatially on both large and small scales, generally being deeper closer to the celestial equator and shallowest near the Galactic plane. As discussed, we expect this spatial modulation to only be accurate for the high-ACT noise. Furthermore, we can discern the telescope scan strategy inducing coherent arcs across the observed footprint. Between declinations of ∼ 0 • to 15 • , the ACT scan strategy tends to have an "x-like" directional pattern, while declinations of ∼ −45 • to −15 • have a uniformly "vertical" scanning pattern. The scanning pattern is reflected in the bottom panel of Figure 1, which contains a cross-linking map, 1 − f p (see §2). Regions where the cross-linking is low are those where the ACT scans are uniformly vertical; regions where the crosslinking is high are those where the ACT scans are "xlike."

Spatially-Varying Noise Anisotropy
Unfiltered ACT data have historically shown stripy noise features (Louis et al. 2017;Choi et al. 2020), and the DR6 maps are no exception. Like the large-scale correlations (Figure 3), stripy noise patterns are a manifestation of the atmospheric noise. Given a particular low-frequency atmospheric signal in a TOD, as the telescope scans across the sky, the signal will manifest as a one-dimensional stripe in the map. Thus, not only are the largest spatial scales dominated by atmospheric  Because the 2D spectra are normalized by their overall radial profile, the colorscale is unitless. As is explained in the text, the noise exhibits stripy patterns that align with the local scan strategy. In Fourier space, these patterns manifest as bars perpendicular to the local scan strategy. We outline the primary features from the scanning strategy in black. The left (right) panel outlined region contains an average power excess of 8% (25%). Secondary features -such as detector-detector correlations on the top and bottom of the right panel -are also visible. The regions of low noise power near the center result from the removal of the radial, or "isotropic," part of the power spectrum.
signals, but those signals also tend to "trace" the ACT scan pattern. We confirm this intuition by examining the 2D Fourier noise power in the previously mentioned declination bands, as shown in Figure 4. In order to emphasize the "anisotropic" noise power, we first divideout the "isotropic" radial profile (i.e., the 1D spectra from Figure 3) that would otherwise dominate the full 2D power. Near the celestial equator (the left panel), we recover a strong "x-like" pattern in the relative 2D noise power, as expected from our by-eye assessment of the ACT scan pattern. Likewise, in the southern region with vertical, poorly cross-linked scans (the right panel), we find the 2D noise power to be concentrated in a horizontal "bar" in Fourier space. In other words, the 2D noise power exhibits long-wavelength (small wavenumber) correlations in the y-direction, but is approximately featureless (white) in the x-direction, as expected. Furthermore, the poorly cross-linked region exhibits stronger overall anisotropies than the well cross-linked region. To demonstrate this, in Figure 4, we measure the power excess within the dominant "x-like" or "bar" features in the 2D spectra. The outlined "x-like" portion of the well cross-linked 2D spectrum contains an average noise power excess of 8%, whereas the same as measured in the outlined "bar" of the poorly cross-linked 2D spectrum is 25%. Thus, the ACT map-based noise contains anisotropies whose structure and prominence varies spatially.

Scale-Dependent Map Depth
As discussed above, the ACT map-based noise is spatially modulated by the inverse-variance maps at small scales. At large scales where the atmospheric correlations dominate, however, the map depth spatial morphology resembles the amount of cross-linking.  Figure 5. Scale-dependent map depth: "Inverse-variance maps" as measured directly from the PA5 f090 difference maps. Top: After filtering νi for small scales, 2000 ≤ ≤ 5000, and averaging over Stokes Q and U in only the first split. Bottom: After filtering νi √ hi for large scales, 200 ≤ ≤ 500, and averaging over Stokes Q and U in all eight splits. We square the filtered maps to assess local noise variance, smooth them for legibility (Gaussian FWHM = 4π/5000 rad ≈ 0.14 o and FWHM = 2π/500 rad ≈ 0.72 o , respectively), and take their inverse. We also multiply by pixel area to account for declinationdependent loss of power from the filtering. Because we invert the variance, small values indicate larger noise power. As is explained in the text, the small-scale noise power is modulated by the mapmaker inverse-variance, and the large-scale noise power traces the mapmaker cross-linking.
decreases only with the number of detector hits. This does not apply to large-scale correlated modes observed simultaneously by many detectors. Instead, variance in the estimation of large scales decreases with the number of independent scans that observe the mode (heuristically, for atmospheric correlations on the order of the array size, the array acts as if it has only one detector, not hundreds). All scans are not equally effective at reducing variance in the estimate, however. In a single scan, large-scale, correlated signals on the sky are degenerate in Equation 2 with low-frequency, correlated atmosphere in the TOD. Observations of a fixed sky position with differing scan directions act to lift this degeneracy, while parallel scans do not. Therefore, in addition to averaging down with the number of independent scans, the large-scale map-based noise averages down with increased map cross-linking as well.
To isolate this cross-linking-like morphology, we should first normalize the difference maps by the number of array scans at each point. Maps of these statistics are not readily available, so we use the inverse-variance maps as a proxy. 9 To be precise, we first multiply the difference maps, ν i , by √ h i , such that the variance of ν i on small-scales is uniform. Borrowing from Equation 9, we have on small-scales: Then we filter ν i √ h i for large-scales and look for residual structure in the noise power across the map. √ h i after filtering for large scales as just discussed. Comparing to Figure 1, the empirical small scale power clearly traces the mapmaker-produced inverse-variance map, while the empirical large-scale power resembles the mapmaker-produced cross-linking map -with greater noise power (less inverse-variance) corresponding to less cross-linking, and vice versa. In sum, different angular scales have their own distinct "inverse-variance maps," resulting directly from the ACT scanning strategy.
Note that all of the above noise properties are each manifestations of the same underlying process: the propagation of correlated time-domain noise and the ACT scanning strategy through the mapmaking. Further study of atmospheric emission (Morris et al. 2022) and instrumental noise sources (Dünner et al. 2013) may reveal new, subtle map-based noise properties not explored in this paper.

NOISE MODELS
In this section we describe three different noise models that capture the ACT map-based noise properties from §3.2. Implicit in Equation 3 is the assumption that the particular map based noise realizations in our data, n i , are zero-mean random vectors. We also make the assumption that the n i are Gaussian distributed. 10 Therefore, a statistically complete "noise model" for each n i is given by its covariance matrix, N i . We build estimates of this matrix using the difference maps, ν i , and use it to draw noise simulations.
The long-distance noise correlations from the previous section suggest that N i is dense in the map basis. Combined with the large pixel count in the ACT DR6 maps (N pix = 445, 824, 000, see Table 2), the construction of an explicit N pix × N pix N i is intractable . Instead, we leverage the statistical properties in §3.2 to build a sparse, but still approximately complete, covariance matrix. The key to a sparse representation is the isolation of stationary regions in some projection of the noise. For example, given map space fields a and b at locations x and x , and their Fourier transformsã andb at modes k and k , the cross-correlation theorem states: If : then : 10 This is well motivated because the time-domain noise is already well described by Gaussian noise. The noise properties stay Gaussian when the linear mapmaking equation (Equation 2) is applied. Furthermore, each each pixel is is built up from many independent time samples (∼ 97%/99%/98% of pixels have greater than 100 time sample hits in each split map of PA4/5/6) which is expected to further reduce any non-Gaussianity.
where C ab is the map space covariance of a and b, and C ab is its Fourier transform. 11 In other words, if the covariance of two fields is stationary in map spacethat is, exhibits translational invariance (even while being correlated) -then in Fourier space it is uncorrelated.
In particular, only the diagonal of the Fourier covariance -the (cross) power spectrumC ab -needs to be measured. Therefore, for each noise model discussed in this section, we will first filter the ACT DR6 noise maps to promote their stationarity, and then transform them to permit a sparse representation of the covariance matrix. Following a brief discussion of some additional notation in §4.1, we begin in §4.2 with an updated version of the "tiled" noise model from Naess et al. (2020), including the algorithms by which we build the model and draw simulations. We describe the same for two novel noise models: an "isotropic wavelet" (or "wavelet") noise model in §4.3 and a "directional wavelet" (or "directional") noise model in §4.4. Throughout §4.2, §4.3, and §4.4, we will introduce proper matrix notation for all operations performed on our map vectors. This will aid users incorporating the noise covariance matrices into their analysis program, as well as clarify how we draw noise simulations. Lastly, for all three models, we describe map preprocessing steps and additional implementation details in section §4.5.

Linear Algebra Notation
Before diving in, we establish some notation accompanying our map vectors and matrices. So far, we have understood our proxy noise maps, ν i , to have N pix elements. Henceforth, we will add an additional index to the maps which iterates over frequency bands (2) and polarization indices (3) on a given detector array. That is, we should now take the vector ν i to refer to a concatenation of the (6) polarized frequency maps from each array, with 6N pix elements. In this way, our noise models will be able to capture the observed frequency-frequency cross-correlations from Figure 3. 12 When needed, we will refer to a particular frequency and polarization subset of the data by its superscript index, a, ranging from 0 to 5 (e.g., for PA4, the a = 0 − 2 blocks of ν a i would refer to the PA4 f150 band, I, Q, and U, respectively). This notation extends to other map vectors like h i , and any matrix objects we construct from the outer-product of two vectors -when needed, we will refer to matrix 11 Due to the unitary property of the Fourier transform, Equation  13 is equally valid if we exchange map space and Fourier space. 12 While we do not observe significant polarization-polarization noise cross-correlations in DR6, in keeping with previous ACT convention (Choi et al. 2020;Madhavacheril et al. 2020), we allow for them in the models.
block elements using superscript indices a and b. Each map vector and noise model will continue to be distinct for each detector array and split, i. We also define matrices for the map transformsdiscrete Fourier transforms (DFTs) and spherical harmonic transforms (SHTs) -used in our modeling and simulation algorithms. We denote the forward DFT by the matrix F, which is unitary (F −1 = F † ). Following Reinecke, M. & Seljebotn, D. S. (2013); Seljebotn et al. (2019), we denote spherical harmonic synthesis (from harmonic to map space) by the N pix × N a lm matrix Y, and spherical harmonic analysis (from map to harmonic space) by Y † W, where W is a square diagonal matrix containing per-pixel "quadrature weights" used in the numerical integration. In general, it is the case that N pix = N a lm : SHTs are, unlike DFTs, rectangular operations. Unless otherwise stated, we always perform SHTs to the Nyquist bandlimit of the input pixelization. Following the default SHT implementation in pixell 13 and healpy 14 (Górski et al. 2005;Zonca et al. 2019), when applied to a spin-2 pair of Stokes Q and U maps, we will take Y † W to be the linear combination of spin-2 SHTs that outputs E-and B-mode harmonic coefficients (Zaldarriaga & Seljak 1997). 15 Thus, if s is a polarized map with Stokes I, Q, and U components, the vector a = Y † Ws consists of I (also referred to as T), E, and B harmonic coefficients.
Finally, we make use of a compact notation to describe products of matrices and vectors with incompatible shapes. The notation is similar to the "broadcasting" of vector and matrix objects used in numpy, the main scientific computing package for the Python programming language (Harris et al. 2020). The use of this notation helps save ink, prevents redundant definitions of the same linear operation, and directly corresponds to the implementation of these operations in code. For example, under this notation it makes equal sense to write Fm for the DFT of an intensity-only map m as it does Fν i , where, as discussed, ν i is a six-component map. In the latter case, the DFT matrix F is understood to be promoted to a 6 × 6 square block-diagonal matrix with the original F along the block-diagonal. Likewise, a = Y † Wm is a spin-0 harmonic transform, while a i = Y † Wν i is a six-component transform (e.g., for PA4, the a = 0 − 2 blocks of a a i would refer to the PA4 f150 band, T, E, and B respectively).

Tiled Noise Model
Previous ACT analyses (e.g., Louis et al. 2017) noted that the stripy noise pattern appeared to be spatially constant across the released maps. In ACT DR4, the relatively small (compared to DR6) sky footprint of each mapped region was referred to as a sky "patch" Mallaby-Kay et al. 2021), and thus downstream analyses, notably Choi et al. (2020) and Madhavacheril et al. (2020), built a sparse, 2D Fourier space model of the noise covariance in each patch. In addition, these analyses weighed the Fourier space models by the mapmaker inverse-variance to capture the spatial dependence of the map depth. Thus, within a given patch, the ACT DR4 noise models captured isotropic noise correlations (the "radial" direction in 2D Fourier space), correlated frequencies (by taking cross-spectra in Equation 13), map depth inhomogeneity (by inverse-variance weighting difference maps), and spatially-constant anisotropy (constant across the patch).
The main goal of the tiled noise model is to reproduce the spatially-varying anisotropy in the ACT noise, as shown in Figure 4. Our model is based on the "constant correlation" model of Naess et al. (2020), which built an inverse-variance weighted, 2D Fourier power spectrum in interleaved 4 • × 4 • tiles that are cut out of the larger ACT maps. In this sense, the noise model in each tile resembles that for each DR4 patch. We highlight important changes from Naess et al. (2020), but for further background, we refer the reader to §4 of that paper, especially Figures 7 and 10 -12.

Model Estimation
We outline how to measure the tiled noise model, starting from a set of difference maps, ν i . We follow a similar algorithm as Naess et al. (2020), but make a key change in Step 2 to improve performance. This also establishes a general prescription for a class of Gaussian map-based noise models, since the wavelet and directional noise models will proceed analogously.
1. As discussed above, we first want to filter the noise maps to maximize their stationarity. But we know, a priori, that their covariance is not translationally invariant in map space, because the map depth has spatial structure. Thus, before anything else, we remove the covariance's dependence on absolute sky position -due to the map depth, at leastby multiplying ν i with √ h i . We already saw the effect of this in  Figure 6. A 1D cross-sectional profile of the "forward" and "backward" tile kernel for a tile of pitch p. The grey band has width p and is centered on the tile. In this paper, we use p = 4 • . The forward and backward profiles smoothly extend into adjacent tiles; the dashed lines indicate the overlapping kernels for the tile immediately to the right. The full 2D tile kernel is the formed by the outer product of perpendicular 1D profiles.
where H 1/2 i is a square diagonal matrix whose diagonal entries are populated by √ h i . Thus, H 1/2 i is a "stationarity filter" acting on ν i .
If we were strictly following the Naess et al. (2020) prescription, we would next break up H 1/2 i ν i into small tiles and evaluate their 2D Fourier power spectra. Doing so in the context of this paper, however, would introduce some negative side-effects. As we will see in Step 3, the tiling of the maps essentially consists of multiplying the maps with small masks. Multiplication by these local masks in map space effects a convolution by a broad kernel in harmonic space. This would result in a modecoupling bias -power being "smeared" across scales -since the noise power spectra are very steep. Therefore, we first perform a "demodulation" step, in which we divide-out the isotropic part of the noise power spectrum (the curves in the top panel of Figure 3) prior to tiling. This results in much flatter power spectra that are less sensitive to mode-coupling. In Appendix G, we 16 Equations 8 and 12 are together equivalent to the first step of the noise prescription of Choi et al. (2020): demonstrate the degree to which mode-coupling induced bias can dominate the tiled noise model, especially for large-scale polarization, without the inclusion of the demodulation step. We proceed as follows.
2. First, we measure the noise power pseudospectra within a large, smooth "pseudospectrum estimate mask", µ est . 17 That is, we measure all the auto-and cross-pseudospectra of the vector µ est * √ h i * ν i , which we store in the object C est . For example, for PA5, C 0,3 est (the a = 0, b = 3 part of C est ) is the f090 T× f150 T crosspseudospectrum: the only off-diagonal parts of C est are the frequency and polarization crosses indexed by a and b. Thus, one can also think of C est as containing N a lm 6×6 matrices at each harmonic mode. We note that C est is a symmetric matrix. Then we construct C −1/2 est , where the matrix exponent acts on each 6 × 6 matrix of C est separately. Finally, we filter the weighted noise maps, H This new vector is still in map space, and it still has size 6N pix . On small scales, it has uniform per-pixel variance. But now, at least as measured within the apodized mask µ est , it also has flat autospectra, 18 and has been decorrelated -its crossspectra have been nulled. As discussed, the goal of flattening the spectra is to mitigate mode-coupling when we tile these maps in Step 3. The benefit of decorrelating them here will become apparent when we outline the algorithm of drawing a noise simulation: in that case, we will start with a map of easy-to-produce white noise, and by applying the inverse of C −1/2 est -that is, C 1/2 est -we will emerge with a map with properly-correlated spectral components.
within small sky patches is constant, but varies over larger distances. We follow the same tiling scheme as Naess et al. (2020), wherein the ACT footprint is divided into M ∼ O(1, 500) equalsized square tiles with a 4 • pitch. In practice, each tile is actually larger than 4 • × 4 • to allow for a 1 • cosine apodization around its border. 19 As a result, adjacent tiles partially overlap. We fold the apodization into the definition of the tile "profile," or tile "kernel:" cutting a tile out of a map consists of multiplying the map by the tile kernel. A cross-section of an arbitrary tile kernel is shown in Figure 6, displaying the apodized edge and overlap with neighboring kernels. We name this operation a "forward tiling transform;" the "backward" transform, which combines tiles back into one map, will be defined when we draw simulations.
We write the matrix form of the forward tiling transform. Each tile has significantly fewer pixels than the full map: N pix,tile N pix . Thus, the first step in applying a given forward tile -say, tile j -to a map vector is to "crop" the map vector, retaining only those pixels contained in tile j. The next step is to multiply the cropped map vector by the apodized tile kernel itself. Together, these operations look like: where P j is a N pix,tile × N pix cropping matrix consisting of N pix,tile rows of unique unit vectors that select the j th tile pixels, and τ j,f is a N pix,tile × N pix,tile forward tile matrix, a square diagonal matrix with the j th tile kernel values along its diagonal. The forward tile transform then consists of all of these tile operations acting on the same map vector: where T f is a block-column-vector of M forward tiles. Putting it all together, our input noise maps 19 As with µest, the apodization prevents strong ringing when taking the per-tile DFT in the next step. Some tiles extend past the edge of the ACT footprint. For such a tile, we simply redefine its new border to be the intersection of the old border and the ACT footprint, and apodize the new border instead.  have been transformed into weighted, demodulated, tiled noise maps: 4. Assuming that within each tile the noise covariance is stationary, we take our cue from the crosscorrelation theorem (Equation 13). That is, by transforming to Fourier space, we will assume that each Fourier mode in the resulting vector is uncorrelated with all other Fourier modes: where u i is the weighted, demodulated, tiled, 2D Fourier projection of the map-based noise, and F is understood to be a square block-diagonal matrix of DFTs (one for each tile). For each Fourier mode, however, we still retain the six correlated map fields: two frequency bands and three Stokes components.
5. With the construction of u i , most of the hard work is complete. The last steps use u i to build a sparse (diagonal) covariance matrix, just as in Equation  13. For each tile, we evaluate its 2D Fourier noise power spectra: where k is a given 2D Fourier mode, denotes the real part (relevant when a = b), u a i,j (k) is the j th tile and a th component of our transformed noise map for split i, and f j,2 corrects for the loss of power from the apodization in the tile (Choi et al. 2020;Madhavacheril et al. 2020;Naess et al. 2020): which is just the average value of the squared tile kernel. Thus, in each tile, the covariance matrix N i,j is block-diagonal in each Fourier mode k, with a single block being a 6×6 matrix N (k) i,j . The total covariance matrix N i is block-diagonal in tiles, with a single block being N i,j : This highly diagonal form is the sparse representation we originally sought: it has O(N pix ) nonzero elements near its diagonal, rather than being dense with O(N 2 pix ) nonzero elements.  2020), we smooth each tile's power spectrum estimate N (k) ab i,j over nearby modes k to reduce sample variance. Unlike the smoothing procedure for the two wavelet models (discussed in §4.5.4 and Appendix F), for the tiled model we cannot reliably distinguish between random fluctuations and true features in the tiled 2D power spectra. We thus select the same smoothing width in Fourier space as was used in Madhavacheril et al. (2020) and Naess et al. (2020), ∆k = 400 in both the x-and y-directions. We perform the smoothing via convolution with a tophat kernel directly applied to the 2D power spectra. 20 The smoothing width is fixed both within a given tile, and for each tile. An example smoothed, tiled 2D Fourier power spectrum is shown in Figure 7. 7. Lastly, in preparation for drawing noise simulations, we note that the actual objects we save to disk are not the tiled 2D Fourier covariance matrices, but rather their matrix square-root. This is the matrix analog of obtaining the standard deviation from the variance in preparation for sampling a scalar Gaussian random variable. Thus, the object we save to disk is actually N 1/2 i . As in Step 2, the only off-diagonal parts of N i are the frequency and polarization crosses indexed by a and b. Therefore, the matrix exponent acts on each Figure 8. Because it is also used in drawing simulations, we save C 1/2 est to disk as well.
These steps generate the tiled model square-root covariance N 1/2 i for the purposes of this paper. A schematic depiction of all these steps is given in the top row of Figure 8.

Simulations and Covariance Matrix
The ultimate aim of this paper is to create realistic simulations of the map-based noise. Here, we explain how we draw such a such a simulation from the model we have just described. Along the way, we also write down the full map-based noise covariance matrix. The following steps are essentially applicable to the wavelet models as well; in §4.3.2 and §4.4.2 we only discuss minor differences from what follows here. Batches of these map-based noise simulations are the main product of this paper.
There are two key steps in extracting a realization of the map-based noise from the tiled model. We start by drawing a realization in tiled 2D Fourier space. Doing so is tractable thanks to the highly diagonal structure of the tiled 2D Fourier covariance, N i . Then we need to transform the realization back to map space, which essentially means reverting the various operations that built u i from ν i in Equation 19. 1. For each tile, draw Gaussian white noise in tiled 2D Fourier space and take the inner product with the square-root covariance: where η i is complex-valued white noise with the same shape as u i . Again, the only nontrivial (off-diagonal) part of this product is in the frequency/polarization index. By definition, The orange filters represent the "forward" tiling kernels in Figure 6. Each operation is labelled by the corresponding step number in the model estimation algorithm. The input difference map is PA5 f090, T (a = 0), first split (i = 0), and the outputs are 2D square-root Fourier power spectra, TT (a = b = 0). We see the noise anisotropy varies by tile. Bottom: The sequence of filters and other operations involved in drawing a simulation of the map-based noise in the tiled model. The first column denotes independent, Gaussian white noise. Importantly, the orange filters now represent the "backward" tiling kernels in Figure 6, while all other operations revert those of the top row. Each operation is labelled by the corresponding step number in the simulation algorithm. The output is a tiled noise simulation of PA5 f090, T, first split.
N 1/2 i η i is a sample from the tiled covariance: by construction. This demonstrates why we save to disk the square-root covariance.
2. With our tiled 2D Fourier space noise realization in hand, we proceed to revert Equation 19. So, first we transform out of tiled Fourier space and back to tiled map space with a per-tile inverse DFT: 3. Next, we need to stitch tiles back together into a single map. That is, we need to apply a "back-ward" tiling transform. At this point in the algorithm, each tile contains an independent realization of its own stripy noise pattern. Therefore, as noted in Naess et al. (2020), naively pasting each tile next to one another with a step-function edge would produce sharp discontinuities in the synthesized map. Just like the forward transform, the solution is to apodize and overlap their borders such that the transition across tiles is continuous. A good illustration of this is shown in Naess et al. (2020), Figure 13. Unfortunately, we cannot just reuse the tile kernels from the forward transform: we should specifically design the apodization to preserve the per-pixel noise variance in the tile overlap. We show the exact constraint that must be satisfied in Equation 28 and derive it in Appendix C, but, in short, the forward tile profile does not satisfy it. Instead, we use the backward profile shown in Figure 6, which is the squareroot of the bilinear interpolation from Naess et al. (2020).
The matrix form of the backward tiling transform is identical to that of the forward tiling transform (Equation 17): The only difference is that the diagonal of each N pix,tile × N pix,tile backward tile matrix, τ j,b , contains the values of the backward tile kernels. The projection matrices P j are unchanged: by applying T † b , they perform a "paste" operation that is the exact opposite of the "crop" operation from the forward transform. Thus we have: where the vector now has the same shape as an input noise map, ν i , and likewise lives in the original, untiled map space.
As discussed, when stitched together in the synthesized map, the backward tile profiles should preserve the total per-pixel noise variance. For example, for a given pixel belonging only to one tile, it should be the case that the value of that tile's backward kernel is 1 -or else we would artificially increase or decrease that pixel's variance. In general -including for pixels in an overlapping region between multiple tiles -the quadrature sum of all contributing tile kernels should likewise be 1: where the term in parentheses is just the squared value of the backward tile kernels, and the sum over projection matrices performs the tile "stitching." We derive Equation 28 more rigorously in Appendix C. Continuing with the simulation algorithm: 4. We need to apply the inverse of the demodulation filter from the model estimation. That is, we multiply the simulation in harmonic space by C 1/2 est : This reapplies the steep noise auto-and crossspectra to the simulation.
5. The last step in drawing a noise simulation is to divide by the square-root of the inverse-variance map, or multiply element-wise by 1/ √ h i . This is analogous to multiplying by the per-pixel standard deviation. In matrix form, a simulation from the tiled noise model is therefore defined as: These steps generate a tiled model noise simulation, ξ i , for the purposes of this paper. A schematic depiction of all these steps is given in the bottom row of Figure 8. We also have the necessary ingredients to write the full map-based noise covariance matrix in the tiled model. By definition, the map-based noise covariance matrix is: Combining Equations 24, 30, and 31 we have: This section has given a detailed accounting of how we both measure the tiled noise model and draw a tiled noise simulation. While thorough, it is an investment that will pay off in our discussion of the two wavelet models: in spite of the wavelets' orthogonal approach to modeling the map-based noise, their actual mechanics are highly similar to the tiled model.

Isotropic Wavelet Noise Model
Our isotropic wavelet, or "wavelet," noise model especially targets the scale-dependent map depth of the ACT DR6 noise, as shown in Figure 5. The model works by taking Figure 5 at face value: maps of the per-pixel noise variance appear different depending on the filtered range of angular scales. That is, each range of angular scales could be assigned its own "inverse-variance" map, where each map still operates in the white noise, uncorrelated-pixel limit. This idea is especially well motivated by the observation that the transition from the small-scale to the large-scale map depth in Figure  5 is reasonably smooth (recall that the bottom panel shows the morphology after normalizing by the mapmaker inverse-variance). In other words, we do not need especially high-resolution "filters" in harmonic space to capture the simultaneous scale-and spatial-dependence of the map depth. We therefore assume that the set of scale-dependent inverse-variance maps forms a complete and sparse description of the map-based noise.
Applying the core principle of the cross-correlation theorem (Equation 13), a pixel-wise uncorrelated covariance in map space inspires us to seek stationary regions in harmonic space. We isolate these regions by tiling the harmonic plane, where now we call the harmonic space kernels "wavelets:" a set of smooth, interleaved profiles defined over , each of which filters for a compact range of angular scales. 21 Thus, the swapping of map space and harmonic space is the main algorithmic difference between the tiled model and the wavelet model, and we can proceed nearly identically as in §4.2. We first prescribe how we construct a sparse representation of the map-based noise covariance under the wavelet model in §4.3.1. For reference, this procedure is similar to the construction of the total data covariance in needlet ILC (NILC) analyses (e.g., Delabrouille et al. 2009;Planck Collaboration XLVIII et al. 2016;Coulton et al. in prep.). Then, in §4.3.2, we outline how we draw a simulation from the wavelet model.

Model Estimation
As before, the goal of the model estimation algorithm is to first filter the ACT DR6 noise maps to promote their stationarity, and then transform them to permit a sparse representation of the covariance matrix. In the wavelet model, we work in harmonic space first and then transform into map space. 1. We start by considering the harmonic transform of the noise maps. Thus, our input noise vectors are given by: which is now a 6N a lm element vector instead of a 6N pix element vector.
2. Just as in the tiled model, at this stage, we have a priori knowledge that the noise covariance is not translationally invariant in harmonic space. Instead, the noise variance depends on the absolute position in harmonic space -that is, depends on -due to the steep noise power spectra ( Figure  3). Therefore, we still need to apply a "stationarity filter," only this time in harmonic space instead of map space. We perform the same filtering operation as in §4.2.1, Step 2, wherein we divide-out  the square-root noise spectra 22 as measured in the same pseudospectrum estimate mask, µ est : Whereas in the tiled model, this operation functioned as a demodulation filter, now the flattened power spectra help promote the noise stationarity in harmonic space. Again, we further explore the necessity of this filter in Appendix G.
In principle, we could now perform the harmonic space equivalent of a "demodulation" step to avoid modecoupling. Recall, in the (map space) tiled model, this involved a filter operating in harmonic space -dividingout the square-root noise pseudospectra. The effect of this was to suppress mixing power across angular scales when tiling the maps. Therefore, for the (harmonic space) wavelet model, demodulation would involve a filter operating in map space -namely, multiplying by the square-root inverse-variance maps (as in Equation  12). Analogous to the tiled model, this would suppress mixing power across regions of the map when breaking the harmonic plane into wavelets in the next step: multiplication by a local wavelet profile in harmonic space effects a convolution by a broad kernel in map space. Nevertheless, in practice we found this demodulation step was not necessary for performance of the wavelet model, and we chose to forego it here. Although we do not include it, we discuss the drawbacks and benefits of this demodulation step further in §6.
3. After flattening the spectra, our goal is still to isolate stationary regions in the transformed noise. As discussed, we achieve this by breaking the noise into a set of "wavelets," which are analogous to tiles defined in harmonic space. For this purpose we use the "scale-discrete" wavelets of Wiaux et al. (2008); Leistedt et al. (2013) with a scaling parameter λ = 1.3 (λ sets the logarithmic wavelet width), lowest wavelet scale J 0 = 9 (which sets the max of the lowest-wavelet), and highest wavelet scale J = 33 (which determines the overall number of wavelets). These parameters generate the M = 26 wavelet kernels shown in the top panel of Figure 9. 23 The wavelet log-spacing is well suited 22 The Cest measured in the wavelet model will not be the same as the tiled model, since in the tiled model we had already multiplied the noise maps by the square-root inverse-variance before measuring the spectra. 23 We augment the highest-wavelet kernel to extend to arbitrarily high-at a value of 1.
to the roughly power law scaling of the noise spectra. In addition, the smooth overlap of the wavelet profiles plays a similar role as the apodized tile edges in §4.2.1, Step 3: it prevents strong ringing of the map-space wavelet kernels when taking the per-wavelet SHT in the next step. Said another way, it sacrifices some of each wavelet's resolution in harmonic space to increase its resolution in map space. The bottom panels of Figure  9 demonstrate just that: the map-space wavelet kernels are compact, and can therefore resolve local noise features in map space. We also observe that because the higher-wavelets are wider in harmonic space, their map-space kernels are commensurately tighter. Thus, the wavelet decomposition is capable of simultaneously localizing noise properties over angular scale and sky position.
We again denote this operation a "forward tiling transform." As before, the first step in applying a given wavelet profile to a map vector is to "crop" the noise vector. Only now, we retain all harmonic modes up to the wavelet max , which is to say we bandlimit the noise vector for each wavelet. Then, we multiply the set of bandlimited vectors by the corresponding wavelet profile. For a given wavelet j, the matrix form is identical to Equation 16, where P j is a N a lm ,wav × N a lm bandlimiting matrix, and τ j,f is a N a lm ,wav × N a lm ,wav square diagonal matrix with the j th wavelet kernel values along its diagonal. Then the forward tiling transform, T f has the exact same form as Equation 17. Our input noise maps thus become transformed, weighted, tiled noise maps: 4. We again are guided by the cross-correlation theorem (Equation 13): we now transform into map space, where we assume that each pixel in the resulting vector is uncorrelated with all other pixels: Thus, u i contains a set of M bandlimited "wavelet maps," where each map has six correlated frequency/polarization components, but whose pixels are uncorrelated.
5. Using u i , we evaluate the pixel-diagonal wavelet power maps:  where x is a given wavelet map pixel, u a i,j (x) is the j th wavelet map and a th component of our transformed noise vector evaluated at x, and f j,2 (x) corrects for the power suppression due to the modes lost when multiplying by each wavelet profile (Dahlen & Simons 2008): where A, the pixel area in steradians, accounts for the spherical geometry of the wavelets. The total wavelet covariance matrix, N i , is thus the same form as Equation 22. Each block along the diagonal, N i,j , is itself block-diagonal in map pixel x, with a single block being a 6 × 6 matrix N (x) i,j ; the elements of N (x) i,j are given by Equation 37. Again, this highly diagonal form is our desired sparse representation, containing O(N a lm ) nonzero elements near its diagonal.
6. We smooth each wavelet power map N (x) ab i,j over nearby pixels x to reduce sample variance. Some examples of smoothed wavelet power maps are shown in Figure 10. We see the wavelet model is suggestive of cross-linking-like noise morphology at large scales, and inverse-variance-like noise morphology at small scales. Unlike the tiled model, we do not use the same smoothing width for all wavelets. Instead, we select a Gaussian smoothing kernel with a FWHM of f ( max )π/ max radians, where f is some function of and max is the maximum support of the given wavelet. Because the max of our SHTs are set to the Nyquist bandlimit of the corresponding map pixelization, π/ max for a given wavelet is the pixel size (in radians) of the corresponding wavelet map. Thus, f counts the number of pixels in the wavelet power maps across the smoothing FWHM. As shown in §4.5.4, Figure 16 Figure 11. Top: The sequence of filters and other operations involved in measuring the sparse covariance in the wavelet noise model. Large boxes denote filters or operations applied to the entire map, while small boxes denote filters or operations after tiling the harmonic plane. The orange filters represent the "forward" wavelet kernels in Figure 9 ("forward" and ""backward" kernels are the same in this model). Each operation is labelled by the corresponding step number in the model estimation algorithm. The input difference map is PA5 f090, T (a = 0), first split (i = 0), and the outputs are square-root wavelet power maps, TT (a = b = 0). We see the spatial variation in map depth also varies by wavelet scale, with high-to low-running top to bottom. Bottom: The sequence of filters and other operations involved in drawing a simulation of the map-based noise in the wavelet model. The first column denotes independent, Gaussian white noise. The orange filters represent the "backward" wavelet kernels in Figure 9, while all other operations revert those of the top row. The output is a wavelet noise simulation of PA5 f090, T, first split.

Simulations and Covariance Matrix
Just as the wavelet noise model estimation algorithm followed directly from §4.2.1, so too does its simulation algorithm carry over from §4.2.2. That is, the procedure for generating a noise simulation from the wavelet model consists of first drawing a realization in wavelet map space, and then reverting the operations that transformed ν i into u i in Equation 36.
1. For each wavelet, draw Gaussian white noise in wavelet map space and take the inner product with the square-root covariance: where η i is real-valued white noise with the same shape as u i .
2. Next, transform out of wavelet map space and back to wavelet harmonic space with a per-wavelet SHT: That is, this vector contains the SHT of the M = 26 independent wavelet noise realizations.
3. Just as in a tiled noise simulation, we now need to stitch the wavelet realizations back together, only this time the stitching occurs in harmonic space. The same considerations carry over from the tiled model as well: we should avoid naively pasting each wavelet next to one another, which would result in sharp discontinuities in the synthesized harmonic transform. Thus, we first multiply the wavelets by smoothly tapered profiles in harmonic space. For this purpose, we reuse the same profiles, or "kernels," from the forward transform, as shown in Figure 9. As before, we also require that the sum over wavelets of the per-harmonic-mode noise variance is preserved; equivalently, that the wavelet kernels satisfy Equation 28. Conveniently, the wavelets of Wiaux et al. (2008); Leistedt et al. (2013) do so by construction. We then simply add up the 26 smoothly tapered wavelet realizations in harmonic space, resulting in a single harmonic transform. Together, this constitutes the "backward tiling transform." The matrix form of the backward tiling transform is identical to Equation 26. Since we have reused the same wavelet profiles as in the forward transform (T b = T f ), we therefore have: where the vector now has the same shape as the harmonic transform of ν i , and likewise lives in the original, "untiled" harmonic space.
4. We apply the inverse of the stationarity filter from the model estimation. That is, we reapply the square-root noise auto-and cross-spectra directly in harmonic space: 5. Finally, transforming the entire simulation back into map space completes the noise simulation: These steps generate a wavelet model noise simulation, ξ i , for the purposes of this paper. A schematic depiction of these steps is given in the bottom panel of Figure 11. As before, it is easy to use Equation 43 to write the full map-based noise covariance matrix in the wavelet model: where we have used that C 1/2 est and W are Hermitian. This completes our discussion of the isotropic wavelet model. The direct parallels to the tiled model construction are apparent. By tiling harmonic space instead of map space, the wavelet model is well-adapted to capture the scale-dependence of the map depth; however, by construction, the isotropic wavelets are circular in map space (Figure 9) -they are insensitive to any local noise stripiness.

Directional Wavelet Noise Model
Our directional wavelet, or "directional," noise model in principle captures all enumerated aspects of the ACT map-based noise, including the spatially-varying stripy patterns and scale-dependent map depth, simultaneously. Its key feature is a decomposition using a set of directional wavelets: each wavelet filters for a particular range of angular scales (like the "isotropic" wavelet model), as well as a particular direction on the sky (unlike the "isotropic" wavelet model). That is, by assuming the noise is stationary within a "directional wavelet" with both a well-defined angular scale and direction, the decomposition naturally leads to scale-and directiondependent "inverse-variance" maps. Moreover, because the main difference between the wavelet model and directional model is just the definition of their wavelet profiles, their model estimation and simulation algorithms are nearly identical. The directional model can therefore be viewed as a generalization of the wavelet model.
The directional model's only algorithmic change is that we build the wavelets in 2D Fourier space instead of harmonic space. 24 Doing so admits speed improvements, increased flexibility in wavelet design, and generally simpler implementation. One consequence of defining wavelets in Fourier space is that their map space kernels change physical size over declination -that is, their size in pixels stays fixed as the physical sizes of the pixels themselves change. In principle, then, the wavelets could mix physical scales. In practice, however, the effect of this is limited: the map depth morphology undergoes a smooth transition from small to large scales (again, see Figure 5). We should also note that the Fourier space wavelets would encounter regularity issues at the poles; however, this too is not relevant for the ACT footprint, whose largest declination is −63 • . Fourier space aside, again we can heavily borrow from §4.3.1 and §4.3.2 in what follows.

Model Estimation
Again, our goal is to construct a sparse representation of the noise covariance matrix. We give an abridged version of the model estimation algorithm from the wavelet model in §4.3.1, since the steps are highly similar. We take care, however, to explain notable differences in the implementation, especially in relation to the design of the directional wavelets.
1. We apply the same stationarity filter as in §4.3.1, Step 2: flattening the noise power spectra as measured in µ est . Again, this helps remove the absolute dependence of the noise variance on angular scale, . We still apply this filter to the maps in harmonic space: 2. In preparation for the wavelet decomposition, we need to transform the maps from harmonic space to 2D Fourier space. To do so, we first return to map space, and then take the DFT to Fourier space: We note that this Fourier space noise vector still contains 6N pix elements, since the DFT, F, is a square matrix.
3. As in §4.3.1, Step 3, our goal is to isolate stationary regions in the transformed noise. Now, we do this by breaking the noise into a set of wavelets defined in Fourier space. We construct the wavelets to be separable radially and azimuthally in the Fourier plane. Modes with comparable radii, k ≡ |k|, map to comparable angular scales, , as well. For instance, had we constructed the isotropic wavelets from §4.3 in 2D Fourier space instead of harmonic space, they would have appeared as concentric annuli. Thus, for the radial part of the directional wavelets we reuse the scale-discrete wavelet profiles of Wiaux et al. (2008); Leistedt et al. (2013), only now with a scaling parameter λ = 1.6 (i.e., we have increased the logarithmic wavelet spacing), J 0 = 5, and J = 19, yielding 16 profiles (see the top panel of Figure 12). Compared to the isotropic model, the directional model has fewer and wider radial wavelet kernels. Also note, since we are in Fourier space, we consider the wavelets to be functions of k instead of . By augmenting the wavelets with an azimuthal part, we create wavelets local not only in their angular

6
Azimuthal Kernels, Radial Index = 11 Figure 12. Top: Radial slices of the Fourier directional wavelet kernels over angular scale. These profiles are again from the scale-discrete wavelet family (Wiaux et al. 2008;Leistedt et al. 2013). The blue and orange dashed wavelet around k ∼ 100, labeled by a "radial" index of 7, corresponds to the azimuthal slices shown in the middle panel; the green and red dashed wavelet around ∼ 1, 000, labeled by a "radial" index of 11, corresponds to the azimuthal slices shown in the bottom panel. Middle and Bottom: We construct a wavelet set with increasing directional locality as a function of k. For example, 13 wavelets tile the azimuthal axis for the lower-k wavelet, while 37 kernels tile the azimuthal axis at the higher-k wavelet. A full 2D wavelet is given by the outer product of the radial and azimuthal slices. Numeric labels are for comparison with Figure 13. scale, k, but in their direction on the sky as well. The azimuthal profile shapes (middle and bottom panels of Figure 12) are defined in Appendix D; their outer product with the radial profiles yields M ∼ O(300) 2D Fourier space wavelets. The full 2D wavelets are shown in Figure 13. As before, these 2D Fourier-space wavelet profiles smoothly overlap to prevent ringing and increase their mapspace resolution.
The forward tiling transform associated with the directional wavelets is similar to that of §4.3. We note that the action of "cropping" out each wavelet from the full 2D Fourier space is akin to the maxbandlimiting from the isotropic wavelet model, except that we are restricted to a rectangular geometry (as in the tiled model). That is, rather than retaining all harmonic modes up to the wavelet max , we retain all Fourier modes up the wavelet |k y,max | and |k x,max |. Thus, each projection matrix P j is a N pix,wav × N pix cropping matrix, and τ j,f is a N pix,wav × N pix,wav is a square diagonal matrix with the j th wavelet kernel values along the diagonal. The forward tiling transform matrix form is again identical to Equation 17, yielding the following weighted, transformed, tiled noise vectors: In Figures 12 and 13, we clearly observe what makes these wavelets directional: by isolating a compact range of azimuthal angles in Fourier space, their map space kernels are stripy with a well-defined orientation on the sky. Additionally, those figures demonstrate a critical design principle: the "zero-sum" tradeoff in scale, azimuthal (i.e., directional), and spatial locality. In Figure  13, for instance, the left panel shows that at large scales (near the "center" of Fourier space), each wavelet exhibits a high degree of scale locality and low azimuthal locality. Correspondingly, their map space kernels (in the top row of the right panel) are spatially extended overall, but are proportionally compact along their orientation. At small scales (far from the "center" of Fourier space), each wavelet exhibits successively less scale locality, and successively more azimuthal locality. Correspondingly, their map space kernels (in the bottom row of the right panel) are spatially compact overall, but are proportionally extended along their orientation. This behavior is reminiscent of the "parabolic scaling" in Chan et al. (2017), wherein the smaller-scale wavelets become progressively more extended. We discuss the engineering of this tradeoff in more detail in Appendix D.

We again transform from the space in which the
noise is assumed to be stationary (Fourier space) into the space where we take the noise to be diagonal (map space), this time with an inverse DFT. Thus, we assume each pixel in the following vector: is uncorrelated with all other pixels.
5. We evaluate the same pixel-diagonal wavelet power maps, N (x) ab i,j , as in Equation 37. The only change is due to the rectangular geometry, the normalization factor f j,2 is the same form as the tiled model (Equation 21), with N pix,wav replacing N pix,tile . The highly diagonal wavelet covariance matrix, N i , is likewise the same form as in the isotropic wavelet model. 6. We apply the same smoothing method with the same definition of f as in the isotropic wavelet model ( §4.3.1, Step 6) to reduce sample variance.
7. We save to disk the square-root matrices N 1/2 i and C 1/2 est .
These steps generate the wavelet model square-root covariance N 1/2 i for the purposes of this paper. We do not show the directional wavelet model maps, but they are qualitatively similar to Figure 10. The top panel of Figure 11, showing the schematic model estimation steps, is likewise similar for the directional wavelets after substituting spherical harmonic transforms for Fourier transforms.

Simulations and Covariance Matrix
Apart from the wavelet construction, the process of drawing a noise simulation from the directional model is, again, almost identical to the wavelet model. We give an abridged version of the simulation algorithm here.
1. For each wavelet, draw Gaussian white noise in wavelet map space and take the inner product with the square-root covariance: 2. Transform out of wavelet map space and back to wavelet Fourier space with a per-wavelet DFT: 3. Apply a "backward tiling transform" in Fourier space. That is, again, we need to stitch the Figure 13. Left: Contour-levels of the full 2D wavelet kernel set. The blue (7, 0), orange (7, 2), green (11, 0), and red (11, 6) 2D wavelets, formed by the product of the 1D profiles in Figure 12, are highlighted (numeric labels match those of Figure 12). The contours are at the half-maxima of the squared wavelets. We see the wavelets are attuned to "radially oriented" features in 2D Fourier space. Right: The map space wavelet kernels for the same blue, orange, green, and red wavelets. Wavelets (7, 0) and (7, 2) pick out low-frequency features by design and are spatially nonlocal. Wavelets (11, 0) and (11, 6) pick out high-frequency features by design and are spatially local overall. The opposite rotation sense in map space versus Fourier space results from the plotting convention (in map space, the R.A. decreases to the right).
wavelets back together to form a single noise Fourier transform. For the same reasons as the isotropic wavelet model, we choose to reuse the forward tiling transform -T b ≡ T f -noting that we construct our wavelets (Appendix D) to satisfy Equation 28. We therefore have: where the vector now has the same shape as the Fourier transform of ν i , and likewise lives in the original, "untiled" Fourier space.
4. We reapply the noise auto-and cross-spectra in harmonic space. To do so, we first transform out of Fourier space and into map space, and then into harmonic space, and then multiply by the squareroot noise pseudospectra: 5. Transforming the entire simulation back into map space completes the noise simulation: These steps generate a directional model noise simulation, ξ i , for the purposes of this paper. The schematic depiction in the bottom panel of Figure 11 is highly representative of the simulation algorithm. As before, we use Equation 53 to write the full map-based noise covariance matrix in the directional model: where we have used that C 1/2 est and W are Hermitian. This completes our discussion of the directional wavelet model. By generalizing the wavelets to isolate a particular direction on the sky in addition to a range of angular scales, the directional model naturally captures the spatially-varying noise anisotropy and the scale-dependent map depth simultaneously.

Preprocessing and Implementation
Now that we have defined the core features and formalism of the models, we complete their discussion with some final implementation details. In Sections §4.5.1 and §4.5.2, we describe the data preprocessing we perform on the noise maps, ν i , prior to using them as inputs to the noise models. This is necessary to mitigate residual map signal that does not cancel in Equation 5. In Section §4.5.3, we define the "pseudospectrum estimate mask," and other masks, used throughout the analysis. Sections §4.5.1, §4.5.2, and §4.5.3 are common to all three models; section §4.5.4 gives further details on the model smoothing step for the wavelet and directional models only.

Source Subtraction and Inpainting
Each of the preceding models sacrifices some degree of map space locality in order to simultaneously resolve features in harmonic/Fourier space. The raw ACT data, however, contain spatially compact point sources; importantly, the brightest of these sources do not exactly cancel when we construct difference maps in Equation 5. This is likely due to a variety of underlying causes: many sources are themselves time-variable, or the telescope pointing varies slightly between splits, both of which would leak their signal into the map-based noise as defined in Equation 1. They may be bright enough that small errors in gain calibration between splits can also induce significant signal-to-noise leakage in Equation 5. The upshot is that raw difference maps contain spatially compact point-source residuals, which, when left untreated, can bias our noise models.
We mitigate these point-source residuals in two ways. First, we use "point-source subtracted" maps Naess et al. 2020). These are maps in which both time-domain and map-space models of point sources, either from a catalog or matched filter applied directly to the ACT DR6 data, have been fit and subtracted from the raw map products Naess et al. 2020). Use of the point-source subtracted maps as inputs eliminates most of the difference map residuals, but some particularly bright sources remain (see left and center panels of Figure 14).
To further increase robustness against point-source residuals, we also inpaint the difference maps in small regions around the sources. For computational simplicity, we do not use a constrained-realization (e.g., Bucher & Louis 2012) inpainting routine. Rather, we develop a simple, sufficient procedure which leverages the overall smoothness of the noise and compactness of the pointsource residuals. For a given point-source location in the difference map, we inpaint only the pixels within a six arcmin radius -the "inpainted region" -as follows: 1. Set the value of all pixels within the inpainted region equal to the mean value of pixels in the annulus between six and nine arcmin. This removes the point-source residual and replaces it with a reasonable constant value. While this is better than having the large spike from the point-source residual, it will leave a discontinuity in the noise properties at the border of the inpainted region.
2. We would like the large-scale noise features to be continuous across the inpainted region. To achieve this, we first copy the 120 arcmin square region centered on the point source from the updated difference map -that is, the map resulting from the first step. We then smooth this copied cutout with a six arcmin FWHM Gaussian kernel. Finally, we adopt the values of this smoothed cutout only in the inpainted region of the difference map. This adds some large-scale modes from outside the inpainted region on top of the constant value from the first step. 3. We would also like the small-scale noise features to be continuous across the inpainted region. For this, we simply add a white-noise realization drawn from the per-pixel difference map variance (w −1 i ) to the difference map inpainted region. These three steps ensure that in a small region around inpainted point-source residuals, the noise features are reasonably continuous. The results of this procedure for a sample point source are shown in the right panel of Figure 14. We inpaint around all sources in the compact source catalog described in Qu et al. (in prep.), which contains 1,779 point sources. While only a fraction of this catalog is likely bright enough to cause a noticeable residual, our inpainting procedure is benign and we only inpaint 0.12% (0.28%) of the sky (ACT footprint).

Map Downgrading
We typically complete our preprocessing of the difference maps by downgrading their resolution by a factor of two or four, which also reduces their bandlimit proportionately. There is no fundamental reason for this step; rather, we do so only to save disk space and computation time of noise realization ensembles. We resample in Fourier space to perform the downgrading. This has several benefits over pixel-block-averaging. First, the pixel centers in the downgraded pixelization align with pixel centers in the raw pixelization. This is necessary for ensuring SHTs of the downgraded maps are lossless (i.e., the downgraded maps still follow the Clenshaw-Curtis quadrature rule, see Reinecke, M. & Seljebotn, D. S. (2013)). Second, we do not introduce any additional pixel window or aliasing beyond what is already present in the full-resolution maps. Lastly, unlike harmonic space resampling, Fourier space resampling preserves the entire Fourier plane, which is necessary for the directional wavelet decomposition to work. In principle, bandlimiting the difference maps could introduce ring-ing around bright features, but in practice these features are already suppressed when we take map differences in Equation 5 and inpaint the difference maps.

Pseudospectrum and Boolean Masks
Our models also rely on two input masks. The first is an apodized mask that leaves a restricted portion of the sky used to measure the noise pseudospectrum and construct the isotropic filter for the noise models. The second is a boolean mask defining the larger area we model and simulate.
This first mask is only used to measure the noise pseudospectra which enter the isotropic filter in §4.2.1 Step 2, §4.3.1 Step 2, and §4.4.1 Step 1. The goal of this mask is produce "representative" pseudospectra: downweighting or excluding the noisiest data which would otherwise dominate the spectra, while retaining a reasonable fraction of the ACT footprint. Thus we use a slightly modified version of the nominal ACT DR6 CMB lensing analysis sky mask (Qu et al. in prep.). This is the intersection of two other masks: a root-mean-square threshold cut on the ACT DR6 f090 and f150 coadd maps of 70 µK-arcmin, and a Planck 60% Galactic mask (Planck Collaboration I et al. 2020). We modify it by then cutting pixels not observed by all splits of all detector arrays. This ensures we can use this one mask version for all of our noise models. Finally, we apodize the mask by three degrees with a cosine profile. The outline of the resulting pseudospectrum mask, µ est , is shown as the blue region in Figure 15, overlayed on top of a cross-linking map. To maximize noise model accuracy one might be tempted to select the same mask here as is used in any downstream analysis, but the ensuing proliferation of noise models could be cumbersome from a data management perspective. Instead, we note that our noise models are reasonably robust to O (1) variations about the noise spectra as measured in our pseudospectrum mask (see Appendix G). 25 The boolean mask is used to define which data enter our noise models and are simulated. Ideally, we would include all observed pixels in a given difference map, such that we model and simulate all the available data. In practice, however, we found it improved performance to prohibit pixels with the lowest cross-linking from entering the noise models. We detail the motivation and construction of this boolean mask, µ bool , in Appendix E. Once defined, we apply it directly to the difference maps (via µ bool * ν i ) before the model estimation algorithms, and directly to the simulations (via µ bool * ξ i ) after the simulation algorithms. The outline of this boolean mask is shown as the orange curve in Figure 15. We see it excludes the regions at the edge of the ACT footprint with practically no cross-linking. Importantly, the boolean mask does not cut any data used in the main DR6 cosmology analyses (the CMB power spectrum and CMB lensing).

Model Smoothing
The preceding noise models are themselves noisy approximations of an underlying, true covariance matrix. This statistical "model scatter" is made especially acute by the fact that we build models using a single realization of the input data -a model for each split map. Absent any suppression of the statistical scatter in the estimated covariance matrices, the models will be biased by over-fitting to the data: random fluctuations in the estimated covariance matrix will be recorded as part of the true covariance matrix. Of course, the preceding algorithms do address this issue by smoothing the tiled 2D Fourier power spectra ( §4.2.1, Step 6) or wavelet power maps ( §4.3.1, Step 6 and §4.4.1, Step 6). These smoothing steps are essentially averages over adjacent modes or pixels in each tiled or wavelet covariance (we discuss whether also averaging noise models over data splits is a viable alternative in §6). While necessary to avoid overfitting to the data, however, over-smoothing the noise models may remove structure in the true noise covariance. For example, in the wavelet power maps of Figure  10, a given "blob" of high variance might be a statistical fluctuation, such that we would want to smooth it away, or might be real feature in the true noise covariance, such that we would not want to smooth it away. Here, we attempt to balance these competing potential biases.
Starting with the wavelet models, we need to specify the function f ( ) used in §4.3.1, Step 6 and §4.4.1, Step 6. Recall, in those smoothing steps we take each wavelet power map, corresponding to a wavelet with a given max , and smooth it with a Gaussian kernel whose FWHM in map pixels is given by f ( max ). Thus, we want f ( ) to lie in the "Goldilocks zone" of not too little, but not too much, smoothing.
Fortunately, we have a priori insight into what the noise power maps should resemble: at high-, the mapmaker inverse-variance maps, and at low-, the mapmaker cross-linking maps (recall Figures 1 and 5). 26 We use this knowledge to estimate the optimal smoothing width, f opt , at a given max , according to the following procedure: 1. Take the raw mapmaker inverse-variance, initially with a Nyquist bandlimit max = 21, 600, to be the "true" noise covariance. Draw a white-noise realization from this covariance to represent a single realization of the map-based noise in the data.
2. With varying amounts of smoothing, construct white-noise models using this realization of the map-based noise as the input. This is done by simply squaring the observed noise in each pixel, and smoothing with a Gaussian kernel whose FWHM is f pixels.
3. Compare the agreement between the "estimated" white-noise covariance in the models with the original, "true" noise covariance. The optimal smoothing, f opt , corresponds to the model that maximizes this agreement.
4. Repeat these steps by varying what is taken to be the true noise covariance, by downgrading the inverse-variance map to a lower max and/or including the cross-linking map. Find f opt in each case.
The fourth step above is important to ensure the fiducial input covariance is as realistic as possible. First, the wavelet power maps probe noise structure at different resolutions, as set by their max . Consider the limiting case of a white-noise covariance defined by the mapmaker inverse-variance map. In this case, even though there is a "true" noise covariance at full resolution, such a covariance has different spatial structure when downgraded to lower resolutions. Thus, the optimal smoothing level at those downgraded resolutions may not be the same as at full resolution. Second, we observed in Figure 5 that the spatial noise structure resembles the mapmaker cross-linking at large scales. Thus, by including the (downgraded) cross-linking in our input covariance at lower values of max , the resulting optimal smoothing level is appropriate for models of the actual data. This procedure balances the tension between preserving the true covariance structure and suppressing random fluctuations in realistic wavelet power maps. We show the results in Figure 16. 27 In general, the optimal smoothing factor, f opt , increases with max . At large scales -that is, scales dominated by correlated noise -the cross-linking governs the spatial noise structure, and f opt ∼ 5 − 10. At small scales, where the inversevariance dominates the spatial noise structure, f opt increases toward ∼ 20 − 25. The open green diamond and cross in Figure 16 correct for the fact that our wavelet noise models are not simple white-noise models (also explained further in Appendix F). They indicate the optimal smoothing for the wavelet noise models is slightly less than implied by our simple white-noise procedure.
In light of these results, we opt for a simple ad-hoc form of f ( max ) in all arrays and frequencies, using the model shown in Figure 16. 28 This respects all arrays' preference for low smoothing factors at low , with a shallow increase toward larger smoothing factors at large . This curve also remains ∼ 10 − 30% below the white- 27 We give the detailed version of this approach and define a quantitative measure of "model-to-truth" agreement in Appendix F. 28 The lines from (0, 2) to (1,350, 10), and then through (5,400,16) in ( , f ( )) space.
noise results for compatibility with our wavelet models.
We also see no significant effect in the results of §5.1 when varying f ( max ) by ∼ 5 at all but the largest scales. Therefore, in setting the form of f ( max ) we are primarily concerned with capturing the overall trend than specific, minor variations between arrays. We plan to improve our smoothing procedure in future work. For the tiled model, there is no mapmaker product that gives an a priori understanding of what the 2D Fourier power spectra should look like. Thus, as we discussed in §4.2.1, Step 6, we simply adopt the same smoothing scale as was used in Madhavacheril et al. (2020) and Naess et al. (2020).

MODEL EVALUATION AND RESULTS
Having constructed the tiled, isotropic wavelet, and directional wavelet noise models, we examine their ability to reproduce ACT map-based noise statistics. As discussed in the previous section, we do so by averaging over large ensembles of noise simulations. A data difference map and example noise simulations for each model are shown in Appendix A. For ease of computation, we have chosen to show results from models bandlimited to max = 5, 400, and averages of 30 simulations. Because of the density of the data and model noise covariance in map space, a direct comparison of those matrices is not possible. As an informative substitute, we instead examine the noise properties of §3.2 as captured in simulations. Thus, in §5.1, we compare simulations to the data by: (a) isotropic noise auto-and crossspectra (large-scale noise correlations), (b) 2D Fourier power spectra (spatially-varying noise anisotropy), and (c) effective "inverse-variance" at different angular scales Figure 17. Comparisons of PA5 noise auto-and cross-spectra between the simulations and the data. The solid lines (shaded bands) are the means (1σ range) over the eight splits. The top row shows f090 power spectrum ratios. The bottom row shows f150 power spectrum ratios. The middle row shows correlation spectra differences: the correlation is calculated by considering the ratio of the f090 × f150 cross-spectra to the geometric mean of the auto-spectra (Equation 11). For all three rows, the nominal result is a value of 0. All spectra are measured in the same apodized pseudospectrum mask from §4.5.3. Recall the models are bandlimited to max = 5, 400. Temperature spectra perform well at all scales, exhibiting no more than few-percent deviations at low-. As discussed in §6 , polarization spectra appear robust for 300, but exhibit ∼10% deviations for 300.
(scale-dependent map depth). Each comparison is analogous to viewing the full map-based noise covariance matrix along a different "slice," or through a different "lens:" it visualizes a particular salient aspect of the matrix structure, since doing so for the entire matrix at once is difficult. In §5.2, we apply the noise simulations to a realistic analysis pipeline. In particular, we examine the covariance matrix of the ACT DR6 power spectrum bandpowers -that is, the determination of the errors on the power spectrum. We compare a common analytic prescription for the covariance matrix and a Monte Carlo estimate from simulations alone.

Noise Covariance Model-Data Comparisons
We examine the noise properties from §3.2 as captured by our simulations and compare to the data. Thus, we directly probe the extent to which each model achieves its design goals. As discussed, each property visualizes a slice of the full covariance matrix, either constructed from the single available realization of the data, or averaged over an ensemble of simulations. If we have built a good noise model, the covariance slice measured from the data should be consistent with a single noisy realization of the ensemble average covariance slice. Said another way, the ensemble average slice is each model's postulated long-run average of the data slice, if it were possible to rerun the experiment many times. Thus, for a successful model, we expect the data slice to "scatter" around the smoother long-run average provided by the simulations, as is the case in a typical goodness-of-fit analysis. Indeed, in future work, we aim to add a full χ 2 consistency test of data and simulations, which would  Figure 4. The first/second/third/fourth columns correspond to the data/tiled/wavelet/directional models. The model bandlimit ( max = 5, 400) causes the oval power loss in the simulations. The bandlimit is nearly circular in 2D Fourier space in the top row, since this is from a region near the equator. The bandlimit becomes "squished" in the x-direction in the bottom row, since this region is at a higher declination. That is, the pixels in the x-direction are physically closer together at these latitudes, and so the 2D Fourier space samples higher frequencies. Note we account for this in our definition of the 2D Fourier coordinates. The tiled and directional models capture the main anisotropies observed in the data, with the tiled model resolving more fine features. The wavelet model has no ability to measure anisotropies by construction.
require application of the full inverse noise covariance matrix, N −1 i , to map vectors (see §6 for a discussion of the inverse covariance matrix). At any rate, we should keep this signature of good model performance in mind in what follows.

Frequencies
As discussed, extended atmospheric signals are what produce most of the non-trivial map-based noise properties, including pixel-pixel correlations, and correlations between frequencies on the same detector array. Figure 17 demonstrates the noise models' performance at reconstructing the noise auto-spectra and cross-spectra between frequencies. Generally, all models perform similarly well in reconstructing the noise TT, EE auto-and TE cross-spectra. Their mean values are within 1% of nominal (zero for the correlation spectra differences and the auto-spectra ratios, since in Figure 17 we have subtracted 1 from the ratios). Departures from nominal are mainly confined to large scales ( 300), especially in polarization. One possible explanation for this behavior is the pseudospectrum filter, which is applied to all of our models. This filter places a lower bound on the map space locality of features in the simulated noise. As we show in §6, this lower bound happens to be about three times higher in polarization than temperature. In other words, it is more difficult to contain locally noisier regions of the sky in polarization, and we would interpret Figure 17 as showing some noisy, large-scale polarized regions bleeding into the analysis mask. We note, however, that ACT DR4 power spectra were limited to min = 350 in polarization and min = 600 in temperature (Choi et al. 2020). Thus, we take the results of Figure 17 as a marker of good noise model performance across all three model types.

Spatially-Varying Noise Anisotropy
While the tiled and directional models are designed to capture the location-dependent stripiness in the ACT noise, the wavelet model is not. To evaluate their performance in this regard, we simply repeat the analysis in Figure 4, substituting an average over simulations for the data. The success, or lack thereof, in each model is visually apparent in Figure 18. Both the tiled and directional models succeed at reconstructing the primary noise stripiness in each region: the "x-like" pattern near the equator, and the horizontal "bar" in the poorly cross-linked region. The tiled model appears to capture some secondary features in the data noise that are not oriented radially; recall the directional kernels ( Figure  13) are primarily designed around radial features. The wavelet model stands out: by construction, it has no ability to measure local noise anisotropy. We also observe that the high frequency "covariance scatter" in the data 2D power spectrum is substantially suppressed in the simulation spectra. This is expected: as discussed, in the simulation spectra we visualize the true, underlying covariance structure postulated by each model, of which the data spectrum is a single realization. This also inspires confidence in the model smoothing from §4.5.4: the models are not obviously over-fit to the random fluctuations in the data covariance. In sum, the tiled model and directional models succeed at capturing the primary features of the spatially-varying noise anisotropy in the data, while the wavelet model captures no such features.

Scale-Dependent Map Depth
In addition to spatial correlations and stripiness, we showed how the atmosphere and scan strategy together induce a scale dependence to the map depth in §3.2. In particular, we showed that the small-scale noise power appears to follow the morphology of the mapmaker inverse-variance maps, while the large-scale noise power is sensitive to the scan cross-linking. We compare the ability of the noise models to reproduce these effects in Figures 19 and 20. When filtered for small scales (Figure 19), all three models succeed at reproducing the mapmaker inverse-variance-like morphology in the data. When filtered for large scales (Figure 20), the two wavelet-based models succeed at reconstructing the mapmaker cross-linking-like morphology, for which they were nominally designed.
The tiled model, while not explicitly designed for this purpose, also performs moderately well. This is because a scale-dependent map depth is equivalent to a spatiallydependent noise power spectrum. The tiles do model a spatially dependent (2D Fourier) power spectrum, but limited to a resolution of the tile size (∼ 4 • ). As a result, some features in the tiled large-scale spatial noise power are not resolved -for instance, the fine structures near a declination of −20 • . Other structures are "squaredoff," such as the transitions between high-and low-crosslinking regions.
For the large-scale maps, we also note that all three models show slightly less inverse-variance (more noise power) than the data. This is a restatement of the result in Figure 17 that the polarized noise power spectra ratios are elevated for 300. Finally, as for the 2D Fourier spectra of Figure 18, we observe the same suppression of high-frequency scatter in the simulated averages as compared to the data. This is again expected, and further supports the model smoothing of §4.5.4. In sum, the two wavelet models perform somewhat better than the tiled model at capturing the scale-dependent map depth in the data.
In short, the models behave as designed. All three models reconstruct the overall noise auto-and crossspectra. The tiled model is best at capturing spatiallyvarying noise stripiness; it performs less well at capturing scale-dependent map depth. The wavelet model cannot capture stripiness; however, it is best at capturing scale-dependent map depth. Finally, the directional model performs well in both analyses.

ACT DR6 Power Spectrum Covariance
In this section, we examine the implications of our noise simulations on the estimation of the covariance matrix for the CMB angular power spectrum. Specifically, we examine the matrix, Ξ, given by: is the residual of the recovered power spectrum in -bin b, between fields W and X (one of T, E, or B), with respect to a fiducial model. An accurate covariance matrix is a required deliverable of any CMB experiment: inaccuracies would directly propagate into cosmological parameter uncertainties and could also skew power spectrum pipeline null tests.
While the exact calculation of Equation 55 is difficult (see e.g. Planck Collaboration XI et al. 2016), several approximations exist. These incorporate knowledge of a fiducial sky model, instrumental noise, and incomplete sky coverage into computationally tractable, analytic expressions of the covariance matrix (e.g., Brown et al. 2005;Efstathiou 2006;Couchot et al. 2017;García-García et al. 2019;Louis et al. 2020). These analytic expressions may also be augmented with simulations to correct for some of their assumptions, for example, by adding variance at low-to account for masking point sources with small holes (Planck Collaboration V et al. 2020;Li et al. 2021). As far as their treatment of instrumental noise, these semi-analytic methods adopt one of two approaches. The first approach assumes the noise is diagonal in the harmonic domain -that is, completely described by the noise power spectrum. This is the de- . The same small-scale "inverse-variance map" as the top panel of Figure 5, for both data and noise simulations. The first/second/third/fourth rows depict the data/tiled/wavelet/directional models. As is discussed in §4.5.3, we see that simulations are masked with the boolean mask that removes sky regions with the lowest cross-linking. All models do a good job reproducing the high-map depth in the data. . The same large-scale "inverse-variance map" as the bottom panel of Figure 5, for both data and noise simulations. The first/second/third/fourth rows depict the data/tiled/wavelet/directional models. All three models exhibit slightly less inverse-variance (more noise power) than the data, as is expected from the elevated polarized power spectra ratios in Figure 17 for 300. The wavelet and directional models capture fine structures in the data; the tiled model resolution at large scales is limited to the tile size -map features are "squared-off" or unresolved.  Figure 21. Ratios of the power spectrum covariance matrix diagonal elements between a Monte Carlo construction using 300 noise simulations, and an analytic approach common in the literature. The nominal result is a value of 1. TT (EE) ratios exhibit an excess of up to ∼ 15% (∼ 20%). The "Gaussian" simulations are a sanity check: they are drawn directly from the measured noise power spectra and should match the analytic covariance. Significant deviations of the "Gaussian" simulations from nominal (i.e., in TT for 300) suggest discounting this comparison at those scales. Data points are plotted at the center of the multipole bins, and show the per-bin mean and 1σ error in the mean coming from the 300 simulations.
fault approach in the NaMaster 29 (García-García et al. 2019) and pspy 30 (Louis et al. 2020) codes, which together form the SO power spectrum pipeline, PSpipe 31 . The second approach generalizes the first by combining knowledge of the noise power spectrum and the inhomogeneous white-noise depth (e.g., Planck Collaboration XI et al. 2016;. In the following subsections we compare Equation 55 estimated from an analytic expression following the former approach (assuming diagonal noise in the harmonic domain), to a Monte Carlo estimate from simulations alone. We will examine analytic expressions following the latter approach in future work. We focus on the PA6 f150 band, whose results are representative of the full DR6 data.

Analytic and Monte Carlo Covariances
We construct an analytic covariance matrix assuming the signal and noise are completely described by their respective power spectra -that is, assuming uniform white-noise depth. In particular, we use the methods of pspy available in PSpipe. Our implementation adopts the best-fit ΛCDM cosmology from Planck Collaboration VI et al. (2020) as the primary CMB power spectrum signal, and adds a foreground model power spectrum consistent with Dunkley et al. (2013), derived from 29 https://github.com/LSSTDESC/NaMaster 30 https://github.com/simonsobs/pspy 31 https://github.com/simonsobs/PSpipe maps with a 15 mJy source flux cut. We incorporate the noise pseudospectra estimated directly from the DR6 data (i.e., the unnormalized version of the spectra in Figure 3), following Equation 8 of Louis et al. (2020) (see also Equation 2.55 of García-García et al. (2019) and Equation 10 of Li et al. (2021)). To keep our analysis straightforward and avoid issues with point source holes, we reuse the pseudospectrum estimate mask, µ est , from the previous sections (shown in Figure 15).
Next, we prescribe the simulations from which we build a Monte Carlo covariance matrix. We first draw signal map realizations from the same CMB-plusforeground signal model that entered the analytic matrix, and convolve them with the PA6 f150 beam. We then add noise realizations in map space using the methods of §4. We evaluate the power spectra of 300 such simulations, for each of the eight PA6 f150 splits, and directly compute the Monte Carlo covariance matrix (Equation 55) by averaging the residuals of the estimated power spectra compared to the input signal power spectrum. As a sanity check we also generate a set of "Gaussian" Monte Carlo simulations whose noise realizations are simply drawn from the noise power spectra, and estimate their covariance matrix following the same method. Both the analytic and Monte Carlo matrices account for the mode-coupling induced by the analysis mask, 32 and are binned in to improve their conditioning.

Comparison
We compare the diagonals of the Monte Carlo and analytic matrices in Figure 21. In terms of Equation 55, that is, we only consider the elements of Ξ with b = b and W = X = Y = Z. As expected, the "Gaussian" simulations produce a Monte Carlo matrix whose diagonal agrees with the analytic expression to a few percent, demonstrating that our pipeline is not significantly biased. We then note that in spite of their different methods, our three noise models deliver remarkably similar results. In particular, the TT (EE) block of the analytic covariance appears to underestimate its diagonal by up to ∼ 15% (∼ 20%) for 2, 000 (1, 000). Such discrepancies exceed what has been found in comparable analyses of Planck data. For instance, Couchot et al. (2017) compared the diagonals of a Monte Carlo covariance estimate and an analytic expression assuming uniform white-noise depth and found errors of at most a few percent. Meanwhile, Li et al. (2021) performed a similar comparison and suggest errors of order ∼ 10%. At a minimum, our noise simulations suggest increasing the ACT DR6 power spectra covariance by up to ∼ 20% in polarization.
The agreement of our three noise models in Figure 21 hints at a common underlying mechanism. As discussed, one ACT noise property not accounted for in the analytic covariance of this section is the spatially-varying map depth (regardless of whether the map depth is simultaneously scale-dependent). Thus, the natural next step would be to evaluate an analytic expression that includes the inhomogeneous ACT white-noise depth (Efstathiou & Gratton 2021), as was used for Planck (Planck Collaboration XI et al. 2016;Planck Collaboration V et al. 2020), assessing whether it could sufficiently capture the additional variance observed in our simulations. These results will inform the approach to constructing a power spectrum covariance matrix for ACT DR6, and will be explored in future work.

DISCUSSION AND CONCLUSIONS
For ACT DR6, the advent of three separate noise models allows downstream analysis pipelines to evaluate results against a range of map-based noise properties. For instance, we showed how propagation of noise simulations through the ACT DR6 power spectrum pipeline can help measure the extent to which analytic covariance matrix expressions accurately capture the nontrivial ACT noise properties (e.g., Li et al. 2021;. Likewise, the ACT DR6 lensing analysis (Qu et al. in prep.) utilizes these models to demonstrate that the lensing inference is robust with respect to the complicated ACT noise. While checking pipelines for robustness against different noise models can be useful, it is too early to say whether one model is the most performant for particular applications. Finally, consumption of large noise simulation ensembles by ACT and community users requires efficient code and a simple interface to those products. This codemnms -is made public with this paper, and is out-of-the-box applicable to future ground-based CMB experiments such as SO and CMB-S4. Nevertheless, it is worthwhile to evaluate key model assumptions and suggest directions for future development.
Measuring noise from split differences: Throughout this paper, we have assumed ν i to be an accurate noise estimate. In §3.1 we demonstrated this is a reasonable assumption when averaged over a large sky fraction. However, in regions of especially compact or bright foregrounds, we encounter residual signal in our difference maps. As discussed in §4.5.1, we mitigate point-source residuals with inpainting. However, we do not handle bright, extended source residuals, which can be prominent in the Galactic plane (see Figure 23). The underlying cause of these residuals is likely similar as for point sources: a combination of pointing variation in the telescope and small relative calibration errors between splits. Our noise models will therefore be biased in these regions. Future work could include more sophisticated inpainting than our implementation in §4.5.1 to handle bright, extended regions.
Non-identically distributed splits: As noted, unlike ACT DR4, we do not assume that the noise in separate splits is identically distributed. Therefore, we do not average noise models over splits in their sparse basis. Whether to do so is tightly coupled to the model smoothing: averaging over the eight splits of ACT DR6 would reduce the amount of required model smoothing. However, if splits contain different noise structure, averaging over them would bias every model. In fact, we know a priori that the noise is not identically distributed over splits because their inverse-variance maps vary considerably. We indeed observe percent-level improvement in low-power spectra ratios when treating splits separately versus when averaging over splits. In principle, including inverse-variance maps as a mode- Figure 22. "Beam-like" real-space profiles corresponding to the transform of the noise autospectra from the top panel of Figure 3 for PA5 f090. The profiles have been peaknormalized. Solid lines (shaded bands) denote the mean (1σ range) over splits.
decoupling filter in the wavelet models (as we discussed, and forewent, in §4.3.1 and §4.4.1) would eliminate the barrier they present to averaging over splits.
Noise pseudospectrum filtering: Each of our noise models includes the same noise pseudospectrum filter. The benefits of this filter were discussed in §4 and are explored further in Appendix G. However, this filter has a downside: it limits our ability to model high-resolution noise features in map space, especially in polarization. To see how, consider that the noise pseudospectra in the top panel of Figure 3 downweight small scales relative to large scales. Thus, the noise pseudospectrum filter has a similar effect in the noise models as the instrumental beam has on a given sky signal: by suppressing small scales it "blurs" compact features. We can see this by examining the "beam-like" real-space profiles corresponding to the transform of noise pseudospectra. Results are shown in Figure 22 for the PA5 f090 band. The FWHM of the EE and BB real-space profiles are both ∼ 5 • , whereas the TT profile is more spatially compact, with a FWHM of ∼ 1 • . In other words, given a point-like noise feature in both temperature and polarization, we would expect our simulations to resolve it to ∼ 1 • in temperature and ∼ 5 • in polarization. This may explain, for instance, the greater deviations observed in the lowpower spectra ratios ( Figure 17) for EE than TT: noisy regions from farther outside the analysis mask in polarization than temperature are smoothed into the mask, skewing the ratios. The different real-space widths in Figure 22 are driven mainly by the low-shape of the noise pseudospectra in Figure 3: in some bands, the TT noise power at the largest scales ( 100) level off and even decrease. This does not occur in polarization. We will seek to better understand the shape of the lownoise power spectra in future work. Powers of noise covariance matrix: In this paper, we only interact with the full noise covariance matrix by drawing samples from the Gaussian distribution it defines. Downstream analyses may wish to raise the matrix to an arbitrary power and then apply it to a map vector. In particular, Monte Carlo methods, inversecovariance filtering, and Wiener filtering require the full inverse noise covariance matrix. Such an implementation is beyond the scope of this work; nevertheless, we can comment on its feasibility. Unfortunately, due to the overlap of tiles in map space (likewise, of wavelets in harmonic/Fourier space) the class of noise covariance matrices prescribed here are not analytically invertible. An exception would be the case of non-overlapping (i.e., tophat) tiles or wavelets, but with a detrimental impact to their performance as noise models (as noted in the text) from their lack of smoothness. In principle, the exact application of the inverse noise covariance matrix to a map vector is still possible with a conjugate gradient method, but may be slow. We will explore this and other alternatives in future work.
Prospective: These results offer concrete insights for the design of next-generation, ground-based CMB experiments. Foremost among them is the central role of the map cross-linking in modulating large-scale noise power. Combined with the tight relationship between cross-linking and the noise anisotropy pattern, spatial uniformity of cross-linking is as, if not more, important to realizing a successful map-based noise model than uniformity of detector hits. We acknowledge that achieving uniform cross-linking may encounter more observational constraints than uniform detector hits (e.g., the dynamics of the telescope and receiver). In a similar vein, projects seeking to use these noise modeling methods should design scan strategies that facilitate identically distributed map splits. While formally difficult, maximizing the uniformity of heuristics across splits (again, e.g., cross-linking, inverse-variance) would go a long way toward allowing noise models to be averaged over splits. Projects should also seek to avoid unnecessary "hard edges," in their scan strategies. This includes not only the observation footprint boundary, but also adjacent regions of disparate cross-linking or inverse-variance levels. Such boundaries within the ACT DR6 footprint proved especially challenging to handle from a noise modeling perspective.
Finally, we emphasize the utility of a fast and flexible noise model implementation, as is released in this paper with mnms. Using ACT DR6 data with mnms, this paper has constructed empirical noise models and drawn large ensembles of ACT noise simulations. For future experiments undergoing design and forecast activities, such as SO, mnms can also serve as an emulator of otherwise expensive TOD simulations. Thus, mnms could relieve a bottleneck in the simulation-to-forecast feedback loop, enabling iterative optimization of experiment design.

B. DIFFERENCE MAP ALGEBRA
We explicitly show the algebra relating to the construction of the difference maps, ν i . We first prove Equation 7. Starting from Equation 6, we have: Now impose the independence of the noise in different splits and the definition of the inverse-variance in split i, E(n i * n j ) = 1/h i * δ ij . Also substitute the definition Σ ≡ i h i : We then prove Equation 10. First, we note that because ν i ∝ d i , the correlation of ν i and n i is the same as that of d i and n i . Next, we evaluate the covariance of d i and n i : Now evaluate the correlation coefficient directly: Note that all of these Equations are simply generalizing the familiar "N − 1" factors from the unbiased sample variance of a set of equally-weighted observations to the case of unequally-weighted observations.

C. DERIVATION OF TILING CONSTRAINT
We offer a derivation on the backward tiling transform constraint (Equation 28) that each noise model must satisfy. We will show that it emerges from a desire to not bias the diagonal entries of the full noise covariance matrix. We begin by considering a simpler version of the noise simulation equations (Equations 30,43,53). Specifically, we ignore any "exterior" filtering operations, and we replace any DFTs or SHT matrices with a "universal" transform matrix U (we also drop the split index i for brevity). In this way, a noise simulation from all three noise models appears identical: where we have generalized the bases of these objects: the simulation is in the "operational" basis where we define and apply the tiling transform profiles -for the tiled/wavelet/directional model this is map space/harmonic space/Fourier space -and the model covariance is in the "sparse" basis. The matrix U thus projects a vector from the "sparse" to the "operational" basis -it is a DFT or a SHT matrix. We want to determine the conditions under which the operational space noise covariance matrix, N oper = ξ oper ξ † oper , is unbiased: where we have identified UN j,sparse U † with N j,oper . Now consider a single matrix element indexed by x, x -that is, the covariance between "pixels" (or "modes") in the operational basis, N x,x oper . Such an element is formed by taking the x th row of P † j τ † j,b and x th column of τ j,b P j , each of which have at most one nonzero element. Thus, for a given j, N x,x oper receives a contribution from at most one element of N j,oper ; let us index that element by e and e : In the case of x = x (i.e., the diagonal of the map-based covariance matrix), Equation C7 recovers Equation 28. We had to use two sleights-of-hand in the second and third lines of Equation C7. In the second line, we moved N e,e j,oper out of the sum, where it became N e,e oper . In the third line, we asserted that N x,x j,oper = N e,e j,oper . To do so, we assume that any given element of the covariance matrix on the left-hand-side is equal to each of its occurrences in each tile or wavelet, j. In reality, each tile or wavelet covariance will disagree on the exact value of a given shared element, but if we assume that these separate estimates are unbiased, then satisfying Equation C7 will likewise yield an unbiased full covariance matrix.
There are a few more caveats of note. Working backwards, it is difficult to construct useful tiles or wavelets that satisfy Equation C7 for x = x , and indeed for the kernels in this paper this sum approaches zero as the difference between x and x increases. The upshot is that we cannot model correlations at scales larger than the tile or wavelet size within the tiling scheme itself. Such "long-distance" correlations instead are modeled by the filtering steps that we excluded from the preceding argument. Likewise, we assume that our sparse-basis covariance blocks N j,sparse are unbiased estimates, but that is likely not true. For instance, Equation 21 is the zeroth-order correction to the modecoupling induced by the forward tiling transform. Lastly, we must assume that the sparse-basis covariance blocks are themselves uncorrelated, but we know that is not true, since the forward tiling kernels from which the blocks are measured also overlap. In sum, while satisfying Equation 28 is necessary for a performant noise model, it alone is not sufficient.

D. DIRECTIONAL WAVELET KERNELS
As discussed in §4.4, the directional model utilizes radially-and azimuthally-separable wavelets in the Fourier plane: where k is the 2D Fourier mode, k = |k|, φ = arg(k), κ are the wavelet kernels of Wiaux et al. (2008); Leistedt et al. (2013) with the parameters given in §4.3.1, η are the azimuthal kernels, and we have expanded the wavelet index j into a "radial" index u and "azimuthal" index v. Evaluated at each discrete k in the map Fourier transform, τ j,f becomes the diagonal of the forward wavelet kernel matrix τ j,f . The fact that η also carries the "radial" index denotes that each radial kernel has its own distinct set of associated azimuthal kernels.
We define the azimuthal kernels as follows: where: with n ≥ 0, 0 ≤ p ≤ n, 0 ≤ v ≤ n, and all indices are integers. One should interpret this as follows. For each radial index u, we choose two parameters for the set of associated azimuthal kernels: n and p. n sets the number (n + 1) of azimuthal kernels, and p is the kernel "shape" parameter. v is the kernel index, and it generates n + 1 copies of the kernel equally spaced in azimuth (at each φ 0,k,n,v ). The kernel shape is the p-th power of a cosine centered on φ 0,k,n,v , and truncated at either side of φ 0,k,n,v where the cos(...) p function intersects the x-axis. The width of the cosine is set by ω n,p . The normalization ensures the kernels satisfy Equation 28 (with τ j,b = τ j,f * ) and respect reality conditions when applied to the Fourier-transformed map in §4.4.1, step 3. Finally, the sum over k ensures that this piecewise function is periodic in π. Example azimuthal kernels are shown in the top row of Figure 25. The guiding principle in the construction of η u,v is the tradeoff in locality between map space and Fourier space: a kernel compact in map space will necessarily be extended in Fourier space, and vice versa. Taking the radial kernels as a given (see Figure 12), all else being equal the large-scale wavelets will be therefore be extended in map space, while the small scale wavelets will be compact in map space. We counteract this by tuning the n and p parameters for each radial kernel. For a given number of azimuthal kernels n + 1, increasing the shape parameter p decreases the compactness of the kernel in Fourier space (and increases its compactness in map space). We note that p = 0 corresponds to n + 1 equally spaced, non-overlapping tophat functions, while p = n corresponds to the steerable basis of Freeman & Adelson (1991). On the other hand, for a given p, increasing n increases the compactness of the kernels in Fourier space (and decreases its compactness in map space). This tradeoff is evident in the bottom row of Figure 25. Therefore, we select low n and high p values for the large-scale wavelets, and the opposite for the small scale wavelets, subject to 0 ≤ p ≤ n. Meanwhile, computational interests favor fewer, more-compact Fourier space kernels at high u.
We found the parameters in Table 3, which we used in §4.4, to be a good balance of all considerations. There are fewer radial kernels (16 vs. 26) compared to the isotropic wavelet model to make up for the increased total kernel count. The choice of n and p recalls the "parabolic scaling" in Chan et al. (2017), such that the 2D kernels become increasingly directional (narrow) at larger u (see Figure 13), except at the highest radial kernel. This allows the resolution of sufficient detail in 2D Fourier space (e.g., Figure 18) without sacrificing large-scale detail in map space (e.g., Figure 20). We note that we accomplished this by increasing n in large steps, while decreasing p within each step. A similar narrowing of the 2D kernels with increasing radii could have been accomplished, for example, by more smoothly increasing n while holding p constant.  As discussed in §4.5.1, we exclude some regions of the ACT footprint from our noise models and simulations. We do this through the application of a boolean mask, µ bool . We note that µ bool can only have a value of 0 or 1 in a given pixel (corresponding to omitting the pixel, and permitting the pixel, respectively). Here, we motivate and define this boolean mask.
The most natural boolean mask is one that selects exactly the pixels that have non-zero hits; however, we construct a slightly more restrictive mask. In addition to unobserved pixels, we mask regions with low cross-linking that abut otherwise well cross-linked/deep regions. These regions are predominantly located on the left and right sides of the southern half of the ACT footprint (the solid black areas Figure 15). We found that masking these regions significantly improved the low-power spectra ratios with respect to the data (see §5). This can be understood as a fundamental limitation of our modeling: as with residual point sources, our models cannot resolve the step-function-like change from high cross-linking to approximately no cross-linking at the edges of these masked regions. In other words, if we did not mask these shallow, poorly cross-linked regions, their noise power would mix into adjacent deep, well cross-linked regions in the noise models. Importantly, these deep, well cross-linked regions are exactly those retained in the DR6 CMB power spectrum and CMB lensing pipelines -the main consumers of our noise simulations. Therefore, we try to optimize noise model performance in these priority regions, while sacrificing as little data as possible.
We construct µ bool separately in two disjoint regions. The first (northern) region includes all pixels north of −6 • Dec between 180 • and 125 • R.A., north of 3 • Dec between 125 • and −100 • R.A., and again north of −6 • Dec between and −100 • and −180 • R.A.. The second (southern) region is the complement of the northern region. First, in both regions, we set µ bool = 1 in pixels observed by all eight splits in both frequencies of a given modeled detector array, and 0 otherwise. Then, only in the southern region, we also set µ bool = 0 in pixels whose cross-linking statistic is less than 0.001 in the point-source subtracted coadd map of any DR6 array or frequency. A considerable fraction (0.65%) of µ est in the first region would be cut by the cross-linking threshold, hence why we only apply the cross-linking cut to the second region. In the second region, the fraction of µ est cut by the cross-linking threshold is only 0.04%. We add these pixels in the second region back to µ bool such that µ bool fully encompasses µ est . Together, this procedure is effective at removing the large areas of approximately zero cross-linking on the left and right sides of the southern region, while preserving the data used in downstream analyses. For PA5, for instance, this results in the blue outline in Figure 15. For the other arrays, the mask is visually indistinguishable.
The somewhat ad-hoc procedure of this section is only necessary given the properties of the ACT scanning strategy cross-linking. Future surveys can mitigate the need for a boolean mask by ensuring more uniform cross-linking.

F. MODEL SMOOTHING
This appendix expands on our discussion of how and why we smooth our wavelet noise models. As discussed in the text, we use a white-noise map space covariance, modulated by either the mapmaker cross-linking and/or inversevariance maps, as the fiducial input to our optimization procedure. This simplification has a twofold advantage. First, we showed in §3.2 that the cross-linking and inverse-variance maps approximately capture the spatial noise structure at large and small scales, respectively. Since we do not know the true noise covariance of the DR6 noise, these products serve as realistic proxies. In particular, by downgrading these maps to specific, lower resolutions (and Nyquist bandlimits), we can emulate the noise covariance at specific wavelet scales. Second, by assuming the noise is white, we can evaluate the agreement between noise models and the input covariance using an analytic likelihood.
We proceed as follows: 1. For a given array and frequency, select either the raw inverse-variance map alone, h, or the combination of h and the raw cross-linking map, f p . Using Fourier space resampling, downgrade all maps by a given factor d, such that the Nyquist bandlimit of the maps is max = 21, 600/d. In both cases, invert the downgraded inverse-variance map to obtain a "variance" map, 1/h.
2. In the first case, let the variance map define an an underlying, true, white-noise covariance map, σ 2 0 = 1/h. This map only has high variance when the inverse-variance is small. In the second case, let the underlying, true, white-noise covariance map be defined as σ 2 0 = f p /h. Relative to the inverse-variance map, this map has high power when the cross-linking is also poor (f p ∼ 1), following the discussion of Equation 12.
3. Mask σ 2 0 with the same boolean mask for the selected array as in §4.5.3.  Figure 26. Model likelihoods (−lnL) as a function of smoothing factor f . Solid lines (shaded bands) denote the mean (1σ range) over the Ni = 3 noise realizations from the fiducial input covariance. Curves have been recentered and normalized relative to their minima to facilitate direct comparison. Left: Likelihoods using the PA5 f150 cross-linking and inverse-variance combination map (σ 2 0 = fp/h) as the input covariance. Different colors denote different choices of downgrade factors. Right: Likelihoods for the set of noise modeling routines enumerated in the appendix, using the PA5 f150 inverse-variance map (σ 2 0 = 1/h) downgraded by a factor of four as the input covariance. The data labeled "White (MC)" ("White (Model)") refer to the white-noise model's s 2 i,MC map (s 2 i,model map). The agreement between the curve and overplotted crosses demonstrates that 30 simulations are sufficient to reach convergence of Monte Carlo estimates. For the other models, we only have the option of using the Monte Carlo estimates. For both panels, the relatively small scatter over the Ni = 3 noise realizations indicates the optima are robust.
4. Draw an uncorrelated, Gaussian realization from N (0, σ 2 0 ), called s i . Let this be analogous to the hypothetical i-th independent realization of the noise in the data. 5. Feed s i into one of several noise modeling routines, utilizing a fixed FWHM smoothing factor f in all models: • A white-noise model which is simply the pixel-wise square of s i , s 2 i,model . • The isotropic wavelet model of §4.3, without the pseudospectrum-filtering step.
• The directional wavelet model of §4.4, without the pseudospectrum-filtering step.
6. Draw N sim = 30 simulations, s i,sim , and form the Monte Carlo estimated, white-noise covariance map s 2 i,MC = 1/N sim s 2 i,sim . For the two wavelet models, this is necessary because it is not possible to directly compare the model itself, which is in the wavelet basis, to the input covariance, σ 2 0 , which is in the map basis. For the white-noise model, as noted above, we retain both the Monte Carlo estimate, s 2 i,MC , as well the model itself, s 2 i,model , since the model is in the map basis. By comparing results using s 2 i,MC to s 2 i,model in the next step, this allows us to check that N sim = 30 simulations is sufficient for the Monte Carlo estimates to converge (e.g., note the agreement between the blue crosses to the blue line in Figure 26). 7. For the white-noise model, the isotropic wavelet model, and the directional wavelet model, evaluate the likelihood of s 2 i,MC given the true covariance σ 2 0 , L(s 2 i,MC |σ 2 0 ). For the white-noise model only, also evaluate L(s 2 i,model |σ 2 0 ). To avoid the poorly-observed edges of σ 2 0 , only perform the evaluation using pixels where the pseudospectrum mask from §4.5.3 is nonzero. 8. Accumulate results over N i = 3 input noise realizations.
The likelihood, L(s 2 i,MC |σ 2 0 ), follows the gamma distribution in each pixel x: −lnL(s 2 i |σ 2 0 ) = − x P gamma (s 2 i |α = N sim /2, θ = 2σ 2 0,x /N sim ) where P gamma is the gamma probability density with shape parameter α and scale parameter θ. That is, for a given pixel with variance σ 2 0 , we evaluate the likelihood that the average of N sim = 30 independent samples of that variance equals the observed value. We minimize −lnL as a function of the smoothing factor f , for various modeling routines and downgrade factors d. Example likelihoods are shown in the left panel of Figure 26. As a motivating example, we dispel the misconception that noise models with more wavelet kernels, and with thus with more parameters, require increased model smoothing. This view arises from the observation that, in the case of a single-frequency, unpolarized map, the directional model, with its 280 wavelet kernels, contains a factor ∼53 more parameters than the input data dimension. Compare that to the white-noise model we enumerated above, which has exactly the same number of parameters as the input data dimension. However, this argument overlooks that these 280 wavelet kernels are compact in Fourier space: their corresponding wavelet maps will therefore be smooth, with highly-correlated parameters. In fact, they will be correlated by exactly the right amount to guarantee that the number of degrees-of-freedom will be equal to the input data dimension. This is a consequence of Equation 28: with more kernels supporting a single element of the map space covariance matrix, Equation 28 averages over equally more terms, and with proportionally smaller weights.
To confirm this expectation, we perform the above procedure for each of the listed noise modeling routines. We use inverse-variance maps downgraded by a factor of four as inputs, and examine the results in the right panel of Figure  26. On the whole, we find the likelihood is generally insensitive to the noise modeling routine. In fact, the likelihood slightly prefers less smoothing as the number of wavelet kernels increases. This preference can also be understood as a consequence of the spatially-smooth wavelet kernels: they have a fundamentally limited ability to localize features in map space, and so enter the regime of "over-smoothing" sooner than a perfectly-local white-noise model. We factor this limitation into our final choice of smoothing factors in §4.5.4.

G. BENEFITS OF PSEUDOSPECTRUM FILTER: MODE-DECOUPLING AND STATIONARITY
The noise pseudospectrum filter is necessary for the construction of minimally biased noise models. We demonstrate that its exclusion leads to significant deviations from nominal in ratios of simulated versus input power spectra. We also demonstrate that our model estimation and simulation algorithms -when including the filter -suppress these deviations even if the filter is suboptimal. We examine the performance of a tiled and wavelet model in both cases.
We proceed as follows: 1. We draw a full-sky, isotropic realization from an input fiducial noise power spectrum (bandlimited to max = 5, 400).
• For the supoptimal-filter case, both models, we construct the input spectrum from scratch. The spectrum is much closer to flat than the actual noise spectra, but is not exactly flat (see Figure 27). It is equivalent to the situation where we have drawn from the same unfiltered PA5 f090 noise spectra, and then applied an imperfect noise pseudospectrum filter, leaving some residual -dependence to the filtered noise realization.
2. We mask this realization with the same boolean footprint mask as the actual PA5 f090 data (see §4.5.3).
3. From these realizations we generate a noise model as in §4, with some simplifications: • For the tiled model, we exclude the inverse-variance map filter step since the input noise realization is isotropic.
• For all cases, we exclude the pseudospectrum filter step.
• For all cases, we do not smooth the models at all. This is especially important for the tiled model whose sparse basis is in Fourier space: smoothing directly mixes power across scales and could also produce deviations in output/input power spectra ratios. For consistency, we do not smooth the wavelet model.
4. From these models, we draw a single simulation as in §4, with the analogous simplifications.
5. We compare the pseudospectra of the ouput noise simulation with the input noise realization, as measured in the apodized pseudospectrum mask of §4.5.3.
The downsides of ignoring the pseudospectrum filter are self-evident in Figure 27. As discussed in §4.2.1, multiplying each tile by an apodized mask in map space mixes modes in Fourier space. When the power spectrum of the signal Output/Input Power Spectra, Suboptimal-Filtering Wavelet ratio Tiled ratio Figure 27. Top-left: Noise pseudospectra from the actual data used as input power spectra for the no-filtering case. They are the mean over splits from the same curves as Figure 3. The black dashed line indicates an arbitrary normalization to the spectra which has no effect on the results. Top-right: Power spectra ratios between tiled and wavelet simulations, and their input noise realizations. The input realization from the tiled (wavelet) model is drawn from the EE (TT) spectrum. The tiled model deviation has been divided by 10 to fit on the same plot. Power spectra entering the ratios are smoothed with a tophat kernel of size ∆ = 25. Bottom-left: Toy spectrum used as input for the suboptimal-filtering case. Now the black dashed line signifies that this spectrum represents the effect of an imperfect filter. Bottom-right: Power spectra ratios for tiled and wavelet simulations, built off the same toy realization. The tiled model deviation is no longer divided by 10. Again, power spectra entering the ratios are smoothed with a tophat kernel of size ∆ = 25.
in the tile is especially steep, as in the case of the PA5 f090 EE noise, a given mode receives excess power from nearby modes at slightly larger scales. In the simple demonstration here, we observe an excess in simulation power of ∼500% near = 200 relative to the input noise power. Meanwhile, as discussed in §4.3.1, the wavelet model assumes stationary noise in harmonic space; this assumption is clearly violated by the strong -dependence of the PA5 f090 TT noise power spectrum. Instead, the model projects out the stationary part of the noise -that is, without an absolute -dependence -within each wavelet kernel. This produces a "staircase-like" noise model in harmonic space, with a flat step on each wavelet kernel, rather than the smoothly decreasing input spectrum. Thus, we observe an oscillating ratio between the simulation and the input realization By including a (sub)optimal pseudospectrum filter we suppress these issues. As Figure 27 shows, even if the power spectrum of the input noise contains O(1) variations over , simulations drawn from the noise models are less biased over a large range of angular scales. Such a power spectrum could plausibly arise in §4 after the pseudospectrum filtering step in regions of sky whose local noise power spectrum does not match the one used in the filter. We judge these improvements to be worth the cost of the filter's limited map space resolution discussed in §6. Finally, we note that when applied to the real data, the pseudospectrum filter does not guarantee unbiased simulated power spectra, even if the filter perfectly matches the actual noise power spectrum. Rather, this appendix goes to show that such a filter is necessary, but not sufficient, for a performant noise model.