Velocity reconstruction with the cosmic microwave background and galaxy surveys

The kinetic Sunyaev Zel'dovich (kSZ) and moving lens effects, secondary contributions to the cosmic microwave background (CMB), carry significant cosmological information due to their dependence on the large-scale peculiar velocity field. Previous work identified a promising means of extracting this cosmological information using a set of quadratic estimators for the radial and transverse components of the velocity field. These estimators are based on the statistically anisotropic components of the cross-correlation between the CMB and a tracer of large scale structure, such as a galaxy redshift survey. In this work, we assess the challenges to the program of velocity reconstruction posed by various foregrounds and systematics in the CMB and galaxy surveys, as well as biases in the quadratic estimators. To do so, we further develop the quadratic estimator formalism and implement a numerical code for computing properly correlated spectra for all the components of the CMB (primary/secondary blackbody components and foregrounds) and a photometric redshift survey, with associated redshift errors, to allow for accurate forecasting. We create a simulation framework for generating realizations of properly correlated CMB maps and redshift binned galaxy number counts, assuming the underlying fields are Gaussian, and use this to validate a velocity reconstruction pipeline and assess map-based systematics such as masking. We highlight the most significant challenges for velocity reconstruction, which include biases associated with: modelling errors, characterization of redshift errors, and coarse graining of cosmological fields on our past light cone. Despite these challenges, the outlook for velocity reconstruction is quite optimistic, and we use our reconstruction pipeline to confirm that these techniques will be feasible with near-term CMB experiments and photometric galaxy redshift surveys.


I. INTRODUCTION
Measurements of the Cosmic Microwave Background (CMB) radiation are entering an unprecedented era of high-resolution and low noise. Existing experiments such as the Atacama Cosmology Telescope [1] (ACT) and South Pole Telescope [2] (SPT), near-term experiments such as Simons Observatory [3] (SO), and future experiments such as CMB-S4 [4] and CMB-HD [5] will map the small-angular scale anisotropies in CMB temperature and polarization with steadily increasing precision. Many of the new opportunities on this frontier arise from CMB secondaries: temperature and polarization anisotropies associated with the gravitational or electromagnetic scattering of CMB photons from structure at low redshift. For example, existing data from ACT has been used to reconstruct the lensing potential at high fidelity over a significant fraction of the sky [6], and future experiments will enable precision cosmological constraints on e.g. neutrino masses using information from these reconstructions. Simultaneously, galaxy surveys will map ever larger volumes with increasing numbers of spectroscopic redshifts and photometric redshifts of improving precision; near-term surveys include The Rubin Observatory Legacy Survey of Space and Time [7] (LSST), Dark Energy Spectroscopic Instrument [8] (DESI), and Euclid [9]. Existing galaxy surveys have been combined with CMB datasets to make a (statistically) significant detection of the kinetic Sunyaev Zel'dovich (kSZ) effect [10][11][12][13][14][15][16], temperature anisotropies induced by the scattering of CMB photons from free electrons in bulk motion. Again, future experiments will allow for precision science using the kSZ effect (as described in more detail below). Beyond lensing and kSZ, there are other CMB secondaries in temperature, including the moving lens (ML) effect, [17,18], other non-linear components of the integrated Sachs Wolfe effect [19][20][21], patchy reionization [22], and rotational kSZ [23]; additional effects exist in the polarized component of the CMB (e.g. [24,25]). Assessing the detectability of CMB secondaries and their utility in improving our understanding of cosmology is an active area of investigation, to which the present paper aims to contribute.
In this paper, we focus on the kSZ and the ML effects. Schematically, the kSZ temperature anisotropies are given by a line-of-sight integral over the product of the radial component of the peculiar velocity field 1 and the electron density; the ML temperature anisotropies are a line-of-sight integral of the transverse gradient of the gravitational potential and the transverse components of the peculiar velocity field. These secondaries contain significant cosmological information owing to their dependence on large-scale bulk velocities.
One way of accessing this information is through the technique of kSZ tomography [26][27][28][29]: tomographic reconstruction of the radial velocity using the CMB and a tracer of large scale structure. A quadratic estimator for the radial velocity field was developed in Ref. [28], which relies on the anisotropic crosspower between a galaxy redshift survey and the kSZ component of the CMB. Such a reconstruction can be used as a general-purpose cosmological observable, and is in principle a powerful probe of primordial non-Gaussianity [30], relativistic effects in galaxy surveys [31], modified gravity [32], CMB anomalies [33], and isocurvature perturbations [34,35].
A quadratic estimator for the transverse velocity based on the correlation between a galaxy redshift survey and the ML component of the CMB temperature anisotropies was presented in Ref. [36]. For consistency of terminology we will refer to the tomographic reconstruction transverse velocities using the ML effect as 'ML tomography'. The ML effect is roughly an order of magnitude smaller than kSZ on small angular scales (∼ arcmin), so while kSZ has been detected at the greater than 5σ level, current datasets are not yet sufficient to have made a detection of the ML effect. Even so, the ML effect will be measured with high signal-to-noise with upcoming CMB experiments [36][37][38]. As discussed in Ref. [39], transverse velocities from ML tomography, when combined with galaxy measurements, can provide competitive constraints on the combination of the linear-theory growth rate f and the amplitude of matter fluctuations σ 8 on the scale of 8h −1 Mpc, comparable to the scenario where the redshift-space distortions (RSDs) are modelled with high accuracy. This parameter is useful for studying a large range of physics, including dark energy [40], modified gravity [41] and the effects of neutrino mass [42]. Precision measurement of f σ 8 also allows one to use the kSZ effect to learn about astrophysics, such as the characteristics of the electron density profiles around halos, by breaking degeneracies in kSZ between the electron-scattering optical depth and the growth rate [29].
The formalism for velocity reconstruction using the kSZ effect is reasonably mature, and several approaches have been developed to analyze simulations and forecast the capabilities of kSZ tomography in future experiments. In the 'Box Picture' of [29], the kSZ effect is estimated from the momentum field along one direction in a 3D box at the median redshift of a galaxy survey. This formulation is convenient since it sidesteps spherical projection effects and allows one to work in the familiar Fourier domain. This approach is a good approximation for reconstruction on relatively small sky areas and over limited ranges in redshift. The study of various systematics is straightforward in the Box Picture, and Ref. [29] estimated the impact of redshift space distortions and photometric redshift errors on velocity reconstruction. Subsequent work explored important contributions to the variance of the quadratic estimator [43] and explored the effect of the optical depth bias [29,43,44], the uncertainty in the reconstruction induced by imperfect modelling of the correlation between the distribution of electrons and galaxies. Ref. [43] validated the quadratic estimator in the Box Picture using a suite of N-body simulations, and demonstrated that the constraints on primordial non-Gaussianity forecasted in Ref. [30] could be realized in practice. It is, however, cumbersome to accurately incorporate redshift evolution, relativistic contributions to the kSZ effect, and large sky area in the Box Picture.
In the 'Light Cone Picture' introduced in Refs. [26,28], kSZ tomography is formulated in terms of observables on our past light cone. In the Light Cone Picture, it is straightforward to incorporate redshift evolution and relativistic contributions to the radial velocity field (promoting it to the remote dipole field). However, it is less straightforward to incorporate photometric redshift errors and other systematics. Another drawback of the Light Cone Picture are the cumbersome projection integrals, which make computing observables more expensive. The quadratic estimator in the Light Cone Picture was validated using N-body simulations in Ref. [45], where the impact of gravitational lensing, redshift space distortions, and the non-linear evolution of structure were taken into account. In an application to future datasets, the Light Cone Picture has a number of advantages. Perhaps most importantly, it is formulated in terms of direct observables (e.g. fields on the sphere), making contact between theory and observation precise. The signal to noise of the reconstruction is largest on the largest scales, where the redshift evolution and projection effects captured by the Light Cone Picture are most important.
A primary goal of this paper is to further develop the formalism for the Light Cone Picture for both kSZ and ML tomography. We begin by developing a self-consistent theoretical framework based on the Halo Model [46] for predicting the auto and cross spectra of fields on the light cone, including dark matter density, electron density, velocities, galaxy number counts, the Newtonian potential and its time derivative, as well as the frequency-dependent contribution to the CMB from the thermal Sunyaev Zel'dovich (tSZ) effect and the Cosmic Infrared Background (CIB). We employ a coarse-graining scheme in radial distance along the light cone based on Haar wavelets and use this complete basis to perform line of sight integrals for the kSZ, ML, ISW, lensing, tSZ, CIB, and binned galaxy number counts. The contributions we include for the CMB represent the most important (in terms of the amplitude of power spectra) blackbody and frequency dependent components. A challenging aspect of kSZ and ML tomography is that large scale fields are reconstructed from small angular scale anisotropies, implying one must model both small and large scales well. We develop tools to accurately compute spectra from the dipole down to sub-arcminute angular scales. We quantify the level of coarse graining given a fiducial CMB experiment and galaxy survey that will be necessary to capture the relevant cosmological information accessible using kSZ and ML tomography.
Another goal of this paper is to assess the impact of various foregrounds and systematics on kSZ and ML tomography. Previous work has largely neglected these effects in forecasts. We assess the impact of extragalactic foregrounds by forecasting the level of residuals in auto and cross-spectra given a fiducial CMB experiment and the resulting effect on the variance of the quadratic estimators. We develop a formalism, analogous to bias hardening in CMB lensing [47], to remove biases associated with photometric redshift errors, and compute the variance of the resulting unbiased estimators in both a redshift-binned and principal component basis. The kSZ and ML quadratic estimators are biased by other sources of statistical anisotropy in the CMB-galaxy cross-power such as CMB lensing. We confirm that these biases are small enough to be neglected in near-term experiments. A related systematic arises from redshift calibration errors on large angular scales, or any other effect that modulates the amplitude of the underlying statistically isotropic CMB anisotropies or galaxy number counts. This leads to a statistically anisotropic modulation of the CMB-galaxy cross-power that can bias the kSZ and ML quadratic estimators. While this is the dominant source of estimator bias, we determine that it is below the estimator variance for the fiducial CMB experiment we consider.
To assess the impact of partial sky coverage for the CMB experiment and galaxy survey, we work with simulations in map space. We develop a numerical framework to produce sets of properly correlated CMB maps and redshift binned galaxy number counts, assuming the underlying fields are Gaussian. We derive and implement a set of real space quadratic estimators and an associated pipeline to reconstruct the radial and transverse velocity fields from an ensemble of simulated CMB maps (including both blackbody components and foreground residuals) and correlated binned galaxy number count maps. We find that no significant bias is introduced by masking, and confirm that the reconstructed power spectra can be approximated by simply scaling by the fraction of the sky that remains unmasked. For ML tomography, we highlight a challenge posed by numerical errors associated with spherical harmonic transforms in the low signal to noise regime that near-term experiments will operate in.
Our results suggest that the main limitations on kSZ and ML tomography will be various modelling errors that give rise to a biased reconstruction of the velocity field. The largest among these are biases introduced by photometric redshift errors and mis-modelling of the galaxy-electron cross-power spectra, highlighting areas for future work. With a fixed experimental setup, improvements in the fidelity of the reconstruction can be made through better foreground removal techniques on small angular scales. However, our investigations have not found any effect that seriously impacts the performance of kSZ and ML tomography as presented in previous literature, which is good news for these techniques. As a companion to this paper, we have released a publicly available code: (Re)construction (C)ode for (C)osmological (O)bservables, ReCCO 2 . This code can be used to compute various power spectra, generate properly correlated Gaussian mock maps, and perform radial and transverse velocity reconstruction. We hope that this is a useful tool for future forecasts and data analysis.
The plan of the paper is as follows. In Sec. II, we outline the formalism for kSZ and ML tomography and in Sec. III we outline our construction of various contributions to the CMB and galaxy density field. In Sec. IV we perform a detailed forecast for a set of fiducial datasets including various biases to the reconstruction. We present and validate a simulation and reconstruction pipeline for the radial and transverse velocity fields in Sec. V, and conclude in Sec. VI. We summarize various technical details in several appendices.

II. FORMALISM
In this section, we describe the formalism for kSZ and ML tomography (velocity reconstruction) in the Light Cone Picture. We begin by reviewing the projection of cosmological fields onto our past light cone. Coarse grained cosmological fields on the light cone constitute the inputs to our estimator formalism, and we outline a coarse graining scheme as well as the statistics of the coarse grained fields. We outline the quadratic estimator formalism, proceeding from the simplest to the most realistic scenario.
A. Continuous fields on the light cone Continuous fields defined on our past light cone constitute the basic building blocks of our formalism. A simple way of constructing a field on the light cone is to take the projection of an underlying 4-dimensional space time field U(η, x), where η is the conformal time and x are the comoving spatial coordinates. If one parametrizes the light cone with a unit direction vectorn (the line-of-sight) and a comoving distance χ, then the projected field is defined by: F(n, χ) ≡ U(η(χ), x = χn). (1) In many cases, it is convenient to express F(n, χ) in terms of the spatial Fourier moments of the field U, defined by: which gives F(n, χ) = d 3 k (2π) 3Ũ (η(χ), k) e ik·χn .
The line-of-sight dependence of the light cone field can be expanded in terms of spherical harmonics: and we will refer to the coefficients F m (χ) as the light cone moments (LC moments for short) of the field F. The LC moments can be expressed as: which we can further simplify to: where j (kχ) is a spherical Bessel function. It is possible to define fields on the light cone using a more complex projection of the underlying field U than the one used in Eq. 1. For example, the projection could depend on the directionn or introduce weights depending on the conformal time η. A more general expression for the LC moments is then: where K (χ, k) is an integral kernel determined by the particular observable and typically containing linear combinations of spherical Bessel functions.

B. Integrated and coarse grained fields on the light cone
A second type of building block of our formalism are line-of-sight integrals of continuous fields on the light cone. Given a generic window function W (χ) we define the windowed F field: and its spherical harmonic moments: where F m (χ) are the LC moments defined in the previous section. Given a finite portion of the light cone determined by an interval [χ min , χ max ], we can consider a complete set of normalized functions µ i (χ) and expand the LC moments of the field F: where the coefficients F i m are obtained using Eq. 9 with W (χ) = µ i (χ). From now on, we refer to these coefficients as the µ-binned LC moments of the field F. In this paper, we choose to expand the LC moments in the radial direction using the Haar basis. Haar wavelets are defined on the interval χ min ≤ χ ≤ χ max by: with s = 2 p + q − 1 for integer p, q for s > 0; for s = 0, the Haar wavelet is h 0 (χ) = 1 √ χmax−χmin . The scale is determined by p and the location is determined by q; for each value of s there is a unique choice of p, q. The Haar basis functions are orthonormal over the interval χ min ≤ χ ≤ χ max : We choose the Haar basis to expand the LC moments because conveniently the truncated Haar expansion up to s = N − 1 is equivalent to representing the LC moments by their average values in comoving bins of equal size ∆χ = χmax−χmin N : where Π α (χ) = 1 ∆χ , χ min + j∆χ ≤ χ < χ min + (j + 1)∆χ, 0, otherwise (14) and F α m are the Π-binned LC moments (we reserve Greek letters to index the Π-binned LC moments and Latin letters to index the Haar-binned LC moments). This property allows us to express the LC moments in a way that coarse-graining in the radial direction is clear: where the first sum represents the 'coarse' or 'bulk' radial modes and the second sum, orthogonal to the first, represents the 'fine' modes that don't contribute to the bulk averages. We note that the spherical Fourier-Bessel decomposition (see e.g. [48]) could have been chosen instead of the Haar basis used here. In the context of galaxy redshift surveys, a comparison between the spherical Fourier-Bessel decomposition and the redshift-binned approach employed here can be found in Ref. [49]. Exploring the advantages of various choices of basis is deferred to future work.

C. Statistically isotropic correlations
The statistically isotropic correlations between µ-binned and integrated light cone moments can conveniently be expressed in terms of a set of angular auto-and cross-spectra that depend on and the window labels only. Let's consider two fields F and G on the light cone, constructed from underlying 4-dimensional fields U F and U G as described in above, and integrated on the line of sight with windows W and W respectively. The cross-spectra is: where we have assumed a statistically isotropic cross-correlation power spectrum between the underlying fields: Although a brute force computation of the integrals in Eq. 16 is feasible for certain values of and certain χ ranges, the oscillatory behaviour of the integral kernels make such an approach cumbersome if accuracy across a wide range of multipole moments and redshifts is desired. This is exactly our case, as we aim to have consistent modelling of large-angle and small-angle observables across a large redshift range. The Limber approximation (see e.g. [50]) can be used to simplify the oscillatory integrals and provide accurate spectra under certain circumstances. For our purposes, an implementation of the Limber approximation is challenged by several factors: first, part of our calculations require narrow window functions, which can drive the Limber approximation beyond its regime of validity if the multipole is not high enough. Second, the Limber approximation only picks up the equal-time contribution to the cross-correlation power spectra (χ 1 = χ 2 ), and does not capture non-negligible contributions from unequal-time correlations [51]. Here we adopt the 'Beyond Limber approximation' method from [52], which separates Eq. 16 into a term suitable for the Limber approximation and a term with separable structure that allows for fast Bessel integrations. We briefly summarize this method in Appendix A.

D. Statistically anisotropic cross-correlations
We now discuss our modelling for anisotropic cross correlations between the temperature field and a windowed density tracer on the light cone. We write the observed temperature field as the sum of two contributions: where the first term I(n), analogous e.g. to the primary CMB, represents all the contributions to the temperature coming from integrated light cone fields: and the second term, analogous to various CMB secondaries such as kSZ, consists of the line of sight integration of the product of two light cone fields. Consider as well a tracer of large scale structure obtained as a line of sight integration of a density field δ(χ,n) on the light cone: We assume that I(n), M (χ,n), G(χ,n) and δ W (n) are isotropically correlated among each other as described in Sec. II C. The second term in Eq. 18 leads to a statistical anisotropy when the temperature is correlated with the density field: Let's construct a quadratic sum of temperature and density multipoles with the following structurê and choose weights G M α W L such that the estimator is unbiased: and has minimum variance. The first condition translates to The minimum variance estimator can be found using the Lagrange multiplier method subject to the constraint Eq. 34, which gives: where C ΘΘ is the full temperature power spectrum. A complete derivation can be found in Appendix B. In the computation of the estimator variance used to derive G M α W L we have only included the disconnected part of the temperature-galaxy-temperature-galaxy four-point function. Under this approximation, the estimator variance is given by: However, looking in more detail at the estimator variance: we see that there is a dependence on a six-point function of the underlying fields. Therefore, even if all the fields are Gaussian, the disconnected four-point function is not a complete description of the estimator variance -one must in principle include the 15 terms that contribute to the disconnected six-point function.
Fortunately, as we describe in Appendix C, for the observables in this paper the relevant components of the six-point function do not yield any significant additional variance beyond the terms in Eq. 36. This additional contribution to the variance was computed for kSZ tomography in the box formalism in Ref. [43], where in analogy with lensing reconstruction, it was referred to as the N (1) bias and was found to be negigably small. For non-Gaussian fields, one must additionally compute the connected part of the six-point function. This was also computed in Ref. [43], where it was shown that this 'N (3/2) bias' is far larger than the N (1) bias, and can even become comparable to A M α L at sufficiently high SNR. A full computation of these additional contributions to the variance within the light cone picture will appear in future work. Moving forward, we will only consider the contribution from C M α M α L and A M α L in our estimator variance.

Case 2: Single modulating field and multiple Π modes
The next step to add more realism is to have multiple Π-binned LC moments from a single modulating field. Recall that the number of Π bins corresponds to the degree of coarse graining in the line of sight integral for the temperature: and We want to construct N unbiased quadratic estimators, one for each modulating source M α 1m1 . The strategy we choose is to first construct N biased estimators, by taking N versions of the single field estimator Eq. 32 described in Case 1: where the weights A M α L , G M α ,W L are chosen exactly as if M α 1m1 was the only source of statistical anisotropy. These estimators will be biased: . . .
We can define a 'rotation matrix": with indices X, Y in (M 0 , . . . , M N −1 ). With this matrix we write the system Eqs. 41 as: If the rotation matrix is invertible, we can now define unbiased estimatorsM LM for M LM : The procedure above serves as an example of bias hardening the quadratic estimators [47] in the presence of multiple sources of statistical anisotropy in the temperature-density cross-correlation 3 . The two point function for the unbiased estimator is: where M LMM † LM is the two point function of the biased estimator; note that this is a matrix containing all auto-and cross-spectra. Similarly to Case 1, the biased two point function can be written in terms of 4-point and 6-point functions and broken down into signal and noise terms: where X, Y are indices in (M 0 , . . . , M N −1 ) and X m , Y m are the vectors: The dominant terms of the 2-point function of the biased estimator are: where C M M L is the modulating field covariance matrix and the elements of the N 0 L matrix are given by: It is easy to check that the diagonal elements satisfy (N 0 L ) αα = A M α . This is to be expected because we constructed the unbiased estimators as collections of the Case 1 estimator. Eq. 49 has further contributions from the 6-point function: some relatively simple terms proportional to the covariances C BB L ,C δ W δ W L ,C δ W B L , etc., and more complicated terms coming from various contractions of the 6-point function. Again, these contributions are expected to be small enough to be neglected.

Case 3: Single modulating field, multiple Π and Haar modes
The estimators constructed in Case 2 ignore the contributions to the temperature multipoles and temperature-density statistical anisotropy coming from the Haar-binned LC moments of the field M (n, χ). We have where k max is chosen such that the Haar expansion is mostly converged. If we follow the same steps as in Case 2 to construct N biased estimators, the Haar modes lead to an additional bias: The relevance of this bias depends on the truncation number N , which in principle can be chosen to be high enough such that the contribution from Haar modes can be ignored. Thus, quantifying the size of these terms as a function of N is a useful way of determining the level of coarse graining that we need to model our observables with. The 2-point function of the estimator can be computed using the same expression Eq. 47 as in Case 2 just by expanding the vectors X, Y defined in Eqs. 47-48: The resulting 2-point function contains three terms which we identify as dominant: where R L is the rotation matrix defined in Eq. 43, C M M L is the modulating field covariance matrix, N 0 L is computed exactly as in Case 2, and N M M f ine L is given by: We call this term the fine mode noise, as it is sourced by the Haar modes of the modulating field above the truncation number N . N M M f ine L can become comparable to N 0 L for low enough N . Conversely, one can find a high enough truncation number N such that the fine mode noise can be neglected. In a realistic scenario, the truncation number is limited by the details of the 3-dimensional large scale structure survey that is being used for the reconstruction. In further sections we will show the size of the fine mode noise in the estimation of the radial velocity and transverse velocity Π-binned LC moments.

Case 4: Multiple modulating fields, multiple Π and Haar modes
Generalizing the results from the previous cases to the multiple field case is straightforward. The temperature and the statistical anisotropy are: Similar to the previous cases, we can construct N biased estimators for the Π-binned LC moments of the M field. The 2-point function is calculated using Eq. 47 as in Case 2 just by expanding the vectors X m , Y m to: The number of terms contributing to the 2-point function will clearly increase with the introduction of new modulating fields. In addition to the terms from Case 3, each new modulating field Q will introduce a term N QQ L given by: as well as other less significant terms with a similar structure. In principle, there is no immediate way to determine if N QQ L are negligible with respect to N 0 L and N M M f ine L , as this depends on the specific details of the modulating fields giving rise to the temperature signal. We show examples in Sec. IV, where we compute the reconstruction noise for the radial and transverse velocity.

Multiple density windows
We have thus far discussed the construction of estimators for Π-binned LC moments of a light cone field M (n, χ) using only one window window function W (χ) for the density tracer. We remind the reader that the window function shows up in the coupling Eq. 27 through the cross-spectra C B α δ W ≡ B α m δ W m , where B(n, χ) is the field integrated together with M (n, χ) to form a temperature secondary. Since the estimator relies on large multipoles ( , ), where correlations along the light cone are relatively small, the cross-spectra will be non-negligible only if the window function W (χ) overlaps with Π β (χ). A density window function with wide support on the light cone will lead to well defined couplings and estimators for all the Π-bins, but at the same time leads to an increased mixing of the 3-dimensional information we are trying to reconstruct. In contrast, more localized density window functions will be better at isolating contributions coming from different redshifts, but can lead to ill-defined estimators for Π-bins with zero overlap with the density window. One can remedy this last issue by constructing estimators with a numer of density window functions such that Π α (χ) and W α (χ) overlap: One intuitive choice is to take W α (χ) ≡ Π α (χ), which could be a possibility if one has 3 dimensional measurements of the large scale structure that can be separated into custom redshift bins 4 . The rotation matrix is defined similarly to Eq. 43, where the only difference comes from changing the window functions to match that used in the estimators: where W X is the density window function associated to the observable X. The 2-point function introduced in Case 2 can be easily generalized to include the varying density window functions: The advantage of using multiple localized density window functions over fewer, wider ones, is clear: more 3-dimensional information of the light cone fields is retained. In this paper we will explore a case with many window functions and a case with a single broad window function to compare these two scenarios.

Principal component analysis
Consider a set of estimators for the Π-binned LC moments of a field M (n, χ), constructed used the methods described above. The variance of the estimator is given by: where N is the sum of all sources of noise. Although the Π basis is useful when it comes to separation of scales and localization on the light cone, it can be less useful when it comes to separating the independent information contained in the 2-point function of the estimator. Using a principal component analysis, we can find the uncorrelated linear combinations of bins that yield the highest signal to noise. We do so by the following set of transformations: • Transform to a basis in which the noise matrix is diagonal.
• Perform a second transformation to a basis in which the noise matrix is the Identity.
• Perform a third transformation to a basis in which the signal matrix is diagonal. The noise matrix, due to being equal to the identity, is unchanged by the third transformation. The resulting signal matrix C pp L is diagonal and contains the signal to noise for the different uncorrelated principal components. The linear combinations of bins associated to the principal components can be found using the transformation matrices T 1 , T 2 , T 3 , R L : where X = (M 0 LM , . . . , M N LM ). The j-th principal component then is characterized by a set of N coefficients c jβ L such that: The signal to noise per mode for the j-th principal component is simply given by the diagonal element (C pp L ) jj . We define a signal to noise per harmonic LM mode as We define the total signal to noise as a sum over all principal components and harmonic modes:

Multiplicative bias from theory modelling
In order to construct the quadratic estimators, one has to assume a model for the couplings Eq. 27, which depend on C B α δ W (χ). If an incorrectC B α δ W is used instead of the true physical C B α δ W , a multiplicative bias will be introduced: whereR L is the rotation Eq. 64,∆ is the reconstruction noise, and Γ XY L is the multiplicative bias: where f Y W Y L is the true physical coupling. The bias Eq. 72 is not symmetric in the indices X, Y so in principle there are N 2 bias parameters at each scale L. In the context of kSZ velocity reconstruction, where the B-field is the optical depth, this multiplicative factor is commonly referred as the optical depth bias; see e.g. Refs. [29,43,44,53]. As discussed in more detail in Sec. IV A 2, the bias does not depend on the scale L over the relevant range for reconstruction, leaving a total of N 2 bias parameters to account for. Note that in the absence of off-diagonal terms in the rotation matrix Eq. 64 (or if these terms are very small), there would only be N bias parameters. This is the assumption that has been made in previous literature utilizing the light cone picture to forecast cosmological constraints, e.g. Refs. [31][32][33]. We comment on this assumption and the general problem of mitigating the optical depth bias in Sec. IV A 2.

III. MODELING OF OBSERVABLES
In this paper, we will be interested in statistically anisotropic correlations between various contributions to the observed CMB and a tracer of large scale structure (LSS). Our goal is to use such statistical anisotropies to reconstruct (on large angular scales) a set of modulating fields -here, our focus is on the radial and transverse velocity fields. Our prototype tracer is a photometric galaxy redshift survey, as considered in e.g. Refs. [26,[28][29][30]; other tracers such as spectroscopic surveys [29,30], the Cosmic Infrared Background (CIB) [54], or line-intensity maps [35,55] are other interesting candidates. Throughout the paper, we assume a fiducial cosmological model consistent with Planck 2018 [56]; in particular, we set: {10 9 A s = 2.2, n s = 0.965, Ω m = 0.31, Ω b = 0.049, H 0 = 68 km s −1 Mpc −1 , τ = 0.06}. There is no strong dependence on cosmological parameters for any of our conclusions.
When necessary, we present relations in the Newtonian gauge, where at late-times when we can neglect anisotropic stress; the metric is: Because our Halo Model code calculates perturbations in the synchronous gauge, it is sometimes necessary to relate Newtonian gauge quantities to synchronous gauge ones. For late-times, simple relations can be written for the Newtonian gauge gravitational potential Ψ and the peculiar velocity field v of dark matter in terms of the synchronous gauge dark matter perturbations in Fourier space: where H is the Hubble rate and f is the growth rate, defined as a D dD da , with D(a) the linear theory growth factor of dark matter perturbations. Given that perturbations of the galaxy and electron fields are only needed on small scales for the reconstruction procedure, we approximate δ . Unless stated otherwise, we work in natural units with = c = G N = 1.

A. Constructing observables
All of the observables presented below are constructed from a set of fundamental cosmological fields, which we compute using linear cosmological perturbation theory and the Halo Model for large scale structure. Combining Eqs. 7 and 9, a field on the light cone can be characterized by specifying a window function W (χ), an integral kernel K F (χ, k), and an underlying perturbation field in Fourier spaceŨ F (η(χ), k) : Below, we refer to these functions as the 'building functions' and we specify them for each observable we construct.  B. The CMB The de-beamed CMB temperature measured in a frequency band ν through an instrument with an isotropic beam (B Θ ) ν and noise n ν m has contributions from a variety of sources. As a baseline model, we take: There are frequency-dependent extragalactic contributions (Θ XG m ) ν , whose dominant components for the experimental configurations considered below include the CIB and the thermal Sunayev Zel'dovich effect (tSZ). Finally, there is a frequency-dependent galactic component (Θ G m ) ν . Below we describe in detail the components which have significant cross-correlation with late-time tracers of LSS, since such components must be computed in a self-consistent way. The SW, Doppler, and early ISW contributions Θ pCM B m to the primary CMB do not contribute to the cross correlation with tracers of LSS; we compute their power spectra using CAMB [57]. The reionization kSZ component Θ ReI m is modelled as a Gaussian field with power spectrum 2 C rei /2π = 1µK 2 . If higher redshift tracers of LSS are considered, then the reionization kSZ can be used to reconstruct the radial velocity field as described in Ref. [58]; in this paper, we focus on tracers of LSS that do not have any significant cross-correlation with reionization kSZ. For the fiducial CMB experiments considered below, including reionization kSZ does not affect any of our results, and we therefore neglect this contribution in our analysis. Galactic foregrounds on the small angular scales relevant to velocity reconstruction are generally sub-dominant to extragalactic foregrounds on a line of sight away from the galactic plane (see e.g. Ref. [59]). We assume that regions with significant galactic contamination can be masked. Other than considering the effect of a mask, we therefore neglect galactic foregrounds. We model instrumental noise n ν m as a frequency-dependent constant and the beam (B Θ ) ν as a Gaussian with a frequency-dependent Full Width at Half-Maximum (FWHM) θ ν FWHM . Our fiducial CMB experiment is consistent with the properties of the Simons Observatory Large Area Telescope [3], with: where N red describes the level of 1/f atmospheric noise and N white describes the sensitivity of the frequency band. The frequencies, beam, and noise levels we use in our analysis below are collected in Table I. We assume an observation time of 5 years when computing the level of 1/f noise, and choose the 'baseline' values for noise found in [3]. In Fig. 1, we summarize the blackbody components of our CMB model: the primary CMB, late-time ISW, lensed CMB, kSZ, and ML. Each of these contributions is discussed in more detail in the following sub-sections. At low-, the dominant components are the primary CMB and late-time ISW effects. Crucially, at high-( 4000), kSZ is the dominant blackbody component of the CMB. In the left panel of Fig. 2, we show the frequency-dependent components of our CMB model, including extragalactic foregrounds and instrumental noise. In the right panel of Fig. 2, we compare the effective noise obtained by using multifrequency information with the blackbody component of the CMB and with the noise and foregrounds in the 'cleanest' channel (for velocity reconstruction) of our fiducial experiment, at 145 GHz. Note that the blackbody CMB dominates the noise and foregrounds below 3000. We assume that multifrequency information can be used to clean foregrounds using a standard harmonic space internal linear combination (ILC) procedure, described in more detail below. Such a procedure can reduce the level of noise and foregrounds by roughly a factor of 2 at high-when compared to the 145 GHz channel. Unless otherwise specified, in the analyses to follow we will use the ILC-cleaned CMB generated with the specifications in Table I; we consider a maximum value of max = 6000 which roughly corresponds to maps with a HEALPix 5 resolution NSIDE of 2048. We now describe in more detail how we model the various CMB components listed above.

The kSZ effect
The contribution to the CMB temperature from the late-time kSZ effect is: whereτ (n, χ) is the differential optical depth and is the remote dipole fieldthe locally observed CMB dipole at points on our past light cone, projected along the line of sight. The dominant contribution to the remote dipole comes from the radial peculiar velocity of electrons denoted as v, and small corrections to the observed dipole come from the intrinsic dipole anisotropy of the CMB. In general, the correction from the intrinsic anisotropies can be safely neglected, only becoming significant when inspecting the correlations of the remote dipole field on ultra-large scales. For simplicity, we will only consider the dominant kinetic term in this paper, and use the terminology radial velocity in lieu of remote dipole. Henceforward, we approximate: The differential optical depth isτ where σ T is the Thompson cross section, a(χ) the scale factor,n e (χ) the average electron density, and δ e (n, χ) the electron overdensity field. We will focus on the late-time kSZ effect here, where the limits of integration extend from the origin out to a radial comoving distance χ max after reionization ended. We assume a fiducial value of χ max = 8.1 Gpc, which corresponds to a redshift z max = 5 in our fiducial cosmology. Computing the multipoles of the kSZ temperature anisotropies Eq. 79 in terms of the Haar-binned LC moments, we have: For the radial velocity we use the following building functions: where H(χ) is the Hubble rate and f (χ) is the growth rate, defined as a D dD da , with D(a(χ)) the linear theory growth factor of dark matter perturbations. For the differential optical depth we use the following building functions: The kSZ temperature power spectrum is Keeping the disconnected parts of the four-point function only, the power spectrum is: where (C vv ) ss 1 , (Cττ ) ss 2 , (C vτ ) ss 1 , and(C vτ ) ss 2 are calculated using the building functions and Eq. 16. Focusing on 1, the majority of the power will come from 1 where 2 ∼ . In this regime, we also expect that there is little bin-bin correlation in the differential optical depth, so we can take (Cττ ) ss 2 (Cττ ) ss 2 δ ss . Finally, the first term in parentheses above will dominate the second on small angular scales. In this limit, the kSZ power can be approximated by: Taking s max → ∞, this is equivalent to the expression: which is consistent with previous literature [60]. In Fig. 3 we show the coarse grained kSZ power spectrum for χ max = 8.1 Gpc (corresponding to a redshift z max = 5 in our fiducial cosmology) with s max = 32, 64, 128, 256, and 512 bins (corresponding to ∆χ = 263, 131, 66, 33, 17 Mpc). We compare with the continuum expression Eq. 94. It can be seen from this figure that ∼ 512 bins, corresponding to a coarse graining scale of ∼ 17 Mpc, is sufficient to capture the majority of the kSZ power. Based on this, we take 512 bins to correspond to the continuum limit below.

Late-time ISW (linear)
Gravitational potentials that evolve in time induce a temperature anisotropy known as the integrated Sachs-Wolfe (ISW) effect. The late-time ISW contribution to the CMB is given by where χ max is a fiducial maximum range in comoving distance large enough to capture the majority of the late-time decay of the potential due to the presence of a cosmological constant. The building functions (see Eq. 76) for the linear late-time ISW effect are: where δ (lin) m are the Fourier modes of the linear dark matter perturbations. The power spectrum of the linear late-time ISW is calculated using Eq. 16.

Moving lens effect (non-linear ISW)
In the non-linear regime, the ISW effect can be sourced on small-scales by the long-wavelength peculiar velocities of dark matter halos. Taking the limit k k, the non-linear evolution of the gravitational potential due to the coupling of small-wavelength density fluctuations and long wavelength velocity modes can be approximated as:Ψ which translates in real space to:Ψ The ML effect is sourced by motions transverse to the line of sight: where v ⊥ is the peculiar (comoving) transverse velocity and ∇ ⊥ is the gradient on 2-sphere. In this work, we assume that the large-scale velocity is pure-gradient, and therefore the transverse velocity component can be expressed as v ⊥ = ∇ ⊥ Υ(n, χ). We refer to Υ as the transverse velocity potential. In spherical harmonics, the effect on the CMB temperature takes the form where and we will refer to this quantity as the moving lens potential. We can expand the signal in terms of the Haar-binned LC moments of Υ and ψ: For the transverse velocity potential we use the following building functions: For the moving lens potential we use the following building functions: The angular integral in Eq. 104 is given by: The ML power spectrum can be calculated in terms of the auto-and cross-spectra of the Haar-binned moments of Υ and φ, where (C ΥΥ ) ss , (C ψψ ) ss , (C Υψ ) ss , and (C Υψ ) ss are calculated using the building functions and Eq. 16.
To evaluate Eq. 112, it is necessary to truncate the sum at some s max . In Fig. 4, we plot the moving lens power spectrum for χ max = 8.1 Gpc (corresponding to a redshift z max = 5 in our fiducial cosmology) with s max = 32, 64, 128, 256, and 512 bins (corresponding to ∆χ = 263, 131, 66, 33, 17 Mpc). For 512 bins, the moving lens power spectrum is nearly converged for < 1000, but still missing some power at large-. Unfortunately, 512 bins is already challenging to compute, so we adopt 512 bins as our model for the continuum limit of ML. As we demonstrate below, the missing fine-grained information is not relevant for velocity reconstruction with ML for near-term experiments, so this is not an important restriction. Although we do not use it in the analysis below, we note that it is possible to find an approximate formula for the continuum ML power spectrum. In the squeezed limit where , we can set and neglect cross-correlations between the bins s and s as well as the cross-power between Υ and ψ: Defining θ ⊥ ≡ 1 2 ∇ 2 ⊥ Υ, and taking the continuum limit of the sum, we obtain: This expression may be of interest in future analyses of the ML effect.

Lensing of the primary CMB
We approximate the lensing of the primary CMB by the first order term: where ∇ ⊥ is the angular gradient in the unit 2-sphere and φ(n) is the lensing potential, defined as: In terms of the multipole moments of the primary CMB and the lensing potential, the lensed CMB contribution is written as: where the primary CMB is computed using CAMB and the lensing potential moments are computed using the building functions: The power spectrum calculation is similar to the one for the moving lens effect, and yields:

Extragalactic foregrounds
There are a number of extragalactic foregrounds that contribute to the CMB, whose relative importance depend on the frequency and scale being observed. At low frequencies ( 150 GHz) on arcminute scales, the thermal Sunyaev Zel'dovich (tSZ) effect and radio point sources dominate. At high frequencies ( 150 GHz) on the same scales, the CIB is the dominant extragalactic foreground. Below, we assume that enough radio point sources can be masked to make tSZ the dominant source at low frequencies. With this assumption, we include the tSZ and CIB only in our extragalactic foreground model.
We model the CIB and tSZ using the Halo Model for large scale structure, combining elements of the models described in Refs. [29,54,[61][62][63][64]. Our assumptions are outlined in detail in Appendix D. In Fig. 2, we show angular power spectra of the CIB and tSZ at several frequencies for our fiducial CMB experiment. Since all observables are computed within the same Halo Model, it is possible to capture the correlations between the CIB, tSZ, and galaxy number counts -e.g. the spectra in Fig. 2 include the CIB-tSZ cross-power. We discuss the detailed properties of the galaxy-foreground cross spectra below in Sec. III F 1.

C. Foreground cleaning of the CMB
To access the blackbody components of the CMB necessary for velocity reconstruction, we can estimate how well one can use the multifrequency information in the CMB to clean the extragalactic foregrounds in our model. Here, we use the harmonic Internal Linear Combination (ILC) algorithm [65].
We write the covariance between the de-beamed CMB at different frequencies as a matrix: where C T T contains the blackbody components of the CMB (primary CMB, kSZ, ML, etc.), e = {1, 1, 1, . . .}, C XG contains the CIB, tSZ, and the cross-correlations between these various components at the measured frequencies, and (B −1 N) is the de-beamed instrumental noise covariance (assumed diagonal). Following the ILC method in harmonic space [65], we estimate the blackbody component as: where the weights w that minimize the variance of the resulting multipole momentsΘ m are given by: The ensemble averaged power spectrum of the cleaned map is: To the extent that the second term is small, we have successfully isolated the blackbody component of the CMB in the resulting map. Note that the residuals represented by the second term include both foreground residuals as well as an effective noise for the linear combination of maps. In the right panel of Fig. 2, we show C clean for our fiducial CMB experiment. For the fiducial experimental parameters we choose, from the left panel of Fig. 2, we see that the experimental noise is somewhat larger than the extragalactic foregrounds. Therefore, much of the improvement of the cleaned CMB over the 145 GHz channel comes from a lower effective noise rather than the removal of extragalactic foregrounds. We can also estimate the cleaned galaxy-Temperature cross power: Here, because the CMB noise is uncorrelated with the galaxy field, there is no effective noise term. The ILC algorithm in this case reduces the variance of the cross-power due to the removal of extragalactic foregrounds.

D. Galaxy number counts
We now consider a tracer of the electron overdensity field, which for the purposes of the present paper we take to be the galaxy overdensity field, measured using a photometric redshift survey. Other tracers such as the redshifted 21cm Hydrogen line (or transitions such as CII) measured by line intensity mapping surveys [35,66], the CIB [54], or the dispersion measure of Fast Radio Bursts [44] have been considered as well. Spectroscopic surveys were considered in Refs. [29,30,43], which may be more computationally feasible to analyze in the box picture; we defer a discussion of spectroscopic surveys in the light cone picture to future work.
For the purposes of velocity reconstruction, the three dimensional information in a galaxy redshift survey is used to construct a series of 2-dimensional fields that are later cross-correlated with CMB temperature anisotropies. In harmonic space, these 2-dimensional fields can be expressed as integrals over redshift space: where z o denotes the observed redshift for the galaxies in the survey, g m (z o ) are spherical harmonic coefficients of the measured 3-dimensional galaxy overdensity field, and W α (z o ) is the window function used to construct the average. The equation above is not immediately related to the comoving space integral Eq. 9 of a light cone field introduced in Sec. II. First, the observed redshift z o of a galaxy may be subject to instrumental errors and therefore different from the actual redshift z . Second, due to redshift space distortions (RSD), the redshift z can be different from the background cosmological redshift z of the galaxy (which is simply related to the comoving distance χ). The second issue can be safely ignored for high enough multipoles, where the RSD correction to the power spectrum is unimportant [29,67]. Since only small angular scale galaxy data is necessary for velocity reconstruction, we don't include RSD in our modelling (for the impact of RSD on correlations between velocity reconstruction and number counts see [31]) and will treat the actual redshift z as the cosmological redshift z. The issue of measurement errors is discussed below in the context of a redshift galaxy survey subject to photometric redshift errors. In our analysis below, we consider two prototype galaxy surveys: a LSST-like survey with many photometric redshift bins and a WISE-like survey with a single wide photometric redshift bin. For velocity reconstruction, these two surveys will be used as prototypes for the 'multiple density window' and 'single density window' cases for the quadratic estimators described in Sec. II E.

LSST-like survey
For the LSST-like survey, we consider Gaussian errors on photometric redshifts, with the probability of assigning redshift z o to a galaxy with true redshift z (following Ref. [7]) given by: where σ z = σ 0 (1 + z) with σ 0 parametrizing the size of the photometric errors. We assume a fiducial value of σ 0 = 0.05. With this probability distribution, the galaxy average for the window W α (z o ) can be expressed as an integral over the actual redshifts z: and in terms of the comoving distance where we have defined the effective window function and g m (χ) are the light cone moments of the underlying galaxy overdensity field. The angular power spectrum between two galaxy redshift bins coming from a photometric survey can then be expressed using Eq. 16 plus a shot noise term: where P gg (χ 1 , χ 2 , k) is the galaxy-galaxy power spectrum computed using the Halo Model (consistent with Refs. [29,30]; see Appendix D for a summary), K g (χ, k) = 4πi j (kχ) is the galaxy projection kernel from three dimensional Fourier space onto the sky, andn g α is the number of galaxies per steradian in redshift bin α. We assume shot noise that is uncorrelated between redshift bins, and compute the number density per bin assuming the galaxy number density n(z) per square arcmin is [7]: with z 0 = 0.3 and n g = 40/arcmin 2 . We construct the effective window functions using: with Π α (χ) defined as in Eq. 14. In the limit of σ 0 → 0, where photometric redshift errors can be neglected, these window functions correspond to normalized top-hat windows in comoving space. We show the effects of the photometric errors in the galaxy-galaxy covariance matrix in Fig. 5. Bin-bin correlations are enhanced as expected and the auto-power at a particular bin is reduced due to the contamination from distant bins. The principal components of the galaxy survey can be found using the procedure described in Sec. II E 6 just by appropriately replacing the signal and noise matrices. Fig. 6 compares the effect of different photometric redshift error levels on the total signal to noise Eq. 70 of the galaxy survey as a function of the number of bins N . As expected, we observe that the photometric errors put a limit on how much radial resolution our galaxy survey can have. For our fiducial value of σ 0 = 0.05, the signal to noise is mostly saturated for more than 32 redshift bins.

unWISE-like survey
For the unWISE-like survey considered in this paper, we model the 'blue' sample used in Refs. [68][69][70] from the unWISE catalogue [71], which is based on data from the WISE mission [72]. This sample is characterized by a median redshift ofz ∼ 0.6 and is reasonably uniform over a redshift range of ∆z ∼ 0.3. The number density of the resulting map isn 0.95/arcmin 2 .
Following Ref. [68], we model the unWISE blue sample as a linearly biased tracer of dark matter plus shot noise. In particular, we model the galaxy-galaxy angular power spectrum as where the power spectrum is: Here, P mm (χ 1 , χ 2 , k) is computed in the Halo Model as described in Appendix D. The galaxy window function W ef f is simply the normalized comoving galaxy density where the redshift distribution of galaxies dN/dz is defined to be normalized as 1 = dz dN/dz and is reasonably uniform within a range ∆z ∼ 0.3 of the median redshfitz ∼ 0.6; the redshift distribution is shown in Fig. 15. The total number of galaxies in the survey is ∼ 1.4 × 10 8 , yielding a shot noise of 1/n g = 9.2 × 10 −8 . When performing velocity reconstruction, we must also compute the cross-power with the Π-binned optical depth and potential. In these cases, it is convenient to expand the observed moments of the galaxy overdensity as: and define a set of window functions We then define a set of binned galaxy moments as in Eq. 132 using the window functions Eq. 142. These binned galaxy moments are used to compute the cross-power with other Π-binned LC moments.

E. Galaxy survey systematics
Aside from the photometric redshift errors described above, one must consider a wide variety of systematics associated with a galaxy survey, many of which manifest on large angular scales (see e.g. [73][74][75][76][77]). Systematics that modulate the observed number counts of galaxies are the most problematic for velocity reconstruction, as they lead to a statistically anisotropic cross-power between the galaxy overdensity and CMB temperature that mimics the signal of interest. Additive effects that are uncorrelated with extragalactic sources, e.g. mis-identified stars included in the sample, are less problematic, adding only noise to the estimators but not bias. Starting from the observed number counts (following Refs. [75][76][77]), we model systematics effects as: where N W α g (n) are the number counts of galaxies in a bin (e.g. sample) defined by the window function W α . The modulating field c(n) encodes calibration errors which we might expand as a sum of effects associated with the instrument/observation strategy, extinction due to galactic dust, etc. Defining the underlying galaxy overdensity field g W α (n) by N W α (n) =N W α (1 + g W α (n)), withN W α the mean number of objects on the sky, the moments of the observed galaxy overdensity field are: where g W α m is defined as above in Eq. 132 and is the correction to the mean number counts from each moment c m . Below, we neglect this correction to the mean. To model the form of the large-angular scale systematics, we assume that the modulating field c(n) is a Gaussian random field with power spectrum: where for a LSST-like experiment we set the fiducial value for the amplitude A c such that the variance of c(n) satisfies: which corresponds to a level of calibration error somewhere between the best current data sets and futuristic data sets (see Fig. 3 of [78]). For unWISE, we use a value of 10 −2 for the variance of the calibration error field.
F. CMB temperature-galaxy cross power As discussed above, there are a number of components of the CMB temperature that are correlated with tracers of large scale structure, such as the galaxy surveys considered above. Some of these contributions, such as the late-time ISW and extragalactic foregrounds, have a statistically isotropic cross-power. On the other hand, secondary components of the CMB such as lensing, kSZ, and ML will have a statistically anisotropic cross-power with the galaxy survey. Indeed, this statistical anisotropy is the basis for velocity reconstruction. We now consider these two cases in turn.

Statistically isotropic cross-correlations
The observed CMB anisotropies have contributions that are isotropically correlated with galaxies, including: extragalactic foregrounds (CIB, tSZ) and the late time linear ISW effect. To calculate isotropic cross-correlations we use Eq. 16: and therefore we need to specify the window functions, integral kernels, and underlying power spectra for each of the temperature-galaxy signals. For the galaxies, we use the window functions introduced in Sec. III D and the integral kernel K g (χ, k) = 4πi j (kχ). Extragalactic foregrounds are themselves tracers of large scale structure, and therefore are well-correlated with binned galaxy density. We assume that extragalactic foregrounds can be described by random Gaussian fields. For these signals we use trivial window functions W (χ 1 ) = 1, kernels K extra (χ, k) = 4πi j (kχ), and underlying spectra P ν CIBg (χ, k) and P ν tSZg (χ, k) computed at each frequency ν using the Halo Model. In the left panel of Fig. 7, we show the cross-correlation between the extragalactic foregrounds at different frequencies and a LSST-like galaxy survey in the redshift bin z = (0.20, 0.26). We show as well the crosspower between the ILC cleaned temperature discussed in Sec. III C and the galaxy survey in that same bin. The right panel shows the cross-power between cleaned temperature and galaxies at different redshift bins together with the cross-power between the linear late-time ISW signal and galaxies. The ISW-g correlation is calculated using Eq. 16 with the corresponding building functions.

FIG. 7:
Left panel: Extragalactic foregrounds cross galaxies at redshift bin z = (0.20, 0.26) and ILC cleaned temperature cross galaxies (solid line). Right panel: ILC cleaned temperature cross galaxies for several redshift bins compared to the linear-ISW cross galaxies. At low-, the ISW component is relevant, while at high-it can be safely neglected.

Anisotropic cross-correlations
The main focus of this paper are the statistically anisotropic cross-correlations between the CMB and galaxy surveys, as these are what allow us to perform velocity reconstruction. We work in the basis introduced in Sec. II E, expanding in terms of Π and µ-binned LC moments to define the 'bulk' and 'fine' modes, respectively. For the kSZ-galaxy cross power we have: where the bulk mode couplings f v α W 1 and the fine mode couplings f v k W 1 are given by: and For the Moving Lens-galaxy cross power we have: where the bulk mode couplings f Υ α W 1 and the fine mode couplings f Υ k W 1 are given by: and The kSZ and ML cross-power with galaxies form the basis of the estimators used for velocity reconstruction. However, there are additional sources of statistical anisotropy in the cross-power that potentially introduce biases on the reconstructed velocity fields. Here, we focus on CMB lensing and large angular scale calibration error in the galaxy survey. For the CMB lensing-galaxy cross power we have: where the couplings f ΘW 1 are given by: For the calibration error contribution we have: where the couplings f cW 1 are given by: Other effects leading to a statistically anisotropic cross-power which we anticipate will be less important, and which we do not compute here, include: relativistic aberration of the CMB [79] (similar effect as calibration error, but with a smaller magnitude), SZ effects at higher order in velocity and temperature (see e.g. Refs. [80][81][82][83][84][85]), anisotropic/ill-characterized beam patterns in the CMB experiment (see e.g. Ref. [86] for an assessment of the impact on lensing reconstruction), and perhaps others. In the case of CMB lensing, note that the modulating field is the primary CMB temperature. Although we do not explore it further here, we note that a quadratic estimator for the low-primary CMB can be formulated from the CMB-galaxy cross power using the formalism introduced in Sec. II E. A similar estimator was introduced in Ref. [87] as a means to reconstruct the primary CMB dipole, which is not directly measurable due to the contribution from our local peculiar velocity.

IV. RECONSTRUCTION ANALYSIS
In Sec. II E, we discussed the details involved in constructing quadratic estimators for fields sourcing a statistical anisotropy in the CMB-LSS cross-correlation. We showed that information about these fields can be reconstructed up to a series of noise terms. The purpose of this section is to analyze the relations between signal and noise for the reconstruction of the Π-binned LC moments of the radial velocity v(n, χ) and the transverse velocity potential Υ(n, χ), sources for the kSZ-LSS and ML-LSS statistical anisotropies respectively. We estimate the signal and noise for a reconstruction using the modelling for the CMB, LSS, and their correlation presented in Sec. III.

A. Radial velocity reconstruction for SO x LSST
Applying the formalism of Sec. II E to the reconstruction of v α LM leads to a collection of estimators with the following correlation function: where the various terms are defined by: • R L C vv L (R L ) † : the covariance matrix of Π-binned LC moments of the radial velocity field. The rotation matrix R L defined in Eq 64 encodes the bin-bin mixing of the signal covariance due to the redshift error in the galaxy survey.
• N 0 L : the Gaussian reconstruction noise Eq. 50, with coupling functions defined by Eq. 149. This term comes from the disconnected contractions in Eq. 65 (e.g. ΘΘ δδ and Θδ Θδ ). Note that we do not include the non-Gaussian contributions to the estimator noise in the present analysis (e.g. the N (3/2) and N (1) noise terms, in the terminology of Ref. [43]); see Appendix C for discussion.
• N f ine L : the estimator variance coming from the fine mode bias Eq. 57, with coupling functions for the bulk and fine modes of the radial velocity field defined by Eq. 149 and 150, respectively. The relative importance of this term decreases with an increasing number of bins; we explore this in detail below.
• N Υ L : the estimator variance due to the moving lens effect, defined by Eq. 62 using the coupling function for the transverse velocity Eq. 149.
• N cal L : the estimator variance due to galaxy survey calibration error systematics, defined by Eq. 62 using the coupling function for the calibration error Eq. 156.
• N lens L : the estimator variance due the lensing of the primary CMB, defined by Eq. 62 using the coupling function for the lensing potential Eq. 154.
Note that we refer to the contribution R L C vv L (R L ) † as the 'signal' and all other terms as the 'noise' in the discussion that follows.
In Fig. 8 we show a few diagonal elements (i.e. α = β) of Eq. 157 for a near bin at z ∼ 0.5 and far bin at z ∼ 1.5 for SO x LSST with 32 bins. The dominant source of reconstruction noise is the N 0 L term, followed by the fine mode and calibration error contributions to the variance. The variance arising from the transverse velocity potential and lensing are negligibly small compared to the Gaussian estimator noise; we therefore neglect these terms in our analysis below.
There are significant bin-bin correlations in the estimator variance both due to the signal and the various noise terms. Photometric redshift errors in the galaxy surveys lead to mixing of radial information that contributes to the bin-bin correlation, and this radial mixing is captured by the rotation matrix R L . In Fig. 9 we show, for fixed L, the radial mixing for a set of redshift bins and illustrate how the mixing decreases when the photometric errors are smaller. The rotation matrix is found to be largely independent of the multipole L for L 200.
In Fig. 10 we show, for fixed L, the contributions to the bin-bin covariance from the various noise terms. The Gaussian reconstruction noise is correlated between bins, mainly due to the the correlation between structures in nearby bins induced by the redshift error in the galaxy survey. This is the largest contribution to the bin-bin covariance in nearby bins, independent of L. There is a less significant, but non-negligible, short-range correlation induced by the fine-mode noise which is most important at low-L. We observe that the bias from the calibration error induces long-range bin-bin correlations in the estimator variance, as expected due to our assumption that the calibration error is the same for each bin. Had we assumed a different calibration error in each bin, there would be no such correlation.

Principal components
In light of the significant bin-bin covariance present at all scales L in both the signal and the noise terms in the bin basis, it is instructive to consider the principal component basis where there is no covariance. The transformation to the principal component basis was outlined above in Sec. II E 6 and defined by: Note that we employ the full signal covariance and all noise terms in Sec. 157 to define the principal components. In Fig. 11 we show the j = 1, 2, 3 principal component coefficients c jβ L as a function of bin β for N = 64 bins at L = 1, 2, 5, 10. Note that at each scale L, the weight for the most significant principal components receives support primarily from lowest redshifts. This is where the galaxy density is relatively high (hence the reconstruction noise is minimized) and the amplitude of velocities is relatively large (e.g. due to linear growth). In addition, the number of nodes along the radial direction increases with L and for the lower signal to noise principal components at fixed L.
In the principal component basis, we can define a measure of the total signal to noise per mode LM by We evaluate this quantity in Fig. 12. Each panel of that figure compares, for a reconstruction with N redshift bins, the effect of adding the N f ine L and N cal L compared with the Gaussian reconstruction noise N 0 L . As expected, with an increasing number redshift bins, the fine mode contribution becomes less important and the calibration error becomes the leading correction to the Gaussian reconstruction noise. The reconstruction of the Π-binned LC moments of the radial velocity suffer a considerable loss of signal to noise per mode as we reduce the number of bins even when only including the Gaussian reconstruction noise. For our binning scheme, N = 64 corresponds to redshift bins of equal comoving size of approximately 110 Mpc. The coherence length of the velocity field is around 70 Mpc, and therefore it makes sense that the fine modes become more relevant for N = 32 and smaller, as the size of the bins are considerably larger than 70 Mpc. Even with N = 64 bins, comparing the orange and green curves, we see that calibration error leads to a significant degradation in SN LM of greater than 10%. Efforts to mitigate systematics in galaxy surveys on large angular scales can therefore meaningfully impact the fidelity of the reconstruction. Regardless, we see that velocity reconstruction with SO x LSST will have exceedingly high SNR on large angular scales, with SN LM > 1 for L < 30 with the most significant principal component.

Optical depth bias
As discussed in Sec. II E 7, incorrect modelling of the correlation between the electron and galaxy density leads to a multiplicative bias on the reconstructed radial velocity, commonly referred to as the optical depth bias. We illustrate how this bias shows up in our formalism by considering a one parameter toy model for the electron-galaxy correlation function, based on the Halo Model. If we fix the model parameters determining how galaxies inhabit dark matter halos, the electron-galaxy cross-power is determined by the model for the electron density profile inside dark matter halos (see Appendix D for more details). Due to physical processes such as AGN feedback, baryonic matter does not trace dark matter inside of halos. In Fourier space, this translates into an electron density profile u e (k, M, z) that is different from the dark matter density profile u(k, M, z). To explore a one-parameter family of models, we construct the following toy model for the electron density profile: where A is a continuous parameter that interpolates between the dark matter profile (A = 0) and our fiducial model (A = 1) that incorporates feedback. If we take our fiducial model to be the true model for the electron density, we can explore the optical depth bias when 'incorrect' A = 1 profiles are used for velocity reconstruction. Fig. 13 shows the behaviour of the elements of the bias matrix Eq. 72 as a function of the angular scale (left panel) and redshift (right panel). We find that the bias is practically independent of L for large angular scales L 200 (in accordance to what was found in [43]) and that it is less significant at higher redshifts, where the difference between electron and dark matter perturbations is less pronounced. The off-diagonal elements of the bias matrix are similarly scale-independent and approach 1 at high redshift. The L independence on large angular scales is a feature that we expect to be robust independently of the models under consideration. Note that the bias is always less than 1 for this family of models. This is because feedback in the fiducial model causes the electron halo profiles to be more diffuse than their host dark matter halos, leading to a power suppression at high-k, and therefore on small angular scales where the sums in Eq. 72 receive the most weight. Even in the extreme case where baryons are assumed to trace dark matter, the magnitude of the bias over the entire range of redshifts lies within a reasonably small range 0.6 Γ XY L < 1. To obtain cosmological constraints from velocity reconstruction using future datasets, it will be necessary to incorporate the optical depth bias into the analysis. For example, if we wish to obtain constraints on a set of cosmological parameters m appearing in the radial velocity power spectrum C vv L (m), it is necessary to compare (via e.g. a likelihood function) the measured velocity spectra to the model: where (N L ) αβ includes the most relevant noise terms (e.g. N (0) , fine-mode, calibration) and (ΓR) ij = Γ ij (u e )R ij (σ z ), where we assume the optical depth bias and rotation matrix are independent of L (a good approximation, as shown above) and indicate explicitly the dependence on the electron profile u e and redshift error σ z . To get access to the cosmological information contained in C vv L (m) it is necessary to encapsulate the redshift errors and electron profile into a set of nuisance parameters that can be marginalized over. In the absence of any modelling, there are N 2 bin nuisance parameters. This is the same number of independent entries in C vv L (m) that determine redshift-redshift correlations, implying that the only residual cosmological information is in the shape of the velocity angular power spectrum. This impedes one's ability to learn about e.g. the growth function using the reconstructed velocity field. But this scenario is far too pessimistic, as it does not incorporate information from other sources, or physical constraints present in the modelling.
Optimistically, it may be sufficient to characterize redshift errors and the electron profile by a small number of model parameters. For example, the fiducial model of Gaussian redshift errors considered above contains a single parameter σ 0 . Assuming for the moment that this is an accurate model for LSST redshift errors, there is a single model parameter associated with the rotation matrix. In addition, one can put a prior on the ranges this parameter might take by using other available information: the galaxy-galaxy power spectrum itself, simulations, comparing with a spectroscopic survey, etc. Likewise, if Eq. 160 is a reasonable description of the range of possible electron profiles, then a single model parameter would determine the optical depth bias. Again, one could incorporate additional measurements to provide a prior on A, for example by independently measuring the galaxy-electron cross power using Fast Radio Bursts [44] or by correlating the reconstructed velocity field with the galaxy survey [29,31,88] or the reconstructed transverse velocity field [39]. In reality, there are likely more than two model parameters to consider to fully characterize redshift errors and the electron profiles. But by evaluating a range of physical models and finding complementary observations, one can likely put an informative prior on the N 2 bin degrees of freedom in the product of the rotation matrix and optical depth bias.
In a number of previous analyses, e.g. Refs. [31][32][33][34], it was assumed that the rotation matrix was diagonal, and therefore that the optical depth bias consisted of N bin parameters to be marginalized over. In the presence of photometric redshift errors, we have seen that this assumption does not hold; the off-diagonal nature of the rotation matrix gives rise to greater than N bin parameters. How many additional parameters need to be incorporated depends on how dominant the diagonal terms in (N L ) αβ are compared with the off-diagonal terms, since small off-diagonal terms can be neglected. This depends primarily on the magnitude of redshift errors, so more accurate photometric redshifts, or spectroscopic redshifts can simplify the analysis of the reconstructed velocity field. In future work, cosmological forecasts and analyses should take into account the off-diagonal terms in the optical depth bias, either through a physical model or by marginalizing over a sufficient number of degrees of freedom. Bias tends to 1 for higher redshifts as electrons trace dark matter more closely at earlier times.

B. 'Double' SO and pre-reconstruction vs post-reconstruction cleaning
In this section, we investigate two scenarios related to the effect of foregrounds on the reconstruction. First, we investigate whether or not additional frequency channels can help with mitigating the effect of foregrounds on the reconstruction. To do so, we define a hypothetical experiment we refer to as 'Double' SO, which has a set of channels (in GHz) at: 47,52,63,75,91,109,131,158,190,228,275, 330, 397, 478, 575, 691, 831, 1000. The boundaries and spacing of this selection were chosen to minimize residuals in the cleaned CMB temperature spectrum for our foreground model. Including frequencies below ∼50 GHz and above ∼1000 GHz provides no improvement for removing extragalactic foregrounds. Our choice of 12 frequency channels in the relevant range is somewhat arbitrary, and is simply meant to be representative of a reasonable number of detectors as compared to SO. To define the noise properties of Double SO, we first take the SO LAT TT noise model [3] assumed above and define a linear interpolating function on the three free parameters in the noise model, extrapolating when necessary to higher frequencies that are not in the SO selection. We then analyzed the reconstruction noise for the fiducial N = 64 bin case assumed for SO x LSST above. In Fig. 14 we show the signal to noise for the first two principal components over a range of scales for SO and Double SO x LSST. It can be seen that the Double SO experiment (true to its name) yields a signal to noise that is about twice as good as SO. This is due to a combination of a lower effective noise in the auto-power as well as a reduction of foreground residuals in the cross-power. This result illustrates the great room for progress in velocity reconstruction with future instruments. Above, we considered the scenario where a linear combination of CMB maps was used to remove foregrounds before velocity reconstruction. It is also possible to perform velocity reconstruction on each frequency map, and then find the linear combination of reconstructions that minimizes the variance of the level of the reconstruction. To do so, we describe the power spectrum of the reconstruction as We can then apply the same harmonic space ILC method defined above to find a map that minimizes the variance due to reconstruction noise: where the ILC weights w are defined using the reconstructed spectra: Since we know the signal C vv we can subtract this from the reconstruction in the pre-reconstruction cleaning scenario to arrive at the residual noise, which we compare directly to w † N w from the post-reconstruction cleaning scenario. We find that the residuals for pre-reconstruction cleaning are smaller than the residuals for post-reconstruction cleaning. Therefore, we focus on the scenario where foregrounds are mitigated before reconstruction is performed.

C. Radial velocity reconstruction for SO x unWISE
We now turn to the second scenario we consider, where velocity reconstruction is performed with SO and data currently available from the unWISE blue sample. Here, there is a single galaxy window function, which is plotted in Fig. 15. We consider a reconstruction using 8 bins in the redshift range between 0.2 < z < 1.5 corresponding to a comoving bin width of ∆χ 450 Mpc. To compute the fine mode noise, we use 512 bins in the same redshift range. We increase the calibration error from our LSST framework by a factor of 10 2 to account for the difference in precision of redshift measurement between the two experiments. Because there is a single galaxy window function, the reconstructed velocity field and reconstruction noise will be highly correlated among the 8 bins in which we perform the reconstruction. Therefore, it is crucial in this case to use the principal component basis. Fig. 15 shows the α = 1 principal component coefficients c αβ L both with and without the inclusion of fine-mode noise and calibration errors for L = 1 and L = 5. Note that for L = 1, the first principal component roughly traces out the unWISE window function dN/dz when the fine-mode noise and calibration errors are neglected. However, the first principle component becomes oscillatory once the additional noise is included. This is due to the redshift-redshift correlations of the noise terms obscuring the redshift correlations in the signal. The signal to noise of the first principal component at L = 1 drops from 25 to 16.4 as the additional noise terms are added. At L = 5, the first principal component has an oscillatory structure in redshift both with and without the additional sources of noise. The signal to noise of the first principle component at L = 5 is 1.7 and 1.3 with and without the additional noise terms, respectively. Therefore, most of the signal lies at the lowest L. Analyzing the higher principle components, they make an insignificant contribution to the signal to noise at all scales. We therefore can focus on the first principle component only.
We explore the effect of changing the number of bins used in the analysis by computing the signal to noise SN LM defined in Eq. 69, summing over principal components at fixed L. We find that it is numerically difficult to consider greater than 8 bins. Large bin-bin correlations in the signal covariance and Gaussian reconstruction noise, especially in bins where the redshift distribution is small, lead to poorly conditioned rotation matrices (Eq. 43) that spoil the construction of the principal component basis. Therefore, we consider scenarios with 4 and 8 bins. The result for the signal to noise per mode, summed over principal components, is shown in Fig. 16. Here, the dependence of the signal to noise on the number of bins is less dramatic than for SO x LSST. This is to be expected, since not much information is gained by finer sampling in redshift due to the fact that there is a single wide galaxy window function. In this figure, we also demonstrate the effect of fine mode noise. For 4 bins, there is a significant correction beyond the Gaussian reconstruction noise. However, we see that for 8 bins, we are able to improve on the signal to noise in the presence of fine mode noise.
Finally, in Fig. 14 we compare the SN α LM attainable for SO x unWISE compared with SO x LSST. The signal to noise per mode for the first principle component for SO x unWISE is roughly an order of magnitude lower than for SO x LSST; for the second principle component the difference is three orders of magnitude. Although there is a significant galaxy density in the unWISE sample, yielding a small Gaussian reconstruction noise (at least over some range in L for the first principal component), there is little redshift information. We therefore can only expect to obtain coarse-grained knowledge of the velocity field from such an analysis. Nevertheless, this is in principle important information, and the reconstruction of the first principle component at signal to noise greater than unity can be obtained for L 10. This represents a modest, but non-trivial, number of well measured modes.

D. Transverse velocity potential reconstruction for SO x LSST
In this section, we discuss reconstruction of the transverse velocity potential Υ using the ML effect for the fiducial case of SO x LSST. As for the case of the radial velocity field, we must consider both the signal covariance as well as multiple sources of bias and noise: These various terms are defined as for the radial velocity estimator variance Eq. 157, using the coupling function for the transverse velocity Eq. 149. The term (N v L ) αβ is the bias induced by radial velocity in the kSZ effect, defined by Eq. 62 using the coupling function Eq. 149. In Fig. 17, we plot the diagonal components of the various terms in Eq. 165 for the same two redshift bins as the radial velocity estimator. For SO x LSST with N = 32 bins, the fidelity of the transverse velocity potential reconstruction in each bin is not high, with the signal covariance below the reconstruction noise for all redshifts and scales L. Note that the contribution to the estimator variance from lensing is significant, comprising roughly 10% of the signal in the high redshift bin shown in the right panel of Fig. 17. We also explore the off-diagonal correlations fo of the various noise terms, as shown in Fig. 18. The fine-mode and reconstruction noise lead to small and fairly localized contributions to the off-diagonal estimator variance; calibration error in the case of ML is negligible (in contrast to the case of radial velocity reconstruction, where calibration error was significant and led to long-range correlations). Lensing, which is the most significant contribution after the reconstruction noise and signal covariance, leads to long-range correlations between bins. Given the degree of bin-bin correlation, it is useful to define a principal component basis for the transverse velocity potential estimator. The coefficients for the first few principal components are shown in Fig. 19 over a variety of scales L. The first principal component, which is a weighted average of the transverse velocity potential over a reasonable range in redshift, has a significant signal to noise over a reasonable range of scales L 10. The next principal components have an increasing number of nodes in the radial direction. There is less structure generally in the radial direction than for the radial velocity estimator, owing mainly to the lower signal to noise. In Fig. 20 we show the total signal to noise per mode SN LM at scales L summed over all principal components. As can be seen in this figure, the signal to noise per mode is most significant at the largest scales, falling below unity at L ∼ 15. In contrast to radial velocity reconstruction, the increase in signal to noise per mode does not increase dramatically with an increasing number of bins.

V. VELOCITY RECONSTRUCTION PIPELINE
In this section we assess the performance of velocity reconstruction with future datasets using a suite of simulations and a reconstruction pipeline based on the quadratic estimators described in previous sections. Previous work has demonstrated the effectiveness of the quadratic estimator for reconstruction of the radial velocity field using N-body simulations both on the light cone [45] and in the box geometry [43]. There has been no previous work demonstrating the feasibility of transverse velocity potential reconstruction with the moving lens effect, a gap which we fill with the present work. Here, we focus on simulated data that consists of properly correlated random Gaussian fields including: the velocity field, galaxy number counts with photometric redshift errors, the electron density field, the primary CMB, the kSZ and moving lens contributions to the CMB, and extragalactic foreground contributions to the CMB. We develop a reconstruction pipeline for the radial and transverse velocity fields using fast real-space versions of the quadratic estimators described above. Theoretical modelling is an important component of velocity reconstruction, since it appears in the estimator for the Π-binned moments of the velocity fields and also in the rotation matrices required to de-bias the estimators. This makes a combined pipeline including both the simulation of the maps and the application of the estimators essential.
The benefit of using a Gaussian simulation framework is that the ensemble-average properties of the estimator are well-understood on the full sky using the results of previous sections, which allows us to validate the analysis pipeline. Another benefit is that we can isolate and investigate the effect of mapbased systematics such as masking on the reconstruction to compare with results on the full sky. Since the generation of correlated random Gaussian fields is far less computationally intensive than running a suite of N-body simulations, it is possible to explore ensemble averages, and quantify underlying numerical inaccuracies or biases. A disadvantage of this approach is that we miss important non-linear contributions to the reconstruction. As shown in Ref. [43], for radial velocity reconstruction this includes a contribution to the reconstruction noise analogous to the N (3/2) bias in lensing reconstruction [89]. At the resolutions considered in Ref. [43], this was in fact larger than the Gaussian contributions to the reconstruction noise by a factor of ∼ 2. At the somewhat lower resolution and higher instrumental noise we consider, we expect this contribution to be smaller, and sub-dominant to the Gaussian contributions. Another nonlinear effect included in Ref. [45] is redshift space distortions, which were found to have minimal impact on the reconstruction at the resolutions simulated. All previous work has relied on dark matter-only N-body simulations, making the approximation that baryons follow the N-body particles. This assumption will fail on the small scales relevant for velocity reconstruction. Under the assumption of statistical isotropy made above, this mis-modelling of baryons shows up as a multiplicative bias (see Sec. IV A 2 for a discussion), but at the non-linear level there may be additional effects. Future work with simulations should certainly include baryonic effects to explore the impact on simulations at the non-linear level.

A. Simulations
In this work, we approximate the primary CMB, galaxy number counts, components of the velocity field, electron density, moving lens potential, and extragalactic foreground contributions to the CMB as correlated random Gaussian fields. Using the complete set of spectra and cross-spectra between these fields, we can construct a multivariate Gaussian distribution from which to drawn properly correlated realizations. These realizations can be used to compute the kSZ and moving lens contributions to the CMB. Signals constructed this way will show the expected statistical anisotropy when correlated with the galaxy density. For each set of realizations, the quadratic estimator for the underlying radial velocity or transverse velocity potential can be applied, allowing us to validate the statistics of the estimators by averaging over many realizations.
Here is the list of steps we take to generate a suite of Gaussian simulations for radial velocity and transverse velocity potential reconstruction: 1. Determine fields for simulation: The first step we take is to determine which fields need to be simulated 'simultaneously", that is, from a single multivariate Gaussian distribution capturing all the crucial correlations. Ideally, all of the cosmological fields we consider in this paper should be simulated simultaneously. This can become a difficult computational task if we want to simulate the Π-binned moments of various fields in many redshift bins, which translates into large covariance matrices with non-vanishing off-diagonal terms and many high-resolution maps. The smaller the covariance matrix, the more likely it is that numerical errors can be avoided, so there is good motivation to be as economical as possible. We can ask ourselves, for example, which fields are necessary for a simulation of the kSZ signal. Certainly, joint simulations of the Π-binned moments of the radial velocity v, the differential optical depthτ , and the galaxy fields g are necessary if we want to ensure that the kSZ-g crosscorrelation has the correct statistical anisotropy. Having identified these completely necessary fields for velocity reconstruction, we can ask ourselves if the fields that source other forms of temperaturegalaxy statistical anisotropy should also be considered in the multivariate Gaussian distribution. The analysis of Sec. IV answers this question for us: the bias introduced by the moving lens effect and the CMB lensing are negligible. This means that, for simulating radial velocity reconstruction, the moving lens and the CMB lensing signals can be simply treated as 'effective' sources of CMB anisotropies with no statistical correlation with the galaxy distribution. Finally, we ask ourselves if the fields that are isotropically correlated with the galaxy distribution need to be simulated together with v,τ and g. These are the linear late-time ISW signal and the frequency cleaned extragalactic temperature foregrounds. The isotropic correlation between temperature and galaxies appears in the estimator weights Eq. 35. A quick inspection shows that, for our fiducial experimental noise levels, the relative difference in these weights when the small-angle ( > 200) temperature-galaxy cross-power is ignored is at most 3%. Thus, we consider it to be safe to ignore all isotropic correlations between temperature and galaxies in the reconstruction pipeline. Summarizing, we only need to generate simultaneous simulations of the Π-binned moments of v,τ , and g in order to capture all the important correlations for radial velocity reconstruction. The non-kSZ CMB anisotropies can be simulated separately from a single temperature spectra and later combined with the kSZ map from the joint simulations. The considerations above also apply for the transverse velocity potential reconstruction from moving lens.
2. Simulate the fields: Once we determine which fields have to be simulated from a single multivariate Gaussian distribution, we construct a joint covariance matrix C for 0 < ≤ max including all spectra and cross-spectra. Our fiducial resolution is N bin = 32 and max = 6144 (corresponding to the bandlimited multipole for a HEALPix map of N side = 2048). The choice of number of bins and angular resolution are fixed by computational resources; it would be desirable to include more bins when possible to incorporate the effects of fine-mode noise. At each we find the Cholesky decomposition L L † = C and generate data vectors a m corresponding to the 2 + 1 spherical harmonic coefficients at fixed of for all maps using the relation: where X is a vector of random Gaussian numbers with zero mean and unit variance. In general we find good agreement between the ensemble-average spectra and cross-spectra from simulations and the input spectra. At 30 the built-in routine for generating Gaussian maps in HEALPix (which generates realizations using a Cholesky decomposition as above) performs somewhat better than our algorithm. For large , the HEALPix algorithm performs worse than ours. We therefore employ a hybrid method, generating low-moments using HEALPix and high-moments using our algorithm. In either case, it is necessary to compute spectra and cross-spectra at sufficiently high accuracy to ensure that C is numerically positive definite and therefore the Cholesky decomposition well-defined. In this respect, our code for computing spectra is sufficiently accurate at the resolutions we have explored, but we expect increasingly accurate spectra are necessary for larger numbers of radial bins.
3. Construct the temperature signals: For radial velocity reconstruction, the kSZ signal is constructed from products of the simulated maps as: The part of the CMB temperature that we approximate as uncorrelated with galaxies is simulated as above, using a single temperature spectrum including: primary CMB, lensing contribution, moving lens contribution, and ILC cleaned extragalactic + intrumental noise components. The total temperature map is the sum of the uncorrelated map and the kSZ signal.
For transverse velocity potential reconstruction, the moving lens signal is constructed from products of the simulated maps as: The part of the CMB temperature that we approximate as uncorrelated with galaxies is simulated as above, using a single temperature spectrum including: primary CMB, lensing contribution, kSZ contribution, and ILC cleaned extragalactic + instrumental noise components. The total temperature map is the sum of the uncorrelated map and the moving lens signal.
For both reconstruction scenarios, we apply a mask corresponding to an SO-like experiment, consisting of a cut between a declination of -70 degrees and +20 degrees and a Galactic mask that removes ∼ 30% of the sky. The total sky fraction covered by the joint mask is f sky = 0.45.

4.
Run the estimator pipeline: The galaxy maps and temperature maps generated using steps 1-3 above are processed using the real space estimators described below and compared with the expected results based on the input fields.

B. Real-space estimators
The harmonic-space quadratic estimators for the radial velocity and transverse velocity potential cannot be implemented efficiently at the resolutions we wish to explore. We therefore derive mathematically equivalent real-space estimators that take advantage of fast forward-and inverse-spherical harmonic transforms, which can be efficiently implemented. A version of the real-space quadratic estimator for radial velocity reconstruction in the absence of foregrounds appeared in Ref. [45]. Here, we derive the equivalent estimator including foregrounds, and present a real-space version of the quadratic estimator for the transverse velocity potential.

Radial velocity estimator
To derive an efficient real-space estimator for the radial velocity, we start from the harmonic space estimator defined in Eq. 63. First, re-write G α L as: (169) Next, we use the relation and the definitions: Substituting into Eq. 63, the real-space estimator is given by: For the simulations presented below, where we effectively set C g W α Θ = 0 by not including the statistically isotropic correlations between the galaxy and temperature fields (as argued above, these contributions are insignificant for our fiducial CMB experiment), we can work at the n = 0 level. For different experimental configurations, it may become necessary to consider higher order terms if ( Note that the fields ξ α n (n), ζ α n (n),ξ α n (n) andζ α n (n) are convolutions of an azimuthally symmetric function and the moments of the CMB and galaxy density maps. In pixel-space, we can therefore write: where the 'beam' B ξ α n (|n −n |) for the field ξ α n (n), and the beams for the other filtered fields, are given by: B ζ α n (θ) = 2 + 1 4π and Bξ α n (θ) = 2 + 1 4π Bζ α n (θ) = 2 + 1 4π Some insight into the map-based properties of the estimator can be gained by examining the shape of these functions, which we plot in Fig. 21. The beams receive support only over a scale of ∼ 4 arcmin for the experimental parameters considered here, corresponding to ∼ 3 pixels at HEALPix resolution N side = 2048. This implies that the quadratic estimator is highly local, and that systematic errors due mixing information from masked or contaminated regions of the sky will be minimal. Unlike the case of CMB lensing (see e.g. [90]), we therefore expect that there is only a very small bias from the mask on the reconstructed velocity field.

Transverse velocity potential estimator
Following the derivation of the real space radial velocity estimator, we expand G α L , use the coupling function f Υ α W α L defined in Eq. 151, and use the definition Eq. 170 to obtain: + [ξ α 1;n (n)ζ α 2;n (n) −ξ α 2;n (n)ζ α 1;n (n)]−[ξ α 2;n (n)ζ α 1;n (n) −ξ α 1;n (n)ζ α 2;n (n)] , where the auxiliary functions are given by where we have defined the factor for convenience. As described in the previous subsection for the radial velocity estimator, these auxiliary functions can be viewed as a convolution in map space (Eq. 174) with a set of azimuthally symmetric 'beams' defined by B ξ α 1;n = 2 + 1 4π Bξ α 1;n = 2 + 1 4π Bξ α 2;n = ( + 1) 2 + 1 4π Note the terms that appear with an additional factor of ( + 1) in the beams defined above, which cause the transverse velocity potential estimator to be more localized than the radial velocity estimator. This is demonstrated in Fig. 21, where B ξ α 2;n=0 (θ) is seen to have support on smaller angular scales than the corresponding auxiliary field B ξ α n=0 (θ) for the radial velocity. As we describe further in Sec. V D, this property makes the transverse velocity potential estimator more susceptible to numerical errors. The ingredients of the real space quadratic estimator are formed by convolving these beams with the temperature or galaxy maps. For comparison, the size of a HEALPix pixel at N side = 2048, the default map resolution employed in later sections, is ∼ 1.7 armin. The beams are quite local, implying that the estimators will be rather insensitive to masking or local contaminants. This same property can lead to significant numerical errors, which we discuss further below.

C. Reconstruction on simulated maps: Radial velocity
We now present the results obtained by applying the real-space estimators derived above to the simulated maps for the fiducial data combination of SO x LSST employed in previous sections. We generated a set of 30 realizations, each with 32 bins in the redshift range 0.2 ≤ z ≤ 5 and output resolutions of NSIDE 2048. The resolution and number of simulations were dictated by available computational resources. Note that our reconstructions do not include fine-mode noise, since the simulations are constructed using a limited number of bins. We show examples of the reconstruction in Fig. 22, where we compare the rotated true velocities R L · v LM to the output of the estimatorv LM at two representative redshift bins located at z ∼ 0.5 and z ∼ 1.5. These maps have been filtered to show only the largest angular scales, where the reconstruction is signal-dominated. A visual inspection of these maps indicates that a successful reconstruction of the radial velocity has been achieved. A comparison of the angular power spectra for the reconstruction and the masked actual velocity field indicate good quantitative agreement. Before undertaking a full quantitative analysis of the full set of realizations, and comparing to theoretical expectations, we take a brief digression to discuss the effects of the mask on the reconstruction.

FIG. 22:
Top panels: Low-pass filtered maps of the true rotated velocities for an example realization in a low-redshift and high-redshift bin. Middle panels: Low-pass filtered reconstructed maps for the same example realization. Bottom panels: power spectra comparison between true and reconstructed maps in the two redshift bins. At the level of the maps and the power spectra, we see that large scales can be reconstructed with SO x LSST.
As we discussed in Sec. V B 1, we expect contamination from masked regions to extend only a few pixels from the mask boundary due to the local nature of the radial velocity estimator. We corroborate this by fixing the realization and subtracting the full-sky reconstruction noise (defined as the reconstruction minus the actual radial velocity) from the masked-sky reconstruction noise and studying the residuals, which we refer to as the mask bias. We can see from Fig. 23 that the dominant effect of mask is concentrated on the edge of the unmasked region as expected, and that this contamination can be removed post-reconstruction by extending the mask by a few pixels. Comparing the maps in the top and middle panels of Fig. 23, extending the mask by a single pixel removes the most contaminated regions of the map. In the bottom panel of Fig. 23, we see that the mask bias is always below the reconstruction noise, and that extending the mask by one pixel decreases the mask bias at the level of the power spectrum by orders of magnitude. We conclude that one need not worry about mask bias for reconstruction of the radial velocity. Note that the story presented here will become more complicated for apodized maps, since apodization will introduce a statistical anisotropy that may be picked up by the estimator and which must be accounted for in the reconstruction. In addition, from the perspective of the reconstruction, a smaller bias will be incurred by fitting and subtracting point sources rather than masking them. Returning to our ensemble of simulations, we now confirm that the statistics of the ensemble are as expected from the analytic estimates presented in Sec. IV A. On the full sky this can be thought of as a validation exercise for our simulations and reconstruction pipeline, since in the absence of numerical errors, the agreement should be perfect. On the masked sky, we determine what the effect of the mask is on the reconstructed power spectrum. To mitigate the mask bias, we extend the mask post-reconstruction by one pixel in the results presented below.
The top panel of Fig. 24 compares the ensemble average reconstruction signal and noise to the theoretical expectations on the full sky. Comparing the theory signal (blue) to the simulated signal (green dashed), there is excellent agreement in both redshift bins. Comparing the power spectrum of the reconstruction (red dot-dashed) to the theory signal and the theory noise (orange), we see excellent agreement in both the signaldominated and noise-dominated regimes. To obtain a reconstruction noise, we compute the power spectrum of the reconstruction minus the actual signal (purple dots) and the result of applying the estimator to a temperature map whose kSZ component is uncorrelated with the galaxy density (brown squares). In both cases, the agreement with the theory reconstruction noise is excellent, aside from some excess at low-L, at the level of about one percent of the signal. We attribute this residual to numerical error in the reconstruction. This result is a powerful validation of our simulation and reconstruction pipeline.
The bottom panel of Fig. 24 shows the comparison between the theory and ensemble-averaged reconstruction on the masked sky. We make no attempt here to de-project the mask from the power spectra (e.g. using the methods of Ref. [91]), and simply find the power spectra of the masked maps. The theory curves (signal and noise) have been multiplied by f sky = 0.44 to account for the loss of variance from masking.
Comparing the theory signal multiplied by f sky (blue) to the simulated masked signal (green dashed) the scale-dependent effect of the mode-coupling with the mask is evident, especially at low redshift. Nevertheless the factor of f sky gives a reasonable estimate of the power spectrum of the velocity field on the masked sky. Comparing the reconstruction on the masked sky to the masked actual velocity and the theory noise reduced by f sky , we see good agreement at both high-and low-L. Checking this in more detail by finding the difference between the reconstruction and the actual masked signal (purple dots), we again find good agreement with the expected reconstruction noise aside from a few percent excess at low-L and low-redshift. The collection of reconstructed maps are not statistically independent of each other due to velocity correlations along the light cone. On the full-sky, harmonic moments with different multipoles L, M are independent from each other and only correlated in the radial direction. Using the principal component decomposition described in Sec. II E 6, we can construct N bin linear combinations of Π-binned harmonic multipoles a j LM = α c jβ L a β LM such that a j LM and a i LM are uncorrelated for all i = j. Note that the transformation coefficients depend only on the multipole L. The maps constructed using these rotated harmonic moments at each L constitute the principal components of the radial velocity on the light cone. The power spectrum of the principal components at each multipole L is diagonal and is obtained by rotating the bin basis power spectrum c L C L (c L ) † , where c L is the matrix defined by the coefficients c jβ L . Fig. 25 shows the true maps, reconstructed maps and spectra for the 2 highest signal to noise principal components from a single realization. The 'unit' of the principle component power spectra and maps is signal to noise, since the noise has been normalized to unity. Clearly, the fidelity of the principle component reconstruction is far higher than for the single bins presented in Fig. 22 (although the information in the set of principle component maps is equivalent to the information in the set of bin maps), making the principle component basis desirable for a visual representation of the results. In the top panel of Fig. 26, we show the ensemble-averaged power spectra of the first two principle components without masking. Performing the comparisons described above in the bin basis between theoretical expectations and data from the reconstructions, we again find excellent agreement, aside from sub-percent level effects at the lowest L. The principal component analysis discussed here is subject to some complications when a mask is introduced. Statistical isotropy is broken for the masked map, which introduces L = L correlations within and between bins. Using the c jβ L coefficients defined for the full-sky scenario would not lead to statistically independent maps. A more rigorous procedure to find the uncorrelated combinations of the data could be done in pixel space rather than harmonic space. Such a procedure would involve the construction and diagonalization of a matrix containing the covariance between every pair of pixels at every pair of redshift bins, a task that is computationally demanding. Here, we consider a 'pseudo' principal component transformation at the level of the power spectra of the reconstructions with a mask. As we discussed in the previous subsection, the power spectrum of reconstructed maps with a mask traces the underlying full-sky spectra up to a factor of f sky and some scale-dependent corrections due to convolution with the mask. For multipoles where the scale dependent correction is small at all redshifts, the masked C L is approximately proportional to the unmasked one and therefore can be diagonalized using the full-sky transformation matrix c L . In the bottom panel of Fig. 26 we compare the theory signal and noise (reduced by a factor of f sky ) to the reconstructed velocity for the first two principal components. Despite the complications from mode coupling with the mask, there is reasonable agreement with the theory curves. Finally, we can explicitly check that the rotation associated with the principal components c L take the reconstructed spectra v LMv † LM to a nearly diagonal form.
This is shown in Fig. 27.

D. Reconstruction on simulated maps: Transverse velocity
Next, we focus on applying the real-space estimator for the transverse velocity potential defined in Eq. 179 to our simulations. The story is a bit more complicated than the radial velocity estimator due to numerical errors. We generate a set of 30 realizations of moving lens temperature maps and LSST-like galaxy density, using 32 bins in the redshift range 0.2 ≤ z ≤ 5 (the same binning used for radial velocity reconstruction). Maps are output at HEALPix NSIDE of 2048. As described in more detail below, we perform a reconstruction of the transverse velocity potential with and without the primary CMB in order to characterize numerical errors and confirm the estimator in some limited regimes. In all cases, we apply the mask described above with f sky = 0.45.   We begin with a reconstruction of the transverse velocity potential from temperature maps containing only the moving lens effect. The result for one realization in a redshift bin centred on z ∼ 1 is shown in Fig. 28.
The result obtained when using max = 6144 (= 3×NSIDE) in the estimator is shown in the upper-left map in this figure. Comparing with the actual rotated transverse velocity potential (e.g. R L · Υ LM ) in the lower-right map, there is no visible agreement. In the bottom left panel of this figure, we compute the power spectra of the reconstruction (purple dotted line) and actual rotated transverse velocity potential (blue solid line) averaged over all 30 realizations. It can be seen that the reconstruction (which is mostly noise) has roughly an order of magnitude more power over all scales. Computing the ensemble-averaged correlation coefficient in the bottom right panel, we see that there are some traces of the true map in the reconstruction, but nothing close to what is to be expected in this essentially noise-free example. To compute the real-space estimator Eq. 179 it is necessary to go from map to harmonic space and back again to construct the auxiliary fields, and then one must go back to harmonic space to obtain the estimated transverse velocity potential. We attribute the observed catastrophic numerical error to information-loss incurred when performing these spherical harmonic transforms between map and harmonic space, as harmonic transforms in HEALPix are not information-preserving 6 . We conjecture that this error is more severe here than for radial velocity reconstruction due to the fact that transverse velocity reconstruction relies on information from smaller angular scales (as illustrated in Fig. 21), where we expect the spherical harmonic transform to be less accurate.
To mitigate the numerical errors described above, we low-pass filter the maps before applying the real space estimator by cutting out all multipoles greater than some max . The result is shown in Fig. 28 for two choices max = 5744, 4944. At the level of the single realization maps and the ensemble averaged power spectrum and correlation coefficient, it can be seen that as max is decreased, the numerical error decreases. For the temperature maps containing only moving lens analyzed here, it is sufficient to take max ∼ 5000 to obtain a high-fidelity reconstruction. Even in this case, we still observe a residual error at the lowest multipoles. We saw something similar for the radial velocity reconstruction e.g. in Fig. 24, which we believe can be attributed to same form of numerical error.
The low-pass filter introduced to mitigate the numerical errors described above is not problematic when the temperature map contains only moving lens, but of course, the observed CMB contains more than just moving lens. Including the primary CMB, noise, and foreground residuals in the temperature maps is more problematic. Removing small-scale information in this case degrades the signal to noise of the reconstruction. For the case of SO x LSST, we demonstrated in Sec. IV D that the total signal to noise for transverse velocity reconstruction was only of order ∼ 50, with the reconstruction in each bin being noise dominated. Losing information on the transverse velocity potential from small scales is a big deal in this case. In Fig. 29, we show the results of performing transverse velocity potential reconstruction including all contributions to the fiducial SO x LSST case. To reduce numerical noise, severe cuts of order max ∼ 3000 are necessary. Unfortunately, there is very little signal to noise in the reconstruction after so much small-scale information is removed, as demonstrated by the low correlation coefficients. In particular, comparing the expected reconstruction noise (orange dashed lines) to the power spectrum of the reconstruction, we see that there is a large amount of residual power on large angular scales. Performing the reconstruction on the full sky does not provide any significant improvement, as shown in the bottom panels of Fig 29. We conclude that it will be necessary to identify alternative methods to mitigate numerical noise associated with spherical harmonic transforms before an analysis of future datasets can be undertaken. We leave this task to future work.

VI. CONCLUSIONS
This paper has outlined the formalism for velocity reconstruction in the light cone picture using CMB experiments and galaxy surveys. One of the main goals of developing this formalism has been to explore some of the challenges posed by systematics and foregrounds for velocity reconstruction. The range of effects we have explored include: properly correlated extragalactic foregrounds, large-angular scale systematics in the galaxy survey, photometric redshift errors, masking of regions contaminated by galactic emission, modelling errors in the galaxy-electron correlation function (the optical depth degeneracy), biases introduced due to additional physical effects that lead to a statistically anisotropic CMB-galaxy correlation (e.g. lensing of  the primary CMB), biases introduced by coarse graining on the light cone (e.g. the 'fine mode' noise), CMB instrumental noise and beam, choice of frequency channels for cleaning extragalactic foregrounds in the CMB, and the effect of performing foreground cleaning on reconstructed maps. We have developed a numerical pipeline to compute the properly correlated auto-and cross-spectra necessary to assess this range of effects. We have also developed a real-space reconstruction pipeline that we have validated using Gaussian simulations. This pipeline was used to assess the impact of systematics in real-space such as masking. The good news is that none of the systematic effects we have explored seriously degrade the fidelity of the reconstruction, indicating the promise of velocity reconstruction for extracting new cosmological information from future datasets. Our fiducial datasets were a LSST-like galaxy survey and an SO-like CMB experiment. We also considered the data combination of the existing unWISE galaxy catalogue with SO. These choices determine factors such as: redshift error, depth of the survey, galaxy shot noise, the level of large-angular scale systematics, frequency channels assumed for the CMB experiment, the associated level of instrumental noise and resolution, and sky coverage. For these datasets, some of the take-away points of our analysis include: • The total information available in the reconstructed velocity fields (quantified by the total signal to noise) is limited mainly by the redshift error, sky coverage, and factors contributing to the Gaussian reconstruction noise (CMB instrumental noise and beam, foreground residuals, and the level of galaxy shot noise).
• It is essential to incorporate the 'fine mode' noise associated with coarse-graining fields on the light cone into the estimator formalism for velocity reconstruction. This source of bias can be mitigated by ensuring that velocity reconstruction is performed in a sufficient number of bins along the radial direction. For radial velocity reconstruction in the fiducial SO x LSST scenario, it is necessary to use around 64 bins to mitigate this bias and include most of the signal to noise in the reconstruction; for transverse velocity reconstruction 32 bins is sufficient.
• For radial velocity reconstruction, large-scale systematics in the galaxy survey have a significant (∼ 10%-level) effect on the total signal to noise due to the additional bin-bin correlations it introduces. There is negligible impact of this systematic on transverse velocity reconstruction.
• The biases induced by CMB lensing and moving lens on radial velocity reconstruction are negligible; the bias introduced by CMB lensing on transverse velocity reconstruction is significant at the ∼ 10% level and should be taken into account, while the bias from kSZ is negligible.
• From the principle components of the radial and transverse velocity reconstructions (see Figs. 15,19), most of the signal to noise in the reconstruction comes from large angular scales L 30 and redshifts z 1.
• For SO x unWISE, it is possible to reconstruct a single principal component on the very largest angular scales. This demonstrates that even for a single broad photometric redshift bin it is possible to reconstruct the large-scale radial velocity field. For the SO x unWISE data combination, it will not be possible to reconstruct the transverse velocity field.
• The real space estimators for velocity reconstruction are highly local, and contamination from masking is restricted to the region in close proximity to the mask. It is therefore possible to remove the mask bias by extending the mask post-reconstruction.
• We have validated a pipeline for velocity reconstruction using Gaussian simulations. While the numerical errors for radial velocity reconstruction were small, they are significant for transverse velocity reconstruction. Resolving the observed problems with numerical errors will require improved techniques for accurate harmonic transforms. This pipeline can serve as a prototype for analysis of future datasets, which we will pursue in future work.
• On the full sky, we demonstrated the utility of the principal components of the reconstruction. The principal component maps are signal dominated over a wide range of angular scales with a redshift distribution that is easy to visualize. To produce similar maps on the masked sky, it will be necessary to develop methods in map-space as opposed to the harmonic-space methods introduced here, which we will pursue in future work.
• The code used to produce the results presented in this paper, ReCCO, can be found at: https:// github.com/jcayuso/ReCCO.
A fundamental assumption of the quadratic estimator formalism explored in this paper is that the underlying fields are Gaussian. On the small-scales that contribute the most to the quadratic estimators, this assumption is far from accurate. One consequence is the presence of the 'N (3/2) bias', explored by Ref. [43] in the box picture. We have not performed an analysis of this contribution in the light cone picture, although a similar computation based on the Halo Model could in principle be performed (albeit with complex projection integrals to contend with). This may be a necessary ingredient for using velocity reconstruction in the light cone picture to measure and constrain cosmological parameters, although Ref. [43] showed that it can be neglected for constraints on primordial non-Gaussianity. A more comprehensive assessment of the effect on a variety of cosmological parameters should be performed. Aside from this bias, both Refs. [43] and [45] demonstrated that velocity reconstruction essentially works as advertised even for non-linear N-body simulations. An important future analysis will be to directly compare the importance of the various systematic effects discussed in this paper between Gaussian and non-linear N-body simulated datasets.
In the future, our framework can be extended to assess velocity reconstruction using different tracers such as the CIB [54] or intensity maps [35]. Other extensions include velocity reconstruction using reionization kSZ and 21cm maps [58] or reconstruction of the remote quadrupole field [28,92]. Having a unifying framework, or at least a unifying basis, allows one to combine the cosmological information from these various probes. Examples where this may be important include constraints on modified gravity [32] and various early-Universe scenarios [33].
We foresee a cosmological paradigm shift, in which reconstruction of the lensing potential, velocity fields and the remote quadrupole field will provide the most precise tests of fundamental physics. This use of CMB secondaries to extract cosmological information through various cross-correlations is in some sense 'free information', since these analyses rely on data from already planned CMB experiments and galaxy surveys. However, as these techniques mature, they may motivate an even stronger push towards the lownoise, high-resolution frontier with future CMB experiments. We continue to explore the possibilities in future work.
We review the 'Beyond Limber approximation' method from [52], which we use to evaluate angular power spectra that take the form Eq. 16: The method aims to separate the integral above into a piece suitable for the Limber approximation, and a piece that can be expressed as a simple Hankel transform. The separation occurs at the level of the power spectrum P F G (χ 1 , χ 2 , k) by defining a 'non-linear' power spectrum where P F G (χ 1 , χ 2 , k) is the full power spectrum we calculate using the Halo Model described below and P (lin) F G (χ 1 , χ 2 , k) is the linear theory power spectrum. The non-linear power spectrum defined this way is negligible on large scales and starts becoming important for scales and redshifts at which non-linearity kicks in. It is argued in [52] that the Limber approximation of the nonlinear correction term (P F G − P (lin) F G ) is sufficiently accurate in realistic cases and therefore the angular power spectrum integral can be rearranged as: The linear power spectrum can be related to its value at redshift zero using a growth factor. Ignoring any scale dependent growth for the moment, the linear power spectrum can be expressed as: which allows us to separate the χ 1 and χ 2 dependence in the second term of Eq. A2: For kernels K (χ, k) of the form f 1 (χ)f 2 (k)f 3 ( )j (kχ), where f 1 , f 2 , f 3 are arbitrary functions and j (kχ) is a spherical Bessel function 7 , the χ space integrals between brackets can be expressed in terms of Hankel transforms, which can be calculated much faster and with more accuracy than brute force integrations of spherical Bessel functions. If the growth factors g F , g G are scale dependent, then the terms in brackets in the second line of Eq. A4 cannot be expressed as Hankel transforms. The authors of [52] work around this problem by splitting the χ space integrations into narrow enough bins such that the evolution of the scale dependence inside each bin can be ignored. Inside each bin, the growth factor can be approximated as : whereχ is the mean χ in the bin and the approximation comes from ignoring the evolution of the kdependence. With this, we can approximate the linear power spectrum for χ 1 and χ 2 inside bins with mean χ 1 andχ 2 as: and this allows us to restore the separability necessary to construct the Hankel transforms: where the sums are over the auxiliary bins constructed to do the approximation. Since there is no limitation on how small these auxiliary bins can be, this approximation can be made as accurate as necessary.

Our implementation
In our implementation of the Beyond Limber method, we define the non-linear piece of the power spectrum in the following ways depending on the particular observables involved: • For power spectra involving only dark matter, electrons or galaxies, we define the non-linear spectrum as: where P 1h+2h F G (χ 1 , χ 2 , k) is the full power spectrum computed using the Halo Model containing the 1halo term and 2-halo term (see Appendix D), P lin mm (χ 1 , χ 2 , k) is the linear dark matter power spectrum from CAMB, and b X (χ, k) is the large scale linear bias function computed with the Halo Model, only different from 1 for galaxies (we assume electrons trace dark matter for linear modes).
• For power spectra involving at least one power of the CIB or tSZ,we define the non-linear spectrum as: where P 2h F G (χ 1 , χ 2 , k) is the 2-halo term computed using the Halo Model. Effectively, we are treating the 1-halo term as the non-linear piece and the 2-halo term as the linear piece. This is not entirely correct, because the 2-halo term does account for part of the non-linearities on small scales and it is not strictly separable as in Eq. A5. However, a detailed inspection reveals that the 2-halo term is separable on the scales for which the second term of Eq. A4 finds most of its support, and that the 1-halo term dominates the regime for which the Limber approximation is adequate.
where the form of f M α W 1 depends on the observable. The quadratic estimator is of the form: Our goal is to find the appropriate weights G M α W L such that we minimize M α LMM α LM subject to the constraint M α LM = M α LM . First, we find the mean of the estimator: We now use: as well as the following properties of 3j symbols: Substituting these relations into the estimator, we obtain: Aside from the monopole, we can make the estimator unbiased so long as: We can now fix G M α W L by minimizing the variance of the estimator. We compute: The four-point function can be decomposed into a connected and disconnected piece: Here, we minimize the variance considering the disconnected contribution only: Contributions to the connected piece are discussed in Appendix C. Plugging this expression into the variance: We now perform the sums over m. Using Eq. B7, the second term in parentheses contributes only to the monopole. We neglect this term in the following. To evaluate the third term, we use Changing dummy indices, the variance is Using Eq. B8, we can perform the sums over m 1 , m 2 to obtain: To minimize the variance, we can use the Lagrange Multiplier method. First, let's define The variance can therefore be written as: We want to minimize the variance subject to the constraint that the estimator is unbiased, which is enforced by Eq. B11. This condition translates to: We therefore want to evaluate: where λ is the Lagrange multiplier. Evaluating the derivative yields times Eq. B32 with permuted indices 1 ↔ 2 we obtain: For now, let's define a new function: so that: Multiplying by f M α W 1 L 2 and using the no bias condition Eq. B29, we can solve for the Lagrange multiplier: Substituting this into Eq. B35, we have and so we can identify G M α W 1 2L = h 1 2L as the choice that minimizes the variance:

Estimator mean
The mean of the estimators M α LM depend on the two-point function Θ m δ W m . Quadratic estimators for the radial and transverse velocity fields are based on non-Gaussian contributions to the correlation functions Θ kSZ m δ W m and Θ M L m δ W m . Because the kSZ and moving lens temperature anisotropies depend on the product of density (contrast or gradient) and velocity, these are in fact three-point functions. Above, we considered the squeezed limit of these correlators, where the velocity mode is of much larger wavelength than the density modes. Additionally, we treated the velocity and density fields as Gaussian. When the velocity mode is of comparable wavelength to the density modes, there will be a contribution to the three-point function due to gravitational collapse. We expect this to be important at high L, beyond the regime of interest for velocity reconstruction. Likewise, the contribution from lensing Θ L m δ W m will receive contributions from non-linearities on small scales, but since the leading order bias from lensing is small, we expect these additional contributions to be completely negligible on scales of interest. In Sec. III E we also considered large angular scale systematics that modulate the observed density field. Similar systematics in a CMB experiment will lead to similar effects although we expect their magnitude to be far smaller.
Another contribution to the mean of the estimators, which was not considered above, arises from nonlinear terms in Θ XG m δ W m . On scales , 1 where the estimator receives most of its weight, we must include non-linear contributions to the galaxy density field as well as the extragalactic foregrounds (here, the CIB and tSZ). At second order in perturbation theory, schematically we must consider correlators of the form tδδ where t is the large-scale tidal field (see e.g. Ref. [94,95]). It is difficult to imagine this term being larger than the bias induced by calibration error, which takes a similar form, and which is likely far larger in magnitude than the large-scale tidal field. Related to the tidal field, systematics associated with the intrinsic aligment of galaxies lead to a large-scale statistical anisotropy in the galaxy number counts [96]; again, it is difficult to imagine that the amplitude of this effect is large enough to cause a significant bias. We defer a detailed estimate of these and other effects to future work.

Estimator variance
Above, we considered only the disconnected contributions to the estimator variance M α LMM β LM . There are a number of additional contributions to the variance, arising from the non-Gaussian nature of the kSZ effect as well as other non-Gaussian contributions to the CMB temperature and galaxy survey. Concentrating on non-kSZ, non-Gaussian contributions to the estimator variance Eq. B12, we conjecture that the most important terms arise from extragalactic foregrounds and CMB lensing: Note that the relevant shape of the four point function for the estimator variance is the 'collapsed' configuration where 1 ∼ 2 and 1 ∼ 2 , since the relevant scales are L 1 , 2 , 1 , 2 . The terms in Eq. C1 should be calculable analytically within the Halo Model since the collapsed four-point function typically has a simple form [97]. For example, similar computations have been performed in the context of the CIB have been performed [98]. Roughly speaking, we expect the disconnected four-point function to dominate the connected four-point function by a power of the matter power spectrum. Therefore, including the connected four-point function will most likely not make a large contribution to the estimator variance. We leave a detailed computation to future work.
Because the kSZ temperature anisotropies arise due to the product of the optical depth and radial velocity, evaluating kSZ contributions to the estimator variance involves computing a six-point function: To compute the six point function we must consider both connected and disconnected components. There are a total of 15 terms in the disconnected six point function. We can use the fact that the four-point function Eq. C2 takes the collapsed configuration, together with the property that the velocity power spectrum falls rapidly with to argue that the relevant scales are¯ 1 1 ,˜ 1 1 , 1 ∼ 2 , and 1 ∼ 2 . From the 3j symbols in the coupling functions R 1 2 m1m2−m , this in turn implies that¯ 2 ∼ 2 and˜ 1 ∼ 2 . Therefore, correlators involving the velocity (which is relevant at low-) and eitherτ or δ W (which are relevant at high-) will not make a significant contribution to the disconnected six-point function. We can therefore make the approximation: As we now show, the first term gives rise to the signal covariance, the second term reproduces the Gaussian estimator variance, and the third term yields the 'N (1) bias' from Ref. [43]. Substituting the first term into the estimator variance Eq. B12, we obtain: (2 1 + 1)(2 2 + 1)(2L + 1) 4π This is the signal covariance rotated into the basis defined by the estimators.
Moving to the second term: This term combines with the non-kSZ disconnected components of the temperature galaxy four-point function to yield the estimator noise. The third term is somewhat more complicated, Cτ s δ W 2 2¯ 1 + 1 4π (2 1 + 1)(2 2 + 1)(2 1 + 1)(2 2 + 1) ¯ 1 2 1 0 0 0 ¯ 1 2 1 0 0 0 (−1) L+¯ 1 2L + 1 When L is much smaller than the other factors in the 6j symbol, we can simplify using: (2 2 + 1)(2 1 + 1) ¯ 1 1 2 0 0 0 2 (C13) This is the N (1) bias first computed in [43]. Evaluating it, we find, in agreement with [43], that this term is negligible compared to the Gaussian estimator noise. Another contribution to the estimator variance arises due to the connected six-point function, the 'N (3/2) bias', which was found in Ref. [43] to be even larger than the Gaussian estimator noise in the high signalto-noise regime. A full computation of this term is beyond the scope of this paper, but will be necessary for a complete analysis in the future.
whereŴ (k, R) is the Fourier transform (in k) of a top-hat function with radial extent R.
The values of the parameters {β, γ, φ, η} are listed in Table 4 of [99] with a mild redshift dependence given in Eqs. 9-12 of [99]. The value of α results from applying the z-dependent normalization condition to be discussed below in Sec. D 1 d.

b. Halo bias
Halos are biased with respect to the underlying dark matter power spectrum; in particular, the power spectrum of halos of masses M at redshift z P hh can be written (on large scales, where the bias is scaleindependent) as P hh (k, M, z) = b h (M, z)P lin (k, z), where P lin (k, z) is the linear dark matter power spectrum and b h (M, z) is the halo bias. In our Halo Model, we use the halo bias of [99], which is parametrized as The values of the parameters {A, a, B, b, C, c} are listed in Table 2 of [99].

c. Halo density profile
The halo densty profile ρ( r, M, z) gives the density at a displacement r from the centre of a halo and thus governs the distribution of dark matter within a halo. For spherically symmetric halos, ρ( r) = ρ(r). We take ρ(r) to be Navarro-Frenk-White (NFW) [101], ie ρ(r) = ρ S r r S 1 + r r S 2 (D8) with r S the scale radius, a parameter which is related to the halo radius r M by the concentration parameter c = r M r S ; the scale density ρ S defines the density of the halo, and can be eliminated in favour of the virial radius and mass by using the definition of mass M = r M 0 4πr 3 ρ(r)dr. We use the halo concentration parametrization found in [102], which parametrizes the concentration of halos as The values of A, B, C depend on the definition of the halo mass one is using and can be found in Table 1 of [102] (in particular we take the Sample-F redshift 0-2 row). Note that Ref. [102] provides different values for the parameters depending on the definition of the halo mass considered; we take M to be the mass within the radius R 200m for which the mean density of the halo is 200 times the mean matter density (labeled M mean in Ref. [102]). In power spectra, the normalized Fourier transform of ρ(r) u(k, M, z) ≡ is used. The halo radius R at which we cut off the integral is R 200m .

d. Dark matter power spectrum
Within the Halo Model, power spectra are split into a term sourced by correlations in different halos (inter-halo correlations), and correlations within a single halo (intra-halo correlations). These terms are known as the 2-halo and 1-halo power spectra respectively, so we have P mm (k, z) = P 2h mm (k, z) + P 1h mm (k, z) (D11) where P 2h mm and P 1h mm denote the 2-halo and 1-halo dark matter power spectra respectively, and P mm is the total dark matter power spectrum. Each term is an integral over all halo masses: (D13) On large scales, it is a requirement that P 2h mm (k, z) = P lin (k, z); this is a consistency condition that ensures that all dark matter resides in halos, and that it is unbiased with respect to itself. This results in the following normalization condition: b(ν)f (ν)dν = 1. (D14) This consistency condition results in a z-dependent constraint on the normalization of the halo mass function: it fixes the value of the parameter α in Eq. D2.

Large-scale structure tracers
The large-scale structure tracers we are interested in are the galaxy density g, electron density e, the CIB flux density I ν at frequency ν, and the tSZ temperature anisotropy Θ tSZ,ν at frequency ν. Below, we summarize for each tracer the essential details necessary for constructing auto-power and cross-power spectra in the Halo Model.
a. Galaxy density Galaxies are distributed in halos according to a halo occupation distribution (HOD). In our HOD, we assign one 'central" galaxy to the centre of halos in a mass-dependent way, and additional 'satellite" galaxies which are distributed throughout the halo according to the dark matter distribution. Thus, the number of galaxies in a halo of mass M at redshift z is where N cen denotes the number of central galaxies (always 0 or 1) and N sat the number of satellite galaxies. We use the same HOD as Ref. [29]. The mean number density of galaxies at z is then Electrons are distributed inside dark matter halos according to a radial density profile ρ e (r). As a fiducial model, we choose the 'AGN' gas profiles from [53]. The Fourier space density profile for electrons is then: where M AGN is the AGN 'mass" M AGN = R 0 4πr 2 ρ AGN (r)dr, with R the cutoff radius at which we cut off the NFW profile Eq. D8 (R 200m for us).

c. CIB flux density
The CIB flux density at frequency ν I ν is given by an integral over the CIB emissivity density j ν (n, χ): I ν (n) = dχa(χ)j ν (n, χ). (D18) This can be written as an integral over galaxies with different luminosity densities: the mean emissivity density isj where L (1+z)ν is the luminosity density and dN dL (1+z)ν is the halo luminosity function defined in analogy with the halo mass function; the factor of (1 + z) in the frequency accounts for the fact that the photons that we receive have been redshifted. Neglecting scatter between M and L ν , this can be written as an integral over the halo mass functionj As all luminosity is sourced by galaxies, L ν can be separated into that sourced by the central galaxies and the satellite galaxies: Point sources can be identified and removed form CIB maps by imposing a flux cut and removing sources above this; in our CIB calculations we impose a flux cut of 400mJy at all frequencies.

d. tSZ temperature
The tSZ temperature anisotropy at frequency ν is given by ∆T tSZ T (n, χ) = g ν y(n, χ) where y(n, x) is the Compton y-parameter and g ν is the spectral function of the tSZ with the dimensionless variable x given by x ≡ hν k B TCMB (where h is Planck's constant; k B is the Boltzmann constant; and T CMB is the temperature of the black-body CMB). The Compton y-parameter is a line-of-sight integral over electron pressure y(n, χ) = σ T m e c 2 dχa(χ)P e (n, χ) where P e (n, χ) is the electron pressure at (n, χ), and where σ T is the Thompson scattering cross section; m e is the electron mass; and c is the speed of light.
To calculate the power spectrum of y, we need the three-dimensional Fourier transform of P e (r); for spherically symmetric halos this allows us to define the profile y(k, M, z) ≡ 4πσ T a m e c 2 drr 2 sin (kr) kr P e (r).

Poissonian noise
In all galaxy-galaxy, CIB-CIB, and galaxy-CIB power spectra, we must also include the scale-independent Poissonian noise (or shot noise). wheren(z) is the total galaxy number density in the map in a redshift bin. For the CIB, the shot noise is where S ν represents flux measured at frequency ν.