Direct Optimal Mapping for 21cm Cosmology: A Demonstration with the Hydrogen Epoch of Reionization Array

Motivated by the desire for wide-field images with well-defined statistical properties for 21cm cosmology, we implement an optimal mapping pipeline that computes a maximum likelihood estimator for the sky using the interferometric measurement equation. We demonstrate this direct optimal mapping with data from the Hydrogen Epoch of Reionization (HERA) Phase I observations. After validating the pipeline with simulated data, we develop a maximum likelihood figure-of-merit for comparing four sky models at 166MHz with a bandwidth of 100kHz. The HERA data agree with the GLEAM catalogs to<10%. After subtracting the GLEAM point sources, the HERA data discriminate between the different continuum sky models, providing most support for the model of Byrne et al. 2021. We report the computation cost for mapping the HERA Phase I data and project the computation for the HERA 320-antenna data; both are feasible with a modern server. The algorithm is broadly applicable to other interferometers and is valid for wide-field and non-coplanar arrays.


INTRODUCTION
Observations of the 21 cm spectral line of neutral hydrogen during the epoch of reionization (EoR), cosmic dawn, and the Dark Ages have the potential to transform our understanding of the universe. Goals of current experiments are to map the process of the formation and evolution of the first stars, galaxies and black holes, to further constrain the prevailing ΛCDM cosmology (Bennett et al. 1996(Bennett et al. , 2013Hinshaw et al. 2013;Planck Collaboration et al. 2020), and to search for evidence of physics beyond the ΛCDM paradigm. For reviews see (Liu & Shaw 2020;Mesinger 2019;Morales & Wyithe 2010;Pritchard & Loeb 2012). Current and recent interferometric experiments aiming to detect cosmological 21 cm signals include CHIME (Bandura et al. 2014), HIRAX (Newburgh et al. 2016), PAPER (Parsons et al. 2010), MWA (Tingay et al. 2013), LOFAR (van Haarlem et al. 2013), and HERA (DeBoer et al. 2017).
The ultimate goal of precision cosmology with the 21 cm line is a quantitative comparison between theoretical predictions of neutral hydrogen structures and their measurements at radio wavelengths. This confrontation between theory and experiments using interferometers is beginning with two-point statistics (two-point correlation function or power spectrum), and is likely to develop further with higher-order statistics and, finally, direct evaluation of properties of 3D (two angular dimensions and one frequency dimension) image cubes.
Radio interferometers measure the coherence, or visibility, between signals received by pairs of antennas in an array. For co-planar arrays and small fields of view, the relationship between the measured visibilities and the brightness distribution on the sky is approximately a 2D Fourier transform (Thompson, Moran, & Swenson 2017), and is therefore closely related to the power spectrum (Morales & Hewitt 2004). Wider fields of view can be accommodated with the 2D Fourier technique by implementing corrections for neglecting the "w-term" in the interferometric phase (Cornwell et al. 2005(Cornwell et al. , 2012; Barry et al. 2019;Ye et al. 2021). Current limits on the 21 cm power spectrum have been derived by analyses that make use of the image-visibility Fourier relationship (Dillon et al. 2014(Dillon et al. , 2015aBeardsley et al. 2016;Trott et al. 2016;Patil et al. 2017; Barry et al. 2019;Li et al. 2019;Rahimi et al. 2021) and by analyses that work directly with the visibility data through the delay spectrum approach (Parsons et al. 2012;Kolopanis et al. 2019;The HERA Collaboration et al. 2021). The two approaches are complementary, essentially measuring different statistics in the sky (Morales et al. 2019).
Images mapped from visibilities, and convolved by the array synthesized beam, are called dirty images. To deconvolve the dirty images, the CLEAN (Högbom 1974;Clark 1980;Cornwell 2008;Rau & Cornwell 2011) deconvolution algorithm is frequently used. The resulting CLEAN model is a list of deconvolved bright sources, which are the focus of astronomy and astro-physics. However, the focus of 21 cm precision cosmology is the faint diffuse emission which is not corrected by CLEAN. A deconvolution approach is required to treat all pixels equally across the image. Even for bright sources, CLEAN is a heuristic, iterative process, and the statistical properties of the resulting image are not well known. For cosmological studies, we need an algorithm that can map wide fields and that can reconstruct bright point sources and faint diffuse emission with equally well understood statistics. Other ways to image and deconvolve images have been explored. Some examples are Fast Holographic Deconvolution and other forward modeling techniques (Sullivan et al. 2012;Bernardi et al. 2013) and m-mode mapping (Shaw et al. 2014;Eastwood et al. 2018). For a recent review, see Liu & Shaw (2020).
In this paper we explore direct optimal mapping (DOM) of interferometric data, and we apply it to HERA data as a demonstration. By "direct", we mean that we do not make the assumptions that lead to the two-dimensional Fourier transform relationship between data and image, and we take "optimal" to mean that the mapping process does not lose information of model parameters. The optimal mapping approach was first explored in the context of CMB observations (Tegmark 1997). It has more recently been extended to interferometric imaging for 21 cm cosmology (Morales & Matejek 2009;Sullivan et al. 2012;Dillon et al. 2015b;Zheng et al. 2017a), and considerations of optimal mapping are incorporated in the HERA design (Dillon & Parsons 2016).
Benefits of this approach include a data product in the image domain that potentially covers the full celestial sphere, full knowledge of the point spread function in all directions, and full knowledge of the covariance matrix relating map pixels. Linear deconvolution, implemented through matrix operations, is in principle possible, treating point sources and extended emission equally. With the direct inversion of the instrument response, it is not necessary to grid the data (Barry & Chokshi 2022), it is not necessary to correct for neglect of the w-term, and the configuration of the antennas need not be coplanar.
The paper is organized as follows. We introduce the DOM general mathematical formalism in Section 2, and how we apply the formalism to HERA data in Section 3. We validate the algorithm with simulated data in Section 4. In Section 5, we use DOM to map HERA data, and evaluate four sky models. In Section 6, we assess the computational costs of DOM. We conclude in Section 7.

FORMALISM
Interferometers measure intensity as complex visibilities, defined as: whereŝ is the unit vector pointing to a certain point on the sky, B ij (ŝ) is the product of two primary beams from the i − j baseline 1 , I(ŝ) is the specific intensity, b ij is the baseline vector, ν and c are the observation frequency and the speed of light.
If we denote the sky as a map vector m and the measured visibilities to form a data vector d, the mapping can be expressed by one matrix multiplication where A is the measurement matrix and n is the noise vector for the visibilities. The A matrix maps the sky to visibilities. With this data model, an optimal estimator of the sky is obtained with the following operation: where N = nn † is the noise covariance matrix, D in general transforms the raw map to the final estimation of the true sky, andm is an estimate of the true sky.
In this paper, we only format D as a normalization to physics units; we leave more complicated D formats, say including deconvolution, to future publications. Even as a normalization matrix, D does not have a set form, and we discuss our choice of D in Section 3.2. DOM calculates direction-dependent PSFs for all pixels, expressed in a N pixel × N pixel matrix P (Dillon et al. 2015b): For interferometers, pixel noise is highly correlated. Therefore, the pixel covariance matrix is critical for quantitative interpretation of the measured sky map. One advantage from DOM is that it provides a pixel covariance matrix. The covariance matrix C is closely related to the PSF matrix (Dillon et al. 2015b): To use DOM, we first need to construct the measurement matrix A and the noise matrix N. We divide the sky into HEALpixels (Górski et al. 2005) Figure 1. Diagram of the direct optimal mapping (DOM). The two major modules, data conditioning and mapping, are displayed on the left; the detailed steps are shown in the flowchart. More details of each step are described in Section 3. summation: where the subscript n represents all visibilities (folding ij into one index), the subscript k represents all pixels, d n is an element of the data vector d,ŝ n,k is the directional vector in the horizon coordinate system, and m k is the flux within one pixel calculated as m k = I k · ∆Ω (∆Ω is the solid angle of one HEALpixel). Please note that approximating spatially extended flux to the center of HEALpixels leads to errors, especially when the in-pixel flux distribution is strongly skewed by bright off-center point sources. This will be discussed later in the treatment of the GLEAM catalogs in Section 5.1. In the above equation, we assume all the antenna pairs have the same primary beam B, andŝ n,k implicitly depends on time because of sky rotation.
Then the A matrix is written element-wise: = B(ŝ n,k ) · Φ n,k , where the second line shows that the beam term B(ŝ n,k ) and the fringe term Φ n,k are separable. We will use this feature in Section 3.2 for normalization.

APPLICATION TO HERA
In this paper, we use HERA data from the Phase I observation season, with 39 operational antennas. This is the dataset that was used to derive the Phase I EoR power spectrum limits, and we refer the reader to Kern et al. (2019), Dillon et al. (2020), Kern et al. (2020a), Kern et al. (2020b), and The HERA Collaboration et al.
(2021) for a detailed discussion of the data preparation. We only map the east-west polarization for this demonstration because Byrne et al. (2021) shows that this patch of the sky is predominantly unpolarized.
Briefly, the dataset spans the dates of December 10th -28th, 2017, and the data were calibrated, flagged, and binned by local sidereal time (LST). Certain instrumental systematics, including crosstalk and cable reflection, were modeled and subtracted. HERA is designed to have a redundant array configuration (Dillon & Parsons 2016), the resulting non-redundant baseline groups poorly sample the uvw space, making HERA not an ideal instrument for imaging. However, the focus of DOM is to make images with well understood statistical properties, regardless of the resolving power of the images themselves.
We have developed a software package 2 for the DOM algorithm. The diagram in Figure 1 illustrates steps of the algorithm. Although the package is initially implemented in HERA data, it can be easily applied to other interferometric data.

Calculating A and N
The sky pixels are defined in the equatorial coordinate system, meaning that the pixels are independent of Earth rotation. For a zenith-pointing telescope, both the primary beam B and the unit vectorsŝ n,k are natively defined in the horizon coordinate system. At each time integration, the pixel locations are converted from the equatorial coordinate system to the horizon coordinate system (RA/Dec→Az/El). Then with the baseline vectors (b n in Equation 7), all the elements of the A matrix are calculated.
The input data are first conditioned by selecting visibilities, estimating the visibility noise, removing flagged data, and averaging redundant baselines 3 . We estimate the noise for each visibility according to the radiometer equation (Thompson et al. 2017;Kern et al. 2020a) where σ n is the estimate of the noise of the visibility between antenna i and j, V ii and V jj are autocorrelations of antenna i and j, ∆ν is the bandwidth, and ∆t is the integration time. When visibilities are redundant-averaged, the estimated noise for each redundant-averaged baseline is calculated as where the summation runs over visibilities from redundant baselines within one redundant group. The noise matrix N is constructed by filling the diagonal elements with the visibility noise squared, assuming off-diagonal elements are zero: where σ n is the noise of the nth visibility.

Map Normalization
The direct optimal mapping formalism only requires the D matrix being non-singular to preserve all information of model parameters. In practice, different normalization can be applied for different map-based analysis.
Without normalization, the mapping equation m = A † N −1 d adds up contribution from all visibilities as a sum. To calculate the average, we need the effective weights, which varies across pixels because of the primary beam. In addition, sky drift complicates the weighting because it moves the primary beam on the sky. Therefore, we need a weighting of the visibilities considering a moving primary beam for each sky pixel. For HERA, the instrument observes the sky naturally with one primary beam applied. After that, noise is introduced in visibilities. In direct optimal mapping, we apply another primary beam from multiplying A † in the mapping equation (Equation 3). Therefore, the recovered sky has the primary beam applied twice-one from observation and one from mapping. However, the noise only has the primary beam applied once from mapping.
Because of the difference, there does not exist an obvious way to correct for the primary beam effect in normalization. We may correct for the primary beam twice, and the recovered sky will have no primary beam applied; however, the noise will have one extra primary beam corrected, blowing up the noise far away from the beam center. Alternatively, if we correct for the primary beam once, the recovered sky will still have one primary beam applied, but the map noise will be free of primary beam effect. Here we choose to correct for the primary beam once, and define the normalization matrix D accordingly.
Inspired by Barry et al. (2019), we use the optimal mapping equation but replace the visibility data with a vector of ones for counting. Then, we dividem by the weight map for normalization. However, the weight map has zero values because of the small-scale fringe term in the A matrix, which are numerically unstable in the denominator. Instead, we use only the primary beam term in the A matrix to construct the weight map.
We first construct another version of A with only the beam term (i.e., setting the exponential term equal to unity), which we call A B : We then calculate the weight map where 1 is an all-ones vector counting all visibilities. The weight map is constructed following the exact sampling of the visibility measurement and the noise weighting. We divide the weight map out ofm to turn the visibility summations to averages. Figure 2 shows an example of the weight map. Finally, we define the D matrix as where m w,i is the ith element in m weight . This definition accounts for the differing contribution of the visibilities to each point on the map given the primary beam. However, the primary beam effect, from observation, is still left in the map, meaning that sources away from the zenith are attenuated by the primary beam.
With A, N, D and the conditioned HERA data d, the map estimatorm is calculated with Equation 3. The PSF and the covariance matrices (P and C) are also calculated with Equation 4 and Equation 5.

ALGORITHM VALIDATION
In this section, we use simulated data to validate the direct optimal mapping algorithm.

Map Validation
We generate simulated data with the HERA Phase I array configuration, through a pipeline independent of

RA 2h
RA 1h RA 3h Dec. -30°F igure 2. An example of the weight map defined in Equation 13. The map spans two hours in RA. The fact that there are more effective weight on the left side of the region is due to lower noise (higher weighting) when observing that region, which results from longer integration time and lower intrinsic sky noise.   Figure 5, the difference increases to 5% of the original map with the GSM08 sky model. direct optimal mapping. The simulated data are verified to be consistent with the pyuvsim simulation (Lanman et al. 2019) to machine-precision (Kim et al., 2021. We did not use pyuvsim because our simulator is tested to be more computationally efficient. We use the Global Sky Model (GSM08) (de Oliveira-Costa et al. 2008) as our model of the true sky for one twenty-second integration at 166 MHz. Then we use DOM to map the simulated visibilities, and compare the map to convolved GSM08. Specifically, the convolved GSM08 is calculated by multiplying the P matrix and  The full-width-at-half-maximum (FWHM) of the synthesized main beam at the field center is 50-60 arcminutes, close to the diffraction limit defined by the longest baseline. Grating lobes are seen in hexagonal patterns, which are distorted as we move away from the field center, especially in the 75 • -away map. As shown, DOM provides direction-dependent PSF information across the sky. The grating lobes cause the difference between the original and the convolved GSM08 in Figure 5. The peak values change because of the primary-beam attenuation at the four pixels. The color bars saturate at 10% of the peak values to illustrate faint sidelobe structures. the GSM08 sky model. Without noise, the two maps are consistent at 10 −7 levels ( Figure 5). Getting to the 10 −7 consistency, we quantitatively characterized various factors that affect the mapping results: 1. Primary beams: the residual map is very sensitive to the primary beam. We use the CSTsimulated single-antenna primary beam (Fagnoni et al. 2021), which describes the beam pattern in electric fields with 1 • angular grid and 0.5 MHz frequency resolution. However, our mapping requires  (Figure 4), the convolved GSM08 looks different from the original GSM08. Middle: the DOM map from simulated visibilities before and after adding noise. Bottom: the residual maps after subtracting the convolved GSM08 from the middle row. The one without noise shows 10 −7 residuals. The one with noise shows random patterns at ∼ 10% level. The reduced-χ 2 value (Section 4.2) is also shown in the noise residual. resolution. We need to interpolate the simulated beam pattern in both angular and frequency space. We use the UVBeam object within the pyuvdata package (Hazelton et al. 2017) to perform the interpolation. When the beam pattern is rotated by 90 • , the residual increases up to 30% of the original map. Moreover, interpolating in electric fields or in power also lead to 10 −4 differences in the residual maps. This sensitivity to the primary beam indicates the possibility to constrain the primary beams with DOM maps.
2. Horizon contribution: the simulation includes all signals above horizon. We found that if we only convolve sky signals within 50 • around the zenith, the residual is 30 -50% of the peak value. Furthermore, Figure 3 (on the left) shows the map difference when a typical point sources is added near the horizon (85 • away from the zenith). The source leaks into the zenith field at 10 −4 level. Considering the significant solid angle around the horizon, this demonstrates the necessity to correctly include signals within the entire observable hemisphere (Pober et al. 2016), perhaps even accounting for the terrain near the horizon (Bassett et al. 2021), to model the foreground.
3. Coplanarity: we estimate the effect of array coplanarity by comparing the mapping results with and without assuming the antennas are on a plane. The HERA dishes deviate randomly from a perfect plane by about 4 cm, and Figure 3 (on the right) shows the resulting 5%-level difference from ignoring the non-coplanarity.
DOM calculates the direction-dependent PSFs across the field. Figure 4 shows PSFs in four pixels from the field center to near-horizon. The synthesized beam and the grating lobe pattern become increasingly distorted as the pixel moves away from the field center, illustrating the importance of considering direction-dependence PSFs.
After mapping the noiseless simulation, we add noise to the visibilities. Specifically for a simulated visibility d n , we draw independent random noise with the amplitude of σ n / √ 2 for the real and imaginary parts. The noisy visibilities are then mapped to compare with the convolved sky model. Figure 5 shows the results with and without adding noise to the simulations. Without noise, the residual map is at 10 −7 levels. With noise, the residual map amplitude is ∼ 10%, six orders of magnitude higher than the noiseless residual map. Further investigation shows that the residual pattern changes 0 100 200 300 400 500 Eigenmode 10 13 10 8 10 3 Eigenvalue w Figure 6. Eigenvalues of C in descending order for the oneintegration simulation data. A clear drop is seen around the 180th eigenmode. Thus, we choose the first 180 eigenvalues for the figure-of-merit calculation. The red-shaded region shows the selected eigenmodes.
for each random noise realization, confirming that it is random-noise dominated.

Pixel Covariance Matrix
The coherent noise pattern ( Figure 5, bottom right) shows the correlated noise in map space. One feature of DOM is that it provides a robust noise covariance matrix C to capture the correlation. The pixel correlation results from the instrument's incomplete uvw space sampling. There are patterns in the sky to which the array configuration is not sensitive.
The noise covariance matrix C is closely related to the PSF matrix P by multiplying D T on the right (Equation 5). With C, we define a maximum likelihood figureof-merit (FoM) to evaluate residual maps using the pixel covariance: where ∆m is the map vector of one residual map. Since C is not invertible, we first eigendecompose C where V and W are the eigenvectors and eigenvalues. Since C is singular, W has zero elements. Figure 6 shows the eigenvalues in descending order. A clear drop is seen around the 180th eigenvalue, this is related to the number of nonredundant modes HERA Phase I array measures. In addition, the fact that the simulation data only has one time integration leads to the sharp drop in eigenvalue spectrum. Eigenvalue spectra look different when we map multiple time integrations in Section 5.2. Because C is symmetric, we can choose V to be orthonormal

Now we plug Equation 16 into Equation 15
FoM where ∆m T V is the map projections onto the eigenvectors. Since W has zero values, W −1 is not computable. Therefore, we only consider the dominating eigenvalues in Equation 18. For this simulated data, we choose the first 180 eigenvalues, indicated in Figure 6. We write ∆m T V and W element-wise for the first dominating 180 eigenvalues and the FoM calculation can be more clearly expressed as This FoM is a χ 2 -statistic, and the reduced-χ 2 is χ 2 ν = χ 2 /d.o.f. 4 For a residual map from noise-dominated visibilities, we expect the FoM follows a χ 2 distribution with 180 degrees of freedom. For reduced-χ 2 , we expect χ 2 180 ∼ 1. The simulation map in Figure 5 is measured χ 2 180 = 0.93. For different noise realizations, χ 2 180 ∼ 1, validating that the covariance matrix provides an accurate description of correlated pixel noise in map space.
Back to the interferometric setting, eigenvectors represent emission patterns in the sky. The measured visibilities are sensitive to different patterns at different levels, which is characterized by the magnitude of the eigenvalues. The eigenvectors with very small eigenvalues are essentially invisible to the interferometer, which should be excluded for the FoM calculation. Therefore, by selecting the nonzero eigenvalues, we only use the emission patterns visible to the array in calculating the FoM.

MAPPING HERA DATA
In this section, we map a small fraction of HERA data and evaluate sky models against the HERA measurement. Aguirre et al. (2021) report a calibration bias in the calibration of this dataset, so we correct the bias by multiplying the visibilities by a factor of 1.04 (The HERA Collaboration et al. 2021) before mapping. We select the best fraction of the HERA Phase I data, the central region of Field 1 defined in The HERA Collaboration et al. (2021). This data set contains 20 twenty-second time integrations. We randomly select one HERA frequency channel around 166 MHz with a bandwidth of 100 kHz and map the corresponding visibilities. Meanwhile, we calculate the P matrix to cover the entire hemisphere (90 • from the zenith) to convolve four sky models and compare with HERA data: The HERA map and convolved GLEAM catalogs are shown in Figure 7. The GLEAM catalogs match the HERA map in point-like morphology and amplitude, while missing some faint diffuse emission. A similar comparison between GLEAM and HERA was performed in Carilli et al. (2020) with the Common Astronomical Software Applications (CASA) CLEAN imaging.
We subtract the GLEAM catalogs out of the HERA map (left plot of Figure 8) to further study the diffuse structures. With the bright point sources subtracted, the diffuse emission starts to emerge. We compare sky models with the measured diffuse emission. Figure 8 shows this comparison by further subtracting sky models out of the GLEAM-subtracted residual. The middle column of Figure 8 shows three convolved sky models. The GSM08 model shows signs of the two bright sources and vertical stripes. These stripes originate from the Haslam survey (Haslam et al. 1982), which is used to construct GSM08 (Remazeilles et al. 2015). GSM16 does not show signs of bight point sources nor obvious signs of vertical stripes after their de-sourcing and de-striping processes (Zheng et al. 2017b). But neither GSM08 nor GSM16 shows a diffuse emission pattern that resembles the measurement. However, the Byrne21 map shows similar emission patterns to the measurement, especially at large scales.
After further subtracting the three sky models, the final residual maps are shown on the right column of Figure 8. Both GSM08 and GSM16 residual maps show morphology close to GSM08 and GSM16 themselves, with negative amplitudes. This means that neither GSM08 nor GSM16 map reduces the measured diffuse pattern, instead they impose their intrinsic patterns in  (Zheng et al. 2017b), and convolved Byrne21 map (Byrne et al. 2021). GSM08 contains the two brightest point sources, while neither GSM16 nor Byrne21 contains point sources. GSM08, less so for GSM16, also shows vertical stripes. Right column: Residual maps after further subtracting the GSM08, GSM16, and Byrne21 maps. Both the GSM08 and the GSM16 residuals increase the amplitude of the residual map. The morphology of their residual maps resemble that of the sky models themselves, indicating that they do not represent the observed diffuse emission. The Byrne21 map, however, shows a similar diffuse pattern as observed in the GLEAM-subtracted residual at large scales. Reduced-χ 2 values are also presented for each of the residual maps as in Table 1. the final residual maps. However, the Byrne21 residual does show reduced large-scale diffuse pattern, with the point sources more prominent in the final residual map.

Reduced-χ 2 of Residual Maps
To calculate χ 2 ν of the maps, we first examine the eigenvalues of the covariance matrix C. Because we are mapping 20 time integrations, the sky rotation increases the array sampling compared to one time integration in Section 4.2. The additional information leads to the smoothing of the eigenvalue spectrum in Figure 9. With-out a clear drop, it is not obvious to determine a cut for dominating eigenvalues. We choose to use the first 1000 eigenvalues for this covariance matrix. We are exploring more systematic and robust ways to determine the number of dominating eigenvalues for future work.
Similar to Section 4.2, we measured χ 2 ν of the residual maps in Figure 8 Positive Negative Figure 9. Full eigenvalue spectrum of C from mapping the HERA data. The eigenvalues are displaced in descending order, with blue for the positive and orange for the negative.
The negative values have the absolute values at the machineerror levels, and are artificially ordered to the right side of the plot instead of scattering around zero with the positive eigenvalues. Different from Figure 6, there is not a clear-cut in this eigenvalue spectrum. This is because we combined 20 time integrations in this mapping. The sky rotation adds information beyond what the same array measures at one time, smoothing the drop we saw in Figure 6. We choose to include the first 1000 eigenvectors (red shaded region) for the reduced-χ 2 calculation. Note-Reduced-χ 2 statistics are calculated for the map from noise simulation and different residual maps. Column (2) shows χ 2 ν as introduced in Section 4.2.
Looking at the χ 2 1000 numbers, we can see that only subtracting GLEAM gives a relatively low value, while further subtracting GSM08 and GSM16 increases χ 2 1000 back up. However, further subtracting Byrne21 does not significantly change the χ 2 1000 value. This numerical indication is consistent with the conclusion we visually drew in Figure 8. The overall χ 2 1000 1 values quantify the apparent differences between HERA measurement and the sky models. Contribution factors to the difference include errors in HERA's primary beam modeling (Fagnoni et al. 2021), in instrument calibration (Dillon et al. 2020;Kern et al. 2019Kern et al. , 2020a in sky models (Wayth et al. 2015;de Oliveira-Costa et al. 2008;Zheng et al. 2017b;Byrne et al. 2021).

COMPUTATION COST
Previous analysis focuses on a map from 20 twentysecond integrations. Figure 10 shows a wide-field map at 166 MHz from 300 twenty-second integrations, including the entire Field 1 (The HERA Collaboration et al. 2021). This map covers 215 deg 2 , 5 • around the RA 1 h -3 h sky path at -30.7 • declination. The covariance matrix for pixels in this map is also available. Widefield maps will be our final product. With the wide-field maps, Liu et al. (2016) showed that power spectra, especially in large angular scales, can be measured with spherical Fourier-Bessel formalism. However, mapping wide-field maps is computationally expensive. With this mapping configuration, we investigate whether HERA Phase I data, and eventually the full HERA array data, are computable with DOM.
Here we discuss mapping within 5 • around the zenith, which seems to contradict Section 4.1 where we emphasized the contribution from the entire hemisphere, especially near the horizon. In fact, when we map the visibilities, including a larger fraction of the sky does not change the pixel flux already mapped in a small field. Considering the attenuation of the primary beam, the visibilities contain information mainly within a small field around the zenith. However, if we attempt to compare the map with a sky model, we need to convolve the sky model over the entire hemisphere, as we did in Section 5.1. This will be relevant for foreground removal in future cosmological analyses.
Even for a small field around the zenith, the DOM algorithm is computationally expensive both in terms of CPU time and RAM consumption. For one frequency bin, the A matrix has the dimension of N visibility ×N pixel . N visibility is of the order of 10 4 − 10 5 , considering base-lines at multiple integrations; N pixel is of the order of 10 5 or more, depending on pixel size and sky coverage. Storing the entire matrices requires 10 1 − 10 2 gigabytes of memory.
However, since the N matrix and the D matrix we adopt are both diagonal, we can analyze the visibilities piecewise and add their contribution to the final map and covariance matrix. This approach will reduce the RAM requirement.
We first write the mapping equationm = DA † N −1 d element-wise, focusing on mapping the kth pixel where n runs over all the baselines and t covers all the time integrations, d n,t is the data, and σ n,t is the noise. The above equation shows that mapping essentially sums all the visibilities together with different coefficients that are independent of other visibilities (i.e. there are no cross terms). The inner summation sums all the visibilities within one integration, and the outer summation sums all the time integrations. The reason that the simple format can be obtained is that both N and D are diagonal. Now we take a close look at diagonal elements in D which also sums over baselines and time integrations. The subject to be summed is the inverse-variance weighted primary beam (Section 3.2). We plug Equation 23 into the Equation 22 The numerator and denominator are completely separate, we can calculate them individually before the division. Furthermore, since there are no cross terms, the summations can also be performed visibility-byvisibility, avoiding large matrix multiplication. Similarly, we write down the calculation of C elementwise Although the equation is complicated, the logic is clear: the numerator can be added visibility-by-visibility because there are no cross terms, and the denominator is the product of two diagonal elements of D.
In practice, we do not really analyze the visibilities one-by-one, we analyze visibilities in manageable batches, say per time integration. This operation loosens the requirement for N: as long as there are no inter-batch visibility noise correlation, the calculation can still be conducted piecewise, although the calculation may not be written down as simply as illustrated above.
To summarize, provided we keep track of the normalization, the final results from this piece-wise calculation are the same as if we analyzed all the visibilities simultaneously. This piece-wise calculation frees us from storing one large A matrix in RAM; we are now only limited by storing the noise covariance matrix C. The summations in the piecewise calculation also shows that the computation scales linearly with N baseline and N integration . Since the expressions are for one pixel or one matrix element, the computation scales with N pixel for the maps while scales with N 2 pixel for the noise covariance matrix. At last, the above discussion is about mapping one frequency channel, the computation also scales linearly with number of frequency channels. Table 2 shows the computation time for mapping the HERA Phase I data and the upcoming HERA data with 320 antennas. In summary, it takes 2.8 single-core CPU hours and 0.7 GB of RAM to generate the map in Figure 10, along with the covariance matrix. Computing maps across 300 frequency channels takes 840 single-core CPU hours, which can be done in one day with a 40-core server. Projecting to the 320-antenna HERA data, we will map 2.5 times more baselines and four times more pixels (from increasing map resolution). It will take 28 single-core CPU hours and 10 GB to calculate one map along with the covariance matrix. Calculating 300 frequency channels requires 8,400 single-core CPU hours, which can be done in < 9 days with a 40-core server.

CONCLUSION
We have introduced the direct optimal mapping algorithm for interferometric data and applied the algorithm to HERA Phase I data. The algorithm is designed to recover faint diffuse emission on a wide-field with welldefined statistics. It relaxes the requirements of small fields and coplanar antennas, which will be useful for future interferometer arrays, including space missions.
The algorithm is validated with simulated data for one twenty-second integration, showing that it achieves 10 −7 precision with noiseless data. After adding noise, we show that, with the noise covariance matrix, the map noise follows χ 2 ν ∼ 1 distribution. The χ 2 ν calculation accounts for the pixel correlation, which is essential for an interferometer array with incomplete uvw coverage like HERA.
After correcting the calibration bias (The HERA Collaboration et al. 2021), we map HERA Phase I data and compare the map with the GLEAM point source catalog and three different global sky models. The GLEAM catalogs (Hurley-Walker et al. 2017, 2019 match the point sources in the map. After subtracting the GLEAM point sources, the residual diffuse emission is best represented by the Byrne21 map (Byrne et al. 2021). Neither GSM08 (de Oliveira-Costa et al. 2008) nor GSM16 (Zheng et al. 2017b) provides a sky emission distribution consistent with the HERA measurement.
Finally, we presented an example of wide-field mapping along with calculating the pixel covariance matrix. We examined the computation cost and found that, with diagonal visibility noise matrix N and normalization matrix D, the mapping and covariance calculation can be conducted in batches of visibilities. This reduces the memory requirement for manipulating all visibility data at once. In addition to memory, the computation cost for mapping the HERA Phase I data and 320-antenna HERA data is one day and one week with a modern server.
With the direct optimal mapping algorithm, we are able to map visibilities into image cubes along with covariance matrices and point spread functions. We aim to cover followup analyses in future publications.