Evidence of Ultra-faint Radio Frequency Interference in Deep 21 cm Epoch of Reionization Power Spectra with the Murchison Widefield Array

We present deep upper limits from the 2014 Murchison Widefield Array (MWA) Phase I observing season, with a particular emphasis on identifying the spectral fingerprints of extremely faint radio frequency interference (RFI) contamination in the 21 cm power spectra (PS). After meticulous RFI excision involving a combination of the SSINS RFI flagger and a series of PS-based jackknife tests, our lowest upper limit on the Epoch of Reionization (EoR) 21 cm PS signal is ∆ 2 ≤ 1 . 61 · 10 4 mK 2 at k = 0 . 258 h Mpc − 1 at a redshift of 7.1 using 14.7 hours of data. By leveraging our understanding of how even fainter RFI is likely to contaminate the EoR PS, we are able to identify ultra-faint RFI signals in the cylindrical PS. Surprisingly this signature is most obvious in PS formed with less than an hour of data, but is potentially subdominant to other systematics in multiple-hour integrations. Since the total RFI budget in a PS detection is quite strict, this nontrivial integration behavior suggests a need to more realistically model coherently integrated ultra-faint RFI in PS measurements so that its potential contribution to a future detection can be diagnosed.


INTRODUCTION
The Epoch of Reionization (EoR) is one of few periods in the Universe's history that is still largely unconstrained.Since the neutral Hydrogen 21-cm line can directly track the remaining neutral Hydrogen at a given redshift, high redshift intensity maps of 21-cm radiation will allow us to directly map the reionization process.For reviews, see Furlanetto et al. (2006), Morales & Wyithe (2010), and Pritchard & Loeb (2012).A first step towards understanding cosmic reionization is to measure the power spectrum of redshifted 21-cm brightness temperature fluctuations, which is a particularly natural task for radio interferometers.Examples of interferometers that have produced upper limits on the reionization power spectrum signal are the Giant Me-Corresponding author: michael.wilensky@manchester.ac.uk trewave Radio Telescope (GMRT; Paciga et al. (2013)), the LOw Frequency ARray (LOFAR; van Haarlem et al. (2013)), the Murchison Widefield Array Phase I (MWA Phase I; Tingay et al. (2013)) and Phase II (MWA Phase II; Wayth et al. (2018)), the Precision Array for Probing the Epoch of Reionization (PAPER; Parsons et al. (2010)), and the Hydrogen Epoch of Reionization Array (HERA;DeBoer et al. (2017)).Lessons learned from analysis of early results have been specifically incorporated into newer generations of EoR-focused telescopes, such as the Hydrogen Epoch of Reionization Array, and the MWA Phase II.As the knowledge of 21-cm power spectrum analysis has grown, upper limits at various redshifts have gradually improved (Beardsley et al. 2016;Patil et al. 2017; Barry et al. 2019b;Li et al. 2019;Mertens et al. 2020;Trott et al. 2020;Rahimi et al. 2021;The HERA Collaboration et al. 2022a,b).
Theoretical models of the reionization signal suggest that it is 4-5 orders of magnitude fainter than typical astrophysical radio sources.Due to the extreme dynamic range requirements of these experiments, a myriad of data analysis challenges exist.Most prominently, power spectrum analyses require exquisite understanding of the spectral variations in the data in order to make a high quality measurement.Such variations have numerous origins: incomplete or inaccurate calibration models (Barry et al. 2016;Patil et al. 2016;Ewall-Wice et al. 2017;Byrne et al. 2019;Dillon et al. 2020;Byrne et al. 2021), intrinsic instrument bandpass response (Ewall-Wice et al. 2016a;Trott & Wayth 2016; Barry et al. 2019a;Li et al. 2019;Fagnoni et al. 2021), internal cable reflections (Ewall-Wice et al. 2016b;Kern et al. 2019Kern et al. , 2020)), and digital nonlinearities (Benkevitch et al. 2016).Additionally, failing to model refraction of incoming radiation through the ionosphere can significantly disrupt measurements relying solely on directionindependent calibration (Morales & Matejek 2009;Jordan et al. 2017;Trott et al. 2018).These effects can be mitigated using direction-dependent calibration techniques (Vedantham & Koopmans 2016;Hurley-Walker & Hancock 2018), however such techniques can sometimes induce signal loss (Mouri Sardarabadi & Koopmans 2019; Mevius et al. 2022).
Even if other instrumental effects can be handled, an interferometer is naturally chromatic because interferometric baselines measure different angular wave modes at different frequencies.This gives rise to a well-studied wedge-shaped contamination in cylindrical power spectra known as the foreground wedge (Datta et al. 2010;Trott et al. 2012;Morales et al. 2012;Vedantham et al. 2012;Hazelton et al. 2013).Generally the spectral resolution of typical reionization-focused instruments allows for sampling of larger line-of-sight wave modes compared to perpendicular wave modes, where sampling of the latter is determined by baseline lengths and orientations.Consequently, there is only a narrow region in this space that is both outside the wedge and accessible by typical baselines that is free of foreground contamination.However, systematic effects may still contaminate it.This area is known as the EoR window (Liu et al. 2014a,b).
An exhaustive review of systematic effects is outside the scope of this work.We refer the reader to the comprehensive review of data analysis in 21-cm cosmology presented in Liu & Shaw (2020).The purpose of this work is to present an upper limit of the cosmological 21-cm power spectrum signal derived from the second observing season of the MWA Phase I. Of primary importance in the analysis leading to this limit was the nature of a particular systematic effect ubiquitous to all radio experiments called radio frequency interference (RFI), considered here as any non-astrophysical radio signal that is observed by the telescope.Typical RFI sources in EoR and cosmic dawn experiments are anthropogenic, such as frequency modulated (FM) radio broadcasts, digital television broadcasts (DTV; Wilensky et al. (2019)), and ORBCOMM satellite transmissions (Line et al. 2018).In concept, RFI poses a unique threat to EoR measurements in that most sources demonstrate sharp spectral variation over the band used for an EoR power spectrum measurement.Wilensky et al. (2020) found that the contamination in the power spectrum produced by as little as 1 mJy apparent RFI flux can be comparable to the EoR signal on some wave modes.Bright RFI can be easily identified and removed in the raw visibilities.This process is called "flagging."A profusion of flagging algorithms exist in the radio astronomy literature.Two that have been applied to MWA phase I data are AOFlagger (Offringa et al. 2015) and SSINS (Wilensky et al. 2019).In Barry et al. (2019b), it was found that removing observations identified by SSINS to contain RFI that went undetected by previous excision methods made a measurable difference in the power spectrum upper limit, particularly for the most highly affected observations.
In our analysis, we find it valuable to distinguish between different types of RFI based on the level of sensitivity required to statistically identify it.We separate it into four categories: I Visibility level: RFI that can be detected in the visibilities of a single baseline, where most 21-cm EoR analyses do their flagging.
II Full array: RFI that requires the sensitivity of the full array (combined coherently or incoherently).More analyses are moving towards flagging at this level, e.g.Barry et al. (2019b); Li et al. (2019); Rahimi et al. (2021), The HERA Collaboration et al. (2022a,b), and this one.The SSINS algorithm, used in the aforementioned MWA limits and this work, is designed to find full array RFI. 1 III Ultra-faint: RFI that requires coherent averaging of some significant subset of data in consideration for an upper limit, but significantly less than the entire volume of data.It may also require the full array sensitivity in order to detect it.In this work we observe this in the Fourier domain, i.e. in power spectra, when coherently combining approximately 2-5% of the full analysis data set.
IV Unexcisable: RFI that is only detectable when coherently combining an extremely large subset of the data in a full-array manner, such that exclusion of the affected data would significantly alter the sensitivity of the analysis (e.g.half the analysis volume).
We propose this type of RFI as a distinct possibility that has not been ruled out, but do not claim to have necessarily observed it in this analysis.
In this work we show that RFI yet fainter than that detected by SSINS (ultra-faint RFI) exists within MWA observations, and its effect on the power spectrum measurement is observable when less than an hour of data is coherently integrated.We accomplish this by splitting the data into subsets that are either highly likely or highly unlikely to contain this ultra-faint RFI, and comparing their cylindrical power spectra.We confirm that the distinct signature is indeed associated with the RFI by constructing power spectra with and without RFI flags applied, and noting an enhancement of the signature when RFI flags are not applied.We also note, however, that this signature is not necessarily observed in our deepest integrations, suggesting that RFI may have favorable integration properties for EoR measurements.This leaves the status of RFI in EoR science unclear, since it may yet appear at a level fainter than our current integrations, but brighter than the cosmological 21-cm signal.
This paper is laid out as follows.In §2 we describe the instrument and data acquisition, pre-processing, calibration, and power spectrum estimation pipeline.From the flagged and subsequently calibrated data, we derive statistics about the RFI content and conduct a jackknife test to determine the effect of ultra-faint RFI on power spectrum measurements, presented in § §3-4.In §5, we show our final power spectrum upper limit, and in §6 we present our conclusions.

INSTRUMENT AND OBSERVATION DESCRIPTION
Here we briefly describe the properties of the MWA that are relevant to our analysis and provide some details about the choice of observational parameters.For a more thorough description of the MWA Phase I, see Tingay et al. (2013).We also describe some preliminary data cuts based on metrics that have been used similarly in previous MWA upper limits.

Instrument and Observations
To construct this upper limit, we used data from the second season of the MWA Phase I.This is a radio array located in the radio-quiet Murchison Radio Observatory in Western Australia.The Phase I configuration correlates 128 pseudorandomly located receiving elements to form interferometric visibilities.Each receiving element of the MWA is a square tile of 16 crossed-dipole antennas that have been beamformed to produce a single voltage signal.Using analog delays between individual dipoles within a tile, we can coarsely point the tile to different locations on the sky.
We select MWA observations of the "EoR0" field, which is centered on a right ascension of 0 hours and a declination of −27 degrees.Throughout each observing night, the array is repointed roughly every 30 minutes in order to track this field.This field has been studied in previous MWA EoR limits from the first observing season (Beardsley et al. 2016;Barry et al. 2019b), as well as in other seasons, array configurations, and observing bands (Li et al. 2019;Trott et al. 2020).We focus on five pointings centered on zenith as the field transits from East to West.This is based on analysis in Beardsley et al. (2016), where it was found that the galaxy's presence in the array sidelobes produces extreme contamination in the EoR window.This analysis was reconfirmed with a similar quality metric using the Phase II configuration in Li et al. (2019).
The voltage stream of each tile is channelized by a twostage polyphase filter bank (PFB; Prabu et al. (2015); McSweeney et al. (2020)).This divides the 30.72 MHz observing band into 24 coarse channels of width 1.28 MHz, each of which is divided into 32 fine channels of width 40 kHz.The observing band for this analysis extends from 167. 1-197.8MHz.Each observation is 112 seconds in length, with an integration cadence of 2 seconds.
In total we analyze 3168 observations from the second season of the MWA Phase I, which amounts to 98.56 hours of data.However, the vast majority of these are removed from the final limit by a strict ionospheric cut and subsequent RFI-based cuts.We describe these cuts in significantly more detail in the following sections and summarize the results in Table 1.In brief, about 1/3 of the observations have less-than-excellent ionospheric quality, another 1/4 could not have its ionospheric quality collected (usually a sign of poor observational quality), and the rest were cut strictly based on inferred RFI content.Among observations whose ionospheric quality could be gathered, we do not find any obvious correlation between total RFI occupancy and the magnitude of the ionospheric metric.We do observe a slight excess of observations with high RFI occupancy among obser- vations whose ionospheric metric could not be gathered, however this only accounts for a tiny minority of such observations.It could be that excess RFI is causing the failure in the metric gathering in some instances (e.g. one night of extremely bad RFI exists within this subset), however there exist numerous examples within the data for high-occupancy observations with collectable ionospheric metrics, so this hypothetical relationship is tenuously supported at best.Previous limits using a highly similar analysis pipeline on the MWA Phase I analyzed the first season of data (Beardsley et al. 2016;Barry et al. 2019b).We expected the second season of data may be better for power spectrum measurements due to the the correction of the "digital gain jump" from the first season as well as reduced digital nonlinearities.Both of these effects arise in the receiver chain.The digital gain jump is caused by an interaction between non-linearities in the digital signal chain (arising from quantization) and the choice of digital multipliers (called "digital gains") used to flatten the bandpass between the receivers and the correlator (Prabu et al. 2015).The digital gains are removed (by dividing out by their value) after correlation, but in some cases the digital non-linearities of the signal chain make it impossible to perfectly remove the digital gains.In the first season of MWA data, the digital gains that were chosen resulted in a single large jump at 188.16 MHz.This large jump interacted strongly with the digital non-linearities and as a result there was a residual discontinuity (called the "digital gain jump") in the data even after the digital gains correction.Data above the digital gain jump were excluded in Barry et al. (2019b).For the second season, the digital gains were chosen to have much smaller changes across the band, resulting in significant improvements in the bandpass smoothness after the digital gains correction.In addition we correct for the remaining digital non-linearities using a Van Vleck correction (Benkevitch et al. 2016), explained more in §2.3.See Barry (2018) for a detailed description and analysis of these effects.Finally, we will likely benefit in future analyses by combining multiple seasons of data, and therefore stand to gain from analyzing multiple seasons with a similar pipeline.

Analysis Overview
As with all 21-cm power spectrum estimation pipelines, ours involves several analysis steps.
A flowchart describing the process from data acquisition to power spectrum estimation is shown in Figure 1.Before calibration, imaging, coherent averaging, and power spectrum estimation, the raw visibilities must be preprocessed (cyan nodes in Figure 1).Compared to previous MWA limits that use the same calibration, imaging, and power spectrum software as this analysis (Beardsley et al. 2016;Barry et al. 2019b;Li et al. 2019), we use an enhanced pre-processing pipeline.We describe specific enhancements to the pipeline in the next subsection.
Pre-processing involves several corrections to the raw visibilities.The phases of the visibilities are adjusted according to the differences in cable length between the tiles and receivers, and also to specify a phase center on the sky.Since the alteration to the bandpass resulting from the PFB is theoretically known, this shape and the digital gains for each coarse channel are divided out.Due to quantization nonlinearities, this does not fully correct the bandpass, and residual frequency structure is left over.We address nonlinearities due to quantization error during pre-processing using a Van Vleck correction (Benkevitch et al. 2016), which in general corrects for quantization artefacts by solving an integral equation that relates the analytic correlation coefficient to the output of a digital correlator.This is a new technique in our pipeline, and we describe it in more detail in §2.3.We also flag RFI using Sky Subtracted Incoherent Noise Spectra2 (SSINS; Wilensky et al. (2019)), as well as 80 kHz around each coarse channel edge.Finally, we downsample to 80 kHz frequency resolution to save disk space and processing time.
The pre-processed visibilities are then calibrated, gridded, and imaged (at each frequency) in HEALPix format (Górski et al. 2005) using Fast Holographic Deconvolution3 (FHD; Sullivan et al. (2012); Barry et al. (2019a)).The calibration strategy is almost identical to that used for Barry et al. (2019b).However, we use the autocorrelations to establish a bandpass shape in the manner of Li et al. (2019), rather than how they were used in Barry et al. (2019b).Cable reflection systematics are handled during calibration by fitting the amplitude, phase, and delay in a hyperresolved Fourier basis (Barry et al. 2019a).We use a modified gridding kernel, as described in Barry et al. (2019a,b).We also grid model visibilities generated from the GLEAM catalog (Hurley-Walker et al. 2017), allowing us to form "dirty" (just the data), model, and residual (data minus model) HEALPix cubes, which are downsampled to 160 kHz frequency resolution.We can also propagate weights and variance information about the measurement by gridding as if each visibility is equal to 1 using the beam and the square of the beam, respectively.HEALPix cubes from multiple observations can then be coherently averaged before passage to Error Propagated Power Spectrum with InterLeaved Observed Noise4 (εppsilon; Barry et al. (2019a)), where power spectrum estimates are formed.We form separate cubes for the even and odd time integrations within each observation, so that εppsilon can form a power spectrum estimate without a thermal noise bias.Important features of εppsilon include the use of a generalized Lomb-Scargle periodogram as well as multiple forms of noise metrics, described in Barry et al. (2019a).
The in-situ simulation capabilities of FHD allow for robust tests of signal loss.By simulating visibilities for a fiducial EoR signal and estimating a power spectrum from them under various conditions, Barry et al. (2019b) largely verifies that there is no appreciable signal loss in the power spectrum estimation pipeline used in this work.More specifically, in the absence of direction-independent calibration errors (note there is no direction-dependent calibration in the pipeline, an effect known to cause signal loss in some instances), we are able to recover an input EoR signal in an in situ simulation i.e. residuals from gridding and other effects are beneath the expected signal.This and our end-to-end error propagation allows us to confidently place upper limits on the EoR power spectrum signal in our measured power spectra.
We also selected data based on ionospheric quality using a metric described in Jordan et al. (2017).This metric is based on a combination of the median ionospheric offset as well as a PCA-based measure of offset isotropy to determine ionospheric quality.Lower scores indicate less active ionospheres.We perform a relatively harsh cut, removing all observations from the analysis with an ionospheric quality assurance metric of 5 or greater, cf.Trott et al. (2020), which amounted to 33.26 hours.We also cut any observation for which the ionospheric metric could not be gathered, of which there were 25.32 hours.This usually results from a failure of the observation to calibrate in the MWA Real-Time System (RTS; Mitchell et al. (2008)), which is a requirement for assessing the ionosphere.In some instances, this can indicate poor observational quality.Since we wanted to isolate the effect of RFI on power spectrum measurements, we resolved to remove such observations outright, thus avoiding a potential source of confusion.Combined, these cuts remove 58.58 of the 98.56 hours of data from the original selection that was based solely on field, observing band, and pointing.We show the result of this cut in Figure 2.

Pipeline Enhancements
The majority of the pipeline enhancements in this analysis are related to RFI flagging and analysis of RFI statistics to inform data cuts.We briefly describe our flagging enhancements and changes here, and devote §3 to describing the statistical analysis.We also describe our digital nonlinearity correction and calibration enhancements here.
In Barry et al. (2019b), a major improvement to the limit came from excluding observations that contained DTV RFI.These observations were identified by SSINS, which is designed to identify full-array RFI ( §1).SSINS is described in full detail in Wilensky et al. (2019), but we also briefly describe it here.As implemented in this work, it begins by time differencing the uncalibrated visibilities of a single 2-minute observation at the 2-second integration cadence, which subtracts out all slowly varying components such as the cosmological and astrophysical signals.It then incoherently averages these visibility differences (averages their amplitudes, discarding their phases) over all the baselines to produce a single dynamic spectrum.In the absence of RFI, this remaining spectrum should consist of Gaussian noise.Assuming weak stationarity (unchanging mean and covariance function), the mean of this Gaussian at each frequency is determined by a time-average of each channel.Due to the properties of the thermal noise in the visibility, the standard deviation of the noise at this stage is proportional to its mean, so we can obtain a z-score for each sample in this way.We then use an iterative match filter to search for pre-determined shapes within the spectrum that belong to known sources of interference.This can be thought of in a Bayesian setting as generating flags using the maximum a posteriori decision rule amongst a small dictionary of rectangular signals (top-hat power spectral density) of a given strength.The resulting flags in this spectrum are propagated across all baselines and in time according to the visibilities that formed the corresponding differences.
While SSINS is capable of producing flags at the time and frequency resolution of the visibilities, the actual RFI flags from SSINS were not applied in the Barry et al. (2019b) limit.Rather, observations with known RFI contaminants were excluded from the analysis altogether, including data within those observations that were not classified as contaminated by SSINS.In this analysis, we apply the SSINS flags to the data, extending them in frequency so as to avoid excess EoR window power generated by chromatic flags (Offringa et al. 2019;Ewall-Wice et al. 2021;Wilensky et al. 2022a).Actually applying the SSINS flags could allow for a measurement with reduced thermal uncertainty compared to Barry et al. (2019b) since it may allow for more data to be used.However, in §4 we analyze power spectra with these flags applied at small integration depths and resolve to fully remove any observations identified as containing RFI in developing the final limit, as was done in Barry et al. (2019b).
In this limit, we do not apply AOFlagger flags, unlike every other MWA EoR limit so far.Upon examining the outputs of each flagging pipeline, we found a modest number of RFI events that were caught by AOFlagger and not by the pre-extended SSINS flags.After frequency extension, many of these events were coincidentally excised.Additionally, some samples that were flagged by AOFlagger and not by SSINS may be false positives that result from AOFlagger's morphological detection algorithm, which can overextend flags for broad features (Offringa et al. 2012).The AOFlagger flags have a small random false-positive rate (percentlevel, depending on the night; Offringa et al. 2015) that appears uniformly distributed throughout the times and frequencies of the data.If we extend these flags in frequency to avoid excess power in the window, we incur massive data loss due to the false positives.In principle some of these data could be recovered by implementing some of the strategies in Offringa et al. (2019), but we did not do this.Since we cannot generically extend the AOFlagger flags in frequency without causing massive data loss, we opt to entirely forego them at the possible expense of a few extra false negatives in the entire season.
We apply a Van Vleck correction for the nonlinear distortion present in the raw visibilities during pre- processing. 5The MWA digital signal pipeline involves several stages of quantization.The Van Vleck correction employed in this work corrects the nonlinearity associated with the final round of quantization.Corrections for other quantization stages are forthcoming.The actual correction involves solving an integral equation that relates the quantized and analog visibilities.This equation has no closed-form inverse for the MWA bit depth.In order to generate fast solutions, the equation is approximately inverted using Chebyshev polynomials.The Van Vleck correction reduces calibration errors due to spectral structure from the digital nonlinearities that cannot be handled by a calibration that assumes a linear gain.
Most of calibration proceeds identically to Barry et al. (2019b).To speed calibration convergence by roughly a factor of 2 relative to what was used in Barry et al. (2019b), we employ a Kalman filter during calibration.Within iterations of the calibration solver, a new guess is determined based on a weighted combination of the previous guess and the current one.The filter is a Bayesian optimization technique where a prediction about the likely location of the solution guides the weighting of the two guesses so as to speed up calibration (as opposed to e.g. a strictly gradient-based approach).The 2013 MWA data had a digital gain jump approximately 2/3 the way through the EoR highband, which required calibration to operate separately on the sections of the band to either side of this discontinuity.The data we selected in 2014 did not require separating the band into two pieces for calibration since the digital gain jump has been corrected in data acquisition.For estimation of the bandpass, we use the autocorrelations in the style of Li et al. (2019), described thoroughly in §5 of that reference.This method avoids pitfalls associated with using a particular reference antenna, which may cause idiosyncrasies of that antenna (e.g.imperfect cable reflection fitting) to be imprinted on the other tiles.

ANALYSIS OF RFI STATISTICS
We perform an occupancy analysis based on the SSINS flags.We find several interesting features about the time dependence of the RFI occupancy.We also use these flags in order to study the brightness statistics of RFI after calibration.We then use these time and brightness properties in order to conduct jackknife tests with cylindrical power spectra.We find that there is a definite signature of ultra-faint RFI in the cylindrical power spectrum that is most visible in sub-hour length integrations.The effect on the power spectrum is the main topic of §4, however we comment briefly here on possible reasons for an enhanced RFI signature in the power spectrum at certain integration depths.
Our sub-hour integrations tend to consist of observations from the same night or relatively few nights, and oftentimes observations that are contiguous with one another.If these contiguous observations are flagged due to a common RFI reflector, then it is possible that the RFI signal might integrate coherently for those observations.For example, if an airplane moving due N-S is reflecting DTV interference, then the E-W baselines in particular will observe no fringing of this source and thus average coherently.In longer integrations, we are likely to combine more RFI sources since we include more nights of data.If different RFI sources tend to average incoherently, then deep integrations may dilute the RFI signal and therefore produce smaller power spectrum contamination (Wilensky et al. 2020).Our observations therefore seem to support the idea that independent RFI sources tend to average incoherently.This is our leading hypothesis as to why we seem to see more contamination in sub-hour integrations compared to longer ones.

Summary of Flagging Settings
The match-filter in SSINS can be programmed to search for occupants that span arbitrary contiguous frequency ranges in the observing band.We generally reserve this feature for physically or empirically motivated frequency ranges, since searching every possible frequency range creates massive computational overhead.The most common occupants for this observing band are digital television and whole-band "streaks" of uncertain origin (Wilensky et al. 2019). 6Hereafter, we will refer to these whole-band streaks as "broadband" RFI.Based on the Western Australian digital television allocations, we search for four 7-MHz-wide DTV signals, all adjacent, starting at 174 MHz; these are designated as channels 6-9.Channel 9 begins at 195 MHz, so we only observe 2.8 MHz of its allocated bandwidth.We also search for narrowband occupants and broadband RFI.
Recall from §2.3 that the SSINS z-scores are calculated from the 2-minute time-average of incoherently baseline-averaged visibility differences, and reflect a probability under a null hypothesis for pure thermal noise to achieve a deviation from the mean at least as large as the data's.Under that null hypothesis, we expect less than 1 datum with z-score greater than about 4.2 in a single observation's SSINS.For all shapes other 6 The uncertainty stems from the fact that the contaminants are not band-limited, and can therefore come from a number of hypothetical sources.Some occurrences have been clearly associated with extremely bright ORBCOMM (Wilensky 2021), but this only accounts for a small fraction of instances.Table 2. Association of shapes in the match filter, their frequencies, and the significance thresholds used in the filter.The TV9 shape is allocated all the way to 202 MHz, but our observing band cuts off at 197.8 MHz.Narrowband shapes are 1 fine channel wide and allowed anywhere in the band.
than broadband RFI, we use a significance threshold of 5, i.e. a (potentially frequency-averaged) sample whose z-score is 5 or greater is considered as contaminated.For broadband RFI, we use a significance threshold of 10.We use a greater significance threshold for broadband RFI since the data exhibit lightly nonstationary thermal noise as a result of the receiver refrigeration cycle, which results in a significant number of false broadband detections at z-scores less than 10 (Wilensky 2021).
The SSINS match filter proceeds iteratively in that when it detects an RFI event, it excises that data and recalculates the z-scores of the remaining data before searching for the next strongest RFI event.Before recalculating the z-scores, we flag a frequency channel for the entire observation if its occupancy is greater than 60% and extend flags in frequency.This combination means that if any RFI occupant is found for 60% of an observation, the entire observation is flagged.We summarize these settings in Table 2.

RFI Occupancy Analysis
Any time data is excised by the SSINS match filter, the time, frequencies, and shape label of the most recently identified RFI event are recorded.We analyze this flagging metadata to understand the RFI content of the data set.First we discuss a few important caveats about this flagging metadata related to misclassification, which are explored in substantially more detail in Wilensky ( 2021).Then we present the occupancy results.

Flagging Nuances
Below we list a few nuances about the SSINS flags that are important to understand when interpreting trends within the flagging results.

Prioritizing excision over classification:
While SSINS does associate each RFI event with a label, there can be misclassifications that produce identical flags, and we do not formally check for this in every possible instance.For example, 3-4 DTV events occurring simultaneously may be classified as a broadband event since combinations of DTV events are not checked in the match filter.This will produce the same flags after frequency extension.We judge the rate of this misclassification to be low based on hand-grading the SSINS flags for each observation (prior to frequency extension).
2. Assumption of stationary noise: The z-scores in SSINS are calculated assuming weak stationarity (specifically constant mean and variance in this case).When bright sources enter or exit sensitive parts of the beam, the variance of the thermal noise can change significantly on timescales associated with the beam crossing time, which is insignificant for the 2 minute observations in this work.In the MWA Phase I, there are 16 receivers, each of which are responsible for digitizing and channelizing the voltage signals of 8 tiles.A temperature-dependent gain in the amplifiers (Barry 2018) causes smooth temporal variations in the noise variance on time scales associated with the receiver refrigeration cycle (a few minutes; Wilensky 2021).Since SSINS flags are generated on uncalibrated data, this cycle tends to impart a gradient in the z-scores that is minuscule but coherent in the SSINS over the entire observing band.The coherency leads to an overreporting of broadband events at a given significance.We counter this by increasing the significance threshold for broadband events.We expect this produces an average false positive rate of less than 2%.

Insufficient time samples:
In heavily contaminated observations, having fewer time integrations with which to estimate the mean per frequency and polarization prevents the z-scores from being approximately Gaussian.What we generally find is that the z-scores calculated by SSINS for simulated stationary Gaussian noise are more concentrated than a standard normal random variable, which can lead to false negatives.For this reason, we completely flag all observations where greater than 60% of integrations have been flagged.

Occupancy Results
In this section we discuss the RFI occupancy of the data set as reported by SSINS.We find that the overall distribution of RFI occupancy is relatively unaffected by the ionospheric cut, and therefore only show distributions before the ionospheric cut so that we have a larger sample size.
As an example of fairly typical flagging results for moderately bright DTV contamination, we show Figure 3.This shows the raw SSINS and its z-scores for a 112second observation in the data set.There is an initial flagging mask that flags the edges of the coarse channels as well as the first integration.This latter flagging set is a holdover from previous analyses, where the correlator would occasionally produce oddities in the data at the very beginning of the observation.We did not check whether this behavior was still present, and flagged the beginning of each observation just in case.The DTV contamination is most obvious in the mean-subtracted version, where it is clear that it lasts at least 20 seconds.The z-scores shown are based on a time-average of the unflagged data, which biases the z-scores low for the rest of the integrations in the channels occupied by the DTV.Since the flagger recalculates z-scores after each event is identified (Wilensky et al. 2019), this initial bias does not lead to extra false detections.The middle row shows the results of the SSINS match filter, indicating that the event lasted roughly 40 seconds.In imaging experiments for DTV events that appear in SSINS similarly to these, we find that they are usually some sort of aircraft reflection often appearing in the second southern sidelobe of the beam.The bottom row shows flags after frequency extension.We use these extended flags to develop the flagging time series used in the statistical analysis in this section.
In Figure 4, we show a histogram of the TV occupancies for all of the 112-second observations before the ionospheric cut.We observe that channels 6, 7, and 8 have very similar occupancy distributions, while channel 9 is much rarer to observe, appearing in only 3.5% of observations.Examining broadcasting stations in Western Australia, we find that channels 6, 7, and 8 are used relatively equally, while channel 9 is used less often.Since we only observe 2.8 MHz of channel 9's allocation, SSINS is also less sensitive to its presence.
Since the majority of the observing band is allocated for DTV, we expect very few narrowband events, which is consistent with what we observe.Curiously, when we do observe narrowband events that are clearly true positives, they are often within the DTV allocation.Some of these events are extremely bright, and may be a result of out-of-band RFI clipping the ADC or causing intermodulation products.Alternatively, there could be locally generated RFI.
Since we suspect a significant number of the RFI events recorded by SSINS are due to reflections from aircraft (Wilensky et al. 2019), we seek to understand the timelike properties of the RFI occupancy.First, we notice that the different TV channels are sometimes  flagged simultaneously, and flags tend to appear in clusters within a night.By constructing the correlation function of the time series of flags belonging to different channels, we sometimes observe that flags for different TV channels are strongly correlated within a night, however not all nights have such correlations.Not all broadcasting stations in Western Australia transmit at the same frequencies, and there is probably a range of aircraft trajectories that intercept these various stations.Furthermore, direct reception of the signals is possible.
For instance, there is a night of times with extremely high TV occupancy on August 27, clearly visible in the occupancy scatter in Figure 5.The SSINS of this night show extremely strong, persistent DTV interference.Since the RTS could not calibrate these observations, no ionospheric metric could be gathered to assess the possibility of long-range ionospheric-based reception such as "sporadic-E" propagation.This could be a tropospheric ducting event allowing for long range direct reception of the RFI (Sokolowski et al. 2016), however we have not cross-referenced any sort of weather database or performed any other type of analysis that allows us to understand the exact origin of this extreme event.
Combining all of the aforementioned properties, we expect that we sometimes observe different DTV signals simultaneously, but not always.The flags reflect this in their two-point statistics and this expectation is corroborated by manual inspection of the SSINS.
In Figure 5, we show the total occupancy of each observation with the flags extended.Here, the clustering behavior of the flags is more apparent.To also illustrate this clustering, we show a histogram of interarrival times between SSINS flags (time elapsed between flagged integrations) in Figure 6.We do not include the night with strong, persistent interference.We also show two models with parameters estimated from the data: a Poisson process based on the mean flagging rate and a Markov model based on the observed transition rates between flagged and unflagged data.Over 80% of the mass of this histogram is contained at the sample spacing.If we model each night as its own Poisson process (i.e. each night has a different false positive rate, and the flags are dominated by false positives), Poisson mixture model fails to predict the abundance of mass at the sample spacing and also decays much faster than the data, which means it underpredicts long gaps between flagged data.This suggests clustered flagging with a range of gap sizes rather than a simple mixture of Poisson processes.If we perform a Kolomogorov-Smirnov test, we obtain a test statistic value of about 0.7, which corresponds to the discrepancy at short interrarival times, highlighting the Poisson model's failure to predict clumps.This corresponds to a p-value that is basically 0 in floating point precision, i.e. there is almost no probability that the empirical CDF of a collection of random samples from the Poisson mixture model would appear at least as discrepant as the data.
To probe the clumping properties of the flags, we examine the transition probabilities in the time series.What we find is that unflagged data is almost always followed by more unflagged data at a rate of 98%, while flagged data is followed by more flagged data at a rate of 80%.Since the flag occupancy is relatively low (∼ 10%), this manifests as long runs of unflagged data with smallmedium sized clumps of flagged data.We find that such   clumps can be modeled using a discrete time Markov chain with transition rates matched to the data.Each night we generate a time series using that night's transition rates, and then we histogram the resulting interarrival times.Specifically, we begin with an initial seed state, x 0 , which takes the value 1 or 0 (1 indicates that it represents a flagged datum, while 0 indicates otherwise).We then sample from a Bernoulli distribution with probability of success equal to P (X 1 = x 0 |X 0 = x 0 ).For example, if x 0 = 1 and the Bernoulli sample is a success, then x 1 = 1 as well, otherwise x 1 = 0. We then repeat this process to generate the chain for each night.While the realization of the season of interarrival times is noisy, it appears to emulate the interarrival time distribution.This quantitatively supports that the apparent clumping of the flags is indeed real.
The RFI flag time series for each night is not entirely consistent with the Markov chain; they differ in their two point statistics.The Markov chain is memoryless, and so any given realization generally does not produce an autocorrelation function with strong peaks away from zero lag (difference in time between samples).On the other hand, many of the RFI flags exhibit discernible peaks at a range of lags, though some nights with comparatively few flags appear somewhat consistent with the Markov realization.We show an example in Figure 7.This suggests that flags from SSINS are not entirely random point processes with only short-range correlations.These additional peaks in the autocorrelation function may occur when reflectors transit different sidelobes of the beam, or may suggest multiple reflectors with some lag between them such as multiple aircraft on the same flight path.The additional peaks also appear to have a characteristic width from night-to-night, possibly indicating a typical clump size related to the speed of reflector transit through the beam.
It is clear from the transition probabilities, interarrival times, and two-point statistics of the RFI flag time series that RFI events tend to temporally cluster.This motivates a particular style of jackknife test, where observations with no RFI flags are compared to observations that contained some RFI flags.The logic is that an observation with RFI flags is more likely to contain RFI just beneath the sensitivity of SSINS than observations with no flags at all.In the next section, we develop a series of jackknife tests based on the brightness of the flagged emitters with this physically motivated assumption in mind.

RFI Brightness Analysis
The excess contamination in the power spectrum from RFI responds quadratically to its flux.Since SSINS uses incoherently averaged time-differenced visibilities, it is not immediately apparent how to exactly relate the brightness in the SSINS to the brightness in the visibilities.However, the amplitude of a visibility difference can be no more than the sum of the brightnesses of the two samples that were differenced.In more detail, if the two samples are out of phase by exactly π, this upper bound is achieved.However, if the difference in phase and amplitude is small, then the amplitude of the difference will also be small.This relationship may not be constant over the set of baselines, however the samples identified by SSINS will fall more often in the former case than the latter with the exception of extraordinarily bright RFI (i.e.SSINS will select for large phase or amplitude differences).Therefore, we can use the SSINS amplitudes as a proxy for the brightness in the visibilities.
In Figure 8, we show histograms of SSINS amplitudes as well as z-scores for the entire season, separated by those samples identified as contaminated (flagged) and those that were not (unflagged).In the top panel we show SSINS amplitudes.Due to the central limit theorem, we expect that the per-frequency distribution of the purely thermal samples in the SSINS follow a Gaussian distribution.However, when all frequencies and many observations are combined in mixture, we observe a highly non-Gaussian distribution.In the bottom panel, the unflagged z-scores, which are calculated per frequency and per observation, are highly Gaussian, reflecting the statistical assumptions underlying the SSINS pipeline.The unflagged samples have brightness between 33 and 52 Jy.On the other hand, the flagged samples have amplitudes that range from 30 to 450 Jy.The z-scores of the flagged samples exhibit a highly non-Gaussian distribution.
Many of the flagged z-scores are far below the significance threshold.This occurs when the sample was flagged as a part of a shape in the match filter.These shapes are determined from a combination of official allocations from the Australian Communications and Media Authority7 and empirical verification of the pres-ence of such RFI shapes in our data (Offringa et al. 2015;Barry 2018;Wilensky et al. 2019).Some shapes are significantly broader than others.Broader RFI can be found at a lower average flux density compared to narrower shapes due to the fact that broad clusters of positive (or negative) outliers would exist only with vanishing probability in a purely thermal SSINS.Thus, the z-scores at the single channel level can be very low when their corresponding measurements are identified as part of a broad shape.
These data are perhaps better visualized in Figure 9, where we show the same quantities as in Figure 8, but emulate the match filter process by summing over frequency and histogramming the maximum absolute deviation across the polarization axis of the averaged scores.The top panel now appears obviously bimodal for both the flagged and unflagged data.Remaking this plot per pointing shows that the mode above 40 Jy comes from the eastern-most pointing we selected (designated -2 in this work), while the other mode is a combination of the other four pointings.The thermal background in the bottom panel now takes the form of an extreme value distribution for a Gaussian of sample length equal to 4 (number of polarizations), and the recalculated flagged z-scores are offset from zero.Many still exist below the significance threshold.This happens because we flag uncalibrated data, but these data have been calibrated before being histogrammed.This can slightly change the z-score calculation since calibration corrects relative amplitude variations between different tiles, ultimately affecting the time dependence of the SSINS and therefore the z-scores (Wilensky 2021).
We argue that these are still likely to be true positives.The frequency-summing step in the SSINS match filter reduces the sample size in consideration by the number of frequency channels in the observation.This means that substantially fewer extreme values are expected in sub-band sums, and so a significance threshold of 5, which was set based on the sample size before summing, gives a wider margin of outliers compared to what is expected from the thermal background.More importantly, we find that observations with only these low significance values are still correlated with excess EoR window power compared to their clean counterparts.We show these in the next section by constructing jackknife tests designed to investigate the possibility of residual RFI in observations flagged by SSINS.
The level to which residual RFI contaminates a power spectrum is determined by its apparent brightness and frequency structure.The apparent brightness is determined by the transmitter brightness, the circumstances of propagation, and the relative position of the telescope primary beam.The larger population centers of Geraldton and Perth are located to the Southwest and South of the array, respectively.Either of these locations may take in flight from countries Northwest of Australia, which would put reflectors in a more sensitive part of a Westerly pointed beam.Smaller cities and towns are also located due West, such as Carnarvon and Denham.
Since transmitters tend to be located close to population centers, we expect that RFI contamination should generally increase when the telescope is pointed more to the West.
To probe these different variables, we establish a series of jackknife tests where we group observations according to different RFI-related variables and examine their integrated power spectra.Some observations in the data set have no flags reported by SSINS, while some are reported to contain RFI.In some of our tests, we purposefully disable flags before calculating power spectra.It is standard to refer to data that has been identified as suspicious as "flagged" and data that has not been flagged as "unflagged."The latter term has a particularly confusing linguistic structure that seems to imply "undoing flags that were already present" but more often refers to data that was found to be without a need for flagging i.e. deemed non-suspicious after a rigorous inspection (and indeed this is how we meant it in §3).Since we need to make use of both senses of the word "unflagged," we instead use the following nomenclature in this section when categorizing observations based on their SSINS: • Pure: Observations for which SSINS found no RFI.
• Absolved: Observations for which SSINS found RFI, and we have applied the flags so that the contaminated data has been excised.
• Repentant: Observations for which SSINS found RFI, but for investigative purposes we have elected not to use the calculated flags so that the contamination is still present.
Because the EoR power spectrum is sensitive to very faint RFI contamination we want to test whether the data immediately adjacent in time to detected RFI is also contaminated but at a brightness below SSINS' sensitivity.To make this comparison, we compare the power spectra of absolved observations (RFI detected and excised to the best of our ability) to those of LSTmatched pure observations with no detected RFI.At this stage, we only consider observations that pass the ionospheric cut.We jackknife by dividing the absolved observations into integration subsets separated by RFI shape reported by the match filter (narrowband, broadband, or any TV channel), telescope pointing (integers from -2 to +2), and reported z-scores of the RFI events (bins with edges 3, 5, 10, 100, 1000, 10000).For each absolved observation in a subset, we find a matching pure observation at a similar sidereal time.We then integrate these subsets, form power spectra, and examine them side by side.This produced 160 power spectra per polarization (80 pure/absolved pairs) over the 1285 observations that passed the ionospheric cut.Before examining our jackknife axes, we examined each power spectrum individually, paying special attention to the EoR window where we expect RFI contamination to be most obvious if at all present.Of these 160 observation sets, we identified 46 observation sets, 32 absolved and 14 pure, that demonstrated clear excess power in the lower left corner of the EoR window.The excess power is "signal-like" in the sense that it appears to vary smoothly as a function of k ⊥ and k ∥ until it seemingly becomes noise dominated at very high k ∥ .This feature can vary in power anywhere from 10 9 to 10 11 ( mK 2 h −3 Mpc 3 ), with stronger features generally appearing in shallower integrations.Since most of these integration subsets are relatively short (approximately 1 hour or less), a signal-like component in the window is surprising and indicative of systematic contamination.
This feature almost exclusively appeared in power spectra made from only E-W polarized dipoles and has similar morphology across all identified sets.For the 14 pure observation sets we identified, 12 of their absolved counterparts were also identified in the 46 identified sets (that is, 24 of the 46 sets were LST-matched pure/absolved pairs).The power is usually substantially worse in the absolved sets.The significantly larger severity and prevalence of this feature in the absolved sets The depth of integration ranges by about a factor of 2.5.All of these power spectra appear noise-dominated in the EoR window (region above the solid black line) in that there is a roughly even speckling of positive and negative data.Bottom: Corresponding absolved power spectra, where any RFI type is included so long as the observation has RFI z-scores that lie between 10 and 100.We see a signal-like component in the EoR window.Rather than an even speckling of positive and negative data, we see consistent, smoothly varying positive power up to and occasionally past the first coarse band harmonic.This suggests residual RFI that was uncaught by SSINS.This contamination is generally worse in the later pointings that point more West.There is no clear correlation between integration depth and contamination levels.
suggests that this signature is likely to be residual RFI unidentified by SSINS.8 To further assess the probability of this hypothesis, we co-examined pure/absolved pairs across axes of our predetermined subsets.If the severity or prevalence of the effect is enhanced for choices of RFI parameters that should enhance them, then the evidence for the RFI hypothesis is increased.

Power Spectrum Jackknife Test Analysis
In Figures 10-15, we show several representative examples of such comparisons across each of our jackknife axes.In each comparison, we hold two of the three jackknife axes fixed and compare across the third (e.g. Figure 10 holds z-score and RFI shape fixed and lets pointing vary).While some comparisons clearly support our premise, others are more anomalous though not inexplicable.We show one more test in Figure 16 in which we compare the absolved and repentant forms of a single DTV-affected observation set (i.e.we make power spectra with and without flags for the same observations), and note an obvious enhancement of the contamination This comparison demonstrates two features that were a common among our 2d power spectrum jackknife tests.First, excess power is more noticeable for observations in which the identified contaminants had relatively low SSINS z-scores.Since the number of contributing observations ranges from 1 to 32, there is significant variation in the noise levels of each integration, which makes the significance of excess power from residual RFI harder to discern from this Figure alone.Figure 13 makes this claim more evident.Second, the +2 pointing appears to be the most obviously contaminated pointing, and contamination can sometimes even seen in the pure observations.For example, the corresponding pure integration for the power spectra whose SSINS z-scores fall between 10 and 100 as well as the one for z-scores between 3 and 5 appear to have moderate excess window contamination, though less than their absolved counterparts.
in the repentant power spectrum.The enhancement has the same morphology as the feature we identify in our other jackknife tests, even those with more dubious results.Overall we find the results of these tests to be highly compelling and supportive of the idea that this power spectrum feature is the signature of ultra-faint RFI.In what follows, we walk through the specific comparisons associated with each figure in more detail.

Pointing Test
In general, we found that this excess power becomes more noticeable as the array points more Westerly.As an example, we show a sequence of pointings for integrations over observations with any RFI type in them whose z-scores were all between 10 and 100 in Figure 10.The excess window power suggests that absolved observations possess residual RFI uncaught by SSINS, i.e. ultra-faint RFI, that makes a noticeable effect on the power spectrum measurement.In the full set of integrated power spectra, we see the pointing trend occur regardless of RFI type, and in most z-score bands.We also note that 13 of the 14 pure subsets we initially identified as containing the excess window contamination all came from the 2 most Western pointings.Supposing this is caused by ultra-faint RFI, we expect more prevalence for Western pointings physically since there are more transmitter sites and population centers towards the West of the array than towards the East, and so pointing the telescope beam more West will put associated RFI events in a more sensitive location in the beam.
Since showing every integration comparison available to us in the manner of Figure 10 would be impractical, we show a summary statistic in Figure 11 to demonstrate our claimed trends.For each integration subset, we take the spherically averaged power spectra, excluding contributions from the foreground wedge and coarse band harmonics, and divide the power at each wave mode by its noise level as propagated by εppsilon.This choice of weighting allows us to more easily compare integrations with varying amounts of contributing observations.
For the top two panels, we calculate the empirical cumulative distribution function for the absolute value of this quantity, collated by pointing and whether or not the observation set is pure or absolved.In other words, for each distribution function, the pointing and pure/absolved status is fixed, and the brightness within the SSINS and RFI type are unrestricted.We expect that the noise is, at least approximately, zero-mean and Gaussian distributed (Wilensky et al. 2023).Therefore if the data were strongly noise-dominated, we would expect this distribution to resemble a folded Standard Nor-mal distribution (i.e. a normal distribution with mean zero and standard deviation 1, folded over the vertical axis since we are looking at the absolute value).What we observe is that neither the pure nor absolved observation sets appear noise-dominated; deviations from zero are greater than expected under that hypothesis.Importantly, the absolved data (top row) are clearly ordered by pointing from East to West for weighted power values greater than about 1 (though zenith and +1 track each other almost perfectly).The pure data, on the other hand, are not clearly ordered by pointing until the very highest power values.In summary, absolved power spectra from more Westerly pointed data appear consistently more outlying in the EoR window than spectra from more Easterly pointed data, and power spectra from pure data do not show this as obviously or for as great a range of powers.
In the bottom panel, we compute the (signed) difference between the inverse-noise-weighted spherical power between corresponding absolved and pure power spectra and form the cumulative distribution functions collated in the same way as the top two panels, but without taking the absolute value.In this case, we would expect that if the pure and absolved sets were biased by the same amount relative to the noise (potentially indicating similar residual systematics, though this comparison is complicated), then the difference would be noise-like and appear normally distributed with standard deviation equal to √ 2. We find in this panel that the mass of the data is almost exclusively to the right of this null hypothesis, except in the -2 pointing, indicating that absolved power spectra are generally more positively outlying than pure power spectra.Thus, there is a clear statistical difference between the pure and absolved data, and we suggest that this difference is caused by residual RFI.
We generally find that the excess power is far more noticeable in the East-West polarization than the North-South.We know from inspection of the SSINS that RFI is often seen as unpolarized, i.e. roughly equal strength in all four instrumental polarizations.In some specific instances it can appear polarized.Usually if it appears polarized, it appears stronger in the East-West dipoles.Occasionally it will appear stronger in the North-South dipoles.The apparent polarization of the RFI source is a combination of its broadcast polarization, some propagation effects, and perhaps most importantly, its geometric location relative to the array.
Since a dipole has no sensitivity along its axis, RFI from the Southern or Northern horizon will appear to the array as if it is East-West polarized, and vice-versa for the Western and Eastern horizons.There are a num- Figure 13.Cumulative distribution functions in the same style as Figure 11, but this time collated by the z-score of the events found by SSINS.In this case, the absolved observations appear to fall into two classes.The observations with extremely bright events do not appear to produce significantly outlying power spectra, whereas observations with only faint or moderately bright RFI appear to produce more strongly outlying power spectra.The pure power spectra also seem divided in this way, although the distinction is less obvious.Since observations with lower SSINS z-scores are more plentiful, this trend that is present in both the top and middle panels may indicate a systematic that integrates somewhat coherently over the time scales used in these jackknife tests.We again see that absolved power spectra are more often positively outlying than pure ones.
ber of digital television transmitter sites in Western Australia, and therefore a range of possible propagation directions for DTV RFI.The strongest transmitters are in Perth, which is to the South and slightly West.However, direct reception is extraordinarily unlikely due to the extreme remoteness of the MWA.It is more likely that reflections off of aircraft or other transient phenomena make up the bulk of DTV receptions.For example, the largest population center near the MWA is Perth, and most flights that could reflect DTV to the MWA are probably headed towards there (or perhaps towards Geraldton, which is slightly Southwest of the array).Some of these reflections occur when the aircraft is near its destination, and so the RFI source will appear in the Southern sidelobes, e.g.Wilensky et al. (2019).Since this is also where the strongest transmitters are, we expect more noticeable power spectrum contamination in the East-West polarization.

SSINS z-score Test
In Figure 12, we show power spectra in the +2 pointing, which is the most commonly contaminated pointing in our jackknife tests, separated by the SSINS z-scores of the identified events without discriminating between RFI types.Interestingly, we see that integrations of observations with high SSINS z-scores do not display the characteristic excess power in the window.However, this may be due to the fact that such high z-scores are relatively rare, and that there is significant variation in the noise-levels between different members of the jackknife test.It may be that the noise must be integrated down before an obvious RFI signal appears in the window.We also remark that all integrations with RFI z-scores below 100 have a corresponding power spectrum made from pure observations that demonstrates some excess power in the window, though not as severely as the power spectrum for absolved observations.This suggests the presence of false negatives in the SSINS pipeline that can have a significant effect on the power spectrum measurement.
To summarize the jackknife tests across the SSINS zscore axis, we show another collection of cumulative distribution functions in Figure 13.This is the same data as in Figure 11, but it has now been collated according to the z-score of the events that were found by SSINS.We find that brighter events in the SSINS do not correlate with stronger outliers in the EoR window.It seems that, within the absolved sets, observations with faint to moderately bright events in the SSINS tend to produce the most strongly outlying power spectra.Interestingly, the LST-matched pure observations also seem to at least weakly display this trend.Since there are generally more observations with fainter events, the power spectra corresponding to the fainter z-score bins usually have more integration time.One might expect this behavior in the presence of a systematic that averages coherently at moderate integration depths.While the bottom panel of Figure 13 shows that window power is generally more outlying in the absolved power spectra, we speculate all of the observations are affected by a residual systematic that averages coherently in sub-hour length integrations.In a test discussed later in this section, we find that the window contamination displayed in the 2d power spectra so far is enhanced when RFI flags are turned off.Given the similarity of contamination between the two rows of Figure 12, we suspect that this coherently integrating systematic is indeed residual RFI that is generally worse in the absolved observations but still present in the pure observations.

RFI Shape Test
From the theoretical discussion in Wilensky et al. (2020), we expect that different RFI types might have different power spectrum contamination shapes.In Figure 14, we show a jackknife test over RFI shape as determined by SSINS.In general, we find no obvious power spectrum difference among the shapes, contrary to expectation.For instance, we expect narrowband contamination to be approximately constant as a function of k ∥ .Even in more dramatic cases with substantially more noticeable residual interference than is shown in Figure 14, the narrowband RFI does not produce a noticeable constant pattern.If we compare to Wilensky et al. (2020), we see that narrowband RFI power spectra at a given flux density is generally lower compared to their DTV counterparts, but also that it is constant in k ∥ .Since the noise levels are approximately constant in k ∥ , we expect that if this were only due to narrowband contamination, then the entire EoR window would show clear contamination.
If there is residual RFI in this power spectrum, it is not narrow in frequency.For this to be true, it must be something broader that SSINS pathologically misses altogether.It could be a relatively stationary source of DTV interference, such as a distant reflector that is coincidentally moving towards the array or nearly so.This reflector may be simultaneously reflecting multiple RFI signals, some of which are narrow in frequency and identified by SSINS.Alternatively, while the RFI appears narrow in frequency in the SSINS, it is possible that the signal has structure outside of the frequencies where most of its power is concentrated.This would lead to non-constant power spectrum contamination, but would require frequency sidebands bright enough A power spectrum jackknife test for zenith-pointed observations, where spectra are separated by RFI type.These all have RFI events with z-scores between 10 and 100.We find that the region of contamination seems unrelated to the type of RFI.This may be due to multiple RFI types cohabiting these observations.
to produce such structure, yet faint enough that SSINS cannot flag it as a broad shape.All of our narrowband subintegrations contained less than 16 minutes of data, and all subintegrations that showed window contamination appeared similarly to the narrowband panel of Figure 14, but had just 2-4 minutes of contributing data.This was the only type of RFI that showed any obvious smooth window contamination at such a shallow integration depth.Since so little narrowband interference is observed, it is hard to speculate confidently about its effect on the power spectrum measurements.However, given that this particular shape does seem associated with absolved observations that have narrowband RFI, we are inclined to think this is some sort of RFI effect.Since the broadband events in the jackknife test shown in 14 showed no obvious excess power, we show a power spectrum for a group of absolved observations in the -1 pointing classified as containing broadband interference flagged by SSINS, along with the LST-matched observations in Figure 15.These z-scores were between 10 and 100, indicating that the interference was moderately bright in the SSINS compared to the thermal background.The excess window power is similar to that for other shapes shown in Figure 14, but much fainter.We show this example to point out that we do see some excess power associated with broadband events.However due to the ambiguous nature of broadband events, as well as the distinct possibility that other RFI may be present in addition to the broadband events caught by SSINS, it is difficult to know if broadband interference in general poses as significant a problem as other types Pure Absolved of RFI.We discuss these two points in more detail in what follows.
It is nontrivial to theorize about the expected contamination of broadband interference due to the fact that it could come from several different types of sources.For instance, we have evidence that it could be the sidebands of bright ORBCOMM interference, 40-70 MHz away from the central frequency.The ORBCOMM signal would need to be understood in extreme detail since these sidebands are so far from the allocation.If it were approximately flat-spectrum over the observing band, then it would give no excess power in the window, similar to the foregrounds.Therefore, supposing these signals are indeed responsible for the excess power, there must be some nontrivial structure in the sidebands of these signals.A related possibility is that the sheer brightness of an RFI source has caused clipping or other nonlinearities in the receiver chain.Another broadband emitter is lightning, which produces emissions over a range of radiofrequency scales (Vine 1987).Some theoretical studies suggest that certain types of lightning events can produce relatively flat emissions over the bands considered in this work (Luque 2017;Shi et al. 2019), although Luque (2017) shows some nontrivial structure in this range.Observations of lightning emission at VHF radio frequencies are numerous (Hare et al. 2018(Hare et al. , 2020;;Pu et al. 2021;Sterpka et al. 2021;Scholten et al. 2022).In summary, since the particular signal structure behind the broadband events in the MWA SSINS is as of yet undetermined, we find it difficult to hypothesize about the expected shape of power spectrum contamination.
The similarity in contamination between the different shapes could also be due to other factors.First, as noted in §3.2, SSINS is more optimized for flagging than classification, and the occasional misclassification does occur.Second, it is possible that a given reflector is reflecting multiple different transmissions simultaneously or with a short gap between them.This means that any given observation classified as containing some type of interference does not exclude it from containing another type of interference.In other words, there is some amount of overlap between the subsets shown in Figure 14, and this may be responsible for the similarity in contamination shape.

Flags On/Off Test
To more directly investigate the effects of applying the SSINS flags, we examine an integration in which residual RFI was present in absolved observations by forming the power spectrum using the observations in their repentant form and then subtracting the absolved power spectrum.Note that we are using the same observations in each case, and only choosing whether or not to apply the flags.We show the result in Figure 16.While the change in the power spectrum is slight, when we plot the difference we see a clear signature in the window matching the shape of excess power that we have seen so far.The flags produce a 35% difference in data volume, meaning that the noise levels are appreciably different.However, the power difference varies smoothly and is positive for a significant number of adjacent wave modes, indicating the effect is not due strictly to a difference in noise levels.We also see that the flags removed power in the region of the wedge between the dashed and solid lines.The corresponding LST-match of this observation set is shown on the far left of the figure to allow for a comparison in the style of the previous jackknife tests.
The dashed and solid lines in the wedge represent the extent in k ∥ to which sources at the edge of the primary beam and at the horizon throw power due to the chromatic point-spread-function.The region between these two lines corresponds to the sidelobes.In this example, we chose a handful of observations from the +1 pointing that were identified as containing DTV interference.We expect that a DTV source has a smooth component that appears in the power spectrum wedge, and a spectrally sharp component that throws power into the higher k ∥ modes.This power spectrum suggests that the RFI sources are consistently located in our sidelobes, corroborating the imaging experiments in Wilensky et al. (2019).Figure 16.A power spectrum test in which two separate power spectra were made for the same observations, once with flags applied and once without flags applied.In the difference plot, blue bins indicate power is higher without flags.We find that applying the SSINS flags removes power from the window in the exact shape as seen in other Absolved power spectra.Additionally, power is removed in the region of the wedge corresponding to the sidelobes, which is where we observe DTV sources in our images.The clean LST-matched observations are shown on the far left as a reference.
methods, and that this RFI will bias power spectrum measurements for a wide swath of spherical modes.The severity of this bias in a deep integration compared to its appearance in these jackknife tests depends on the integration properties of the RFI.We expect RFI will not average coherently over many hours of data, but we cannot reliably speculate about the strength of this bias since we do not have a model of unexcisable RFI.However, since we have identified subsets of the data that contain ultra-faint RFI beyond what is identified during pre-processing, we can at least compute the cleanest power spectrum upper limit possible with this data set and examine whether our efforts are likely to have made a difference.

FINAL POWER SPECTRUM AND UPPER LIMITS
Before extracting an upper limit on the 21-cm EoR power spectrum signal, we perform a final reduction based on the results of the jackknife tests.These final eliminations are based on a qualitative assessment of the EoR window in the power spectrum of various integrated subsets.Since our aim is to make a power spectrum measurement, we perform these cuts conservatively so as to avoid selection bias as much as possible.

Final Cuts
First, we remove all absolved observations.This is motivated by the fact that a substantial number of power spectra involving only absolved observations expressed excess window power compared to their pure, LST-matched counterparts.Furthermore, the shape of this excess power matches the shape of the power spectrum difference in the jackknife test shown in Figure 16.We term this shape as "the RFI footprint."Finally, the idea that residual faint RFI might temporally neighbour SSINS-identified samples is motivated both by physical considerations and the occupancy study in 3.2.We did not remove any pure observations, even if they were immediately adjacent to absolved ones.This leaves 591 observations, which is about 18.4 hours of data.
Next, we note that some integrations involving only pure observations also display this excess window power, although this is substantially less common than in their absolved counterparts.Hoping to remove as much RFI as possible before forming an upper limit, we perform cuts on these observations based on manual inspection of window power.From the integration subsets that were already constructed for the previous stages of the jackknife test, we remove all integrations with fewer than 20 observations that have obvious excess window power.We then reintegrate the remaining observations, separating them roughly by pointing and day, such that there are between 12 and 20 observations (24-40 minutes) per integration.This integration depth is based on balancing the interference-to-noise ratio in the EoR window with the desire to be as fine-grained as possible in removing observations with this metric.In other words, As with all other 2d power spectra shown in this work, we show the "dirty" power spectrum i.e. without model subtraction.The Wall of Shame set shows a clear RFI footprint in the spectrum made from the E-W dipoles, but appears noise-limited in the one from the N-S dipoles.On the other hand, the Limit Set power spectra appear similarly regardless of which polarization is used.Specifically both the E-W and N-S power spectra appear systematically dominated below the first coarse band harmonic in the limit set, in contrast to the Wall of Shame set and the majority of shorter power spectrum integrations we inspected.A key difference between this contamination and the RFI footprint is that it does not extend to large k ⊥ .The coherent average of the two performs to expectation in that adding the Wall of Shame set does not strongly affect the appearance of the power spectrum from the N-S data, and imprints the characteristic RFI footprint in the spectrum made from the E-W data.
we want to use the smallest integration subset possible that allows us to identify ultra-faint RFI.From these new integration subsets, we again remove any whose resulting power spectra show obvious excess window contamination.Since we are wary of selection bias, we attempt to be as conservative as possible during this step.We show the power spectra of all integrations that are removed during this stage in Appendix A. This removes a further 119 observations, which we designate as the "Wall of Shame" set, and leaves 472 observations (14.7 hours), which we call the "Limit Set."We also note that no observations from the +2 (most Westerly) pointing survive this test.

Deep Cylindrica Power Spectra
We show deep 2d power spectra from each of the three sets (wall of shame, limit set, and coherent average of the two) calculated over the entire observing band in Figure 17.The power spectrum estimated from the East-West dipole data for the Wall of Shame set displays the characteristic RFI footprint.On the other hand, the power spectrum made using only the North-South dipole data appears noise-limited in the window.Thus, we see that even when we deeply integrate observations known to have RFI contamination, the footprint appears polarized.This can be understood by considering that the brightest DTV transmitters in Western Australia are nearly due South of the array.Since the North-South dipoles are not sensitive to the Southern horizon, we generally expect the RFI flux to be stronger in the East-West dipoles than the North-South ones.
In contrast, the Limit Set displays power spectra that look very similar between the different dipoles.This is remarkable in the first instance since previous power spectrum upper limits with the MWA are preferentially deeper from the North-South dipoles compared to the East-West ones.It is possible that this long-standing preference is related to the difference in RFI content between dipoles.We could investigate this claim more rigorously by extending these RFI mitigation techniques to previously analyzed data that express the preference.Both polarizations appear to have a systematic source of contamination in the lower-left corner of the EoR window.What distinguishes this systematic from the characteristic RFI footprint is that it does not extend to higher k ⊥ modes.Furthermore, it is relatively equal in power across the polarizations, which would be peculiar for RFI given the Wall of Shame power spectra.We suggest that this is possibly a different systematic effect that is unaccounted for in this analysis.Note that when we coherently average the Limit Set and Wall of Shame, we observe the contamination extend to higher k ⊥ in the power spectrum from E-W data, but not in the one from N-S data.
As a counterargument, we remark that there are many other DTV transmission sites in Western Australia than the aforementioned ones South of the array, distributed in varying directions.Most of these transmission sites are significantly less powerful than the one in Perth, some by multiple orders of magnitude.Since this is a complex scattering problem, it is difficult to deduce exactly how bright each transmitter should appear.However, some basic scaling arguments suggest that the RFI flux should still be dominated by the Southern transmitters.Therefore, it is possible that this could be extremely faint RFI from the dimmer transmitters, while the flux from the brighter transmitters was removed by our repeated selections.However, for this to be true, there would need to be an explanation for why the longer baselines (higher k ⊥ modes) display less power from these hypothetical RFI events.None of the transmitters are close enough for regular direct reception, and therefore the nature of the scatter would have to be different such that the long baselines do not respond as strongly to the RFI.This is physically unusual, since the scatterers are likely to be the same class of object regardless of where the transmitter is.One possibility is that the closer transmitters tend to scatter closer to the array.If the scatterer is sufficiently close, it might appear less point-like and thus the longer baselines might not respond as strongly to the interference.However, this would be an unusual coincidence and is therefore less likely to explain a longitudinal trend in a large data set.
Yet another possibility is locally generated RFI within the array e.g. from electronics.This type of RFI, depending on its source, might be seen longitudinally in a large data set and may appear roughly equally in the different dipole arms depending on its location.Furthermore, if it is close enough, we might expect it to preferentially affect shorter baselines compared to longer ones, and we might also expect it to preferentially affect some pointings compared to others.However, locally generated RFI is often bright and therefore eliminated early in the data analysis.This would have to be a particularly pernicious form of ultra-faint locally generated RFI.We cannot rule out this hypothesis from this analysis alone.Examining the near-field RFI environment is an important topic of ongoing investigation within the MWA collaboration, however we consider it outside the scope of this work.

Spherical Power Spectrum Upper Limits
We report the final limits using the 472-observation set.We also integrate the 119 observation set separately to examine how the residual RFI integrates in the power spectrum.Finally, we make an integration with the entire set of 591 observations to see how this final cut affects the power spectrum.We draw limits for three different redshifts: 6.5, 6.8, and 7.1.These correspond to three different bands, each of which is exactly half the 30.72 MHz bandwidth of the MWA EoR highband.They are centered 1/4, 1/2, and 3/4 the way through the band.Since we use a Blackman-Harris taper, the effective bandwidth of each measurement is 7.68 MHz i.e. 1/4 of the band (Harris 1978).This makes the measurements roughly independent.9We display the spherical power spectra along with the 2-σ upper limits in Figure 18.Our lowest upper limit that we report is ∆ 2 ≤ 1.61 • 10 4 mK 2 at k = 0.258 h Mpc −1 using the East-West polarized dipoles at a redshift of 7.1.

Discussion
These limits are calculated assuming a positive-defnite signal-like component and thermal noise, as in Appendix A of Li et al. (2019).No distinction is made between systematic contributions to the signal-like term and the cosmological signal of interest.With information about the strength of various contaminating effects, one could produce an upper limit that takes these into account.For example, if we estimated an RFI excision algorithm to be effective down to some nominal flux level and had a statistical model of the remaining RFI sources e.g.Offringa et al. (2013), we could incorporate our uncertainty about this systematic into the upper limit calculation.In general, this is frustrated by the fact that we must extrapolate about RFI we did not detect based on RFI we detected.This is necessarily dependent on the flag- Figure 18.Spherical power spectra for the Limit Set, along with 2σ upper limits, a fiducial theory model for the EoR signal whose implementation is described in Barry et al. (2019b).In short, the astrophysical constraints from Park et al. (2019) are used in combination with 21cmmc (Greig & Mesinger 2015) to generate samples from a probability distribution of 21-cm power spectra.The fiducial model and theoretical confidence interval are then calculated from these samples (brown solid and dashed lines, respectively).We notice that the range of wave modes between the coarse band harmonics are almost all noise-dominated.This could be a direct consequence of the rigorous RFI cuts we have employed.Compare to limits at the same redshifts in Li et al. (2019).
ging strategy.For SSINS this may be particularly complicated, since flags are generated on time-differenced data and therefore undetected RFI may belong to an entirely different class not necessarily distinguished by total brightness.One way to build up this model is to complement an existing RFI excision strategy with the following imaging strategy.While some RFI might be stationary within an observation, most RFI will not be stable in celestial coordinates from observation to observation, particularly if those observations are significantly lagged with respect to civil time.Therefore, one could difference images of appropriate observations and look for strongly outlying pixels.This could be done at varying integration depths for relatively cheap computational cost within the current pipeline.In other words, this could be used to find full-array RFI within an observation, and then again at a deeper integration depth to find ultra-faint RFI.This would simultaneously allow for RFI identification and localization of the RFI sources, which will be important for developing models of the RFI statistics of the observatory.These models can ultimately be used for determining the depth to which RFI excision needs to be performed.In fact, a similar strategy for RFI identification was implemented in Prabu et al. (2020) and further tested with the Engineering Development Array in Tingay et al. (2020), though only at a few seconds of integration depth rather than combining multiple observations.Thus, there is some promise that such a method could work, although clearly some experimentation would be required to determine exactly how to implement it in deeper integrations and between different observations.
In addition to time differencing strategies, we can imagine incorporating significantly more specific prior information regarding the RFI signals than is currently used by common flaggers.The details about reflector trajectories, details about the signal properties that are known a priori by the transmitting parties, and an atmospheric model could be combined to produce an accurate physical model of a received RFI signal.For example, Prabu et al. (2022) demonstrates that RFI detection can be greatly enhanced by two different methods that incorporate more specific prior information about the RFI.The first method, called "shift-stacking," coherently averages in image space over a region expected to be occupied by a known reflector.The second method implements a post-correlation refocusing of the MWA to a distance at which we expect a reflector.In brief, at these distances, there is still non-negligible curvature in the wave fronts of the reflected signal, and the visibility phase can be adjusted to account for this.One could imagine checking for catalogs of reflectors that ought to be present in the data using methods like these, or conversely, using methods like these to help build such catalogs.
A clear challenge with a physical modeling approach is the construction of a complete model, particularly when different propagation phenomena may produce spectral distortions in the RFI signal.In general, programming a flagger to identify a particular spectral distortion can implicitly make it less sensitive to other types of spectral distortion unless each class of distortion is included in the model.10However, assuming such a model can be made, this will not only enhance flagging performance but also allow us to more accurately predict statistical properties about RFI that was left unflagged.This ultimately will allow for more rigorous uncertainty estimates on power spectrum measurements since there will be an explicit model for unmitigated systematics.
We observe that the range of wave modes between the first and second coarse band harmonics is noise-limited in most of the presented spherical power spectra.This suggests that adding more high-quality data would probably improve the limit.Interestingly, if we evaluate the upper limit using the coherent average of the Wall of Shame and Limit Set, we see that it degrades the limit systematically between the first and second coarse band harmonics (which was not the region of the EoR window used for the final cut) for redshifts 6.5 and 6.8 in the East-West polarization.The effect is slight, but significantly consistent across the stated bins.The lowest limit of any spherical mode that we observed, which we do not report, occurs in this coherent average below the first coarse band harmonic.We do not report that limit since the coherent average was known to be RFI-contaminated before we constructed it.

CONCLUSION
In this work, we analyzed a season of data from the MWA phase I with a particular focus on the presence and potential effect of RFI in 21-cm power spectrum measurements.The goal was to examine to what extent ultra-faint RFI exists within deep power spectrum integrations despite the deployment of proven mitigation methods, as well as to see whether this ultra-faint RFI contamination was likely to deteriorate constraints on the 21-cm cosmic reionization signal.
This RFI analysis, in combination with other quality metrics, ultimately resulted in the decision to cut about 85% of the data from this season to produce the best upper limit.Even in the absence of an ionospheric quality cut, we still would have cut at least 54% of the data based on RFI content alone.This level of data loss is catastrophic from a scientific standpoint.Starting from just 2.5 hours per night that we select due to contamination from the galaxy at other sidereal times, it would take dozens of seasons of data of the same quality, instrumental sensitivity, and cut decisions to reach a confident detection.While a more sensitive instrument such as the SKA might need fewer high-quality observing hours to reach the same detection confidence, we expect that the global RFI environment is worsening with the addition of extremely large satellite constellations in low Earth orbit, and so data cuts of this style may be even more severe for future experiments.
From our RFI analysis, we identified a host of RFI that evaded post-correlation detection by observing a residual RFI footprint in measured cylindrical power spectra that is enhanced when RFI flags are turned off.Residual RFI within observations already known to contain RFI is motivated by the physical properties of RFI reception, which we know to be largely due to transient scatterers.We found the timelike properties of the RFI flags to be consistent with this physical picture.We also found that the RFI footprint was most prevalent the further West the array was pointed.This was true to such an extent that after all data cuts based on independent pre-power-spectrum metrics, all remaining subintegrations in the most Westerly pointed data showed the RFI footprint.The footprint also appeared preferentially in power spectra made from only the East-West aligned dipoles as compared to those made from the North-South aligned dipoles.These two observations are consistent with the fact that the brightest DTV transmitters are nearly due South of the array, where the North-South dipoles are less sensitive, and offset slightly West.
We performed a deep integration, cutting all observations identified as contaminated by any amount as well as any subintegrations that clearly presented the RFI footprint.The resulting power spectra appeared consistent between the polarizations, and did not have an obvious RFI footprint, although some systematic contamination is visible in the lower-left corner of the EoR window.This similarity between the polarizations is a new feature of this limit compared to previous ones made with the MWA, and may be a result of the enhanced RFI mitigation.When we only use the subintegrations known to possess an RFI footprint, we obtain power spectra that are systematically limited in the East-West polarized data, but not in the North-South polarized data, and the presence of a footprint is more obvious.Finally, when we coherently average all the data together, we see systematic dominance in both polarizations, but the systematic dominance only extends to larger perpendicular wave modes in the East-West polarized data.This suggests that ultra-faint RFI is unlikely to disappear through dilution in a coherent average, and by corollary, that the systematic domination in the EoR window of the limit set may in fact be something other than RFI.However, since (a) the behavior of RFI in coherent averaging schemes is poorly understood, (b) the RFI budget for EoR detection is strict, and (c) we can willfully construct relatively deep integrations with noticeable RFI footprints, we ultimately suggest solving the problem of modeling ultra-faint RFI in deep power spectrum integrations.
We use the subset of data least likely to be contaminated in order to set upper limits on the cosmological 21-cm signal.In total, about 85% of the initial data selection was not used for the upper limit.Our deepest upper limit was ∆ 2 ≤ 1.61•10 4 mK 2 at k = 0.258 h Mpc −1 and z = 7.1.While this is not the lowest upper limit set with MWA Phase I, we remark that these limits are noise dominated in the majority of modes higher than the first coarse band harmonic, not including the harmonics themselves.Since RFI produces significant power to extremely high line-of-sight wave modes, we suggest that this prominence of noise-dominated bins at high k (in excess of previous limits at similar depths) may be a consequence of our extremely thorough RFI cuts.We note that this required cutting about 2/3 of the data spared by the ionospheric cut.Since the RFI environment is unlikely to improve in future seasons, obtaining enough high-quality, RFI-free data for a sensitive EoR measurement appears a daunting task.We therefore highlight not only a need to improve RFI mitigation, but to do so in a data-sparing way.

Figure 2 .
Figure 2. Scatter plot of observations included in the analysis in only the central five pointings (left), and after an ionospheric quality cut (right).Vertical axis shows the date of the observations (blue dots), and horizontal axis is the Local Sidereal Time.

Figure 3 .
Figure 3. Left: Raw SSINS of a 112-second observation at various stages of flagging.Right: Mean-subtracted SSINS.Top: Data with initial flagging mask.Middle: Additional flagging mask after match filtering.Bottom: Flags after frequency extension.

Figure 5 .
Figure 5.Total Occupancy scatter plot after extending flags across frequency, which roughly doubles the total occupancy in the data set.Each marker represents a 2-minute observation, where its horizontal position is its starting sidereal time and its vertical position is its beginning civil time.Clusters of flags are apparent.
Figure 6.Top: Histogram of flag interarrival times over all nights for the broadcast SSINS flags.The "Poisson Mixture Comparison" (green) is a hyperexponential distribution where each component has consistent mean interarrival time as each night of data (and is weighted according to the amount of data in the corresponding night).The Markov chain histogram is built from Markov chain realizations of each night with consistent transition probabilities as the night of RFI flags they are built from.Interarrival times from the Markov chain realizations are roughly consistent with those from the RFI flag time series, whereas the Poisson mixture model tends to underpredict short and long interarrival times, thereby failing to predict clumps of flags.Bottom: Cumulative density for what is shown in the top plot.The data and Markov chain cumulative densities are inconsistent with a Poisson model.

Figure 7 .
Figure 7. Autocorrelation of RFI flag time series for a particular night (blue) along with the autocorrelation function for a Markov chain realization (transparent orange).Bottom Panel: Residual of the top panel.The Markov chain realization is clearly missing the peaks at lags between 0 and 50 minutes, meaning that the Markov model does not fully capture the timelike properties of the RFI events, and so a more detailed model is required in future efforts.

Figure 8 .
Figure8.Histograms of SSINS amplitudes and z-scores for the entire season, separated by flagged and unflagged data.These z-scores are calculated per fine 40 kHz channel using the time-average of 112s of data as a reference.The horizontal axis is a hybrid linear-logarithmic scale, where the boundary between scales is demarcated by the black vertical lines.The z-scores of the unflagged samples (bottom orange) are highly Gaussian, as expected from the assumptions about the thermal noise.The flagged samples have a highly non-Gaussian z-score distribution.Their brightness distribution has significant overlap with the unflagged brightness distribution, and cannot be separated solely by drawing an amplitude cut.

Figure 9 .
Figure9.Same as Figure8, but with brightnesses and z-scores calculated by doing the SSINS sub-band sum that is employed in the match filter over the TV7 frequencies.Interestingly, the unflagged and flagged brightness samples appear bimodal.Examination of this plot on a per-pointing basis shows that the mode above 40 Jy belongs to the eastern-most pointing in the data, while the other mode is a combination of all other pointings.

Figure 10 .
Figure 10.Top: East-West polarized power spectra for pure observations over the five pointings present in the data.The array points more Westerly towards the right of the figure.The number of observations in each integration is annotated at the bottom of the figure.The depth of integration ranges by about a factor of 2.5.All of these power spectra appear noise-dominated in the EoR window (region above the solid black line) in that there is a roughly even speckling of positive and negative data.Bottom: Corresponding absolved power spectra, where any RFI type is included so long as the observation has RFI z-scores that lie between 10 and 100.We see a signal-like component in the EoR window.Rather than an even speckling of positive and negative data, we see consistent, smoothly varying positive power up to and occasionally past the first coarse band harmonic.This suggests residual RFI that was uncaught by SSINS.This contamination is generally worse in the later pointings that point more West.There is no clear correlation between integration depth and contamination levels.

Figure 11 .Figure 12 .
Figure11.Top, middle: Cumulative density functions for the absolute value of the inverse-noise weighted spherical power neglecting the foreground wedge and coarse band harmonics.The top row, which shows measurements from absolved observations, appears ordered by pointing from East to West for most of the domain.The pure observations are not as clearly ordered or distinct from one another.Neither data set looks consistent with pure noise, however.Bottom: Cumulative density function for the difference between corresponding inverse-noise weighted spherical power of the top two rows (absolved minus pure).The preference of the distributions to lie towards the right of the dotted curve, which reflects a null hypothesis, shows that absolved observations generally produce power spectra that are more positively outlying than their pure counterparts.
Figure14.A power spectrum jackknife test for zenith-pointed observations, where spectra are separated by RFI type.These all have RFI events with z-scores between 10 and 100.We find that the region of contamination seems unrelated to the type of RFI.This may be due to multiple RFI types cohabiting these observations.

Figure 15 .
Figure 15.Power spectrum jackknife test for observations classified as containing broadband interference.Excess window power takes similar shape as RFI events in Figure 14.
4.1.5.DiscussionThese jackknife tests show that there is very likely a class of ultra-faint RFI uncaught by our current excision

Figure 17 .
Figure17.Deep 2d power spectra from three different sets of data, all of which were deemed uncontaminated by SSINS.As with all other 2d power spectra shown in this work, we show the "dirty" power spectrum i.e. without model subtraction.The Wall of Shame set shows a clear RFI footprint in the spectrum made from the E-W dipoles, but appears noise-limited in the one from the N-S dipoles.On the other hand, the Limit Set power spectra appear similarly regardless of which polarization is used.Specifically both the E-W and N-S power spectra appear systematically dominated below the first coarse band harmonic in the limit set, in contrast to the Wall of Shame set and the majority of shorter power spectrum integrations we inspected.A key difference between this contamination and the RFI footprint is that it does not extend to large k ⊥ .The coherent average of the two performs to expectation in that adding the Wall of Shame set does not strongly affect the appearance of the power spectrum from the N-S data, and imprints the characteristic RFI footprint in the spectrum made from the E-W data.

Table 1 .
Amount of data remaining after each selection listed in the left-hand column.