This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy. Close this notification
NOTICE: There are currently some performance issues with IOPscience, which may also cause error messages to appear. Apologies for the inconvenience..

The American Astronomical Society (AAS), established in 1899 and based in Washington, DC, is the major organization of professional astronomers in North America. Its membership of about 7,000 individuals also includes physicists, mathematicians, geologists, engineers, and others whose research and educational interests lie within the broad spectrum of subjects comprising contemporary astronomy. The mission of the AAS is to enhance and share humanity's scientific understanding of the universe.

https://aas.org/

The Institute of Physics (IOP) is a leading scientific society promoting physics and bringing physicists together for the benefit of all. It has a worldwide membership of around 50 000 comprising physicists from all sectors, as well as those with an interest in physics. It works to advance physics research, application and education; and engages with policy makers and the public to develop awareness and understanding of physics. Its publishing company, IOP Publishing, is a world leader in professional scientific communications.

https://www.iop.org

The following article is Open access

# Characterizing and Mitigating Intraday Variability: Reconstructing Source Structure in Accreting Black Holes with mm-VLBI

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and

, , Focus on First Sgr A* Results from the Event Horizon Telescope Citation Avery E. Broderick et al 2022 ApJL 930 L21

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

2041-8205/930/2/L21

## Abstract

The extraordinary physical resolution afforded by the Event Horizon Telescope has opened a window onto the astrophysical phenomena unfolding on horizon scales in two known black holes, M87* and Sgr A*. However, with this leap in resolution has come a new set of practical complications. Sgr A* exhibits intraday variability that violates the assumptions underlying Earth aperture synthesis, limiting traditional image reconstruction methods to short timescales and data sets with very sparse (u, v) coverage. We present a new set of tools to detect and mitigate this variability. We develop a data-driven, model-agnostic procedure to detect and characterize the spatial structure of intraday variability. This method is calibrated against a large set of mock data sets, producing an empirical estimator of the spatial power spectrum of the brightness fluctuations. We present a novel Bayesian noise modeling algorithm that simultaneously reconstructs an average image and statistical measure of the fluctuations about it using a parameterized form for the excess variance in the complex visibilities not otherwise explained by the statistical errors. These methods are validated using a variety of simulated data, including general relativistic magnetohydrodynamic simulations appropriate for Sgr A* and M87*. We find that the reconstructed source structure and variability are robust to changes in the underlying image model. We apply these methods to the 2017 EHT observations of M87*, finding evidence for variability across the EHT observing campaign. The variability mitigation strategies presented are widely applicable to very long baseline interferometry observations of variable sources generally, for which they provide a data-informed averaging procedure and natural characterization of inter-epoch image consistency.

Export citation and abstract

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

## 1. Introduction

The Event Horizon Telescope (EHT) is a global very long baseline interferometry (VLBI) array observing at a wavelength of ∼1 mm, and it has been assembled with the primary goal of resolving the horizon-scale emission structures of both the M87* (Event Horizon Telescope Collaboration et al. 2019a, 2019b, 2019c, 2019d, 2019e, 2019f; hereafter M87* Papers I, II, III, IV, V, and VI, respectively) and Sgr A* (Event Horizon Telescope Collaboration et al. 2021c, 2022a, 2022b, 2022d, 2022e, 2022f; hereafter Papers I-VI, respectively) supermassive black holes (SMBHs). To date, all sources observed with the EHT during its 2017 campaign have been spatially resolved (Kim et al. 2020; Janssen et al. 2021), and many have exhibited structural variability across days. The most extreme of these variable sources is Sgr A*, which exhibits substantial variability across the electromagnetic spectrum (see, e.g., Boyce et al. 2019; Witzel et al. 2021; Wielgus et al. 2022, and references therein) and on horizon scales (see, e.g., Fish et al. 2011; Gravity Collaboration et al. 2018). The mass of Sgr A* is M ≈ 4 × 106 M (Do et al. 2019; Gravity Collaboration et al. 2019, 2020), corresponding to a gravitational timescale of tg = GM/c3 ≈ 20 s, which is much shorter than the ∼10 hr duration of a typical EHT observing track.

Sources exhibiting substantial structural variability on timescales that are short compared to a VLBI observing track complicate the image reconstruction process. Because VLBI arrays have only sparse instantaneous coverage of the Fourier plane, standard practice is to use Earth-rotation aperture synthesis to improve the coverage by observing the source for several hours (Thompson et al. 2017); as the Earth rotates, so do the baselines in the array, which therefore populate new regions of the Fourier plane. However, image reconstruction using data acquired via this technique assumes that the source remains static for the duration of the observation, such that the rotated baselines continue to sample the same source structure. For rapidly variable sources like Sgr A*, this assumption is broken.

The potential presence of short-timescale variability raises two distinct challenges: characterization and mitigation. The latter necessarily depends on the former; an appropriate mitigation scheme can only be selected after the properties of the variability have been identified. Methods to detect variability intrinsic to the source, and not imposed by subsequent atmospheric or station-induced corruptions, may be constructed using closure phases (Roelofs et al. 2017; Satapathy et al. 2022). However, the interpretation of such measures is complicated by the nontrivial relationship between closure phases and source structure.

We introduce an additional purely empirical method for detecting and, more importantly, characterizing structural variability based on visibility amplitudes. While this novel method must contend with systematic calibration uncertainties, the results may be immediately interpreted as the power spectrum of image fluctuations. This power spectrum naturally provides a prior on subsequent mitigation schemes.

There are at least three conceptually distinct approaches that have been applied (or proposed to be applied) for reconstructing variable sources from VLBI data:

• 1.
The first approach—which we will refer to as the "snapshot reconstruction" approach—involves simply dividing the data into time segments that are short compared to the source variability timescale and then using standard reconstruction techniques. For sources whose variability timescales are longer than a few hours, the snapshot reconstruction approach is no different from multiepoch imaging, which has seen regular use in the VLBI community to make movies of astronomical sources (e.g., Matthews et al. 2010; Lister et al. 2018; Walker et al. 2018; Kim et al. 2020). 150 However, for more rapidly variable sources the required data segmentation may lead to inadequate Fourier coverage that compromises image fidelity (e.g., Fomalont et al. 2001) or necessitates the use of restricted source models (e.g., Miller-Jones et al. 2019).
• 2.
The second approach involves directly reconstructing a dynamic source model rather than a static one, and we will refer to this approach as "dynamical reconstruction." We draw a distinction between dynamical reconstruction and snapshot reconstruction in that the former attempts to recover the time-continuous source structure, rather than restricting itself to static "frames" at the points in time where snapshot data are available. Example dynamical reconstruction algorithms include dynamical imaging (Johnson et al. 2017), StarWarps (Bouman et al. 2018), and resolve (Arras et al. 2019, 2022), all of which impose some manner of temporal coherence or regularization on the reconstructed source structure. Such temporal regularization is necessary both to interpolate between times when data are present and to enforce structural continuity in time. In principle, this capability for data taken at one point in time to inform the image structure at another point in time should mean that dynamical reconstruction algorithms have less stringent requirements on dense instantaneous Fourier coverage than snapshot reconstruction algorithms do. But in practice, current dynamical reconstruction algorithms struggle in similar regimes of data sparsity to snapshot reconstructions (see Section 9 of Paper III).
• 3.
The third approach is "visibility stacking," whereby the source is observed over multiple epochs and the complex visibilities at each location in the Fourier plane are averaged in time. By the linearity of the Fourier transform, the image reconstructed from time-averaged visibilities should be the same as the average of images reconstructed from single-epoch visibilities, meaning that visibility stacking provides a method for recovering the average image structure from time-variable data. Lu et al. (2016) applied a visibility stacking technique, including light-curve normalization and temporal smoothing, to simulated EHT observations of Sgr A*, demonstrating that multiepoch averaging of the complex visibilities can substantially improve the reconstructed image fidelity compared to single-epoch imaging. However, such averaging requires access to calibrated visibilities, and in practice the visibility phases of EHT data are irretrievably corrupted by atmospheric effects (M87* Paper III). 151

To this list we add a fourth approach: statistical reconstruction of the variability alongside static image reconstruction of the average source structure that may be deployed in Bayesian image reconstruction methods like those described in Broderick et al. (2020b) and Pesce (2021). Any variable structure can be decomposed, without loss of generality, into a static (average) component and a dynamic (variable) component. If the duration of the observation is much longer than the variability timescale, then the variable component can be usefully conceptualized as defining a distribution from which the measured visibilities at any instant are drawn. If this distribution is parameterized and an appropriate likelihood function constructed, then the data can be used to simultaneously constrain both the parameters describing the variability and the average source structure. For EHT observations of Sgr A*, in which the observing window (∼10 hr) is much longer than the expected variability timescale (∼minutes), this statistical approach to source reconstruction in the face of variability is well motivated.

This paper is organized as follows: In Section 2 we provide a theoretical expectation for the variability behavior of EHT sources. In Section 3 we outline our proposed strategies for measuring and mitigating this variability during reconstruction. We describe a data-driven, model-agnostic procedure for characterizing and quantifying the amount of variability contained in the data in Section 4. In Section 5 we present an algorithm for simultaneously reconstructing both the average source structure and its parameterized variability. Both methods are demonstrated by application to the 2017 EHT M87* data in Section 6. Finally, we summarize and conclude in Section 7.

## 2. Expected Source Variability of EHT Targets

Structural variability in EHT observations of horizon-scale sources (i.e., Sgr A* and M87*) is anticipated to result from the turbulent physical processes associated with accretion and jet launching near the central black hole. Modeling these turbulent processes is currently performed via large-scale numerical computations, typically within the general relativistic magnetohydrodynamic (GRMHD) paradigm (see, e.g., M87* Paper V; Paper V, and references therein). These simulations may be post-processed through ray-tracing and radiative transfer computations to generate simulated images of the near-horizon emission regions (see, e.g., Gold et al. 2020, and references therein). Through this procedure, physically self-consistent and observationally credible predictions for the horizon-scale image structure and variability may be generated, from which the expected variability properties may be extracted. The resultant variability properties across the library of GRMHD simulations in Paper V are discussed in detail in Georgiev et al. (2022), and we only summarize them here.

We employ the library of simulation images constructed for EHT observations of Sgr A* to motivate the form of our variability mitigation strategy (Paper V). Each simulation generates a sequence of such images, or "movies." These movies can be decomposed into a mean image and movie of random brightness fluctuations, both of which are of intrinsic interest for EHT sources. The spatial power spectrum density (PSD) of the brightness fluctuations are computed and presented in Georgiev et al. (2022), which finds that they are universally well characterized by a "red–red" noise process. That is, the largest brightness fluctuations typically occur on the longest timescales and the largest spatial scales.

This procedure can be repeated for movies of different temporal lengths, generating families of PSDs that are indicative of the fluctuation spectra over the different observationally relevant timescales. In all cases, the PSDs are well described by a broken power law in spatial frequency, 152 and therefore in baseline length, ∣u∣. The location of this break ranges from u0 ∼ 1 to 3 Gλ, corresponding to angular scales ranging from 70 to 200 μas and associated with the typical extent of the emission within the image. For ∣u∣ < u0, the PSD asymptotes to a constant. For averaging times T ≳ 103 GM/c3 the spatial PSDs are independent of T, indicating the lack of correlated variability on longer timescales in most simulations. For Sgr A* and M87*, 103 GM/c3 corresponds roughly to 5.5 hr and 1 yr, respectively. For averaging times shorter than this timescale, the maximum of the PSD decreases roughly ∝T2–3 and the location of the break increases.

The variability in the brightness fluctuations can be substantially reduced by normalizing the visibility data by the total flux light curve. This leverages the "red–red" nature of the brightness fluctuations and explicitly eliminates the variability on zero baselines. Because variability on ∣u∣ < u0 is correlated with that at zero baselines, this light-curve normalization reduces a significant fraction of the additional variability on nonzero baselines shorter than the break. Generically, at small ∣u∣ ≲ u0 the PSDs of the light-curve-normalized brightness fluctuations scale as ∣u2−4, depending on the assumed calibration details. For ∣u∣ ≳ u0, the PSDs generally fall with power-law indices of 2–3.

For GRMHD simulations viewed from close to the polar axis the PSDs are approximately azimuthally symmetric. However, for simulations of accretion flows that are viewed nearly edge-on, which exhibit significant departures in azimuthal symmetry in their images, the PSD at 4 Gλ can vary by as much as 50%. Nevertheless, we will proceed with the approximation that the PSDs are functions of ∣u∣ alone, subject to the validation experiments performed in the following sections.

## 3. Summary of Intraday Variability Mitigation Strategy

Intraday variability presents a substantial challenge for VLBI generally, and especially so for that performed by the EHT specifically, which has sparse (u, v)-coverage, as shown in Figure 1. Methods that attempt image reconstruction and model-based parameter inference from the 2017 EHT observations of Sgr A* using segments of the data that span time ranges shorter than that associated with the variability must contend with at least two significant challenges:

• 1.
A dramatically reduced data volume. Limiting the segmentation time to minutes reduces the number of independent VLBI (non-intrasite) measurements to 15 in the best case, in comparison to the many hundreds obtained throughout a night owing to Earth aperture synthesis. As a direct consequence, the number of independent parameters that can be effectively constrained is drastically limited.
• 2.
The inherent complexity and stochasticity of the anticipated variability, based on the turbulence observed in GRMHD simulation movies.

The strategy presented here is to statistically address the putative variability so that standard VLBI analysis tools that leverage Earth aperture synthesis can still be applied despite the presence of significant intraday variability. In this sense, the properties of the average brightness distribution are simultaneously measured with their covariance, or "noise," over the observation window. This procedure is formally equivalent to marginalizing over realizations of the noise and is statistically well justified when the timescale of the variability is short in comparison to the observing window.

The additional variability-induced noise is an addition to, not a replacement for, the usual measurement uncertainties. Thermal noise is the dominant contribution to these uncertainties and is dependent solely on the telescope operations at individual sites, such as system temperature, bandwidth, integration time, etc. (Thompson et al. 2017). To a very good approximation, the thermal errors on the real and imaginary components of the complex visibilities are Gaussian. We will assume that thermal error is synonymous with the statistical error, which may include small additional contributions from phasing efficiency, phase coherence, etc. Importantly, these errors are independent for each measurement, known a priori, and insensitive to the intrinsic source structure and its variability.

This strategy comes with its own set of challenges. Typically, there is not a single variability timescale, but rather a variability spectrum, with some components of the variability appearing well within the observing window and others extending beyond it. This multitude of timescales is evident, e.g., in the variability properties of the light curve of Sgr A* during the 2017 EHT observations (Wielgus et al. 2022) and suggested by the GRMHD simulations described in the previous section (Georgiev et al. 2022). More important is the presence of correlations between measured visibilities, which will be present even in the absence of spatial correlations in the image. These correlations complicate the manner in which the noise is introduced.

In practice, we generate a model of the diagonal elements of the covariance that incorporates the additional deviations of the complex visibilities from a mean image that result from the structural variability within the source. These variances can be estimated directly from the data (Section 4) or obtained simultaneously with model comparisons and image reconstructions (Section 5). In the remainder of this section we discuss how correlations on different scales are addressed and the procedures we present in this paper are validated.

### 3.1. Large-scale Correlations

The light curve of Sgr A* during the observing nights throughout the 2017 EHT campaign is dominated by fluctuations on the longest timescales (Paper V; Wielgus et al. 2022). This is consistent with other studies that cover much longer time spans, which suggest a potential break at a timescale between 2 and 8 hr (Dexter et al. 2014; Witzel et al. 2018, 2021).

This temporal behavior is consistent with the expectations from GRMHD simulations, described in Section 2 and reported in detail in Georgiev et al. (2022). Within the GRMHD simulations, the long-timescale variability is associated with variations in the largest-scale features of the image. In this sense, the spatiotemporal power spectrum is "red–red," i.e., the largest-amplitude fluctuations in the source brightness occur on the longest timescales and largest spatial scales. As a direct result, the largest correlations are driven by the longest temporal/largest spatial fluctuations.

We remove these a priori by light-curve-normalizing the visibility data. By construction, the normalized data have vanishing variance at zero baseline. For GRMHD simulations, the suppression of fluctuation power extends to ∣u∣ ≈ 2 Gλ (Georgiev et al. 2022). A direct consequence of this procedure is that the reconstructed variance is unable to address these omitted long temporal/large spatial scale modes.

### 3.2. Small-scale Fluctuations

GRMHD simulations exhibit turbulence in the emitting plasma, leading to short-time/small-scale stochastic fluctuations in the images (M87* Paper V; Paper V). These occur on all scales, ranging from the field of view of the EHT observations (≳200 μas) to well below the formal resolution of the EHT (≲20 μas; Georgiev et al. 2022). These structural fluctuations are the source of the additional noise that the mitigation scheme presented here is addressing.

### 3.3. Methodology, Caveats, and Validation

The adopted methods only model the diagonal elements of the covariance of the complex visibility data associated with the underlying structural variability, thereby leaving out correlations. The largest correlations are avoided by fitting to light-curve-normalized complex visibilities. Nevertheless, other correlations including the short-time/small-scale fluctuations are not directly addressed, and we have no a priori reasons to believe that they are negligible. Indeed, in Section 4 we see clear evidence for the existence of nontrivial correlations.

The implication of these residual correlations for the fidelity of image reconstructions and parameter estimates is not immediately clear. Thus, we specified the following set of validation experiments, applied where possible to the specific observation parameters (e.g., (u, v)-coverage, observation times) relevant for the analysis in question:

• 1.
Demonstrated on simulated data sets with Gaussian noise added to individual complex visibility estimates.
• 2.
Demonstrated on simulated data sets with Gaussian noise fluctuations with a specified power spectrum overlaid on the image. These are uncorrelated in the image domain but will have residual correlations in the (u, v)-domain owing to the finite source size.
• 3.
Demonstrated on simulated data sets in which structural brightness fluctuations are added via the iNoisy prescription in a manner that is spatially correlated in the image domain (Lee & Gammie 2021).
• 4.
Demonstrated on simulated data sets generated from imaging GRMHD simulations. Fluctuations are necessarily correlated in the image domain in a manner that reflects the self-consistent physical evolution of the emission region.

These experiments are progressive in that additional complications, related to the nature and strength of correlations, are incorporated in subsequent experiments. Thus, we treat them as cumulative: successful validation using GRMHD simulations implies a successful validation using the iNoisy sets, etc.

All methods presented here are successfully validated via multiple of the above experiments, including on simulated data sets produced from GRMHD simulations.

## 4. Data-driven Characterization of Variability

VLBI data sets collected over many observation epochs contain within them a natural characterization of source variability (see Figure 1). We exploit this feature in the 2017 EHT observing campaign to assess the magnitude and spectrum of the intrinsic source variability.

Data sets that may be combined include those taken on multiple days, crossing tracks in the (u, v)-plane throughout the night, and multiple observation bands (e.g., low- and high-band EHT data) where the spectral dependence of the source may be neglected across them. Additionally, lower limits on the intrinsic source size, α, induce correlation in the (u, v)-plane, meaning that neighboring points in the (u, v)-plane may also be compared. Expressed in EHT-relevant units, the typical distance within the (u, v)-plane over which the complex visibilities are strongly correlated is Δu ≈ 1.0(α/200 μas)−1 Gλ. Within this scale, independent of the small-scale structure within the image, we may treat the visibilities as sampling similar aspects of the intrinsic source structure. In practice, we assume that the structure of the visibility may be modeled by a linear function within patches of this size, with the remaining suprathermal variations indicative of the combination of structural variability, refractive scattering, and residual systematic uncertainty.

Therefore, the strategy for obtaining a model-agnostic, data-driven characterization of variability at any given point in the (u, v)-plane (u0, v0) is as follows:

• 1.
Select all data points across all relevant days and bands within a silo of diameter Δu = 1 Gλ about (u0, v0).
• 2.
Detrend the selected data points with a linear model in (u, v) to disentangle the gross source structure from the fluctuations about it.
• 3.
Compute debiased estimates of the mean of the trend and the variance of the residuals.
• 4.
Average these estimates in azimuth.

The final step reduces the statistical uncertainty in the mean and variance, addresses sparse (u, v)-coverage, and facilitates direct comparisons with Section 2.

Insofar as the above strategy provides a measure of the variability in excess to that anticipated by the thermal errors, including the potential temporal correlations, it is directly comparable to the additional uncertainty required to fit static models to variable data sets. An additional debiasing step that addresses temporal correlations, necessary to compare with theoretical expectations of the power spectrum of visibility amplitude fluctuations, is described in Section 4.5.

### 4.1. Data Selection

To avoid having to address the unknown station phase delays, i.e., an inherently uncertain self-calibration procedure in the absence of a known source model, we employ the visibility amplitudes. Visibility amplitudes constructed from complex visibilities with Gaussian (thermal) noise are distributed according to the Rice distribution, which is only approximately Gaussian. In the following analysis we therefore restrict our attention to scan-averaged data values and data points for which the signal-to-noise ratio is S/N ≥ 2, where the Rice distribution is approximated well by a Gaussian.

We combine the data sets from high and low bands, centered on 227.1 and 229.1 with 2 GHz bandwidths, and across all observation times, including through and across observation nights. All data are equally weighted, i.e., we do not attempt to address the different number and quality of data on different days. This combined data set is assumed to provide an ensemble of comparable observations from which we may estimate the desired statistical properties.

Station gains impose an important potential additional systematic uncertainty. In most cases, we marginalize over the credible range of station gains (see Section 4.3). However, there are two important preprocessing steps that we employ to reduce the impact of gain errors within the 2017 EHT observation campaign.

First, we employ the network calibration procedure described in M87* Paper II to set the gain amplitudes at JCMT, SMA, ALMA, and APEX to within 1%.

Second, we calibrate the LMT gain amplitudes by fixing the visibility amplitudes on the LMT-SMT baseline to a Gaussian model, with an FWHM of 60 μas, comparable to the second-moment estimates of M87* and Sgr A*. This does not eliminate uncertainty in the LMT gains, but rather reduces it to a combination of that associated with the SMT and the error in the applicability of the Gaussian model, eliminating the large potential swings (see, e.g., Figure 34 of M87* Paper IV). This calibration procedure does, however, artificially suppress the variance on baselines shorter than 2 Gλ, and thus we will focus our attention only on baselines long enough to be unaffected by the calibration procedure.

Third, to avoid significant losses on JCMT associated with poor phase coherence during scan averaging, prior to coherently averaging we phase-calibrate the JCMT to a point-source model.

### 4.2. Estimating Statistical Properties of a Single Patch

We begin with estimating the mean and variance of the visibility amplitudes within a radius Δu/2 of some point (u0, v0) within the (u, v)-plane, which we refer to as a "patch." Such estimates may be constructed at any point in the (u, v)-plane where a sufficient number of visibility amplitudes exist, i.e., N ≥ 4. For Δu = 1 Gλ, given the 2017 Sgr A* EHT campaign coverage, there are roughly 60 independent patches within ∣u∣ = 9 Gλ containing sufficient data to generate local mean and variance estimates.

#### 4.2.1. Detrending Visibilities

The visibility amplitudes within a patch will vary as a result of both variability and intrinsic structure. Because it is the former that is of interest here, to exclude the latter, we first estimate the linear trend across the patch. As long as Δu is sufficiently small, a linear model is an excellent approximation of the average visibility amplitudes across the patch.

The linear model is defined by $V(u,v)=\bar{V}+{V}_{u}(u\,-{u}_{0})+{V}_{v}(v-{v}_{0})$. The detrending procedure is described in Appendix A. The reconstructions of the detrending parameters, $\bar{V}$ and (Vu , Vv ), are unbiased for Gaussian errors on the amplitudes. The fidelity of the trend reconstruction depends on the number and S/N of the amplitudes. Having chosen $(\bar{u},\bar{v})=({u}_{0},{v}_{0})$ makes no significant change to the trend reconstruction.

The outputs of this procedure are the mean amplitude within the patch, estimated at the patch center, $\bar{V}$, and the detrended amplitudes within the patch, ${\hat{V}}_{j}={V}_{j}-V({u}_{j},{v}_{j})$. This is supplemented with the thermal errors at those locations, σth,j .

#### 4.2.2. Statistical Estimators

Inherent within the detrended ensemble, $\{{\hat{V}}_{j}\}$, is an estimate of the statistical properties of the structural fluctuations not accounted for by the linear trend. These are necessarily biased by the thermal fluctuations and structural variations, both of which must be removed.

We estimate the debiased variance by

which includes a cutoff at ${10}^{-2}{\sigma }_{\mathrm{th},j}^{2}$ to avoid pathologies associated with "over-debiasing," in which statistical fluctuations below ${\sigma }_{\mathrm{th},j}^{2}$ generate negative estimates of σ2. The normalization of N − 3 is a consequence of having three degrees of freedom in the linear detrending model.

The debiased estimate of the mean is given by

where the mean estimate from the linear detrending procedure is corrected for the biases arising from the thermal and structural fluctuations, characterized by the patch variance estimate.

To demonstrate these estimates, we generate a simple simulated data set composed of 10 complex visibilities, Vj , located at (u, v)-positions (uj , vj ) randomly selected about a center (u0, v0). At each (uj , vj ), the Vj is generated from a fixed linear model with $\hat{V}=1$, Vu = 0.5, Vv = 0, and two complex zero-mean Gaussian random fluctuations, one representing the thermal uncertainty and the other the structural variability, both with amplitude 0.1. Repeating this procedure, holding the linear model and (u, v)-positions fixed, produces an ensemble of data sets, from which the distributions of the reconstructed μ and σ2 are shown in red in Figure 2.

The estimate of μ is unbiased, with an accuracy that depends on the degree of variability and number of visibility amplitudes. Even when only four points are used, this quantity is reasonably well behaved, appearing normal and centered on the "truth" value.

The distribution of the estimated variance is clearly not normal, matching a lognormal distribution more closely. The specific realization of the (uj , vj ) does impact the distribution, with the degree of impact increasing as the number of amplitudes included decreases. When the impact of the structural variability is much smaller than the thermal errors, i.e., σσth, the variance estimate can be inaccurate.

#### 4.2.3. Error Estimates

We estimate the errors on the estimates μ and σ2 for a single patch via a Monte Carlo procedure, Monte Carlo Errors (MCEs), that may be applied to the data directly. Doing so avoids the need to propagate the individual thermal errors through the detrending procedure and, as done in the following sections, include additional sources of uncertainty (e.g., gain uncertainties, polarization leakage). This process begins with generating an ensemble of data sets produced by introducing two additional complex, zero-mean, unit-variance Gaussian random variables, xj and yj :

where σ2 is the variance estimate from the observations. This procedure presumes that the thermal noise on the complex visibilities is Gaussian (well justified) and that the structural variations are independent (perhaps not well justified). Each generated set of ${V}_{j}^{{\prime} }$ is analyzed in a fashion identical to that described above: detrended, and μ and σ2 estimated from Equations (2) and (1), respectively.

These must be further debiased to remove the impact of the original fluctuations inherent in the amplitude data. Therefore, we generate an ensemble of estimates that are further debiased via

The distribution of estimates from the MC ensemble is shown in blue in Figure 2. For both μ and σ2, the MCE distributions are quantitatively similar to those generated from ensembles of simulated data. This similarity extends for σ2 to being biased toward large values. The additional debiasing, which eliminates the double counting of the induced and original fluctuations, does indeed appear to satisfactorily remove any residual offsets between the true and reconstructed distributions.

We reiterate that this MCE procedure has the considerable virtue of being applicable to the actual data, providing a credible means by which to assign uncertainties to the patch-specific means and variances.

### 4.3. Azimuthally Averaged Statistical Properties

The procedure outlined above may be used to produce estimates of the variance arising from structural variations in the source throughout the (u, v)-plane. However, in many instances, and explicitly for the 2017 EHT observations (M87* Paper III), the baseline coverage is very sparse. Combined with the expectation that the power spectrum of the structural variability is only modestly dependent on position angle, this sparsity motivates the construction of azimuthally averaged estimates of μ and σ2, which are then functions of only the baseline length.

#### 4.3.1. Averaged Statistical Estimators

To generate these, we first produce a polar grid of patch centers. In these the radial positions, ρs , are linearly spaced with separation Δu/2, so that neighboring patches overlap. 153 At each radial position, a set of azimuthal positions ϕ st are chosen so that azimuthal spacing is ≤ Δu/2. These then provide a set of ${({u}_{0},{v}_{0})}_{st}=({{\rho }}_{s}\cos {{\phi }}_{st},{{\rho }}_{s}\sin {{\phi }}_{st})$. At each polar patch position, we generate estimates of the mean and variance of the detrended visibility amplitudes, μ st and ${{\sigma }}_{st}^{2}$. The location of these patches is illustrated in Figure 3. Each data point will contribute to multiple patches.

Azimuthally averaged quantities are generated by weighting each patch's estimate by the numbers of degrees of freedom. The resulting azimuthally averaged estimate of the mean is given by

where Θ(x) is the standard Heaviside function. Because the variance estimates appear to be lognormal, and thus vary over many orders of magnitude, we average $\mathrm{log}({\sigma }^{2})$, i.e.,

Note that this estimate of ${\sigma }_{s}^{2}$ emphasizes those patches that have many data points and therefore are likely to be more accurate.

#### 4.3.2. Averaged Error Estimates

Neighboring grid points are not necessarily independent, and thus the uncertainty in the combination is expected to be significantly larger than would be inferred if they were. Therefore, the Monte Carlo estimates of the error in μs and ${\sigma }_{s}^{2}$ must be performed after the azimuthal averaging. Residual gain errors present an additional source of uncertainty on the reconstructed means and variances. Therefore, the MCEs for the azimuthally averaged data sets follow a similar but distinct procedure to that for the individual patches.

We begin by constructing estimates of μs and ${\sigma }_{s}^{2}$. With these, we generate an ensemble of data sets as described by Equation (3). Each ensemble element is subsequently modified by introducing gain amplitude fluctuations, introducing a correlated fluctuation across the data set. For application to the 2017 EHT Sgr A* campaign, we adopt gain uncertainties consistent with those reported in Paper III. For sites for which network calibration is performed (ALMA, APEX, JCMT, and SMA) we assume a gain amplitude error of 1%; all other sites are assigned a gain amplitude error of 10%. These additional gain fluctuations are assumed to be normally distributed and applied on a per-scan basis. Finally, we normalize the ensemble data by the correlated flux on the intrasite baselines (ALMA-APEX, JCMT-SMA) as an approximation to the light curve.

#### 4.3.3. Estimating Noise Model Parameters

We ultimately seek to compare the empirical noise estimates with those found in GRMHD simulations by Georgiev et al. (2022). To facilitate this comparison, we fit the azimuthally averaged ${\sigma }_{s}^{2}$ with a broken power-law model, consistent with that found to be applicable to GRMHD simulations:

Because the azimuthally averaged variance estimates are approximately lognormal, we perform this fit on the logarithm of the reconstructed ${\sigma }_{s}^{2}$.

We find that the variance estimates themselves are susceptible to correlated errors, presumably due to the correlations in the fluctuations induced by noise imposed within the image domain. Therefore, we inflate the MCE estimates on the ${\sigma }_{s}^{2}$ by a factor of two, which safely encapsulates the discrepancy in the numerical validation experiments described in the following sections. In summary, the log-likelihood explored is

where errs is the geometric mean of two-sided 1σ quantiles from the MCE ensemble for ${\sigma }_{s}^{2}$.

The noise model parameters, {a, u0, b, c}, are estimated by sampling this likelihood using the Python-based ensemble Markov Chain Monte Carlo package emcee (Foreman-Mackey et al. 2013). Typically, 105 steps are taken by 128 walkers, of which 104 are randomly selected in the latter half of the ensemble of chains. Beyond this point, the joint posteriors of all of the noise parameters are insensitive to the number of MCMC steps taken.

### 4.4. Validation

We explore a number of validation exercises that follow the validation ladder laid out in Section 3. In summary, within the region for which we expect high-fidelity characterization of the structural variation, we find that we are able to do so. This encompasses the baselines such that 2 Gλ < ∣u∣ < 6 Gλ, where baselines shorter than 2 Gλ are excluded by the LMT calibration procedure, and at baselines longer than 6 Gλ the structural variability is subdominant to the thermal noise.

In each validation exercise, the simulated data sets were processed as described in Section 4.1. That is, (1) they are network calibrated, and light-curve normalized using the intrasite fluxes (ALMA-APEX and JCMT-SMA) interpolated to intermediate scans as required; (2) the complex LMT gains are set by calibrating to a 60 μas FWHM Gaussian; and (3) JCMT is phase-calibrated to a point source.

The resulting data sets are then analyzed as described above with Δu = 1 Gλ. Note that we show results for baseline length spacing of 0.5 Gλ, and thus subsequent values are formally correlated.

These are compared to the azimuthally averaged mean and variance of the visibilities computed directly from "truth" movies, without any attempt to restrict them to any subset of the (u, v)-plane, that is, without any of the constraints imposed by the actual observations.

#### 4.4.1.  iNoisy Validation Results

The images used as part of the validation scheme for the various image reconstructions in Paper III provide a natural set of simple examples in which the fluctuations are introduced in the image domain. In each, a geometric brightness map is modified by the imposition of a set of correlated fluctuations via the iNoisy prescription, in which a set of temporally and spatially correlated brightness fluctuations are constructed in a fashion that mimics the turbulent structures within accretion disks (Lee & Gammie 2021).

This iNoisy method provides substantial control over the properties of the image variability while separating it from the complexity of the source structure. The structural variability induced by iNoisy in these images is tuned to match that exhibited by Sgr A*, as measured by the light curve and variations in the closure phase as quantified via the Q-metric, a noise-weighted measure of the closure phase variability described in Roelofs et al. (2017). The construction of the simulated iNoisy data sets is described in detail in Section 5 of Paper III. However, the apparent spatial correlations are typically confined to smaller scales than those characteristic of the geometric model.

The iNoisy data sets that we employed during the validation of the statistical estimators include a crescent, ring, double source, simple disk, elliptical disk, and ring with a 30-minute-period hot spot (see Paper III for further details). Each "truth" movie was resampled on each of the 4 days with different thermal noise, complex gain, scattering, and structural fluctuation realizations. Three examples—the crescent, point source, and elliptical disk—are shown in Figure 4.

In all cases, the estimated azimuthally averaged mean and variance maps match those associated with the "truth" movies well. The LMT calibration procedure does produce artifacts at ∣u∣ ≲ 2.5 Gλ, extending marginally above that. This is especially visible in the estimated variances. However, it is also visible in the estimated means, which hew closely to the 60 μas FWHM Gaussian assumed during the LMT calibration.

For ∣u∣ ≳ 6 Gλ, the structural variability is modestly subdominant to the thermal errors, indicated by their average values. As anticipated, there are modest departures from the truths in this region.

The consistency extends to the range of fit broken power-law noise models. Because the break in the truth variances occurs below ∣u∣ ≈ 2.5 Gλ, there is little traction on the break location and the short-baseline power-law index. However, both the overall amplitude and long-baseline power-law index are well constrained.

#### 4.4.2. GRMHD Validation Results

GRMHD simulations provide a credible proxy for source structure of astronomical sources on horizon scales (Paper V; Georgiev et al. 2022). While there is a limited range of potential image morphologies within the GRMHD libraries, they do provide a physically motivated and self-consistent set of intensity fluctuations. Importantly, these models exhibit the spatial and temporal correlations that are anticipated in accreting and jet-launching environments.

Each "truth" movie was resampled with (u, v)-coverage identical to that of Sgr A* during the 2017 EHT campaign on each of the four observation days with different realizations of thermal noise, complex gain, and scattering. The simulated data sets share many of the features of those associated with the iNoisy sets: the images are complex, contain significant image–domain correlations that translate into corresponding visibility–domain correlations, and contain an underlying noise spectrum that is not known a priori. The latter fact motivates a comparison to the data-driven nonparametric estimates from Section 4.

However, the GRMHD simulation data sets do differ in two important respects. First, they exhibit a much wider array of behaviors, extending from small-scale fluctuations like those in the iNoisy sets to large-scale arcing structures that differ from the iNoisy images. These arcing features induce long-timescale correlations that are not otherwise present in the iNoisy sets. Second, there is no attempt to restrict the simulations to those that reproduce the variability observed in Sgr A*. As a result, the variability present in the GRMHD simulations is typically larger than that exhibited by Sgr A* (Paper V).

Three typical GRMHD validation examples are shown in Figure 5. As with the iNoisy validation examples, this example accurately reconstructs the underlying truth variances associated with the full GRMHD movie for 2.5 Gλ < ∣u∣ < 6 Gλ, limited at shorter baselines by the LMT calibration procedure and at larger baselines by the thermal noise. These are representative of 100 such GRMHD simulation comparisons.

As with the iNoisy models, the break in the truth variances occurs below ∣u∣ ≈ 2.5 Gλ, and therefore only the overall amplitude and long-baseline power-law index are well constrained.

### 4.5. Debiased Variability Measurement

Thus far, no attempt has been made to address clear artifacts in the normalized variance estimates in the examples shown in Figures 4 and 5 that are due to unaddressed temporal correlations in the simulated data sets. These correlations are real and anticipated in the actual Sgr A* data, arising from the fact that on a single night neighboring positions in the (u, v)-plane are observed at nearby times. Insofar as the variance estimate is used to quantify the excess noise required to accommodate fitting to stationary models (including images, geometric models, and physical models) to data sets composed of a full observation night, the prior scheme is sufficient. However, it is possible to envision a debiasing procedure in which these correlations are corrected, which we present here.

The temporal-correlation artifacts are most prominent between 6 and 7.5 Gλ, within which, for every example case shown in Figures 4 and 5, the normalized variance estimates exhibit a deficit of variability. This range of baseline lengths is especially sparsely sampled by the 2017 Sgr A* baseline coverage, with only one baseline contributing on most days (see Figure 1). For models in which the stochastic image fluctuations exhibit red temporal power spectra, as anticipated by GRMHD models of Sgr A* and observed in the light curve of Sgr A*, this sparsity implies a strong correlation among the visibility amplitudes within (u, v)-bins within this baseline length range. As shown in Appendix B for a simplified case, these kinds of correlations produce heavily biased variance estimates in which only one or two baselines contribute, typically dramatically underestimating the bias in the relevant baseline length bins, as seen in the examples shown.

In contrast, note that between 7.5 and 9 Gλ there are five separate baselines, four of which are "following baseline pairs"—LMT-SPT/SMA-SPT and SMT-SPT/PV-SPT—in which the two baselines traverse the same region in the (u, v)-plane but at times separated by hours (Paper IV). Thus, within this baseline range, there are more statistically independent samples than implied by the number of apparent baselines, and many more than between 6 and 7.5 Gλ. Correspondingly, the accuracy of the variance estimates improves dramatically beyond 7.5 Gλ.

The magnitude of the bias, β, depends on the temporal power spectrum of the fluctuation spectra and thus is entangled with the statistical property we wish to measure. In principle, we could construct an a priori debiasing function, β−1(∣u∣), for a given set of observations as a function of baseline length and temporal power spectrum of the putative fluctuations. In practice, it is simpler to utilize the libraries of simulated data to effectively "measure" the debiasing factor for specific classes of models that share similar temporal properties. We present such an empirical calibration here for the iNoisy and GRMHD models, of which the latter is expected to be more physically relevant and utilized in Papers IV and V.

The procedure to estimate the debiasing functions is straightforward:

• 1.
Generate an ensemble of simulated observations matching the 2017 EHT Sgr A* campaign.
• 2.
Estimate the variance as described in Sections 4.14.3.
• 3.
Construct the anticipated power spectrum, ${\hat{P}}_{s}$, from the "truth" movie as described in Section 2 on each day.
• 4.
Generate an estimated debiasing factor for each baseline length bin by minimizing the joint χ2 among all models, observation days, and the estimates of the variance and its error:

We generate an estimate of the debiasing function using the 90 simulations employed in the M/D calibration in Paper IV. This set of mock observations is generated from GRMHD simulations appropriate for Sgr A*, each of which is composed of simulated visibility data for both bands and all days of the reported 2017 Sgr A* EHT campaign. This is shown in Figure 6 and follows the pattern anticipated by the sparsity of the baselines: baseline lengths with few/many baselines contributing have large/small corrections.

The debiased variance estimates for the three GRMHD examples shown in Figure 5 are presented in Figure 7. These three simulations are taken from the 10 simulations used to validate the M/D calibration and are therefore distinct from those within the ensemble used to estimate the debiasing function. They show excellent agreement with the fluctuation power spectrum obtained directly from the simulated images.

### 4.6. General Application: Limiting Assumptions and Potential Extensions

While we demonstrate and validate the data-driven variability characterization presented in this section using the 2017 EHT Sgr A* campaign, the technique is more widely applicable, albeit subject to some nominal limitations.

First, this method relies on a sufficient density and resampling of visibilities within the (u, v)-plane exists. In the absence of more than three points per patch, it is not possible to reconstruct the detrended excess variance. In the absence of many such patches, the resulting estimates are highly uncertain.

Second, the debiasing procedure outlined in Section 4.2 only results in accurate estimates of the excess variance and its uncertainty when the former exceeds the variation due to the intrinsic thermal uncertainty.

Third, the method does not account for correlations that may be present among different regions in the (u, v)-plane. While not a formal limitation on the method's application, it nevertheless limits the interpretation of the reconstructed variances.

This method also makes a handful of specific choices that may be relaxed. The assumption of azimuthal symmetry, which was exploited to improve the precision of the variance estimates, may be removed when the (u, v)-coverage is sufficiently complete. The debiasing function presented in Section 4.5 was constructed for a specific set of observations and anticipated mock image ensemble. An analogous debiasing function could be separately reconstructed for a different application, given an appropriate set of observations and mock images from which to derive it.

Finally, the interpretation of the reconstructed variances depends on the nature of the structural variability. Where it is well described by a set of stochastic fluctuations about a mean image, the measured variability may directly be interpreted as constraining the power spectrum of the fluctuations. Where the variability is associated with coherent motions, the interpretation is more complicated.

## 5. Bayesian Noise Reconstruction

Implicit in the noise estimates obtained in Section 4 is a model for the source structure and its variability that results in smoothly varying visibility amplitudes and noise that is stationary across the multiple observation days. No model for the station gains is assumed, and no effort to utilize the phase information is made. However, it is possible to simultaneously reconstruct the average source structure and the additional noise associated with the variability within standard maximum likelihood analysis schemes.

For Gaussian likelihoods, i.e., assuming Gaussian errors (as is appropriate when fitting complex visibilities), the log-likelihood is

where ${y}_{j}={V}_{j}-\hat{V}({u}_{j},{v}_{j})$ is the difference between the observed and model complex visibilities and C is the covariance. This differs from the usual quadratic expression by the inclusion of the logarithmic normalization term, which will be critical in what follows.

The essence of Bayesian noise reconstruction is that, in addition to the $\hat{V}(u,v)$, the data covariance C is also replaced by some parameterized model whose parameters are jointly optimized with those from the visibility model. In this strategy, the logarithmic normalization is critical, as it penalizes excess noise. That is, the optimal joint source and noise model is one that has just enough noise to properly normalize the log-likelihood, but no more. 154

As described in Section 3, in what follows we will assume that C is diagonal, subject to the assumptions and limitations described there. In this case the normalization term reduces to

where ${\sigma }_{j}^{2}$ is the jth diagonal element of C , corresponding to the effective uncertainty associated with the jth complex visibility.

### 5.1. Implementation of Uncertainty Modeling

Multiple specific realizations of the noise modeling procedure outlined above have been implemented in the Themis analysis framework (Broderick et al. 2020a). In each case, the "noise" model, or uncertainty model, is characterized by a parameterized function that is added in quadrature to the existing uncertainties on the complex visibilities. That is, for the jth complex visibility we set its effective uncertainty by

where ${\sigma }_{\mathrm{th},j}^{2}$ is the reported thermal error and σ2(uj , vj ; q ) is the additional component evaluated for some some set of parameters q. These differ in the particular form of σ2(uj , vj ; q ). The most general uncertainty model employed is constructed from three subcomponents.

The first is a systematic term associated with the residual nonclosing errors that is proportional to the visibility amplitudes:

where anticipated values of f are of order 1% (Paper II).

The second is a refractive scattering term that accounts for the long-baseline fluctuations resulting from the refractive scintillation toward the Galactic center (see, e.g., Johnson et al. 2018; Issaoun et al. 2021). Predictions for the amplitudes of the fluctuations due to refractive scattering are typically only weakly dependent on baseline length beyond ∣u∣ ≈ 1 Gλ (Paper III). Therefore, we incorporate a term that imposes a uniform ((u, v)-independent) floor:

where typical values of σT can be as high as 0.4%. This term only ever contributes significantly at longer baselines when the former two terms have fallen below ${\sigma }_{\mathrm{ref}}^{2}$.

Finally, the third is a contribution that accounts for the structural variability,

where n(∣u∣) is defined in Equation (7). While this specific functional form is motivated by GRMHD simulations (see Section 2), it is sufficiently flexible to encompass a wide variety of potential structural variability.

The combined uncertainty model is then

of which some or all parameters can be fixed, which will be useful to model only a subset of the model components. This is then inserted into Equation (12) to obtain the uncertainties on the real and imaginary components of complex visibilities.

### 5.2. Validation

We report three sets of validation experiments of the Bayesian noise modeling implemented in Themis. These correspond to the different validation levels described in Section 3.3. In each, we demonstrate that the average source structure and noise power spectrum can be faithfully recovered.

All of the analyses presented here are performed as the Themis-based analogs described in Papers III and IV. Specifically, they make use of the scan-averaged, complex visibilities, and the LMT and JCMT pre-processing calibration steps described in Section 4.1 are applied. The complex station gains are reconstructed simultaneously with the image reconstruction.

#### 5.2.1. Gaussian Noise Reconstructions

We begin with simulated data sets produced from a simple source structure and independent Gaussian fluctuations added directly to the visibilities. Note that in this set of tests there are no guarantees that the associated image is positive definite. However, the manner in which additional noise has been added to the data matches exactly that being modeled.

Simulated data sets are produced with the baseline coverage and uncertainties similar to the 2017 April 7 EHT observations of Sgr A* for a Gaussian brightness map with a total flux of 1 Jy and an FWHM = 50 μas. Various assumptions are made about the inclusion of additional fluctuations, which are independently generated for the systematic, refractive, and variability components. We consider three cases here:

• 1.
No additional noise is added: In this case, the true values of the relevant parameters are f = 0, σT = 0, σvar = 0.
• 2.
Only systematic and refractive noise is added: In this case, the true values of the relevant parameters are f = 1%, σT = 0.7%, and σvar = 0.
• 3.
The full noise model is added: In this case, the true values of the relevant parameters are f = 1%, σT = 0.7%, a = 5%, u0 = 2 Gλ, b = 3, and c = 2.

These are then fit with a Gaussian brightness map model, with and without employing the relevant model for the additional noise.

Figure 8 clearly indicates the consequences for failing to account for the additional fluctuations on the underlying data. When no additional noise is included, the posteriors on the total flux and the FWHM faithfully recover the true values. However, when additional sources of noise are included but are unmodeled, large biases are induced, i.e., many σ deviations from the truth. It is mitigating precisely these biases that is the focus of this paper and critical to Papers III and IV.

In contrast, the joint posteriors when the model being fit includes the appropriate noise model are shown in Figure 9. In this case, the parameters of the image structure are faithfully recovered, albeit with an appropriate inflation of the uncertainty. In addition, the parameters of the underlying noise parameters are faithfully recovered as well.

A similar experiment is performed for a more complicated source model that provides good fits to the source structure in the 2017 EHT Sgr A* data (Paper IV), specifically, the "m-ring" construction from Johnson et al. (2020), in which the truth image is a narrow ring with an azimuthal flux distribution decomposed into a truncated Fourier series and then convolved with a circular Gaussian. The model parameters are the total flux, I, ring radius, R, Fourier mode amplitudes and phases, {Am , ϕm }, and FWHM of the Gaussian smoothing kernel (for detailed model specification see Section 4 of Paper IV, though note that we do not include the additional Gaussian component employed there for this test). For this test, we truncate the Fourier series at m = 2. Data are generated in a similar fashion to that for the Gaussian test and analyzed with and without the Bayesian noise modeling.

The top right panel of Figure 10 indicates the necessity of including the Bayesian noise reconstruction: when the excess noise is not modeled (open points), the structural parameters are biased by nearly 100σ in some cases; in contrast, including the noise modeling yields quantitatively accurate parameter estimates. Joint posteriors for the data set in which all additional sources of noise are added are shown in Figure 10. As with the simpler Gaussian image structure, the Bayesian noise reconstruction permits an accurate simultaneous recovery of the structural and noise parameters. This improvement is in part associated with the inflation of the parameter posterior widths resulting from the inclusion of additional contributions to the uncertainties of the individual complex visibilities.

We conclude that Bayesian noise reconstruction satisfactorily mitigates the presence of additional variability when directly applied to the complex visibilities.

#### 5.2.2.  iNoisy Reconstructions

The iNoisy data sets used to validate the image reconstructions in Paper III and described at the beginning of Section 4.4.1 provide a natural set of validation tests. Despite the artificial nature of the simulated images, there are some salient differences with tests in the preceding subsection. First, the images are more complex, and we apply a more complicated source model—an adaptive bicubic raster model appropriate for image reconstruction (see Broderick et al. 2020b; Paper III)—with an associated increase in model complexity. Second, because fluctuations are applied directly to the image, significant correlations are induced among the complex visibilities. Third, the underlying noise spectrum is not known a priori, and therefore the specific quantitative reconstructions of the noise parameters cannot be compared to some truth in the form shown in Figure 9. Thus, we compare to the data-driven nonparametric estimates from Section 3.

Simultaneous image and noise reconstructions are shown in Figure 11 for three examples of the iNoisy simulated data sets. The morphology of the reconstructed images faithfully matches that of the mean images in the underlying truth. A gross sense of the breadth of the posterior is indicated by the samples shown.

The normalized variance in both instances is shown in the bottom panels of Figure 11. The data-driven, nonparametric estimates serve two purposes in these analyses. First, they inform the priors on the noise model, which employ the data-driven amplitude at ∣u∣ = 4 Gλ and the long-baseline power-law index b. However, these are weakly informative, predominantly setting a lower bound on the noise. The range of normalized variances associated with the permitted noise models is reflected by the blue shaded regions. To prevent overinterpretation of the simulated data, we intentionally impose strong lower limits on the noise via these priors; only very weak, noninformative upper limits are enforced.

Second, the data-driven, nonparametric estimates provide a comparison by which to assess the posteriors on the parameters of the noise models, shown by the orange shaded regions in Figure 11. These posteriors are well matched to the data-driven estimates within and beyond the range for which the latter are well constrained. Note that the assumed priors include many values of the noise parameters for which this is not the case, especially parameters that exhibit large degrees of excess noise that are strongly disfavored.

#### 5.2.3. GRMHD Simulation Reconstructions

Finally, we turn to GRMHD simulations, which exhibit physically motivated stochastic and coherent substructures in the simulated images. The simulated GRMHD data sets described in Section 4.4.2 for the April 6 and 7 (u, v)-coverage of the 2017 EHT Sgr A* campaign are used.

We perform two sets of validation experiments with the GRMHD simulation data sets, distinguished by the image model employed. In both sets of experiments we employ the same procedure to set the priors on the noise model using the data-driven, nonparametric estimates as that employed during the iNoisy tests. These priors are reflected in the blue shaded regions in Figures 12 and 13.

In the first, we reconstruct the simulated data sets with a parameterized ring model appropriate for comparison with the compact ring structures seen in M87* and Sgr A*: the mG-ring model described in Paper IV and a simple extension of the m-ring model introduced in Section 5.2.1, differing only by the addition of a central Gaussian component to probe the magnitude of the central flux depression. This is similar to geometric ring reconstructions of the Sgr A* data in Paper IV.

Unlike the test in Section 5.2.1, we do not have an unambiguous set of mG-ring-parameter truth values to validate against, and thus we limit ourselves to general image comparisons. Figure 12 shows mG-ring reconstruction image posteriors in comparison to the average truth images for the same three GRMHD examples shown in Sections 4.4.2 and 4.5. The general size and brightness distributions of the mG-ring images are similar to those of the underlying average truth, blurred to the ostensible resolution of 15 μas, limited by the restricted freedom of the mG-ring model to produce radially asymmetric brightness profiles. In all cases, the reconstructed noise posteriors match those estimated by the data-driven nonparametric estimates. Specifically, the amplitudes near 4 Gλ and the power-law index between 2.5 and 6 Gλ are well reproduced.

The second validation experiment mirrors that of the iNoisy demonstration: we utilize an adaptive bicubic raster model appropriate for image reconstruction (Broderick et al. 2020b). This directly supports the Sgr A* image reconstructions presented in Paper III and is a procedure similar to that employed by the Themis reconstructions presented therein. The results for the same three GRMHD simulation data sets are shown in Figure 13. As found in the iNoisy case, the morphology of the reconstructed image faithfully matches that in the average truth after convolving both with 15 μas beams. The reconstructed noise estimate posteriors, shown by the orange bands, are again consistent with those from the data-driven nonparametric estimates. In particular, the amplitude at 4 Gλ and the long-baseline power-law index of the excess normalized variance are faithfully recovered.

The success found in both sets of experiments is notable given the much larger degree (by 1–2 orders of magnitude) of additional noise present than the previous iNoisy experiments. More importantly, the noise estimates from both experiments are also consistent with each other, indicating robustness to variations in the underlying image model specification.

### 5.3. General Application: Limiting Assumptions and Potential Extensions

We have made a number of specific choices regarding the form and properties of the noise models, with the goal of applying the noise modeling technique to horizon-resolving black hole imaging analyses. However, there are many other potential natural applications—e.g., the characterization of variability in microquasars and subparsec radio emission from astrophysical jets—that may be incorporated with a handful of minor extensions.

The assumed broken power-law form of the noise model in Equation (7), while well justified for GRMHD accretion simulations and reasonably generic, may not apply in general. However, there is no obstacle to implementing alternative functional forms for the noise. Such alternatives could, e.g., relax the assumption of azimuthal symmetry by including a parameterized azimuthal structure. Similarly, the assumption of no nonzero off-diagonal terms in the covariance—which may be generated, e.g., by coherent variations in the image structure—may also be relaxed. Such would require a model for the particular structure of the anticipated off-diagonal terms.

An important limitation of the procedure as presented is the lack of any model for temporal correlations. Again, this limitation might be relaxed at the expense of specifying an appropriate model. However, in the absence of such an extension to the model, we retain a requirement on the data: a sufficient number of statistically independent samples of the complex visibility function must be present at all locations in the (u, v)-plane. For Sgr A*, this requirement precludes single-day analyses, for which subhour temporal correlations can bias the noise model reconstructions significantly in the absence of additional source realizations. Combining at least 2 days, and thereby better conditioning the data to the underlying assumption of the model, is sufficient to mitigate these biases in the applications to the 2017 EHT Sgr A* campaign.

As with the data-driven variance estimation, the interpretation of Bayesian noise modeling depends on the origin of the underlying variability. Where it is due to stochastic fluctuations about a mean, the reconstructed noise model is again a direct measurement of the spatial power spectrum. Where coherent secular motions dominate the variability, the interpretation is more complicated. This ambiguity is similar to that surrounding the meaning of the reconstructed mean image.

## 6. Application to M87*

As a first science demonstration we apply the data-driven variability characterization and Bayesian noise reconstruction to the 2017 EHT observations of M87*. The estimated gravitational timescale of M87* is 9 hr, and thus little intraday variability is expected, with the possibility of interday variability across the 7 days of the 2017 EHT M87* campaign (M87* Paper II, Satapathy et al. 2022). This expectation is supported by the image and model reconstructions of M87*, which show no significant variation from April 5 to 6 and from April 10 to 11 and modest variations between the two pairs of observing days (M87* Paper IV; M87* Paper VI).

The analyses presented here differ from those in M87* Papers I−VI in two important respects. First, we analyze the entire data set simultaneously, combining the information across multiple observation nights. Second, we explicitly disregard the overall compact flux via the light-curve normalization procedure described in Section 4.1. Because M87* does contain significant extended structure, which is presumably static across the entire observing campaign, we estimate the light curve associated with the compact structure via the intrasite baselines, i.e., SMA-JCMT and ALMA-APEX, using the variance weighted mean for scans in which more than one intrasite visibility amplitude is available.

### 6.1. Data-driven Variability Estimate

We begin with the application of the data-driven, model-agnostic characterization of the variability as a function of baseline length described in Section 4. The empirically derived mean visibility amplitudes and their variance across all 4 days are shown in Figure 14. The mean amplitudes exhibit the clear double-humped structure consistent with a narrow crescent, already apparent in the amplitudes themselves (M87* Paper III). The normalized amplitude variances are generally small, consistent with the typical thermal errors (though still much larger than the systematic uncertainty). That is, as anticipated, there is little evidence for stochastic variability in M87* from this model-agnostic, data-driven approach. We now move on to Bayesian source and noise reconstructions.

### 6.2. Bayesian Noise Reconstructions

The expected absence of intraday variability precludes placing strong priors on the noise model. Thus, we do not apply priors based on the data-driven variance estimates from the preceding section, and instead we assume linear priors in the ranges σT ∈ [0, 0.02], f ∈ [0, 0.1], a ∈ [0, 0.5], u0 ∈ [1 Gλ, 10 Gλ], b ∈ [0.5, 6], and c ∈ [0.5, 6]. The range of noise models is indicated by the blue shaded areas in Figure 15 and covers the entirety of the plot region.

The absence of intraday variability is borne out by the Bayesian noise reconstruction on a single day. The left panels of Figure 15 show draws from the posterior of the reconstructed image of the EHT observations of M87* on 2017 April 10. This day has the fewest scans of any of the observations of M87* during the 2017 EHT campaign and is correspondingly more uncertain, a fact that is reflected in the variety of potential images in the imaging posterior (Figure 15, middle left panels).

In addition to image reconstruction, we simultaneously fit the noise model to the data and find no evidence for any excess noise. The bottom left panel of Figure 15 shows that the posterior distribution of the required additional noise on 2017 April 10, indicated by the orange band, is generally small in comparison to both the thermal errors and the data-driven, model-agnostic estimates from all four observing days presented in Figure 14.

While there is little evidence for intraday variability, interday variability is obviously present in the image reconstructions in M87* Paper IV and M87* Paper VI. This conclusion is borne out by the multiday image reconstructions shown in the right panels of Figure 15. In stark contrast to the single-day analysis, when multiple observation days are combined, there is a clear need for additional noise, the posterior range of which is shown by the orange band in the bottom right panel, even though marginally subthermal. Above 4 Gλ, i.e., on characteristic scales smaller than 50 μas, the required additional noise matches that anticipated by the data-driven, model-agnostic analysis.

A quantitative comparison of the required additional variability obtained from the Bayesian noise modeling is presented in Figure 16. Consistent with the lack of significant Galactic scattering toward M87*, no substantial σT is required. Estimates of the additional systematic noise from the statistics of closure quantities are consistent with the roughly 1% values obtained (M87* Paper III). On a single day, no additional variability noise is required, i.e., a < 0.05% at 2σ. Across all observation days, a = 1.00% ± 0.25% ± 0.45% at 1σ and 2σ, respectively. No other parameters of the noise model are meaningfully constrained in this example application.

## 7. Summary and Conclusions

Intraday variability presents a novel challenge for high-frequency VLBI observations, like those performed by the EHT. Traditional image reconstruction algorithms are predicated on static source structures that permit the combination of VLBI observations throughout the observing night, i.e., Earth aperture synthesis, an assumption explicitly violated by variable targets. At the same time, the nature and properties of the source variability are of intrinsic interest, informing the underlying physical properties of the source (Georgiev et al. 2022).

For the sources of greatest interest to the EHT, Sgr A* and M87*, this variability is associated with stochastic fluctuations driven by MHD turbulence. Thus, we have developed methods to reconstruct the mean source structure and statistically characterize the variability around it. By doing so, we reap the benefits underlying Earth aperture synthesis—better (u, v)-coverage and larger effective data sets, both of which increase the fidelity and volume of information that may be extracted—while addressing the variable nature of the source. These methods are more broadly applicable to any time-variable source, though they are most naturally applied to the imaging and/or modeling of sources exhibiting stochastic variations.

In GRMHD simulations, the strongest structural variability occurs on the longest timescales and largest spatial scales, i.e., the spatiotemporal power spectrum is "red–red." Therefore, Georgiev et al. (2022) find that it is possible to substantially reduce the degree of variability that must be mitigated by utilizing light-curve-normalized visibility data. For GRMHD simulations of Sgr A* and M87*, the remaining power spectrum is well described by a broken power law that typically peaks below 2 Gλ.

We have presented a data-driven nonparametric method for directly estimating the power spectrum of source image fluctuations from multiday sets of observations, like those by the EHT. For the 2017 EHT Sgr A* observing campaign, we have calibrated the resulting power spectrum estimates against GRMHD simulations. In particular, we have demonstrated the ability to faithfully extract the magnitude and long-baseline power-law index of the image fluctuation power spectrum near 4 Gλ, corresponding to scales of 50 μas. Results of applying this procedure to Sgr A* are contained in Paper IV, and the implications of the measured a and b noise model parameters for GRMHD models of Sgr A* are discussed in Paper V. These estimates of the Sgr A* variability are explicitly used to inform priors on the noise modeling employed by the Themis mG-ring analyses in Paper IV and the Themis image reconstructions in Paper III; the remaining imaging methods employed in Paper III survey over ranges of noise model parameters informed by these analyses.

We further present a formalism, Bayesian noise reconstruction, by which the statistical properties of the variability are constrained simultaneously with the mean source structure. These make use of parameterized "noise models," in which an excess uncertainty budget is included within a self-consistent likelihood. In applying this procedure, we necessarily focus exclusively on the diagonal of the visibility covariance, ignoring correlations between the visibilities at different locations in the (u, v)-plane. This procedure has been implemented in the Themis parameter estimation framework (Broderick et al. 2020a) and validated with a number of mock data sets.

We conclude that the Bayesian noise reconstruction scheme is effective at mitigating the impact of even substantial variability on the problem of source reconstruction and provides self-consistent estimates of the intrinsic variability. We have tested the Bayesian noise reconstruction algorithm on simulated data tests that span a wide range of mean images and types of variability, extending to the simultaneous reconstruction of mean images and power spectra from GRMHD movies. In all cases, we are able to faithfully recover both the image structure and excess variability noise. These reconstructions are robust in the sense that any model possessing a sufficient number of degrees of freedom recovers similar images and similar "noise" power spectra. Results of a variety of imaging and modeling analyses of the EHT Sgr A* observations are presented in Papers III and IV. As with the data-driven variability characterization, the recovered noise model from the Bayesian noise reconstruction directly informs models of variability within the emission region.

We have applied both methods to M87*, for which GM/c3 = 9 hr. Image reconstructions of the EHT M87* data do not show significant variation among neighboring days, implying little intraday variability. We confirm this conclusion by analyzing the 2017 April 10 observation, finding no evidence for variability. Combining all four observing days, 2017 April 5, 6, 10, and 11, we recover the interday variability observed across the week. In particular, we find excellent agreement between the data-driven nonparametric estimates and the self-consistently obtained modeled variability noise. Moreover, the nonvariability components in the noise model, i.e., refractive scattering and fractional systematic error components, return their anticipated values for M87*: negligible and 1%, respectively.

Both of the methods presented here have potential applications well beyond EHT data sets. As demonstrated with M87*, they provide a natural method for estimating the degree of variability for sources that vary across multiple observation epochs. Moreover, by simultaneously recovering the variability and the mean flux distribution, Bayesian noise reconstruction enables the self-consistent combination of the data from multiple observations. That is, by construction, Bayesian noise modeling permits self-consistent, data-driven averaging across multiple observations, with the potential to result in more accurate mean image estimates.

A number of the assumptions made here regarding the particular form of the noise model are informed by our ultimate application to EHT analyses of Sgr A* and may be relaxed. Chief among these are azimuthal symmetry and the assumed vanishing of the off-diagonal elements of the covariance. The development of an appropriate noise model is source dependent. Nevertheless, it is clear that in addition to enabling traditional VLBI analysis techniques, the ability to detect and characterize variability opens a new window onto the dynamical processes at work in VLBI targets.

The Event Horizon Telescope Collaboration thanks the following organizations and programs:

We thank the staff at the participating observatories, correlation centers, and institutions for their enthusiastic support. This paper makes use of the following ALMA data: ADS/JAO.ALMA#2016.1.01154.V. ALMA is a partnership of the European Southern Observatory (ESO; Europe, representing its member states), NSF, and National Institutes of Natural Sciences of Japan, together with National Research Council (Canada), Ministry of Science and Technology (MOST; Taiwan), Academia Sinica Institute of Astronomy and Astrophysics (ASIAA; Taiwan), and Korea Astronomy and Space Science Institute (KASI; Republic of Korea), in cooperation with the Republic of Chile. The Joint ALMA Observatory is operated by ESO, Associated Universities, Inc. (AUI)/NRAO, and the National Astronomical Observatory of Japan (NAOJ). The NRAO is a facility of the NSF operated under cooperative agreement by AUI. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We also thank the Center for Computational Astrophysics, National Astronomical Observatory of Japan. The computing cluster of Shanghai VLBI correlator supported by the Special Fund for Astronomy from the Ministry of Finance in China is acknowledged.

APEX is a collaboration between the Max-Planck-Institut für Radioastronomie (Germany), ESO, and the Onsala Space Observatory (Sweden). The SMA is a joint project between the SAO and ASIAA and is funded by the Smithsonian Institution and the Academia Sinica. The JCMT is operated by the East Asian Observatory on behalf of the NAOJ, ASIAA, and KASI, as well as the Ministry of Finance of China, Chinese Academy of Sciences, and the National Key Research and Development Program (No. 2017YFA0402700) of China and Natural Science Foundation of China grant 11873028. Additional funding support for the JCMT is provided by the Science and Technologies Facility Council (UK) and participating universities in the UK and Canada. The LMT is a project operated by the Instituto Nacionalde Astrófisica, Óptica, y Electrónica (Mexico) and the University of Massachusetts at Amherst (USA). The IRAM 30-m telescope on Pico Veleta, Spain is operated by IRAM and supported by CNRS (Centre National de la Recherche Scientifique, France), MPG (Max-Planck-Gesellschaft, Germany) and IGN (Instituto Geográfico Nacional, Spain). The SMT is operated by the Arizona Radio Observatory, a part of the Steward Observatory of the University of Arizona, with financial support of operations from the State of Arizona and financial support for instrumentation development from the NSF. Support for SPT participation in the EHT is provided by the National Science Foundation through award OPP-1852617 to the University of Chicago. Partial support is also provided by the Kavli Institute of Cosmological Physics at the University of Chicago. The SPT hydrogen maser was provided on loan from the GLT, courtesy of ASIAA.

This work used the Extreme Science and Engineering Discovery Environment (XSEDE), supported by NSF grant ACI-1548562, and CyVerse, supported by NSF grants DBI-0735191, DBI-1265383, and DBI-1743442. XSEDE Stampede2 resource at TACC was allocated through TG-AST170024 and TG-AST080026N. XSEDE JetStream resource at PTI and TACC was allocated through AST170028. This research is part of the Frontera computing project at the Texas Advanced Computing Center through the Frontera Large-Scale Community Partnerships allocation AST20023. Frontera is made possible by National Science Foundation award OAC-1818253. This research was carried out using resources provided by the Open Science Grid, which is supported by the National Science Foundation and the U.S. Department of Energy Office of Science. Additional work used ABACUS2.0, which is part of the eScience center at Southern Denmark University. Simulations were also performed on the SuperMUC cluster at the LRZ in Garching, on the LOEWE cluster in CSC in Frankfurt, on the HazelHen cluster at the HLRS in Stuttgart, and on the Pi2.0 and Siyuan Mark-I at Shanghai Jiao Tong University. The computer resources of the Finnish IT Center for Science (CSC) and the Finnish Computing Competence Infrastructure (FCCI) project are acknowledged. This research was enabled in part by support provided by Compute Ontario (http://computeontario.ca), CalculQuebec (http://www.calculquebec.ca) and ComputeCanada (http://www.computecanada.ca).

The EHTC has received generous donations of FPGA chips from XilinxInc., under the Xilinx University Program. The EHTC has benefited from technology shared under open-source license by the Collaboration for Astronomy Signal Processing and Electronics Research (CASPER). The EHT project is grateful to T4Science and Microsemi for their assistance with Hydrogen Masers. This research has made use of NASA's Astrophysics Data System. We gratefully acknowledge the support provided by the extended staff of the ALMA, both from the inception of the ALMA Phasing Project through the observational campaigns of 2017 and 2018. We would like to thank A. Deller and W. Brisken for EHT-specific support with the use of DiFX. We thank Martin Shepherd for the addition of extra features in the Difmap software that were used for the CLEAN imaging results presented in this paper. We acknowledge the significance that Maunakea, where the SMA and JCMT EHT stations are located, has for the indigenous Hawaiian people.

Facility: EHT. -

Software: eht-imaging, Themis, emcee.

## Appendix A: Linear Detrending of Visibility Amplitudes

Given a set of visibility amplitudes, {Vj }, at positions in the (u, v)-plane of {(uj , vj )}, we produce a linear estimate of the trend, $V(u,v)=\bar{V}+{V}_{u}(u-\bar{u})+{V}_{v}(v-\bar{v})$, where $(\bar{u},\bar{v})$ may be arbitrary. The three constants, $\bar{V}$ and the gradient (Vu , Vv ), are determined via

where

and

When $(\bar{u},\bar{v})$ is chosen such that ${\sum }_{j}({u}_{j}-\bar{u})={\sum }_{j}({v}_{j}-\bar{v})=0$, Su = Sv = 0, and thus the matrix inversion simplifies significantly.

We clip the linear estimate of V(u, v) when the difference between the average visibility, N−1j Vj , and $\bar{V}$ exceeds the standard deviation of the Vj by more than a factor of 10. For Gaussian errors, the above procedure results in an unbiased estimator of $\bar{V}$ and (Vu , Vv ).

The reconstruction of the offset and gradient in V is illustrated in Figure 17. Decreasing the number of data points and increasing the random error component increases the error in the reconstructed trend, unsurprisingly. Nevertheless, even for very few points the distribution of reconstructed trends covers the "truth" values faithfully (see Figure 18).

Because we ultimately estimate the impact of the intrinsic uncertainties on the azimuthally averaged means and variances of the visibility amplitudes, we make no attempt during detrending to estimate the uncertainty on the V(u, v) itself.

## Appendix B: Correlation-induced Bias

We employ a minor variation of the standard variance estimator to approximate the variance of the visibility amplitudes within each (u, v)-patch, which is then averaged to produce an estimate at each baseline length bin. However, where strong correlations exist between visibility amplitudes within a patch, they will result in significant biases. Gaining insight into the impact of these correlations is complicated by its heteroscedastic nature: strong correlations will exist within a patch on a single baseline but typically will not extend among different baselines.

Therefore, we construct a toy example in which the sequence of visibility amplitudes within a patch is composed of a sequence of N sets of replicas, each individually containing nj perfectly correlated values (i.e., copies). That is, the set of values is

The total number of values is

of which only N are independent. We further assume that the Vj are drawn from some distribution with variance σ2.

On average, the standard variance estimate from this sequence is

If nj = 1 for all j, then $\overline{{n}^{2}}={\overline{n}}^{2}=1$ and the above reduces to the unbiased estimate. However, when nj is not unity generally, a bias is induced in the estimated variance, becoming large when N is small.

The magnitude of the bias depends most sensitively on the number of replicas, i.e., N. Figure 19 shows the average debias factor for different values of N. When N is small, the value of the debias factor can easily reach values well above those measured in Section 4.5.

## Footnotes

• 150

For a large compendium of VLBI movies of active galactic nuclei, see www.physics.purdue.edu/MOJAVE.

• 151

It may be possible to retrieve some phase information through phase precalibration procedures (e.g., self-calibrating to a fiducial source structure) or the use of closure phases (see, e.g., Roelofs et al. 2021). We leave a full exploration of these possibilities for future work.

• 152

For interferometric observations, such as those performed by the EHT, when measured in λ the baseline lengths are directly proportional to the spatial frequency within the image domain. Thus, individual observations directly probe the Fourier transform of the image at specific spatial frequencies set by the baseline.

• 153

In what follows (s, t) correspond to the radial and azimuthal patch indices to differentiate them from j, by which we index individual data points.

• 154

It is straightforward to show that when C = σ21, where 1 is the identity and the single parameter is σ, the optimal noise model (value of σ) is that for which the standard reduced-χ2 is unity.