Gravitational redshift test of EEP with RA from near Earth to the distance of the Moon

The Einstein Equivalence Principle (EEP) is a cornerstone of general relativity and predicts the existence of gravitational redshift. We report on new results of measuring this shift with RadioAstron (RA), a space VLBI spacecraft launched into an evolving high eccentricity orbit around Earth with geocentric distances reaching 353,000 km. The spacecraft and ground tracking stations at Pushchino, Russia, and Green Bank, USA, were each equipped with a hydrogen maser frequency standard allowing a possible violation of the predicted gravitational redshift, in the form of a violation parameter $\varepsilon$, to be measured. By alternating between RadioAstron's frequency referencing modes during dedicated sessions between 2015 and 2017, the recorded downlink frequencies can essentially be corrected for the non-relativistic Doppler shift. We report on an analysis using the Doppler-tracking frequency measurements made during these sessions and find $\varepsilon = (2.1 \pm 3.3)\times10^{-4}$. We also discuss prospects for measuring $\varepsilon$ with a significantly smaller uncertainty using instead the time-domain recordings of the spacecraft signals and envision how $10^{-7}$ might be possible for a future space VLBI mission.


Introduction
The symmetries that embody the Einstein Equivalence Principle (EEP) underly metric theories of gravitation like general relativity. However, attempts at a quantum description of gravity seem to inevitably lead to violations of the EEP [1]. A consequence of such a violation might be a departure from the predicted gravitational redshift: where ν is the frequency of an electromagnetic signal measured at different points within a gravitational field, ∆U is the difference in gravitational potential at the points, and ε the violation parameter in the case where identical atomic frequency standards are used by the emitter (e) and observer (o) [2]. Measuring ε is the subject of this paper. The first high-precision laboratory experiments of this type, reaching a relative accuracy of 1%, were done in the 1960s by Pound and Rebka [3] and later improved by Pound and Snider [4]. In 1976, Gravity Probe A (GP-A) [5] was launched on a non-orbital trajectory, with an apogee altitude of 10 000 km, allowing the gravitational redshift to be measured with an accuracy of σ ε = 1.4 × 10 −4 [6]. More recently, teams utilizing a pair of Galileo global navigation system satellites, which are in elliptical orbits with an eccentricity of 0.16, were able to refine the measurement of the violation parameter to (0.19 ± 2.48) × 10 −5 [7] and (4.5 ± 3.1) × 10 −5 [8] by taking advantage of an ∼ 8 500 km variation in geocentric distance over the satellites' orbits. Optical lattice clocks have also become sufficiently accurate to allow a similar measurement of (1.4 ± 9.1) × 10 −5 on Earth [9]. Proposed future experiments may allow these measurements to be further refined by several orders of magnitude [10,11].
In this paper, we present the latest results from a gravity experiment using RadioAstron (RA), the spacecraft element of the Russian-led international space very long baseline interferometry (VLBI) mission, launched in 2011 into a highly eccentric elliptical orbit with geocentric distances under 7 000 km and as large as 353 000 km [12], comparable to the Earth-Moon distance. The spacecraft carried a VCH-1010 spacequalified hydrogen maser (SHM) frequency standard [13]. The mission's two ground tracking stations in Pushchino, Russia (PU) and Green Bank, WV, USA (GB) are also equipped with hydrogen masers (H-masers). First results were published in 2020 [14] based on measurements of the spacecraft's downlink signals made with the Doppler tracking equipment (Doppler frequency measurements) also used for orbit determination but were limited to σ ε = 3% by systematics most likely due to the error in compensating for the non-relativistic Doppler shift. In this follow-up paper, we report on an analysis of Doppler frequency measurements where the non-relativistic Doppler shift could be suppressed resulting in a 100 times reduction in σ ε . This higher accuracy was achieved through a Doppler compensation scheme (DCS) during dedicated downlink sessions in which a combination of 1-way and 2-way links similar to GP-A could be used [2,15].
The main steps in the analysis are indicated in figure 1. In the first column are the main inputs: i. station time offsets relative to GPS  measurements at the ground stations, iii. orbital state vectors for the spacecraft, iv. Earth rotation and atmospheric models from the Naval Research Laboratory's (NRL) Tracker Component Library, and v. log of the instantaneous uplink frequency transmitted to the spacecraft. The next column indicates how the inputs are initially used to compute the frequency offset of the ground station clocks and the residuals when comparing the Doppler frequency measurements to those expected from the state vectors and various models. It is at this stage that the DCS is implemented. The third column indicates that residuals are used to measure noise levels which are used to estimate uncertainties using Monte Carlo techniques. Finally, the last column indicates the use of weighted least squares to simultaneously fit the violation parameter, ε, and the frequency offset of the SHM over time. These steps will be elaborated on in detail in the remainder of this paper which is organized as follows: section 2 describes the DCS and how it was implemented with RA; section 3 describes the measurement procedures and the data acquired for the experiment; section 4 describes the data analysis; section 5 describes how ε and its uncertainty were estimated; section 6 discusses prospects for further reducing the uncertainty in ε; and section 7 provides a brief summary and our conclusions.

Doppler Compensation Scheme
RA was equipped with two modes of onboard frequency referencing as shown in figure 2.
In the 1-way mode, the observed frequency of the reference tone at 8.4 GHz and the  In the 2-way mode, the 7.2 GHz reference tone transmitted by the ground tracking station was used in a phase-synchronization loop to provide the onboard frequency reference. Adopted from [15].
carrier signal at 15 GHz (ν 1w ) experienced a relative frequency shift compared to the nominal or unshifted frequency at the ground tracking station (ν 0 ): where y (lowercase) here and hereafter signifies an observed relative frequency shift, Y (uppercase) will be the corresponding expected value and r, the residual relative frequency shift or simply 'residual shift', will be the difference between them (except in figure 3 where r is a position vector). For 1-way, the expected value is given by: whereḊ is the rate of change of the magnitude of the range vector D, also called the range rate, v e and v s are respectively the velocities of the ground station and spacecraft, and a s is the acceleration of the spacecraft. Y fine,1w includes relativistic Doppler terms at third order or higher in v/c as well as other small effects particularly those due to the troposphere, ionosphere and phase-center motion (PCM) that are considered but omitted here for brevity. Key times and state vectors are as defined in figure 3 at t 3 (observation time). Vectors at times t 1 and t 2 , when needed, are related to t 3 using higher order corrections derived by expansion around t 3 . The first order term, the non-relativistic Doppler shift, presents a significant challenge as near perigee it can exceed the gravitational redshift by a factor of 10 4 . Orbital state vectors are typically not accurate enough to estimateḊ/c with sufficient precision and thus dominate the 1-way residual shift: For example, δḊ ∼ 2 mm s −1 for RA [16] which limits the measurement of y grav to ∼ 1% when using only r 1w [14].
In the GP-A experiment, a novel technique was used to suppress the non-relativistic Doppler by taking advantage of a phase-synchronization loop locked to a reference tone uplinked from the ground tracking station to provide a second onboard frequency reference. RA's 2-way mode worked in a similar way. The observed frequency of the downlink signals in this mode (ν 2w ) experienced a relative frequency shift compared to the frequency of the uplink tone from the ground tracking station (ν up ) defined as: where F 0 is a multiplier applied by the spacecraft's electronics when generating the downlink frequencies. To second order this shift is given by: where a e is the acceleration of the ground station and Y fine,2w includes smaller effects, again omitted for brevity as in equation (3). Notice that y 2w does not contain ∆U c 2 . We can again define a residual shift as: which is also dominated by errors in estimatingḊ/c. However, with a DCS, both observed relative frequency shifts are combined to form a Doppler compensation scheme relative frequency shift or simply 'DCS frequency shift' defined as: whereḊ/c has cancelled and ∆U c 2 remains as the leading effect in the expected DCS frequency shift: The DCS also significantly diminishes the effects of the troposphere, ionosphere and PCM with only differential effects remaining in ∆Y fine . In order to achieve the desired accuracy, relative frequency shifts as small as 10 −15 must be considered, including terms at third order in v/c which are included in ∆Y fine : where ∆Y trop , ∆Y ion and ∆Y pcm are the differential effects of the troposphere, ionosphere and PCM while j e is the jerk of the ground station due to Earth's rotation. For RA's orbit, the terms on the last line of equation (10) are negligible as they do not exceed 6 × 10 −16 . Relativistic Doppler terms are estimated using orbital state vectors for the spacecraft provided by the mission (see [16]) along with state vectors for the ground stations computed using the IAU Earth rotation model as implemented in the Tracker Component Library (TCL) by the Naval Research Laboratory (see [17]). The Earth's gravitational field is modeled using tide-free coefficients from EGM2008 (see [18]) with post-glacial rebound, polar motion, solid earth tides, oceanic tides and pole tides added following IERS conventions (see [19]). State vectors for the Moon and Sun are computed using JPL's DE421 ephemeris (see [20]) and used to estimate their tidal effects included Figure 4: Relative frequency shifts after implementing the DCS at GB over a particular orbit in December 2015. Dashed line is the spacecraft's geocentric distance and is plotted against the right axes. Note ∆Y dop2 are the second order relativistic Doppler effects from equation (9) and ∆Y dop3 the third order Doppler terms from equation (10).
in ∆U . ∆Y trop is the residual relative frequency shift due to the troposphere estimated using the VMF3 and GRAD models (see [21,22]). ∆Y ion is the residual effect of the ionosphere, mostly due to the difference between uplink and downlink frequencies, estimated using electron densities from CDDIS (see [23]). ∆Y pcm includes the PCM effect due to the offset of the spacecraft antenna's phase center from its center of mass (see [24]). Also added is the effect due to the significant displacement between the phase center and reference point of the GB ground tracking station. Examples of these relative frequency shifts are plotted over an orbit in figure 4. Now we introduce our main observable, the Doppler compensation scheme residual relative frequency shift or simply 'DCS residual shift': This observed relative frequency shift only contains unmodelled effects. While Hmaser frequency standards may reach or even exceed a relative stability of 10 −15 over thousands of seconds, they are susceptible to a number of systematic effects that cause their frequency to drift over longer times [25]. An offset between each H-maser and Geocentric Coordinate Time (TCG) must be accounted for when comparing observed frequency shifts to prediction. Thus, we define the difference between the SHM and This difference will appear in the observed value of ∆r along with a possible violation of ∆U c 2 as follows: The basic approach of this experiment is thus to measure ∆r using equation (11) and fit the resulting observations using a model function based on equation (13) to estimate ε and its uncertainty.

Measurement procedures
Although RA began scientific observations in 2012, measurements for this experiment only started in 2015 and ran until the SHM failed in mid-2017. Most RA sessions involved the real-time downlinking of astronomical space VLBI observations. During these 'single mode' sessions, the frequency referencing mode was held fixed and Doppler tracking equipment was used to determined the peak frequency of the spacecraft's signals. This was done for the 8.4 GHz reference tone and the 15 GHz carrier signal by computing Fourier spectra using 80 ms of digitally sampled data with 50% overlap resulting in a 25 Hz measurement rate. A subset of these single mode sessions was used for noise analysis, as will be described later on, and are summarized in table 1. In addition, 199 'interleaved' sessions, each about 1 hr long, were dedicated solely to gravitational redshift measurements. During these sessions, RA's frequency referencing mode was switched between 1-way and 2-way modes. Of these sessions, 128 could be used for this experiment and are summarized in table 2. The time series of measurements from an interleaved session can be divided into a series of up to several dozen 'segments' over which a particular mode was in use. The interleaving of modes stands out clearly in figure 5 where the 1-way residual shift with the gravitational redshift added back, r 1w + ∆U c 2 , and the 2-way residual shift, r 2w , have been plotted for two sample sessions. Interleaved sessions with many short segments were recorded at a variety of distances while sessions with fewer but longer segments mostly took place close to apogee where the gravitational redshift between the GHM and SHM is at its maximum value. The former provide sensitivity to a possible violation of the gravitational redshift while the latter largely provide sensitivity to the evolution of the frequency offset between the H-masers introduced in the next section.

Measuring the DCS Residual Shift
Values of the DCS residual shift, ∆r, were measured by fitting polynomials to the 1-way and 2-way residual shifts, r 1w and r 2w , to interpolate simultaneous relative frequency shifts and apply the DCS. This was done using linear least squares (LLS). In the presence of only white noise, LLS would correctly estimate the interpolation error Figure 6: Example of ADEV (left) and power spectra (S y , right) at 8.4 GHz computed using residual shifts r 1w (red) and r 2w (blue) from a typical session. Error bars are 68% confidence intervals and are omitted at higher frequencies for clarity. Dotted lines correspond to the noise model, fit to the mean S y from many sessions. Circles indicate interference spikes. and correspondingly the uncertainty of ∆r. However, in the presence of non-white noise, such estimates can be significantly biased. To determine the nature of the noise present, the Allan deviation (ADEV as a function of averaging time τ ) [26] and power spectral density (S y as a function of frequency f ) were computed using r 1w and r 2w and are shown for a typical session in figure 6. Colored noise is evident in the spectrum and can be characterized by spectral index, α, where S y ∝ f α . At high frequencies, phase noises such as white phase modulation noise (WPM), with α = 2, and flicker phase modulation noise (FPM), with α = 1, dominate, while at low frequencies, flicker frequency modulation (FFM) noise, with α = −1, begins to dominate above a floor of white frequency modulation (WFM) noise where S y is constant. In the presence of these noises, LLS error estimates will be biased by the phase noises while the non-stationary nature of FFM will introduce a minimum error (the 'flicker noise floor') which cannot be overcome by fitting more data. In the presence of WFM alone, a LLS fit using all the data in a session would result in a single measurement of the DCS residual shift with the smallest possible error. However, due to the flicker noise floor, it is instead advantageous to partition the data and separately fit the frequencies in each part. The error introduced by FFM will be random from fit to fit and will thus allow the effect of the flicker noise floor to be diminished by having multiple independent values of ∆r per session. In practice, we chose to measure ∆r at a point in each 1-way segment where the error is expected to be smallest and for which a portion of neighboring 2-way segments could be fit to obtain a simultaneous 2-way mode frequency. A total of 538 measurements of ∆r could be made using these segments at 8.4 GHz (see table 2) with only 519 of these also being usable at 15 GHz.

Measuring ground tracking station offsets
PU and GB ground tracking stations are equipped with GPS receivers that allow their local clocks to be steered such that they remain within a maximum time offset relative to GPS time. As GPS time is itself steered to follow Terrestrial Time (TT), a time series of offsets between the local clock and GPS time (∆T ) allow the relative frequency offset of the local clock relative to TT, h e or station frequency offset, to be estimated using: where the derivative is taken with respect to TT, µ ⊕ is the standard gravitational parameter of Earth while H e and r g,e are the orthometric height and geoid radius at the station's location. The second term accounts for the station not being on the geoid where TT is defined. Prior to fitting, the time series is resampled into uniform hourly measurements. A portion of the resampled time series for GB is shown in figure 7a. Note the two dominant types of noise present: (1) white noise from the GPS receiver and (2) random run phase noise due to the random walk frequency noise of the GHM. These are superimposed on the systematic drift of the maser's offset which, to first-order, is linear in frequency and therefore quadratic in the time offset. However, when an H-maser is disturbed, sudden changes in drift are possible, as seen near day 20 in figure 7a. The first step in estimating the station frequency offset, h e , is to divide the time series into intervals over which the H-maser was undisturbed. This was done using operator logs from GB and manually by inspecting the PU time-series looking for discontinuities. Within each interval, a single quadratic fit of the time series would be appropriate if only white noise were present. However, due to the random run, each interval must be divided into sub-intervals which are as long as possible to minimize the effect of white noise, but over which the random run does not dominate. The optimal length of these sub-intervals was determined by measuring the white noise level and using GHM specifications for the frequency random walk noise level. Overlapping quadratic fits of the optimal length were done and the non-overlapping regions with the lowest uncertainty from each fit were used to produce a time-series of frequency offsets, a range of which are plotted in figure 7b. Finally, the uncertainties of the frequency offsets measured using this approach were determined using Monte Carlo simulation with randomly generated noise (see [27] for a general approach to generating clock noise) according to the determined noise levels. These uncertainties, σ he , are also plotted in figure 7b.

Measuring spacecraft frequency offset and ε
As the station frequency offsets, h e , can be measured independently using the method described in section 4.2, we consider it an observable along with the DCS residual shift, ∆r, and so modify equation (13) to the following:  Values of ∆r + h e plotted versus time appear in figure 8. The nearly linear drift of the spacecraft relative frequency offset, h s , is apparent. Discriminating h s from the effect of ε requires long intervals over which h s evolves linearly and over which the range of gravitational redshifts is as large as possible. While the data cover an impressive range of redshifts, ∆y grav = 1.6 × 10 −10 corresponding to a distance range of 320 000 km from 26 000 km to 346 000 km, a full 2/3 of points are near apogee within 20% of the maximum value of the gravitational redshift. This has the effect of strongly correlating ε with the initial value of the spacecraft frequency offset, h s , and requires that both be fit simultaneously.
In figure 8, we see the slope of h s changing suddenly on two occasions. We thus divide the data into three intervals (T1, T2, T3 arranged chronologically) with boundaries at days 1464.0 and 1875.5 where days are counted starting on 1 January 2012. Sensitivity to this choice is discussed in section 5.4. Within each interval, we assume a linear drift and so define the following model function: where the factor Π (t, i) is 1 if t lies within interval T i and 0 otherwise. The parameter vector, β, includes ε and the 'h s parameters', a i and b i , that account for the drift of the SHM frequency offset. Having three a i parameters instead of just one overall constant, allows for discontinuities at the boundary between intervals, which reduces the sensitivity of E ∆U c 2 , t, β to the precise choice of boundary times. In total, this

Weighted fit
The uncertainty of the measurements, σ, was used as a weighting factor and, based on equations (11) and (15), includes the statistical uncertainties σ r 1w , σ r 2w and σ he from the fits described in sections 4.1 and 4.2. The uncertainty in the expected DCS frequency shift, ∆Y , was ignored as it is either too small or not statistical in nature. This will be further discussed along with estimating the systematic error in section 5.4. Combining the uncertainties in quadrature gives the total uncertainty: Due to the presence of non-white noise, the statistical uncertainties could not be estimated from their corresponding LLS fits. Instead, they were estimated as confidence intervals using Monte Carlo simulations of colored noise generated using models matched to observed noise power (see S y in section 4.1). These models include four power-law noise components (WPM, FPM, WFM and FFM) and were fit to the mean S y from many sessions (see figure 10). Only single mode sessions, all 1-way or 2-way mode, were used to compute mean S y as their longer stretches of data are better suited to measuring non-stationary noises. The presence of additional noise at intermediate frequencies above 0.02 Hz in PU spectra after 7 May 2015 obscures the noise floor relevant at longer averaging times. Therefore, we used the FFM noise level fit to sessions between 24 February 2015 and 7 May 2015 and adjusted the WFM noise level to match that observed in the sessions coming after. As 2-way only sessions were not performed before additional noise appears in PU spectra, the 2-way noise level for PU could not be fit. Instead, FFM noise power was assumed to be twice that in 1-way, which is approximately the same ratio seen between 2-way and 1-way power at GB. Monte Carlo simulation showed that FFM and, to a lesser extent, WFM dominate the error. Care was taken to generate FFM noise with the appropriate characteristics using an ARFIMA (1, 0.5, 0) stochastic model following the approach of [28]. Using 1000 simulations per session, the uncertainties in the 1-way and 2-way residual shifts, σ r 1w and σ r 2w , were found to respectively contribute 74% and 26%, on average, to the total uncertainty while the contribution of the uncertainty in the station frequency offset, σ he , is negligible. The mean total uncertainties across all segments are σ 8.4 = 2.1×10 −13 and σ 15 = 2.5×10 −13 . Using the total uncertainties as weighting factors when fitting equation (16), the following values of ε were obtained: ε 8.4 = (2.1 ± 3.3 stat ) × 10 −4 and ε 15 = (0.7 ± 7.6 stat ) × 10 −4 . These results are very similar to those from the unweighted fits. The weighted fits have chi-square per degree of freedom of χ 2 ν,8.4 = 1.1 and χ 2 ν,15 = 4.4. The former suggests that the weighting factors determined using Monte Carlo methods account for nearly all the scatter in ∆r + h e at 8.4 GHz. In contrast, the larger χ 2 ν at 15 GHz, which is evident from RMS 15 being significantly larger than σ 15 , implies that the estimated uncertainty at 15 GHz is too low. This is not surprising given that we could not directly fit noise levels in this band and instead resorted to using the lower noise levels from 8.4 GHz. The statistical uncertainties of ε have been adjusted so they Figure 10: Mean power spectra of the residual shifts r 1w (red) and r 2w (blue) at 8.4 GHz computed using segments from interleaved sessions. Left is the mean of GB sessions and right is that of PU sessions between 7 May 2015 and 3 June 2016. Dotted lines are corresponding noise models. The mean spectra from 1-way only sessions are plotted in magenta. Some error bars are omitted for legibility. Note, the presence of additional noise above 0.02 Hz in PU spectra which is discussed in the text.
correspond to a χ 2 ν of unity.

Check on statistical error
We can compare σ stat ε to what is expected given the mean variation in gravitational redshift (∆y grav ∼ 2.6 × 10 −11 ), the RMS of the fit residuals, the number of points given in section 5.1 and the mean correlation between ε and the h s parameters (ρ 8.4 = −21%) as follows:σ Using this value, we find σ stat ε /σ ε = 1.07 at 8.4 GHz which is fairly close to unity. For 15 GHz, this check is not useful since noise levels could not be directly measured. Thus we conclude, at least at 8.4 GHz, that the statistical uncertainty is a reliable estimate.

Systematic error
We tested our analysis technique for a bias when measuring ε, particularly towards ε = 0. By assuming a non-zero violation in the presence of simulated noise we confirmed that our estimate of ε is unbiased. Further, as mentioned in section 4.3, these tests confirmed that h s cannot be measured independently from ε using the same data set without a possible violation being suppressed. This shows that our overall approach of fitting ε and h s simultaneously is necessary.
To determine the contributions to systematic uncertainty, we considered three effects. First, we studied the effect of the interference spikes that appear in 1-way mode spectra (see figure 6). A shift, ∆ε filt , resulted when passing the DCS residual shift through a 3 Hz Butterworth lowpass filter of order 8 to remove the spikes prior to fitting. As we could not ascertain which result is more likely to be correct, we conservatively include these shifts, ∆ε filt,8.4 = 3 × 10 −5 and ∆ε filt,15 = 4.2 × 10 −4 , in the error.
Second, the uncertainty of the specific boundaries between intervals T1, T2, and T3 (see figure 8) were studied. Alternate boundaries between T1 and T2 as well as between T2 and T3 were tried corresponding to where the 1σ confidence intervals of the fits on either side of the boundary meet, namely at days 1470 and 1890 respectively. The differences in the fit values, ∆ε bound,8.4 = 2 × 10 −5 and ∆ε bound,15 = 5 × 10 −5 , are also added to the error.
Third, the terms in equation (9) larger than the uncertainty in the DCS frequency shift are ∆U c 2 , the second order relativistic Doppler terms and the station frequency offset. As described in section 4.2, the uncertainty in the station frequency offset was estimated and is included in the weights and, therefore, the statistical error. For the other terms, the main sources of error are the uncertainty in the spacecraft state vectors (δr ∼ 200 m and δv ∼ 2 mm s −1 [16]), and the position of PU's reference point (δr < 10 m). The error introduced in the expected DCS frequency shift by these uncertainties does not exceed 1 × 10 −15 , even for the closest perigee session for which the error would be the largest.
Systematic thermal and magnetic effects on the GHMs contribute to the station frequency offset and thus are taken into account. Ground testing of the SHM showed a thermal sensitivity of ∆f /f = ±5 × 10 −15 / • C and magnetic field sensitivity of ∆f /f = ±2 × 10 −14 /G. During observing sessions, the thermal management system on board RA maintained the SHM temperature with an accuracy of 1 • C. The resulting random frequency shift is therefore expected to be much smaller than the estimated uncertainty in the residual shift, σ 8.4 , and can be neglected. Similarly, at the distances of RA's orbit, the Earth's magnetic field is sufficiently weak ( 0.1 G) that effects due to its variation are also negligible.

Sensitivity study
In table 4, we summarize our results using different subsets of the data with and without weighting. Measuring ε using only T2, the longest interval, results in a correlation of 72% between ε and the constant parameter in the frequency offset. By combining all three frequency offset intervals in a single fit, not only is the uncertainty in ε reduced, but so is the correlation falling by 6% in the case of T2 but by about 24% in the case of the other two intervals. The majority of sessions with numerous switches, performed nearer Earth, were recorded at PU. The addition of GB data makes essentially no difference at 8.4 GHz, indicating that the GB data are consistent with PU, but also that the lower noise levels at PU drive the result. The reverse is true at 15 GHz, where the  additional noise at PU is partially offset by the inclusion of GB data. All the results, including those from unweighted fits, are broadly consistent with each other within the uncertainties. Furthermore, we find in all cases ε is consistent with zero within 1.1 σ stat ε .

Final results
Combining our estimates for ε from the weighted fit and its uncertainty we arrive at the following results: ε 8.4 = (2.1 ± 3.3 stat ± 0.5 sys ) × 10 −4 and ε 15 = (0.7 ± 7.6 stat ± 4.3 sys ) × 10 −4 . For a final result, we considered combining the measurements from the two frequency bands. However, the noise in the two bands appears strongly correlated during GB sessions and at least partially correlated during many of the PU sessions. This implies that the measurements at 8.4 GHz and 15 GHz cannot be considered statistically independent. The result at 8.4 GHz is favored since its χ 2 ν being close to unity confirms that our weighting scheme derived from colored noise simulations is reliable. Using the 8.4 GHz result and combining its statistical and systematic uncertainties in quadrature, we arrive at a final estimate for the violation parameter:

Discussion
Tests of the EEP are considered an important, if not essential, probe of metric theories of gravity [29], with measuring the gravitational redshift being one of the classical tests of general relativity. Our measurements were made with the space VLBI RA mission which was not primarily designed for a gravitational redshift test. In particular, we were limited by the lack of simultaneous downlink signals in the 1-way and 2-way referencing modes and the limited observation time allocated to the experiment. The flicker noise floor of the online frequency measurements by the Doppler tracking equipment is an order of magnitude higher than that of GP-A and almost 35 times what was determined in the laboratory for the SHM prior to launch. Nevertheless, the mission allowed an accurate measurement of the gravitational redshift from near Earth to almost the distance of the Moon where it asymptotically approaches its maximum relative to Earth's surface (see bottom plots in figure 9).
In addition to the Doppler tracking measurements, time-domain recordings of the spacecraft's signal at 8.4 GHz were also made at PU and GB. These permit measuring the frequency evolution of the spacecraft's reference tone with improved offline processing techniques, such as those developed for spacecraft tracking by the Joint Institute for VLBI ERIC [30], which may allow the observed flicker noise floor to be overcome. Once frequency measurements have been made and their uncertainties estimated, the model and analysis described herein may be applied to determine ε with improved accuracy. Preliminary work on applying these offline techniques are discussed in [31] wherein it is estimated that σ ε ∼ 10 −5 may be attainable. Recordings of RA's signal were also made at other ground radio telescopes for which a partial DCS is possible. Including these in the final solution may allow statistical uncertainties to be further reduced.
For a future space VLBI mission in a highly eccentric orbit, we envision a setup allowing simultaneous recordings of 1-way and 2-way referenced signals in parallel to all downlinks of VLBI recordings. Over a three year period, a mission similar to RA would have ∼ 2500 sessions, a factor of 20 increase over the number used in this experiment. Simultaneous recordings would allow a session to be divided into 40 or more segments, a 10× increase over our average number of segments per session. Further, an orbit with a lower inclination or a tracking station in the southern hemisphere, would allow sessions much closer to perigee increasing the variation in the gravitational redshift by a factor of 10 or more. Taken together, such a mission could improve the sensitivity of measuring ε to ∼ 10 −7 .

Summary and conclusions
In this paper we have described a test of the EEP and measurement of the gravitational redshift including: (i) details on Doppler-tracking frequency measurements at the PU and GB stations with RA at distances ranging from 25 000 km to the distance of the Moon, (ii) the implementation of a DCS, similar to GP-A, achieved by alternating RA's communication system between different frequency referencing modes, (iii) the model required to predict relative frequency shifts as small as 10 −15 , (iv) measurements of the frequency offset of GB and PU H-masers relative to coordinate time with an accuracy exceeding 10 −14 throughout most of the experiment, (v) a method for measuring ε and the SHM frequency offset relative to TCG, (vi) using Monte Carlo simulation to determine the correct weighting of the data using GB and PU noise levels, where in both cases FFM noise was found to dominate,

Data availability statement
The data cannot be made publicly available upon publication due to legal restrictions preventing unrestricted public distribution. The data that support the findings of this study are available upon reasonable request from the authors.