AESTRA: Deep Learning for Precise Radial Velocity Estimation in the Presence of Stellar Activity

Stellar activity interferes with precise radial velocity measurements and limits our ability to detect and characterize planets, in particular Earth-like planets. We introduce AESTRA (Auto-Encoding STellar Radial-velocity and Activity), a deep-learning method for precise radial velocity measurements. It combines a spectrum autoencoder, which learns to create realistic models of the star’s rest-frame spectrum, and a radial-velocity estimator, which learns to identify true Doppler shifts in the presence of spurious shifts due to line-profile variations. Being self-supervised, AESTRA does not need “ground truth” radial velocities for training, making it applicable to exoplanet host stars for which the truth is unknown. In tests involving 1000 simulated spectra, AESTRA can detect planetary signals as low as 0.1 m s−1 even in the presence of 3 m s−1 of activity-induced noise and 0.3 m s−1 of photon noise per spectrum.


INTRODUCTION
The search for Earth-like planets in the habitable zones of Sun-like stars is one of the premier challenges of modern astronomy.Despite significant advancements in instrumentation and data analysis techniques, such as the achievement of ∼0.3 m s −1 precision in recent ESPRESSO observations of Proxima Centauri (Mascareño et al. 2020), the detection of Earth-like planets will require further advances.The next generation of Extremely Precise Radial Velocity (EPRV) instruments is designed to achieve an instrumental precision of ∼0.1 m s −1 (Blackman et al. 2020;Hall et al. 2018).However, the emergent spectrum of a star is inherently variable, leading to apparent variations in the star's radial velocity that can mimic or mask planetary signals.This "radial-velocity jitter" arises from magnetic activity, granulation, and acoustic oscillations, which have different characteristic amplitudes and timescales (Haywood et al. 2014;Mascareño et al. 2020;Milbourne et al. 2019).Given that even the quietest stars exhibit RV jitter of approximately 2 m s −1 (Brems et al. 2019), it is essential to mitigate the influence of stellar activity signals on EPRV measurements if we are to have any chance of detecting Earth-like exoplanets.
Numerous data analysis techniques have been proposed to determine true RVs from stellar spectra in the presence of activity.A common approach is to ex-tract RVs using traditional techniques, such as crosscorrelation with a spectral template, and then decorrelate the apparent RV time series with various activity indices that are derived independently from the spectra (Milbourne et al. 2019;Giguere et al. 2016).Another approach, sometimes combined with the previous approach, is to model the RV time series using Gaussian Process (GP) regression to represent the effects of activity (Haywood et al. 2014;Mascareño et al. 2020).There are also data-driven techniques, such as performing shift-invariant operations on the cross-correlation function (CCF) to isolate activity-induced signals (Collier Cameron et al. 2021), and performing full spectral modeling to address contamination due to variability in telluric absorption lines (Bedell et al. 2019).Supervised machine learning techniques have been applied to recognize spurious RV signals associated with changes in shape of the CCF (de Beurs et al. 2022).
On the observational side, there has been a growing trend toward more intensive and longer-term monitoring of stars.It is no longer unusual to see hundreds of RVs gathered for a single object, and we will probably begin seeing thousands of RVs in the coming decade (Crass et al. 2021).For example, the Terra Hunting Experiment will use a high-resolution echelle spectrograph and an automated 2.5 m telescope to measure RVs of a few dozen stars on every clear night for at least 10 years (Thompson et al. 2016;Hall et al. 2018).Such intensive monitoring may open up new possibilities for stellar activity mitigation by providing larger samples of activity-related perturbations.
In this paper, we present AESTRA (Auto-Encoding STellar Radial-velocity and Activity), a deep learning method for distinguishing between true Doppler shifts and activity-induced spectral perturbations, given a large sample of stellar spectra covering a variety of the star's activity states and Doppler shifts.The architecture of AESTRA combines a neural network for radialvelocity estimation, and a spectrum auto-encoder called spender (Melchior et al. 2023;Liang et al. 2023) for activity modeling.AESTRA is designed to disentangle pure Doppler shifts from apparent Doppler shifts due to spectral perturbations, without any prior knowledge of the star's spectrum or orbital motion.
The spectrum auto-encoder architecture spender was originally built for analyzing and representing galaxy spectra in the presence of differing cosmological redshifts.It features an attentive autoencoder that scans over the wavelength range and compresses the information in each spectrum into a small number of parameters, which are ideally independent of redshift (Liang et al. 2023), and a decoder capable of transforming the parameters back into an accurate reconstruction of the spectrum in the rest frame (i.e., without any redshift).Spender proved capable of compressing ∼500,000 galaxy spectra from the Sloan Digital Sky Survey (Ahumada et al. 2020) into a low-dimensional latent space while maintaining high fidelity in the reconstructed spectra (Melchior et al. 2023;Liang et al. 2023).Cosmological redshifts are many orders of magnitude larger than the Doppler shifts associated with exoplanets; for a Earthlike planet around a Sun-like star, z ∼ 10 −10 .Nevertheless, we wondered whether the ability of spender to identify and isolate intrinsic spectral features while disregarding Doppler shifts and other artifacts would be applicable to exoplanet detection.
The remainder of the paper is structured as follows.Section 2 describes the architecture of AESTRA and the loss functions that are used for training.Section 3 presents the application of AESTRA to simulated datasets.Finally, Section 4 summarizes our findings, discusses limitations and potential extensions of AESTRA, and discusses potential applications to solar spectroscopy and exoplanet surveys.

METHODS
Figure 1 illustrates the architecture of AESTRA.The input consists of a collection of hundreds or more of spectra of a single star, obtained over a range of different activity states and different phases of the orbital motion of any planets.The input is processed in parallel by a neutral network RV estimator and a spectrum autoencoder.The RV estimator learns the effect of a Doppler shift: a pure stretching of the wavelength scale, regardless of any other details of the spectrum.Training of the RV estimator network does not require a spectral template or line list, or indeed any prior knowledge about the star.Instead, we introduce artificial Doppler shifts into the input spectra, and optimize the RV estimator network to recover these shifts.Within the RV estimator are filters (convolutional kernels) that sweep across the residual spectrum.Ideally, during training, the parameters of these filters are tuned to recognize the patterns that result from Doppler shifts, while also learning to ignore photon noise.Thus, this part of AESTRA offers an alternative approach to traditional RV extraction methods involving template matching and cross-correlation functions.
This RV estimator, trained to recognize pure wavelength stretching, can accurately determine velocity differences between stellar spectra exhibiting the same activity state.However, when measuring RV differences between spectra of different activity states, the RV estimator is subject to biases due to spectral distortions that mimic Doppler shifts.In the absence of any "ground truth" about the star and its motion, we rely on a key assumption: the star's orbital motion is uncorrelated with its intrinsic variability during the observation period.Assuming this condition holds, we can mitigate the activity-induced RV bias by decorrelating the RV estimator output against activity indicators.These indicators should capture the star's activity state while remaining unaffected by Doppler shifts.
In this paper, we introduce an auto-encoder network to generate these activity indicators through spectrum modeling.The encoder compresses the information in each stellar spectrum into a small number of "latent parameters", and the decoder transforms a set of latent parameters into an accurate reconstruction of a rest-frame stellar spectrum.The architecture is designed to assign latent parameters based on spectral perturbations that are independent of the Doppler shift of the spectrum.In this respect, the latent parameters are generalizations of traditional activity indicators such as the bisector span of the CCF or the equivalent width of certain emission lines.This de-trending technique is explored in more depth in Section 2.3

Architecture
We begin with a collection of normalized onedimensional input spectra.Each spectrum is represented by a vector y obs ∈R L , giving the intensity at each . AESTRA combines a convolutional RV estimator, whose output is denoted v encode , and an attentive auto-encoder that represents activity-induced spectral features with a latent vector s.The decoder network expands s into a rest-frame spectrum yact, which is shifted into the observed frame to produce the reconstructed spectrum y ′ obs .The architecture is trained with a fidelity loss function that ensures y obs ≈ y ′ obs and an RV-consistency loss function that seeks to recover v offset from an artificially Doppler-shifted "augment" spectrum yaug.The combined loss function is designed to disentangle activity-induced spectral features and Doppler shift-induced features.
of L wavelengths λ obs .We do not model the Earth's barycentric motion, thereby implicitly assuming that the barycentric corrections have been applied without error.To prepare the input for AESTRA, we first establish a template spectrum, b obs ∈ R L that captures the star's average spectral features.This template is subtracted from each observed spectrum to create a residual spectrum r obs = (y obs − b obs ).The purpose of this step is to isolate the spectral time variations and reduce the dynamic range of the input.In our experiments we somewhat arbitrarily chose one of the simulated spectra with the smallest applied perturbations to be the template.The exact choice of the template does not matter as long as it is representative of the typical spectrum.The only effect of small changes to the template is to apply small time-independent offsets to all of the residual spectra, which should not interfere with training.
After template subtraction, the residual spectra are processed through two pipelines: the encoder-decoder network and the RV estimator network.
The goal of the encoder-decoder network is to summarize spectral perturbations due to stellar activity using a small number of parameters.The residual spectrum is compressed by a parameterized encoder f θ (•) into a latent vector s ∈ R S .Here, θ denotes the adjustable parameters in the encoder.s = f θ (r obs ). (1) Following the approach of Melchior et al. (2023), the encoder f θ (•) consists of three convolutional layers, an attention layer, and a three-layer Multi-Layer Perceptron (MLP).A more in-depth description of these machine learning components can be found in Smith & Geach (2023).The attention layer is integrated to help the encoder focus on significant spectral features amidst varying positions and noise.In practice, the attention layer works by dividing the extracted feature matrix into two sub-matrices.The first represents the identified features across different wavelengths, while the second indicates the relative significance of these features (commonly called "attention weights").By multiplying these matrices element wise and then summing over wavelengths, we derive "attended" features that emphasize the most relevant spectral information.
Figure 2. Visualization of the effect of perturbing the latent vector based on a representative spectrum to be described in Section 3. The black curve shows the activity spectrum yact generated from a particular choice of the latent vector s.
The colored lines show the resulting activity spectrum after perturbing each component of the latent vector: perturbing s1 (purple) appears to introduce line width changes, while perturbations in s2 (blue) and s3 (orange) produce features related to line asymmetries.
Conceptually, we may regard the components of the latent vector as generalizations of traditional activity indicators, such as line widths or bisector spans.We chose a dimension S = 3 for the latent space as a balance between model complexity, which favors large S, and interpretability, which favors small S.This relatively low-dimensional latent space simplifies the interpretation of the latent vector as activity indicators.As an illustration of the meaning of the latent vector, Figure 2 shows the effect of perturbing each component, one at a time, using the model to be described in Section 3.3.In this case, s 1 appears to be related mainly to line widths, and s 2 and s 3 to line asymmetries.
The decoder function g ϕ (•) with adjustable parameters ϕ is another three-layer MLP which uses s to generate an activity spectrum y act in the stellar intrinsic rest frame, represented as a vector of length L ′ with a slightly extended wavelength coverage (L ′ > L): (2) To encourage the decoder to focus on the activityinduced perturbations, we add a trainable rest-frame template spectrum b rest ∈ R L ′ to the decoded activity spectrum to produce the complete rest-frame model: The rest-frame template, b rest , represents a hypothetical spectrum with zero activity (y act = 0).The rest-frame template is distinct from the observed-frame template described earlier.One may consider b obs to be an initial guess and b rest as an optimized model for the baseline stellar spectrum.
The RV estimation network h ψ (•), visualized in Figure 1, is a modified version of the encoder network.A key difference between the encoder and the RV estimation network is that the encoder includes an attention layer (Vaswani et al. 2017) -which is designed to help the model identify the most important patterns of variability irrespective of wavelength shifts -and the the RV estimator lacks such a layer.
The RV estimator begins with two convolution layers that apply convolutional filters to the input data that are trained to respond maximally to Doppler shifts.After each convolution layer, we apply a Parametric Rectified Linear Unit (PReLU) operation (He et al. 2015), which introduces non-linearity to the network and helps to enhance the recognized features.Next, max pooling is applied, down-sampling the features by retaining only the dominant features from each segment of the residual spectrum.This leaves us with a 64 × 20 matrix representing 64 output channels and 20 wavelength segments.A softmax operation is applied in the wavelength direction.This operation helps to quantify the relative importance to different segments.Then, the resulting matrix is flattened into a vector with 1,280 elements.These elements are passed to a MLP with (128,64,32) nodes.The MLP learns to estimate Doppler shifts based on the feature vector summarized by the convolutional layers.The estimated RV is denoted v encode : (4) Lastly, the rest frame model spectrum y rest is Dopplershifted using the value of v encode determined by the RV estimator and interpolated to match the observed wavelengths λ obs ∈ R L using cubic spline interpolation.This step accounts for the motion of the star relative to the observer and enables a direct end-to-end comparison between the observed and model spectra.

Loss Function
To ensure high-quality spectral reconstruction, we employ the end-to-end fidelity loss function of Melchior et al. (2023).This function quantifies the agreement between the input and reconstructed spectra, averaged over a batch of N spectra each of dimension L: Here, y i,obs denotes the i-th input spectrum, y ′ i,obs is the reconstructed spectrum, w i is the corresponding weight vector.The operation ⊙ represents element-wise multiplication, and the squaring of the difference is also performed element wise.
To achieve consistent RV estimations, we introduce a novel loss term, L RV .To encourage the RV estimator to focus exclusively on the signatures caused by Dopplershifting, we exploit data augmentation, a technique used in machine learning to increase the amount and diversity of training data by applying various transformations or manipulations to the original dataset.In this case, we artificially apply a Doppler shift while preserving the spectral shape.Data augmentation allows us to generate very large samples for training the RV estimator: As shown in Eq. 6, we use a weighted mean squared error (MSE) loss to compare the predicted RV offset between the original and augmented spectra (v aug,i −v obs,i ) to the true injected offset (v offset,i ).During training, v offset,i is randomly drawn from a uniform distribution v offset ∼ U(−3, 3) m s −1 , and σ v represents the Cramer-Rao theoretical limiting uncertainty of a single RV measurement based only on photon noise (see, e.g.Bouchy et al. 2001;Beatty & Gaudi 2015).Because the RV estimator is trained on augmented spectra with artificial Doppler shifts, there is no need to split the data into training, validation, and test sets.During each iteration of training (commonly referred to as an "epoch" in the machine learning literature), a new batch of training data is generated on the fly, ensuring that the model is consistently exposed to new Doppler shifts.
Lastly, we introduce a regularization term to constrain the decoder's flexibility and prevent over-fitting.As both the activity spectrum and rest-frame baseline are trainable, adjusting b rest by adding a constant can be compensated by subtracting the same constant from y act .To resolve this degeneracy between y act and b rest , we define a Ridge regularization loss as follows: The adjustable hyper-parameters k reg and σ y determine the relative weight of the regularization loss.In practice, we set σ y = 0.1 to achieve a good balance between model flexibility and trainability.During optimization, we cyclically adjust the weight of regularization loss k reg , increasing from 0 to 1 over a cycle of 1000 iterations, and then setting k reg to 0 and restarting the cycle.We adopted a two-phase strategy to train AESTRA models.In the first phase, the RV estimator was trained independently until L RV approached 1, while leaving the encoder-decoder network and rest-frame baseline unchanged.In this stage, the RV estimator can accurately measure the velocity offset between spectra of the same activity state, but the RV estimates are still subject to activity-induced bias.In the second phase, we jointly optimized all components of AESTRA-the encoder-decoder network, the rest-frame baseline, and the RV estimator-using the combined loss function L total : In summary, L total is designed to optimize three key aspects during training: high reconstruction quality, consistent RV estimation, and shift-invariant encoding.The fidelity loss, L fid , ensures that the reconstructed spectra closely match the input spectra.The RV loss, L RV , promotes consistency in RV estimations by training the estimator to focus on signatures induced by stretching, using augmented data.Lastly, the regularization loss, L reg , constrains the decoder's flexibility and resolves the degeneracy between the activity spectrum and rest-frame baseline, improving training efficiency.
As an aside, we initially thought it would be necessary to add a consistency loss term to encourage shiftinvariant encoding, following Liang et al. (2023).This term aims to assign the same latent vector to spectra with identical activity-induced perturbations but varying Doppler shifts by minimizing the sigmoid-modulated difference between the latent positions of the original and corresponding augmented spectra: However, we found that this additional term was not necessary for the numerical experiments we have performed.To demonstrate, we used AESTRA trained without a consistency loss term to encode augmented spectra with artificial Doppler shifts.As shown in Figure 3, the latent distance between the original and corresponding augmented spectra (red) exhibit a much lower dispersion than the latent distances between randomly selected data pairs (blue), indicating that the latent parameters that AESTRA assigns to Doppler-shifted spectra depend only weakly on the Doppler shift.We note this fact here because although the consistency loss term was not necessary, it does not harm model performance and might be useful when training on real data, for which spectral perturbations are undoubtedly more complex.

De-trending RVs with Latent Vectors
The star's RV estimates should ideally be independent of the spectral perturbations.However, Figure 4 shows that the encoded RVs are mildly correlated with latent positions.Causal connections between activity and orbital motion are unlikely; although hot Jupiters and other close-orbiting planets may influence a star's rotation and photospheric activity through tidal or magnetic interactions, such effects are negligible for habitablezone exoplanets, the focus of this study.Nevertheless, even if causal correlations are absent, Doppler shifts and stellar perturbations may be correlated because of similarities in timescales -if, for example, the planet's orbital period is comparable to the star's rotation period or the period of a long-term magnetic activity cycle.To disentangle these effects, data should be collected in sufficient quantities and over all relevant timescales.
But we suspect another effect is responsible for the mild trend observed in Figure 4.The RV estimator is trained to determine accurate RV offsets from pure spectral stretching, leaving the RV zero-points unspecified.The encoded RVs may be affected by stellar activity and thus contain activity-dependent RV zero-points v 0 that can bias our RV estimates.However, we have a trained activity autoencoder, capable of producing high-quality reconstructions with varying stellar activity.We thus consider the latents s as generalized activity indices and now seek to determine the spurious velocity zero-points.We make the ansatz: Assuming that we have enough spectra to observe different true RVs at approximately fixed activity s, we can isolate a constant baseline Here, w ij denotes the weight of the encoded RV v encode,j for neighbor j, and σ R denotes the characteristic radius of the latent distribution.We calculate σ R as the average distance among the 10 nearest neighbors in the distribution.This calculation is mathematically equivalent to Gaussian smoothing in latent space.In theory, each point in the distribution should have a non-zero weight and be included in the calculation, but samples farther than 3-σ R exert negligible effects.In practice, the summation is carried out over the 5% nearest neighbors in latent space, with a minimum of 10 nearest samples.
To obtain our final estimate of the true Doppler RVs, we subtract the activity-dependent RV zero-points from the encoded RVs: The resulting corrected RVs, v correct,i , should be closer to the true Doppler RVs, v true,i .We expect the efficacy of this method to increase with larger sample sizes because that we can observe the star with at different Doppler shifts in nearly the same activity state.
3. APPLYING AESTRA TO SIMULATED DATA

Generating Synthetic Spectra from SOAP CCFs
To assess the performance of AESTRA, we simulated a large number of sythetic spectra spectra affected by magnetic activity.
First, we generated a "quiet" spectrum, y quiet , without any activity perturbations or Doppler shift.We used an evenly sampled wavelength grid ranging from 5000 Å to 5050 Å with a wavelength spacing of 0.025 Å1 , resulting in an observed wavelength vector λ obs of length L = 2000.
To produce realistic absorption lines, we randomly selected 70 line locations within the observed wavelength range.For each line, we determined a random width from a Gaussian distribution r ∼ N (µ = 0.2 Å, σ = 0.02 Å) and a random amplitude from a uniform distribution a ∼ U (0.01, 0.70).We then calculated the quiet spectrum as the product of all absorption lines: The spectral deformations are based on the SOAP 2.0 code (Spot Oscillation And Planet; Dumusque et al. ( 2014)).Since SOAP 2.0 only calculates the impact of spots and plages on the CCF rather than the entire spectrum, we needed to devise a method to perturb the synthetic spectrum using the CCFs.We define CCF quiet as the quiet CCF of a Sun-like star evaluated at velocity offsets ∆v CCF , and CCF active as the CCF affected by activity.We introduced spectral perturbations to the simulated spectrum by perturbing all the absorption lines using the same CCF active .At each epoch, CCF active was calculated by randomly placing four active regions on the star's surface.The activity type (spot/plage), size, and phase of each active region were randomly drawn from the priors provided in Table 1.For each realization of CCF active , we defined an activity function f act to measure the activity-induced perturbation as a function of velocity offset: Figure 5 shows a collection of CCF active /CCF quiet due to a single active region.To perturb a specific line, we aligned the activity function f act with the line center λ k and interpreted the wavelength offset as a velocity offset, given by ∆λ/λ k = ∆v CCF /c.By incorporating the individual line perturbations, we constructed the entire intrinsic spectrum: Figure 6.Simulated spectrum of a star affected by multiple active regions and photon noise (y obs , gray), its reconstruction (y ′ obs , red), and the noise-free underlying spectrum (black).In the top panel, y obs and y ′ obs are difficult to distinguish because they overlap nearly perfectly.The bottom panel displays the corresponding residual spectra, with the gray spectrum predominantly showing photon noise.

Parameter Distribution Prior
Activity type Binary P (Spot) = P (Plage) = 0.5 Size [R⋆] Gaussian For the recovery test, we Doppler-shifted the perturbed intrinsic spectrum according to the true RV and interpolated it onto the grid of observed wavelengths using a cubic spline function.To simulate photon noise, we added independent Gaussian random numbers to the spectrum: The true planetary RV signal at time t was taken to be v true (t) = K sin (2πt/P ), where K is the velocity semiamplitude and P is the orbital period.For the following tests, we adjusted the photon-noise level of the simulated spectra to correspond to a Cramer-Rao theoretical limit of 0.3 m s −1 on the precision of a single RV measurement.The value of 0.3 m s −1 was chosen to be comparable to the anticipated capabilities of frontier EPRV instruments.

Defining RV Estimates for Model Assessment
To evaluate the performance of the AESTRA model, we track four radial velocity estimates: Apparent RV (v apparent ): The apparent RV is computed by finding the Doppler shift that brings the quiet baseline spectrum and the observed spectrum into the best possible agreement (minimum χ 2 ).Affected by stellar activity, this metric provides insight into the RV noise level due to line distortions.
De-trended RV using traditional activity indicators (v traditional ): This quantity represents the traditional technique for activity mitigation whereby the apparent RVs are corrected to control for the observed variations in activity indicators that are derived from the spectra.We use a quiet spectrum (y quiet ) as a template to extract the CCF of the observed, activityperturbed spectrum (y obs ).Three activity indicators are measured: the bisector span, the full-width at halfmaximum of the CCF (FWHM), and the depth of the CCF.Following Dumusque et al. (2014), the bisector span is defined as the difference between the top (10%−40% depth) and bottom (60%−90% depth) bi-sectors.Multilinear regression is performed between the time series of v apparent and the time series of the three activity indicators, and the results are used to subtract the estimated contributions to the RV variations due to changes in the activity indicators.
Corrected RV (v correct ): This quantity is derived from the AESTRA model by de-trending the output of the RV estimator (v encode ) against the components of the latent vector, i.e., the generalized activity indicators (see Section 2.3 for details).These values can be used to assess the model's performance in isolating true Doppler RVs from activity-induced RV offsets.
Reference RV (v ref ): Serving as a benchmark, the reference RV is calculated by fitting the noise-free underlying rest-frame spectrum (y intrinsic ) to the observed spectrum (y obs ).The scatter in the reference RVs around the true RVs is slightly higher than the Cramer-Rao bound of 0.30 m s −1 , probably due to interpolation imperfections.This serves as a reference for the best possible RV precision in the absence of stellar activity.
For the i-th input spectrum, let v i denote the RV measurement and v true,i denote the true planetary RV.We define RV scatter as the standard deviation of the residual RV To assess the model's performance, we compare v correct to both v traditional and v ref .
An effective model should yield values for v correct that approach the benchmark values of v ref , and an advantageous model should reduce the RV scatter below the scatter in v traditional .

Case I: Time-Independent Activity
Our overall goal was to see if AESTRA could recover 0.10 m s −1 Earth-like signals in the presence of RV jitter that is approximately 30 times larger in amplitude.First, we considered spectral perturbations that are randomly and independently assigned to each spectrum.We generated 1,000 synthetic spectra using SOAP CCFs, with each spectrum affected by four active regions randomly drawn from the priors in Table 1, without any temporal structure imposed.The apparent RVs, derived by fitting a quiet baseline to the simulated spectra, are predominantly influenced by random stellar activity.
We trained an AESTRA model using a two-phase strategy as described in Section 2.2.2 Table 2 summarizes the training results, and Figure 6 illustrates an example spectrum and its reconstruction.The AESTRA model achieved high reconstruction quality and RV consistency, with L fid ≈ 1 and L RV < 1.The fact that the self-reported RV precision was better than the photon noise limit suggests that the RV estimator performed well and was not the limiting factor in the RV determinations.Instead, the achievable RV precision was mainly limited by the decorrelation process.Table 3 summarizes the performance of different methods for extracting RVs from the simulated dataset of 1,000 spectra.The method involving traditional activity indicators managed to reduce the apparent RV scatter from 3.38 m s −1 to 0.98 m s −1 .The AESTRA model achieved a further reduction of the RV scatter to 0.46 m s −1 , closely approaching the RV benchmark for this dataset.Figure 8 presents Lomb-Scargle periodograms of the RVs de-trended by traditional activity indicators (v traditional ) and those corrected by AESTRA (v correct ).While the traditional analysis yields a peak at the true period, there are also spurious peaks with similar amplitudes.The periodogram of the AESTRAcorrected RVs shows a more prominent peak, allowing for a clearer detection of 0.10 m s −1 planetary signal.
To further evaluate the model's performance in recovering planetary signals, we performed Markov Chain Monte Carlo (MCMC) sampling using the emcee package (Foreman-Mackey et al. 2013) to retrieve the orbital parameters from the corrected RV v correct .Currently, we set the uncertainty for model RVs to be the same as the RV scatter (∆v correct ) std , as obtaining principled uncertainties directly from AESTRA remains a task for the future.We intend to explore alternative approaches, such as noise propagation through the network or leastsquares fit error, in the future development of AESTRA.
Assuming circular orbits for simplicity, we assigned uniform priors for the orbital period between 10 and 200 days, semi-amplitude between 0.0 and 10.0 m s −1 , and phase between −180 and 180 degrees.We initialized 32 walkers near the maximum a posteriori (MAP) solution and ran the MCMC sampler for 10,000 steps with a 2,000-step burn-in.As shown in Figure 9, the RVs measured by AESTRA constrained the orbital period to be 100.0± 0.27 days and the signal amplitude to be 0.12 ± 0.01 m s −1 .The full parameter constraints are summarized in Table 3.
To assess AESTRA's performance in recovering planetary signals with different amplitudes, we created new datasets using the same activity perturbations but varying injected signal amplitudes (K values).The model, trained on the original dataset with 0.1 m s −1 planetary signals, was applied to recover the signal amplitudes in the new datasets.Figure 10 shows that the model accurately recovered weaker signals and slightly underestimated amplitudes > 1 m s −1 .This probably occurs because planetary RVs are treated as noise during detrending, causing stronger signals to introduce more uncertainty in activity-dependent RVs.Deep learning methods generally require large datasets, making it important to understand how AESTRA performance scales with sample size.We generated a smaller dataset of 100 spectra affected by random stellar activity and injected a planetary signal of 0.30 m/s, comparable to the RV scatter caused by pure photon noise.The apparent RV scatter in the N spec = 100 dataset was 3.27 m/s.We followed the same procedure to train the AESTRA model.The training performance and resulting RV estimates are detailed in Table 2 and  Table 3.
For the 100-spectrum simulated dataset, the best-fit AESTRA model reduced the RV scatter from 3.27 m s −1 to 0.73 m s −1 , lower than the scatter of 0.96 m s −1 achieved by the method involving traditional activity indicators.While the improvement was not as good as it was in the 1,000-spectrum dataset, the AESTRA-corrected RVs still allowed for a 5-σ detection of the injected planetary signal, and gave a correct estimate of the RV semiamplitude, K = 0.36 +0.08 −0.07 m s −1 .We believe that for small sample sizes, the primary source of RV uncertainty in v correct arises from the decorrelation process, which relies on Gaussian smoothing in the latent space to estimate the activity-dependent RV zero-points.If the latent space is underpopulated, i.e., if a given type of stellar perturbation is not observed at different Doppler shifts, it becomes challenging to calibrate the RV zero-points.This highlights the importance of capturing potential stellar activity perturbations through repeated observations.Lomb-Scargle Periodograms illustrating the impact of complex spot evolution on apparent RVs (top), RVs de-trended using traditional activity indicators (middle), and the AESTRA-corrected RVs (bottom).The true planetary period (red) is detected by both methods.

Case II: Time-Dependent Activity
Next, we performed simulations in which stellar activity exhibited realistic temporal correlations.We generated a dataset of 200 synthetic spectra affected by evolving starspots using the SOAP 2.0 code, incorporating the time-structured nature of starspot growth, decay, and rotation, as detailed below.A planetary signal with amplitude 0.30 m s −1 was injected into the dataset for recovery testing.
As before, we placed four spots on the stellar surface, with new spots generated at random locations as old spots decayed.Each spot's size evolved over time, following a Gaussian function: where S(t) represents the spot's size at time t, S max is the maximum spot size, chosen randomly between 0.5% and 20% stellar radius, t peak is the time when the spot reaches its maximum size, and τ is the spot's lifetime, determined by τ ∝ S 2 max , with τ max ∼ P rot for the largest spots.The spots' longitudes and latitudes were randomly chosen based on the priors described in Table 1.The growth and decay of starspots results in distinct time structures in the apparent RVs, as shown in the top panel of Figure 11.
Training the AESTRA model on this dataset led to a reduction in the apparent RV scatter from 3.51 m s −1 to 0.44 m s −1 , as presented in Table 3.The AESTRA-based RV scatter of 0.44 m s −1 is about one-half of the scatter obtained with the traditional method, and leads to a solid detection of the 0.3 m s −1 planetary signals in the phase-folded RVs shown in Figure 11.
Both the AESTRA model and the traditional analysis identified significant peaks at the true planetary period in their Lomb-Scargle periodograms (Figure 12).However, the traditional analysis gave K = 0.52 ± 0.09 m s −1 , an overestimate of the true RV semiamplitude.The AESTRA model gave a correct estimate, K = 0.29 ± 0.03 m s −1 .
As summarized in Table 3, when the AESTRA RVs were fitted with a circular-orbit model using an MCMC method, the posteriors for the orbital parameters overlapped with the true values, despite the presence of complex and time-structured stellar activity.These results also underscore the importance of good temporal coverage for robust RV estimation.

CONCLUSIONS AND OUTLOOK
AESTRA is a deep learning method designed to separate Doppler shifts from spectral perturbations due to stellar activity.It operates on a collection of spectra of the same star rather than a single spectrum or a crosscorrelation function.The autoencoder network summarizes spectral perturbations using a small number of latent parameters, yielding a compact representation that can be utilized as generalized activity indices.The RV estimator network is jointly trained with the autoencoder to determine RV offsets.Training is performed using a combined loss function that balances the various objectives, resulting in a model that provides rest-frame spectral reconstruction and RV estimation.
In our simulations, AESTRA successfully detected planetary signals with amplitudes as low as 0.10 m s −1 despite activity-induced perturbations with amplitudes of 3 m s −1 and photon noise of 0.30 m s −1 .Additionally, AESTRA showed encouraging performance even on datasets consisting of as few as 100 spectra.The model recovered planetary signals in a series of simulated spectra dominated by time-dependent stellar activity, even without utilizing time-domain information.
While AESTRA has shown promising results on synthetic spectra, there is no guarantee that it will be effective on real data.We plan to apply AESTRA to analyze solar spectra obtained with instruments such as HARPS-N (Dumusque et al. 2015), NEID (Lin et al. 2022), and EXPRES (Petersburg et al. 2020).This validation step will assess the model's real-world applicabil-ity and will undoubtedly reveal limitations or biases that are not apparent in synthetic data.This may provide valuable insights into solar variability across different time scales and set the stage for applications to other stars, with and without known planets.
AESTRA is at an early stage of development.We intend to explore some potential extensions to improve the model's performance.Currently, AESTRA treats each spectrum independently, not considering the times of observation.In our tests, AESTRA performed well in the presence of stellar activity perturbations that were correlated in time (Case II), but we expect that it would be even better if AESTRA took time-domain information into account.The model could learn the temporal behavior of long-lived starspots, differential rotation, and acoustic oscillations, which exhibit distinct timescales.Future iterations of the AESTRA model could benefit from integrating a continuous latent space time-series model and applying physically-motivated constraints on the timescales associated with different types of activity.
More broadly, the current version of AESTRA is a general model that does not incorporate any physical knowledge about stellar atmospheres or even any built-in concept of a spectral absorption line.It seems likely that integrating known physical processes and relationships that affect stellar spectra into AESTRA will help it cope with more challenging and realistic datasets.For example, future iterations could make use of models for telluric absorption lines and for variations in the pointspread-function of the spectrograph.
If AESTRA can be successfully validated using solar data, perhaps after incorporating some of the improvements described above, the stage will be set for applications to exoplanet data.We will explore AESTRA's potential in improving exoplanet characterization, analyzing low signal-to-noise candidates, and searching for Earth-like planets.As we explore AESTRA's potential in our future work, the code is made publicly available on GitHub (https://github.com/yanliang-astro/aestra),including a frozen version tagged "paper-I" for reproducing the results of this work.

Figure 3 .
Figure 3. Latent distance distribution for randomly selected data pairs (∆s rand , blue), and for data-augment pairs (∆saug, red), in a dataset of 1,000 simulated spectra influenced by random activity.Augmented spectra are obtained by artificially Doppler-shifting the corresponding original spectra.For a Doppler-shift invariant encoder, the expected value of the augmented latent distances is zero: ⟨∆saug⟩ = 0.

Figure 4 .
Figure 4. Distribution of 1,000 simulated spectra in latent space, color-coded according to v encode .A subtle color gradient from the center to the outskirts indicates a weak correlation between encoded RVs and latent position.Given that latent position can be interpreted as activity indicators, this suggests the encoded RVs are largely activity independent.As a final correction to improve the RV extraction, we perform Gaussian smoothing on v encode in latent space, allowing for the extraction and subtraction of the trend.

Figure 5 .
Figure 5. CCF ratios (CCFactive/CCFquiet) versus velocity offset (∆vCCF) for a solar-type star with a single active region, calculated by the SOAP 2.0 code.The color of each curve indicates the brightness variations due to the darkening effect of the active region at different phases of stellar rotation.The left panels depict the effect of spots with varying sizes and latitudes, and the right panels show the impact of plages.The gray shaded area represents the approximate photon noise level corresponding to a single-spectrum RV precision of 0.3 m s −1 .

Figure 7 .
Figure 7. Top: Time series of apparent RV (black, left yaxis) and the fraction of active area on the stellar surface (orange, right y-axis), dominated by random stellar activity.Middle: The corrected model RVs (black) are compared to the 0.1 m s −1 true planetary signal (red).Bottom: The phase-folded and re-binned model RVs (black) are in good agreement with the true planetary signals (red).

Figure 8 .
Figure 8. Top: Lomb-Scargle periodogram of the apparent RVs (vapparent, without any correction) affected by four random active regions (spots and plages).Middle: De-trended RVs using traditional indicators (v traditional ) reveal a subtle peak at the true period (red dashed line).Bottom: AESTRA corrected RVs (vcorrect) show a prominent peak at the correct period.

Figure 9 .
Figure 9. Corner plot illustrating the posterior distribution of the orbital parameters for 1,000 synthetic spectra affected by random activity, with truth values indicated by blue lines.

Figure 10 .
Figure 10.Recovered versus true planetary signal amplitudes.The model accurately recovers small amplitude planetary signals but tends to underestimate amplitudes >1 m s −1 .

Figure 11 .
Figure 11.Top panel: Time-series of apparent RVs (black, left y-axis) and the percentage of the stellar surface covered by active regions (orange, right y-axis).Middle panel: Corrected model RVs (black) compared to injected Doppler RVs (red).Bottom panel: Phase-folded, re-binned corrected model RVs (black error bars), showing good agreement with the true RVs (red).
Figure 12.Lomb-Scargle Periodograms illustrating the impact of complex spot evolution on apparent RVs (top), RVs de-trended using traditional activity indicators (middle), and the AESTRA-corrected RVs (bottom).The true planetary period (red) is detected by both methods.

Table 2 .
Model performance on simulated datasets with random stellar activity (Case I) and starspot evolution (Case II).Lreg is evaluated assuming kreg = 1.