Empirically Constraining the Spectra of Stellar Surface Features Using Time-resolved Spectroscopy

Transmission spectroscopy is currently the technique best suited to study a wide range of planetary atmospheres, leveraging the filtering of a star’s light by a planet’s atmosphere rather than its own emission. However, as both a planet and its star contribute to the information encoded in a transmission spectrum, an accurate accounting of the stellar contribution is pivotal to enabling robust atmospheric studies. As current stellar models lack the required fidelity for such accounting, we investigate here the capability of time-resolved spectroscopy to yield high-fidelity, empirical constraints on the emission spectra of stellar surface heterogeneities (i.e., spots and faculae). Using TRAPPIST-1 as a test case, we simulate time-resolved JWST/NIRISS spectra and demonstrate that with a blind approach incorporating no physical priors, it is possible to constrain the photospheric spectrum to ≤0.5% and the spectra of stellar heterogeneities to within ≲10%, a precision that enables photon-limited (rather than model-limited) science. Now confident that time-resolved spectroscopy can propel the field in an era of robust high-precision transmission spectroscopy, we introduce a list of areas for future exploration to harness its full potential, including wavelength dependency of limb darkening and hybrid priors from stellar models as a means to further break the degeneracy between the position, size, and spectra of heterogeneities.


INTRODUCTION
Transmission spectroscopy was the first technique introduced to study the atmospheres of worlds beyond the solar system (Seager & Sasselov 2000;Brown et al. 2001).Today, it is still the most informative technique for atmospheric studies, as it leverages the light coming from a host star rather than the light directly emitted by the planet itself, which is orders of magnitude fainter.Yet, despite this benefit and subsequent successes achieved, a great deal of work remains in order to make full use of the rich data sets provided by current and upcoming facilities, such as JWST (Gardner et al. 2023).
Corresponding author: David Berardo berardo@mit.edu* 51 Pegasi b Fellow Currently, the dominant bottlenecks for transmission spectroscopy are associated with imperfections in our opacity models (Niraula et al. 2022) and stellar contamination (Rackham & de Wit 2023).The current limitations in opacity models result in an accuracy wall preventing constraints on most atmospheric properties beyond ∼0.5 dex for all planets but large, hot, and highlymetallic ones (Niraula et al. 2023).Future efforts supporting the standardization of databases, and the improvement of treatments of broadening and far-wings of line profiles, should mitigate this bottleneck.
Regarding stellar models, Iyer & Line (2020) showed that not accounting for stellar contamination will yield biased inferences of atmospheric properties.More recently, JWST measurements revealed structures in various transmission spectra that predominantly originate from the host stars (e.g., Lim et al. 2023;Moran et al. 2023).Disentangling between the stellar and planetary We show the deviation from median flux as a function of time, to illustrate the variations due to surface features.contribution (i.e., correcting for stellar contamination) is challenging, mostly due to model limitations (i.e., lack of fidelity), which can yield a biased contamination correction (Rackham et al. 2023).The lack of fidelity can also result in challenges in inferring the number of components present on the stellar disk (Wakeford et al. 2019;Garcia et al. 2022).Fortunately, when stellar models with sufficient fidelity are accessible, the degeneracy between the number of components and their covering fractions can be lifted, leading to an optimal correction of the stellar contamination (Rackham & de Wit 2023).
Sufficient fidelity is defined here as follows: with a precision superior or equal to the expected uncertainty associated with the out-of-transit spectra obtained for transit observations in the targeted system.This definition therefore supports returning to a regime of photonlimited studies-where instruments are used at their maximum potential.The guidance of the report from NASA's Exoplanet Exploration Program Study Analysis Group 21 (Rackham et al. 2023) highlights the need for a new generation of stellar models to better understand and model the spectral properties of stellar surface heterogeneities, which in this work we consider to be spots and faculae.Until such models can provide sufficient fidelity, the only other avenue is to empirically derive the emission spectra of a star's heterogeneities.Doing so would provide the community with a datadriven solution to the stellar-model challenge, as well as benchmarks for the next generation of stellar models.
In this letter, we present a framework that leverages time-resolved spectroscopic observations taken across a stellar rotation cycle to empirically constrain the emis-sion spectra of surface features, such as star spots and faculae, based on their time-varying contributions to the hemisphere-integrated stellar spectrum.We focus our injection-retrieval test on M-dwarf stars with properties similar to those of TRAPPIST-1 (T eff = 2566 K; Agol et al. 2021), for which stellar contamination can be more than 10× larger than atmospheric signals (e.g., Rackham et al. 2018) and the most challenging to correct (e.g., Zhang et al. 2018;Wakeford et al. 2019;Garcia et al. 2022;Lim et al. 2023).We present in Section 2 the forward model developed to generate the synthetic, multi-wavelength observations of a heterogeneous stellar surface.In Section 3, we present the retrieval framework used to assess the extent to which the properties of individual heterogeneities (size, positions, and emission spectra) can be constrained based on a synthetic rotational light-curve.In Section 4, we present a series of performance tests of the technique to guide its future usage, as well outline observing strategies amenable to empirical spectral retrieval.Finally in Section 5, we consider current caveats and highlight future steps to improve and expand upon this initial framework.

GENERATING SYNTHETIC DATA
In this section we present the forward model used to generate synthetic time-and wavelength-dependent observations of a heterogeneous stellar surface, which uses a grid-based stellar surface framework on which we add circular heterogeneities (each described by a latitude, longitude, radius, and temperature) representing circular spots and faculae (Montalto et al. 2014).This is forward model is built upon the model used in Günther et al. (2022) to simulate the interactions of a heterogeneous star with a debris disk.A graphical depiction is shown in Figure 1.

Spectral Model
We simulate observations for the NIRISS (Near-Infrared Imager and Slitless Spectrograph) instrument (Albert et al. 2023) on JWST (Gardner et al. 2023), which provides an ideal compromise between resolving power (R=700) and spectral coverage (1-2.5 µm), considering the spectral energy distribution (SED) of stars.We use the PHOENIX stellar spectral model grid1 to simulate the emission of an individual surface feature considering their respective viewing angle µ (Husser et al. 2013).These grids provide adequate wavelength, temperature, surface gravity, and metallicity coverage to describe the photospheric background of an M dwarf, as well as surface heterogeneities.For this synthetic forward model, we include a quadratic limb-darkening profile with c 1 = 0.26 and c 2 = 0.4, which are obtained using the ExoCTK Limb Darkening Calculator2 , using the parameters of TRAPPIST-1 in the NIRISS bandpass.
For the photosphere we use a spectral model with a temperature of 2500 K, a log g surface gravity of 5.0, and an [Fe/H] metallicity of 0 (similar to TRAPPIST-1, which has a surface temperature of 2566 ± 26 K, a log g of 5.2396 ± 0.006 (Agol et al. 2021) and an [Fe/H] metallicity of 0.05 ± 0.08 (Ducrot et al. 2020)).For heterogeneities, we use spectra corresponding to 2300 K and 2700 K (varying ±200 K relative to the photosphere, based on Figure 1 of Herbst et al. (2021)) as well as nonphysical emission spectra to highlight that our framework can retrieve emission spectra without any a-priori constraints (Figure 2).Based on previous NIRISS observations of the TRAPPIST-1 system, we generate Gaussian noise with a scatter of 1.8% per pixel, based on simulated NIRISS observations made using the Pandexo tool Batalha et al. (2017) and observations presented in (Lim et al. 2023), for an integration time of two minutes.

Spatial Model
The stellar surface is treated as a rectangular grid running along the longitude and latitude of the surface, having twice as many grid elements in the latitudinal direction.The number of cells (i.e., the resolution) is a free parameter in the model.With these parameters, the projected surface area differs from the true value of π by only 0.02% (i.e., 20 ppm), well below the injected uncertainty.In this work we consider circular spots and faculae (Montalto et al. 2014) as our test models.

RETRIEVAL FRAMEWORK
The goal of this work is to demonstrate the ability to characterize arbitrary heterogeneities of a stellar surface and their contribution to the overall stellar spectrum without relying on physical spectral models, which currently cannot provide a sufficient level of accuracy and thus leads to biased inferences and corrections.The contribution function of a heterogeneity is non-linear, due to both projection onto the observing plane as well limbdarkening effects, and thus when retrieving these geometric properties of a surface feature we employ standard Markov chain Monte Carlo (MCMC) methods in order to sample the full range of parameter space.
The observed flux at a given time and wavelength can be expressed as a linear combination of the stellar photosphere and the heterogeneity spectra (scaled by their relative projected surface area): where Λ phot (λ) is the (constant in time) spectral signal of the photosphere (including limb-darkening effects), Λ i (λ) is the spectrum of the i th heterogeneity, and S i (t) is the time-varying geometric projection of a heterogeneity, which is a function of its size and position on the stellar surface, as well as any limb-darkening effects.The sum runs over the number of individual heterogeneity features.A graphical depiction of this decomposition is shown in Figure 1.
With the resolving power now accessible, the inverse problem yielding the properties of stellar heterogeneities is well posed.Each stellar surface feature will have 3 spatial properties (longitude, latitude, size) and n Λ spectral bins for its emission spectra.For N distinct features, the total variables associated with this inverse problem is thus (3 + n Λ ) × N .In comparison, the data sets being considered have n T × n Λ data points, where n T is the number of exposures.NIRISS observations allow for n Λ = O(1, 000) spectral bins, thus as n T >> N in order to build up the signal-to-noise ratio and sample different phases, the problem entails at least two orders of magnitude more data points than unknown variables, i.e., the problem is well-posed.
Part of the problem is linear (for a given set of positions) and thus singular value decomposition (SVD) can be used, leveraging rapid and robust libraries available in Python to estimate the spectral signal (Λ i ) of each feature in just a few milliseconds.Here, we thus separate the problem into a non-linear MCMC retrieval (the geometric properties of the heterogeneity) and linear retrieval (the spectral signal of the photosphere and individual heterogeneities).In practice, the geometric properties of the surface features are primarily recorded in the white light curve (see "contribution function" in Figure 2, and Walkowicz & Basri 2013;Luo et al. 2019).Thus, light-curve inversion techniques can provide adequate first guesses for the positions, sizes, and numbers of heterogeneities.

PERFORMANCE TESTS
Given the forward model used to simulate observations described in Section 2, and the retrieval framework described in Section 3, we now describe a series of injection-retrieval tests we use to assess the ability of the retrieval framework to recover stellar surface heterogeneities when observing an entire rotation of a test star with a three day rotation period using two-minute sampling.

Fitting for Spectral Components
In order to test the effectiveness of the fitting framework in retrieving spectral features of a star, we first perform a series of injection-retrieval tests in an idealized scenario in which we first assume to know the number of heterogeneities, as well as their positions and sizes, attempting to retrieve only the spectral features of heterogeneities and photosphere (the linear part of the retrieval), which represents a best-case scenario and effectively acts as an upper limit on the capability of the framework.This can similarly represent a scenario where strong priors have been obtained for the spectral components, based on an analysis of a white lightcurve or a pre-fitting routine which places constraints on the possible heterogeneity configurations.
We tested the framework on a suite of stellar surfaces which are described in Table A1.The results of these tests reveal that the framework is able to recover the spectra of heterogeneities to sufficient precisions (i.e., better than the OOT (out-of-transit) spectrumsee Section 2).For example, the precision achieved on the photospheric spectrum in this idealized scenario of knowing the spot positions is ≤ 0.1%.We considered observing conditions similar to those of Lim et al. (2023) for TRAPPIST-1, and estimated OOT uncertainties of ∼0.5% using the Pandexo tool (Batalha et al. 2017)typically based on a ∼2 hr integration.The spectra of heterogeneities are on average constrained to ≲ 10%, depending on their sizes and latitudinal position, and more significant deviations are typically due to difficulties in localizing spots, which can be improved by including additional physical information, as discussed in Section 5. We show the results of an example fit in Figure 2, comparing the individual retrieved component spectra to the spectra used to generate the synthetic observations.In addition to spots with physically motivated spectra (spots 1 & 2 in Figure 2), we also included in that model a completely synthetic and non-physical spectrum (spot 3).We do this to illustrate that our model is able to recover any inherent signal present in the data, without biases from what is assumed to be physical or nonphysical.This allows flexibility with respect to inaccuracies that may be present in spectral models used to fit observations.
The spectra of heterogeneities are less constrained due to their smaller covering fractions and thus fewer photons.Their small covering fractions also mean that while the uncertainties associated with their spectra are larger, they contribute less to the total uncertainty budget for the stellar model than the photosphere.For this reason, we will assess retrieval fidelity based on the ratio of the uncertainty associated with the retrieved photospheric spectrum and the one associated with the outof-transit spectrum.

Varying Observation Baseline
In the previous sections, the retrieval was performed using simulated observations covering an entire rotation period of the host star.However, in most cases a strong argument must be made to justify the use of high-demand facilities to continuously stare at a single target.In this section, we investigate the ability of the framework to accurately measure the photospheric spectrum of a star when observing a continuous snapshot of the full rotation lightcurve for less than one full orbit.Given that a surface feature is only visible for half of the stellar rotation period, there exists a strong correlation between the duration of an observation, the phase of the stellar rotation being observed, and the retrieved uncertainty on the stellar photosphere.
We run a similar retrieval as in the previous section, using the same time sampling of two minutes, having also chosen: (1) a phase offset for the longitudinal rotation of the star relative to the observer, and (2) the length of a continuous observation or 'stare', defined as a fraction from 0-1 of the stellar rotation period.Selecting a value of one for the viewing fraction represents the analysis done in the previous section, for which the entire stellar rotation was supplied to the fitting routine.These two values define a time series, for which we generate the geometric flux signals attributed to each heterogeneity on the stellar surface and then retrieve their component spectra, the results of which are shown in Figure 3.
The various curves represent different observation durations.For a given observation duration, the residual signal varies strongly as a function of stellar rotation For this example, we purposely use a non-physical emission spectrum for spot #3 to emphasize that the framework reliably retrieves the emission spectra empirically.
phase.This is more pronounced for the shorter durations.For example, the residual for an observation covering 0.1 of the stellar rotation can vary from approximately 1% to over 100%.For this reason, we find that only a phase coverage of ≥ 90% can reliably constrain the stellar spectra to within the OOT uncertainty (0.5%).Indeed, while the targeted precision of 0.5% may be achieved for some configurations with only a 40% phase coverage, it is not achieved for all (average precision ∼1%).

Series of Snapshots for Slow Rotators
Covering 90% of a stellar rotation of TRAPPIST-1 would correspond to a ∼72-hr stare at the system based on its ∼3.3-day period (Luger et al. 2017;Vida et al. 2017), which is both feasible and reasonable for such a high-priority target.Doing so for slow rotators that may have periods up to 30 times that of TRAPPIST-1, however, would be impractical.For such hosts, we show that a series of small stares ("snapshots") could be used instead (see Figure 4).In order to reach the targeted precision, we find that snapshots need a minimum duration equal to the intended OOT integration and suffi-Figure 3: Minimum phase coverage of observations to leverage time-resolved spectroscopy.The Y-axis represents the median error on the photosphere during a continuous observation of a star with three spot features.The x-axis shows the starting phase of an observation, while different colors correspond to different durations of a continuous observation (relative to the full rotation period).I.e.The orange line illustrates the median photospheric error when observing for 20% of a stellar rotation cycle, for an observation beginning at all possible times during a rotation period.The dashed line shows the targeted precision, chosen as the out-oftransit spectrum of a previous NIRISS observation of the TRAPPIST-1 system ("OOT uncertainty").cient occurrences to sample time-varying contributions of the heterogeneities.
As the heatmaps in Figure 4 show, the duration and number of snapshots required to achieve a given SNR are related, offering multiple observational options.As in previous sections, the retrieval here is done assuming to know the true spot positions, which will not be the case in general.However, our intention is to highlight the relative effectiveness of different observing strategies.For example, we find that for a 30-day rotation period a similar precision is achieved for, e.g., 40 2-hr snapshots, 20 4-hr snapshots, 10 8-hr snapshots, or 5 16-hr snapshots.These options correspond to a 10× lower observation requirement than for a continuous stare, which provides flexibility when selecting observing strategies, taking into account practical considerations such as overheads and slew times.

Reliably Retrieving Heterogeneities' Geometric Properties
In order to fully test the ability of the retrieval framework to characterise a heterogeneous stellar surface, we also run a set of retrievals where we use MCMC to estimate the geometric properties (i.e., size, position, limb darkening) of each inhomogeneity in addition to using SVD to estimate their spectral signature.
We ran fits for the same set of models described in the previous section, varying the size, position, temperature, and number of spot features present on the stellar surface.We show an example retrieval of a one spot model in Figure A1, which highlights the strengths and weaknesses of the framework.The framework is able to reliably constrain the longitude of features to within ±5 • .This can be improved by pre-fitting a white-lightcurve, where we sum across all wavelengths.While this discards spectral information, we can then use this constrained longitude as a prior when retrieving the spectral signal of the features.The latitude, in comparison, has a much wider range on average, with an typical uncertainty of ±20 • .This amounts to differentiating only between equatorial or polar spots (Luger et al. 2021).
Additionally, as shown in Figure A1, the size of the spot feature can similarly vary by up to 50% of its true value.The result of this is a degeneracy between size and amplitude of the spectral contribution of the heterogeneity.However, as also seen in Figure A1, despite these pitfalls the framework is still able to accurately estimate the spectrum of the stellar photosphere.In Section 5, we outline how additional prior information can further constrain the size of a feature, based on global physical constraints on the overall scaling of its spectrum (leveraging the trade-off between feature size and spectral amplitude) to better constrain the physical features of the spot.
Heterogeneities were allowed to occur anywhere on the stellar surface, leading to degeneracies where two heterogeneities would overlap and contribute to the overall spectrum jointly.Additionally, we found that without additional information, the latitudinal position of a heterogeneity was difficult to constrain.These issues highlight clear areas for improvement for future work, which we discuss further in Section 5.In this unconstrained scenario, the framework often has difficulty in determining precise geometric features of the heterogeneities (position and size), often due to degeneracies between the position and number of features.However despite this, it is still able in most cases to recover the photospheric signal to within 1%.

DISCUSSION & FUTURE STEPS
This work presents a potential avenue-and its performance assessment-towards solving the challenges associated with stellar contamination of transmission spectroscopy, in particular addressing one of the main bottlenecks regarding the lack of fidelity of current stellar models.This is in parallel to efforts which aim to identify and correct for stellar activity effects through ground-based high-resolution spectroscopy (Mallonn et al. 2018;Rosich et al. 2020;Perger et al. 2023), which still requires stellar spectrum templates.Still such initiatives highlight both the communal interest and effort in tackling these issues.
Leveraging time-resolved spectroscopy to generate empirical emission spectra of stellar heterogeneities will enable a new era of high-precision transmission spectroscopy, in-depth atmospheric studies, and critical benchmarks for the next generation of stellar models.The upcoming library of empirical emission spectra for stellar surface heterogeneities will be similar in scope to the work of Kesseli et al. (2017) that compiled a library of empirical spectra for various stellar types, with the important distinction that the spectra being considered here are not for disk-integrated features, but rather for 'pure' basis components which may be combined with rotational geometry in order to produce accurate spectra for stars with arbitrarily complex surface features.Such a library will not only enable the robust correction of the transit light source (TLS) effect based on out-of-transit measurements (Rackham et al. 2018(Rackham et al. , 2019)), it will also provide important benchmarks for the next-generation of theoretical stellar models (e.g., Witzke et al. 2021;Rackham et al. 2023), and further inform key relationships between the properties of stars and those of heterogeneities, such as between heterogeneity temperatures and sizes, photospheric temperatures, and atomic linedepth ratios.
We have demonstrated that under ideal conditions of knowing the spot geometry, we are able to constrain photospheric spectra at the level of the ≲ 1% for the spectra of heterogeneities while spectra with precisions of ∼ 1% (S/N ∼ 100) are used commonly to constrain the fundamental physical parameters of exoplanet host stars (e.g., Wells et al. 2021;Delrez et al. 2022;Barkaoui et al. 2023;Ghachoui et al. 2023;Pozuelos et al. 2023;Dransfield et al. 2024).We have also taken first steps towards are a fully blind retrieval, and have similarly found situations in which the stellar photosphere can be constrained at a similar precision.In terms of absolute flux calibrations, for example, the goal for the X-SHOOTER instrument is ≤ 10% (Schönebeck et al. 2014), while the eventual goal of the JWST calibration program is 1% accuracy for each observing mode (Gordon et al. 2022).Thus, constraints on component spectra from this technique are on par with current precisions available for integrated disk spectra and will be limited ultimately by the overall precision and accuracy limitations of JWST observations themselves, providing valuable data-driven benchmarks to inform the next generation of models.
While we have demonstrated the prospects using timeresolved spectroscopy to empirically measure the spectrum of stellar surface features, we highlight that there remain challenges and opportunities to leverage its full potential.We conclude this work by on these and listing avenues for future work.

Correcting for Contamination at Different Epochs
The ultimate goal of this work is to generate a library of empirically derived spectra for heterogeneities (i.e.spots) of a given star, to be used in the robust correction of in-transit stellar contamination at any past or future epoch.The feasibility of this approach is supported by Using this simple physics-based constraint, precision in the retrieved spectra is increased by ∼ 15×, which leads to an increase in the precision on the spot size and position by ∼ 5-8×.The true spot size and latitude for this example are 0.2 and 0.7 radians respectively, illustrating that the posterior constraints provide a more accurate and precise retrieval.
the following.First, heterogeneities of a given star have been shown to have consistent properties.For example, molecular-band modeling of echelle spectra of DM UMa suggests a spot temperature of 3570 ± 100 K during an observing campaign in 1995, with filling factors ranging from 0.25 ± 0.08 to 0.30 ± 0.10 (O' Neal et al. 1998).Returning to the same star during six nights in 1998, a later analysis found a spot temperature of 3450 ± 120 K and filling factors ranging from 0.28 ± 0.06 to 0.42 ± 0.05 (O'Neal et al. 2004).Second, properties of heterogeneities appear to be correlated, making them easier to pin down.Starspot temperatures show a clear dependence on photospheric temperature, based on Doppler imaging, modeling of molecular bands, and atomic line-depth ratios (Berdyugina 2005).Therefore while heterogeneities' filling factors surely evolve over a stellar activity cycle, their temperatures and thus spectra are a static characteristic of a given star, supporting our proposition of their relevance across epochs.

Challenges & Opportunities with Limb Darkening
In the present work, we use limb darkening laws which were independant of wavelength and temperature.In practice however, limb darkening is an effect which depends on both the wavelength and the temperature of the stellar surface (Claret & Bloemen 2011), which are related to quantities we are attempting to retrieve in the first place.Thus we find ourselves in a loop where the stellar spectrum is required in order to know the appro-priate value of the limb darkening coefficients, which is required in order to fit for the stellar spectrum.
The assumption of temperature and wavelength independence is necessary for the present performance estimates, but does not lead to an overestimation of such performances.Indeed, by making said assumptions, we are in fact discarding information content-that when accounted for in future works-will further help constrain the stellar properties.Accounting for the wavelength dependency of limb darkening will notably further help break the degeneracies currently observed between the latitude and size of a heterogeneity and thus better constrain the latitudinal distribution of heterogeneities (Juvan et al. 2018;Luger et al. 2021).

On Complementary Physics-based Constraints
In the current framework, the retrieval is performed in a manner that is completely agnostic to the underlying physics used to generate the synthetic data, and imposes no constraints that the retrieved spectra adhere to any physical processes.When performing a completely blind retrieval (i.e., imposing no constraints on the geometry of spot features), the model will allow a range of solutions, including those with negative or non-physical spectra for the spots.Future works should explore how relevant priors could be added to the framework without introducing biases from limited stellar models.
An example of how physical constraints may be used to inform a retrieval without injecting bias from phys-ical models is to impose the condition that the fitted spectra must be consistent with a blackbody outside of absorption features.We show this in Figure 5, where we have taken the ensemble of posterior spectra of a surface feature, scaled them by the projected area of the feature, and discarded those which are not well fit by a black-body envelope in given wavelength ranges.We do not fit for a specific temperature; rather, we discard posteriors which have a reduced χ 2 value above 5 to allow for fits that deviate due to spectral features which may be realistic but unaccounted for by a blackbody profile, while rejecting i.e. spectra with negative values etc..The result of this is the reduction in uncertainty on the spot spectrum by a factor of ∼ 15×, as well as reducing the uncertainty on the spot size by a factor of ∼ 7× and the latitude by a factor of ∼ 4×.We highlight that since this cut is made on posterior spectra, we are not injecting bias into the retrieval of the spectra themselves, rather discarding those results which do not agree with a loose physical constraint.
Further considerations to include physical information would be a parametrization of the relative flux expected between wavelength bins associated to the feature of a same molecule.While absolute flux values may be biased, relationships between wavelengths may be robust enough to provide additional constraints.This information could be extracted using Gaussian processes in order to measure correlations between different wavelengths (Perger et al. 2023).Constraining the spectra in this way would enable tighter constraints on the size and latitude of a given feature, which is currently degenerate with the overall amplitude of its spectrum.Additionally, including the use of activity indicators provided by high-precision spectroscopy to help solve in the inverse problem of reconstructing active regions on the stellar surface (Mallonn et al. 2018;Rosich et al. 2020).
In other words, while a series of improvements to this framework can (and should) be made in the future, the present theoretical proof-of-concept suffices to move towards a practical application with JWST data as a next step.Such data would also inform in a relevant manner the aforementioned series of improvements (e.g., empirical wavelength-and temperature-dependencies of the limb-darkening).We thus look forward to an on-sky validation and further development of this framework in the near future to enable the robust atmospheric characterization of planets whose spectra would otherwise stay contaminated.

A. TEST MODEL SAMPLE PARAMETERS & RESULTS
Table A1 gives the parameters of the different stellar surfaces used to test the retrieval framework.Figure A1 illustrates the results of the test retrieval allowing the position, size, and spectral contribution of a spot feature to all vary.Table A1: Parameters for the different stellar surfaces used to test the retrieval framework where θ and ϕ are the latitude and longitude of a spot feature in radians, r is its size (relative to the stellar radius), and T is the temperature of the model (in Kelvin) used to generate the synthetic observations.The σ parameters represent the median fractional deviation of the measured spectra from input spectra.Solid horizontal lines separate families of models.The first three are a random sampling of models with 4, 3 and 2 spot features, with maximal spot sizes of r = 0.3 and temperatures of either 2300K or 2700K.The fourth and fifth group show models with a single spot of varying latitudes, for two fixed sizes.The last two groups are for a series of models with spots of varying sizes, at two fixed latitudes.

Figure 1 :
Figure 1: On the information content of time-resolved spectroscopy.(Left) Synthetic stellar surface with two spot features, shown halfway through a stellar rotation (phase = 0.5).(Middle) Individual contribution of each component's time-dependent signal (showing their contribution as the star rotates, as well as effects of limb darkening, size and 2D-projection).(Right) Their summed contributions produce a single time-varying hemisphere-integrated spectrum.We show the deviation from median flux as a function of time, to illustrate the variations due to surface features.

Figure 2 :
Figure2: Proof-of-concept for stellar spectra retrieved using time-resolved spectroscopy.Examples of retrieved spectral signals for a photosphere and three spot features on a synthetic star.Their parameters-temperature, size, latitude (θ), and longitude (ϕ) in radians-are given in the bottom left corner of each panel and are kept fixed in this first retrieval.Synthetic and retrieved spectra are in orange and blue, respectively, and residuals at the bottom.For this example, we purposely use a non-physical emission spectrum for spot #3 to emphasize that the framework reliably retrieves the emission spectra empirically.

Figure 4 :
Figure4: Long-stare vs series of snapshots, highlighting the effectiveness of different observing strategies for stars with rotation periods of 3, 10, and 30 days.Heat maps show the median residual on the photosphere spectrum when observing in snapshots (relative to the OOT uncertainty) for three different rotation periods.The percentage in each cell represents the total observation time relative to the rotation period of the star.The snapshot duration is relative to a 2-hr duration that would be typically observed OOT for a transit of planets around TRAPPIST-1.

Figure 5 :
Figure 5: On the use of loose (non-biasing) physical constraints.(Left) Example of constraining the out-of-feature portion of a spectrum (grey bands) to be consistent with a blackbody envelope.(Right) All spectra from a blind retrieval for a spot shown in blue, with the subset consistent with a blackbody envelope shown in orange.All spectra have been multiplied by the projected surface area of the spot feature to illustrate scatter due only to model uncertainty.Using this simple physics-based constraint, precision in the retrieved spectra is increased by ∼ 15×, which leads to an increase in the precision on the spot size and position by ∼ 5-8×.The true spot size and latitude for this example are 0.2 and 0.7 radians respectively, illustrating that the posterior constraints provide a more accurate and precise retrieval.

Figure A1 :
Figure A1: An example retrieval when fitting for the position, size, and spectral contribution of a spot feature.The upper plot shows the posterior distribution for the position and size of the spot feature, with the true value indicated in blue.The bottom panel shows the retrieved spectrum of both the photosphere and the spot itself, similar to Figure 2.