A Blind Search for Transit Depth Variability with TESS

The phenomenon of transit depth variability offers a pathway through which processes such as exoplanet atmospheric activity and orbital dynamics can be studied. In this work we conduct a blind search for transit depth variations among 330 known planets observed by the Transiting Exoplanet Survey Satellite within its first four years of operation. Through an automated periodogram analysis, we identify four targets (KELT-8b, HAT-P-7b, HIP 65 Ab, and TrES-3b) that appear to show significant transit depth variability. We find that KELT-8b’s transit depth variability likely comes from contaminating flux from a nearby star, while the apparent variabilities of HIP 65 Ab and TrES-3b are probable artifacts due to their grazing orbits. HAT-P-7b indicates signs of variability that possibly originate from the planet or its host star. A population-level analysis does not reveal any significant correlation between transit depth variability and the effective temperature and mass of the host star; such correlation could arise if stellar activity was the cause of depth variations via the transit light source effect. Extrapolating our ∼1% detection rate to the upcoming Roman mission, predicted to yield of order 100,000 transiting planets, we expect that ∼1000 of these targets will be found to exhibit significant transit depth variability.


INTRODUCTION
The Transiting Exoplanet Survey Satellite (TESS ) is an all-sky transit survey designed to discover transiting planets orbiting the nearest and brightest stars (Ricker et al. 2014).Though its main aim is to discover new transiting planets, the exquisite precision of TESS ' lightcurves of stars across the sky allows us to extract additional science for already known planets.This data has been used in previous studies to measure transit parameters to higher precision (see, e.g., Patel & Espinoza 2022;Ivshina & Winn 2022).
One signature that can be detected in high-precision lightcurves is transit depth variability, in which the transit depth of a planet varies over time.This phenomenon has been observed for a small handful of exoplanets (Biersteker & Schlichting 2017;McKee & Mon-Corresponding author: Gavin Wang gwang59@jhu.edutet 2022) but has received considerably less attention than other, time-varying phenomena such as transit timing and transit duration variations (see e.g., Kaye et al. 2021;Trifonov et al. 2023;Boley et al. 2020; for review see Agol & Fabrycky 2017).There are several motivations for studying depth variability.Perhaps the one which has recently gained most interest is that of atmospheric activity, in which weather and clouds on a planet's atmosphere cause the opacity and thus apparent size of the planet to change over time (Spiegel et al. 2009;Parmentier et al. 2013;Komacek & Showman 2019).For instance, atmospheric variability has been suggested for Kepler-76b (Jackson et al. 2019), WASP-12b (Bell et al. 2019), and HAT-P-7b (Armstrong et al. 2016) through phase curve analyses, although recent evidence from Lally & Vanderburg (2022) may suggest otherwise for the latter.WASP-121b has also been suggested to exhibit atmospheric variability from transit spectroscopy (Wilson et al. 2021).Given the sensitivity of TESS, transit depth variability due to such atmospheric activity could be detectable from its data, allowing us to con-struct a list of planets with possible atmospheric activity.Furthermore, since TESS focuses primarily on the nearest and brightest planets, such targets could then easily be followed up by ground-and space-based observing programs.
An additional motivation for studying depth variations is its links to stellar activity, which may present itself as periodic modulations in the lightcurve through the Transit Light Source effect (Rackham et al. 2018).Inhomogeneous features such as starspots and faculae contaminate the face of the planet's host star, and as the size and number of these features change over the activity cycle of the host star, so does the amount of contamination in the transits (see, e.g., Section 2 of Morris et al. 2018).Variability that could be due to starspots has been analyzed for planets including KIC 12557548b, which was found to have a significant transit depthrotation modulation signal (Kawahara et al. 2013;Croll et al. 2015).However, to our knowledge no blind search for transit depth variability has been conducted as of yet.
Orbital dynamics, including orbital precession and inclination-eccentricity cycling, also induce changes in an exoplanet's orbit and have been found to cause transit depth variations.A recent example is the case of the warm Jupiter TOI-216b (McKee & Montet 2022), whose orbit changed from grazing to non-grazing over the course of two years, resulting in a larger planet-tostar radius ratio and a refined inference of the planet's size.Thus, studying transit depth variability also enables us to place constraints on exoplanetary orbital dynamics.Planetary oblateness has been suggested as well to give rise to depth variations (Carter & Winn 2010;Biersteker & Schlichting 2017).
Measuring transit depth variations using TESS data gives us indications regarding the stability of TESS, too.In particular, since TESS has large 21" pixels and observes the sky by splitting it into different sectors, the varying orientation of TESS ' cameras and positions of stars over the course of different sectors may result in time-dependent blending.Such effects, if left unaccounted for, directly propagate into changes in the transit depth, and thus the level of consistency of transit depths between sectors informs us of the long-term stability of TESS across sectors.
Finally, while we do not study secondary eclipse depth variability in the present work, we note that such measurements more directly probe temperature and cloud variations in planetary atmospheres.Such variability claims have been made for planets including WASP-12 b (Hooton et al. 2019;von Essen, C. et al. 2019) and 55 Cnc e (Demory et al. 2015;Tamburo et al. 2018).We believe that a population-level analysis analogous to the one we conduct here but for secondary eclipse depths is an interesting and exciting study for future work.
In this work we present a blind search for transit depth variability among a broad sample of known planets observed by TESS.In Section 2, we describe our sample selection, data modeling processes, and statistical metrics used to measure transit depth variability.Section 3 details the results of our analyses, and in Section 4 we discuss the implications and conclusions of our work.

Selection of Targets
At the time of our analysis (October 2022), the NASA Exoplanet Archive1 showed that there were more than 5,000 discovered planets, of which 3,944 were known to be transiting.The TESS mission had completed 57 sectors at this time, and thus we constructed our sample using all data up to and including Sector 57.We downloaded the available 2-minute TESS data for each target.We note that TESS data has also been collected in additional cadences throughout the course of its mission, including 30-minute, 10-minute, and 200-second cadences.However, in order to keep our analysis consistent across sectors, we chose to use only the 2-minute cadence mode.We then estimated the signal-to-noise ratio for each planet, which we calculated by dividing the transit depth by the median flux uncertainty of the TESS normalized lightcurve, scaled by the square root of the number of in-transit points.We obtained the transit depth for each planet through the Exoplanet Characterization Toolkit (Bourque et al. 2021) software, which in turn extracts the most recent values reported on the NASA Exoplanet Archive.After eliminating targets that did not reach a signal-to-noise ratio of 5, our sample remained of 478 targets.We then further removed targets which had: • Fewer than five transits, which would be likely impacted by small-number statistics and thus prevent the measurement of transit depth variability.
• Transit timing variations (TTVs) larger than 0.1 days, as our modeling does not account for TTVs.
After these cuts were applied, our sample included 330 targets.Figure 1 shows the radius-period distribution of our sample.Note that, although this work focuses on exoplanets, which indeed comprise the majority of our sample, we did not impose mass or radius constraints to remove brown dwarfs.Therefore, although in this work we will use the terms "target" and "planet" interchangeably, it should be made clear that not all objects in our sample are, strictly speaking, planets (although they are certainly substellar).

Number of Transits
Figure 2 shows the number of transits for all targets in our sample.In general, the more transits a target has, the more precisely we can measure its transit depth variability.Although the majority of targets in our sample have fewer than 30 transits, there are also a few targets with ∼ 200 transits for which we can constrain depth variability precisly over long timescales.

TESS Data Modeling
We largely follow Section 2.2 of Patel & Espinoza (2022) for our data modeling procedure, using juliet (Espinoza et al. 2019) to carry out our analyses.We use the batman (Kreidberg 2015) model and the dynesty (Speagle 2020) nested sampler for all of our fits.Before starting our modeling, we queried the Mikulski Archive for Space Telescopes2 (MAST) to download all 2-minute cadence Pre-search Data Conditioning Simple Aperture Photometry (PDCSAP) TESS lightcurves for targets in our sample.We note that instrumental systematics including momentum dumps and other anomalies have al- ready been corrected for in these lightcurves by the PD-CSAP pipeline, and we further remove any values that have a non-zero data quality flag.
Whereas Patel & Espinoza (2022) model the data at the (multi-)sector level, here we mainly focus on fitting data at the individual transit level.In particular, for each target, we perform the following two-step analysis.We first use celerite's (Foreman-Mackey et al. 2017) Quasi-Periodic Gaussian Process (GP) kernel to empirically model the out-of-transit noise, which includes effects such as starspot modulation.For our purposes we define out-of-transit data as data that are captured more than one full transit duration away from the nearest transit center, where we obtain the transit duration, P , and t 0 from the Exoplanet Characterization Toolkit.We use wide priors for all Gaussian Process hyperparameters (see example priors and posteriors in Table 1).Secondly, we fit individual transits as informed by our Gaussian Process model, which we accomplish by setting the priors to the posteriors obtained from the outof-transit fit.Since our goal is to measure transit depth variability across single transits, the fit of one transit should be independent of the others and thus we fit the data one transit at a time.Maintaining consistency with our previous definition, here each transit is fit using data within one transit duration of its transit center.We assign the depth, impact parameter, and quadratic limb darkening coefficients uniform priors between 0 and 1, and the scaled semi-major axis a log-uniform prior from 1 to 100 (see "In-transit fit" section of Table 1).The eccentricity and argument of periastron are fixed to values obtained from the literature, and defaulted to 0 and 90 • respectively if no literature values are available.Although our method is computationally more expensive than a direct fit to the individual transits, it allows us to preserve minute differences between transits.Finally, we visually search through individual transit depths for all targets and remove outliers which were more than 3σ away from the surrounding data points, or had an errorbar that was more than 3 times larger than the median errorbar.
Though the primary focus of this work is indeed transit depth variability on the single-transit level, we determined that it would also be insightful to conduct sectorlevel fits for the planets in our sample.These were carried out in an analogous way to the individual transit fits in that we use the same definition of priors as above, with the only difference being that we do not divide the data into segments and instead fit the entire lightcurve for each sector.Such fits inform us of the instrumental stability of TESS (see Section 3.2), in addition to providing us with precise transit parameters that we later use to create noiseless simulations of transits (see Section 2.4.1).

Periodicity Search
Transit depth variability may emerge in a periodic fashion, whether that be through stellar activity, atmospheric activity, or orbital dynamics.Thus, to identify such periodic signals, we perform a uniform search of periodicity for all targets in our sample using the Lomb-Scargle periodogram (Lomb 1976;Scargle 1982) as implemented through astropy (Astropy Collaboration et al. 2013Collaboration et al. , 2018Collaboration et al. , 2022)).Variations in the transit depth with frequencies above the Nyquist frequency or below the maximum continuous time span of the data cannot be reliably detected.Thus, these form the upper and lower bounds for our frequency space, which we uniformly sample with a density corresponding to the minimum resolvable frequency spacing.
To determine the significance of any peaks detected in the periodogram, we simulate the transit depths with Gaussian white noise and calculate the corresponding 10%, 1%, and 0.1% false-alarm probabilities.Specifically, we assume a fixed transit depth of the system, which we take as the median of the individual transit depths, and set the standard deviation of the Gaussian noise distribution to the median transit depth uncertainty.If a peak for the true data's power spectral density has a false-alarm probability of less than 0.1%, we consider the peak to be significant.Since there are known complications with claiming significance levels in Lomb-Scargle periodograms (Zechmeister & Kürster 2009), for these targets we conduct a second round of vetting by phasing the transit depths to the highest peak and performing a sinusoidal fit to the phased data.This allows us to determine whether the amplitude is consistent with 0 for each target.If the corresponding amplitude for a target is more than 3σ away from 0, we consider it to be a target of interest and worthy of closer scrutiny.Our results from these searches are discussed in Section 3.3.

Population Analysis of Transit Depth Variability
The second objective of this work is to search for correlations between stellar type and transit depth variability on a population level.Achieving this requires us to develop a new metric.Each target has varying baseline noises and thus we cannot directly compare the standard deviations of their transit depths (as calculated in Section 3.1).In order to objectively compare depth variations across different planets, the metric that we use must be independent of: • Number of transits, because this is not an intrinsic astrophysical property of the system and is instead a consequence of orbital period and TESS ' observing baseline.
• Transit detection signal-to-noise ratio, as this depends on factors including apparent magnitude of the host star and transit depth, which we assume to not be related to true transit depth variability but rather only our ability to observe and measure it.
We next describe our process of simulating and coadding transits that we use to calculate a metric that satisfies our criteria.

Simulating & Co-Adding Transits
In broad terms, our approach is to compare variations in the true transit depths against simulated transits.We use the batman model when simulating our transits for consistency with our fits (see Section 2.2).For each target, we first extract the posterior parameters from our sector-level fits.These are then input into batman, which returns noiseless, simulated transits with the same 2-minute cadence as the TESS lightcurves.It is important to note that all transits within the same sector are modeled using the same set of parameters and thus identical.A consequence of this, however, is that injected transits are slightly different for each sector and this method is thus insensitive to changes occurring over timescales of > 1 sector.Since we want to explore the effects of stellar type on transit depths, we inject these transits into the out-of-transit baseline of the lightcurve.Specifically, this is done by multiplying point-by-point the original lightcurve flux values by the simulated data.The injection site is chosen to be a time offset from the Length-scale for GP GPP rot J (1.0, 100.0) 4.26 +0.16   −0.17

Quasi-periodic GP period
In-transit fit −0.17 original transits, which reproduces a full set of transits, each of which are offset by the same time length from the original transits.Finally, we repeat this process for a total of 30 sets per target, with each set corresponding to incremental offsets such that the entire lightcurve baseline is sampled.This process is shown in Figure 4.
In reality, the process of injections is complicated by the fact that there are data gaps in the TESS data.Due to these gaps, it is often the case that a set of injections is unable to contain as many transits as the true lightcurve.Our analyses reveal that the F-ratio, which is the standard deviation of the transit depths divided by the median uncertainty of the transit depths, can exhibit large changes even with small (∼ 1%) changes in the number of transits used to calculate the ratio, as shown in Figure 3. Thus, we enforce the constraint that the number of transits in a set must be exactly equal to that in the true lightcurve to avoid introducing additional sources of error.As a general rule of thumb, we accept targets which have ≥ 9 successful injections out of 30.For targets whose success rates are low (< 30%), we perform a second round of injections, increasing the number of injections to 100 and maintaining inclusion of targets with 9 or more transits.In this way, we are able to incorporate a larger percentage of our original sample into our population-wide analysis.

Nested Ratio from Simulated Transits
For each target, we have between 9 and 30 sets of simulated transits, and we calculate the F-ratio for each set.We also repeat this calculation for the true transit depths to obtain F t .Finally, we normalize F t by dividing it by the distribution of simulated F-ratios.Hereafter in this text we will call this ratio of the F-ratios the "nested ratio," which is the metric that we will use to search for population-level trends: where F i is the median of the injected F-ratios.It should be noted that F-ratios for neither the true nor simulated transits have associated uncertainties.However, we can make use of the fact that we have a distribution of simulated F-ratios to measure the uncertainty of N .For a target which has n injections and thus n simulated ratios, we randomly choose two-thirds of those ratios, whose median we will call F 2/3 .We can calculate one value of N by taking N = F t /F 2/3 .For each planet, we repeat this process 100 times, resulting in 100 random draws of N .Our nested ratio is then the median of this distribution, with associated uncertainty σ N as the standard deviation of the distribution.randomly-chosen transits until all 218 transits for the target are included.Even omitting just a small number of transits can result in a large change in the ratio, demonstrating that the F-ratio is not immune to changes in the number of transits.

Absolute Transit Depth Variability
Using the steps outlined in Section 2.2, we obtain a list of transit depths δ and uncertainties σ δ for each target.In order to empirically measure the variability in the transit depth for each target, we calculate the standard deviation σ in the transit depth for each target in units of parts per million (ppm).We call this value the absolute transit depth variability.Note that this is different from the per-point uncertainty σ δ in the transit depth.However, if there is no inherent, excess variability in the transit depth, these two values should converge.
The results from our calculations are listed in Table A1, with absolute variability typically lying on the order of 1 part per thousand, comparable to the uncertainties in the transit depths.Figure 5 plots the absolute variability against TESS magnitude for all targets in our sample.We note that the data exhibit a general trend of higher variability for fainter targets that aligns with expectations, albeit with a moderate spread.No planets deviate significantly from this trend.
It is important to note that, although this is an absolute (i.e., measured in parts per million) standard deviation which places absolute constraints on the amount by which the transit depth varies, it is only specific to the data collected by TESS.Different instruments and different observing schemes could result in different absolute constraints, but TESS ' high precision nevertheless makes these values good baseline estimates for future studies.

Targets with Sector Discrepancies
Blending caused by TESS ' large pixels is accounted for in the PDCSAP lightcurves through a dilution factor (Stumpe et al. 2012), calculated based on positions of nearby stars from the TESS Input Catalog.However, the proper motion of stars and the orientation of the instrument's four cameras change the pixel positioning of stars over the course of different sectors, which may result in changes in the dilution that are not perfectly accounted for.To measure the instrumental stability of TESS across sectors, we conduct an additional search for variability on the sector-level.Specifically, we jointly fit all transits for each sector, and setting the out-of-transit Gaussian Process posteriors as the priors for these sector-level fits.We then calculate the sigmadiscrepancy for all pairs of sectors and identify those targets which have a minimum of 3σ transit depth discrepancy between sectors.
We find 3 such targets (TrES-1b, WASP-100b, WASP-183b) out of our sample of 330.We note that finding ∼ 1% of targets with discrepancies above 3σ is quite reasonable within a large sample, suggesting that there are no excess biases with TESS.Nevertheless, we examine each of these cases in further detail.
• TrES-1b was observed by TESS in Sectors 14, 40, 41, 53 and 54.For Sectors 14 and 53, our fits obtained R p /R * values of 0.1395 ± 0.0014 and 0.1308 +0.0023 −0.0020 , respectively.These two values are 3.17σ apart, while all other pairs of sectors were consistent to within 3σ of each other.We believe this is in part due to an anomalous transit in Sector 53 which falls in a data gap due to a momentum dump and which has a depth that is roughly only half of the other transits, thus biasing the fit towards a lower R p /R * .Although we had manually removed all outlier individual transit depths   (see Section 2.2), we did not remove those discrepant transit depths from the sector-level fits.To confirm our hypothesis, we repeated the sector fit for Sector 53 with the anomalous transit removed, which returns R p /R * = 0.1322 +0.0021 −0.0019 .This is now consistent to within 2.89 sigma with the depth from Sector 14.Although nominally consistent, we believe that this target is interesting and warrants further scrutiny.
• WASP-100b falls in the TESS Continuous Viewing Zone and was observed in 20 sectors.The planet-to-star radius ratios for Sectors 2 and 12 are 0.0799 +0.0014 −0.0010 and 0.0856 +0.0009 −0.0010 , discrepant at 3.30σ.Sector 2 is also discrepant with Sector 10 (R p /R * = 0.0847 ± 0.0007) by 3.16σ, and there were no other discrepant pairs of sectors.We note that the PDCSAP dilution factor for Sector 12 was 0.88, significantly smaller than the 0.92 − 0.93 values for other sectors both before and after it.The dilution factor is absent in the FITS headers for Sector 10, and we hypothesize that variations in the dilution factor caused the observed discrepancy.An analysis of the typical variation in the TESS dilution factor, however, reveals that 20 of 107 arbitrarily chosen planets in our sample exhibit changes in the dilution factor that are larger than 0.04, and yet do not display significant sectorlevel variations.Thus, we believe that WASP-100b is an interesting target to further monitor as well.
• WASP-183b was observed in Sectors 35, 45, and 46.The best-fit depths for Sectors 45 and 46 are respectively 0.1229 +0.0061 −0.0042 and 0.1500 +0.0053 −0.0057 and discrepant by 3.25σ.We attribute this to the high impact parameter of the target, resulting in a degeneracy in the transit depth.This is supported by the V-shaped transits, in addition to the impact parameter posterior values of b = 0.30 +0.18 −0.14 (Sector 45) and b = 0.75 +0.05 −0.06 (Sector 46) for these two sectors, which differ by 2.4σ.We additionally note that our transit-by-transit fits do not show such large changes in b, and upon constraining the limb darkening coefficients around theoretical values, the discrepancy between the two sectors disappears.However, we still cannot rule out variability such as orbital dynamics or precession on timescales of ∼ 1 sector (27 days) resulting in a variable inclination and thus impact parameter.
Collectively, these results suggest that one must take caution when analyzing targets with transits that fall near a data gap and those with high impact parameters.Additionally, this analysis demonstrates that TESS is largely able to accurately account for and correct variations in dilution for a broad sample of targets.

Periodogram Search
We repeat the procedure outlined in Section 2.3 for all targets in our sample.In most cases, we do not find any statistically significant peaks above what is expected from Gaussian white noise.However, for 4 targets (HAT-P-7b, KELT-8b, TrES-3b, HIP 65Ab) we discover peaks that fall above 99.9% of the points obtained through bootstrap resampling, corresponding to falsealarm probabilities of less than 0.1%.
Continuing the vetting, for each of these 4 targets we calculated the significance of the amplitudes, finding that all 4 have amplitudes of more than 3σ from zero (see Table 2).However, the transits for TrES-3b and HIP 65Ab initially had unusually large uncertainties, which we determined to be due to their high impact parameters and our fitting choices (see Sections 3.3.3and 3.3.4for a detailed investigation).Moving forward we will only present the results from the re-analysis which fixes the impact parameter to literature values.For each target, the transit depths are phased to the highest frequency peak found in Figure 6 and plotted in Figure 7.We discuss each of these targets below.

KELT-8b
KELT-8b, also known as TOI-2132.01and TIC 358516596.01, is a hot Jupiter with a 3.2-day period (Fulton et al. 2015).This target was observed by TESS in Sectors 26, 40, 53, and 54. Fulton et al. (2015) measured a best-fit transit depth of 13100 ± 600 ppm, whereas our fits find clear variability in the depth ranging from values as extreme as 8000 to 14000 ppm with  amplitude 2320 ± 370 ppm, or 6.2σ (see Figure 7).Furthermore, as shown in Figure 8, we find strong modulations in the lightcurve with period 0.197 days and amplitude 3145 ± 9 ppm.Gaia epoch photometry and TESS centroid offsets reveal, however, that the modulations in the lightcurve are due to a blended variable star (TIC 358516610, Gaia DR3 4534144923690486144, T = 13.967)25" from the target star (T = 10.203).After accounting for blending and the brightness ratio of the two stars, the variations match what is observed in KELT-8's lightcurve in both period and amplitude.Thus, we conclude that this is a false positive scenario, and that the true transit depth for KELT-8b likely does not vary in the absence of the nearby contaminating source.
Nevertheless, this candidate serves as a reminder for the need to be cautious of blending scenarios when claiming precise transit signatures with TESS data.Furthermore, the fact that KELT-8b experiences transit depth variability with a period of 7.2 days, unrelated to both its orbital period of 3.2 days and the contaminating source's period of 0.2 days demonstrates the complex effects that photometric activity can have on transit depths.
The strongest photometric variability detected in the out-of-transit portion of the lightcurve has variation amplitude 93 ± 5 ppm and period 6.53 days.There are a few close sources nearby to HAT-P-7.Thus, it is possible that, as with the case of KELT-8, the detected photometric variability is actually due to activity from a nearby star.However, we are unable to make definitive conclusions without further data, as starspots and faculae on the host star HAT-P-7 could also be capable of reproducing the same observed variability.
We note that HAT-P-7b has actually had an interesting history of atmospheric variability claims.Using Kepler data, Armstrong et al. (2016) detected variations in its phase curve, with the position of the peak shifting longitudinally between orbits.These variations were initially attributed to atmospheric variability.More recently, however, Lally & Vanderburg (2022) suggested that the observed variations could instead be caused by stellar activity.Nonetheless, no previous work known to the authors has reported on transit depth variations for this target, and thus our work may provide a new perspective from which follow-up studies can continue to examine the nature of HAT-P-7b and its atmosphere (with, e.g., transmission spectroscopy).

TrES-3b
TrES-3b (O'Donovan et al. 2007), also known as TOI-2126.01and TIC 116264089.01, is a hot Jupiter with a short orbital period of 1.3 days.This target was observed by TESS in Sectors 25, 26, 40, 52, and 53.The discovery paper found its transit depth to be 27600±800 ppm.Although this is consistent with the median transit depth of 29800±8300 ppm from our initial fits, which also appear to show depth variability with amplitude 4350 ± 1140 ppm (3.8σ), we realized that the transit depths had unusually large uncertainties.We attribute this to the large impact parameter; for values of b close to unity, minute changes in b can yield large swings in the transit depth.The most recent value of b in the literature comes from Patel & Espinoza (2022), which found b = 0.846 +0.030 −0.025 .Our TESS sector-level fits yield similar results.Thus, to improve the precision on the depths, we conducted a second round of fitting, this time fixing the impact parameter to 0.846.This yielded noticeably better fits, with the median transit depth being 29200 ± 1900 ppm.
Consequently, with this newer fit, the previously detected transit depth variability is no longer observed, and the periodogram of the new depths no longer has statistically significant peaks.When phased to the peak of P = 4.2534 days found from the first round of fitting (see Figure 6), the amplitude in the depths is reduced to 778 ± 470 ppm, only 1.7σ from 0. The transit depths are plotted in Figure 7 and do not seem to show obvious variability.Thus, we conclude that the observed peak in the periodogram likely results from the shape of the grazing transit casting uncertainty into the transit depths, creating the illusion of transit depth variability.We find photometric variability for this target to be strongest at P = 9.3 days with amplitude 833 ± 17 ppm (see Figure 8), but we do not believe this to be related to the initially observed transit depth variability.

HIP 65Ab
HIP 65Ab (Nielsen, L. D. et al. 2020), also known as TOI-129.01 and TIC 201248411.01,was observed by TESS in Sectors 1, 2, 28, and 29.This is a 3.2 M Jup planet on a 0.98-day grazing orbit.The original discovery paper finds b = 1.169 +0.095 −0.077 , along with a highly uncertain depth of 82000 +52000 −40000 ppm.Our first round of fitting obtains a median depth of 10700±4600 ppm, and phasing at the P = 26.8947day peak found in the periodogram, we appear to detect transit depth variability with amplitude 1600 ± 370 ppm (4.3σ).However, we note that the depth is discrepant from the literature value by nearly a factor of 8. We again attribute this to the large impact parameter, which is even more extreme than TrES-3b and nominally above 1.Thus we again conduct a second round of fitting, in which we fix b = 1.169 in our priors.This yields a revised median transit depth of 83100 ± 2800 ppm, with much better agreement with Nielsen, L. D. et al. (2020).Figure 7 shows the results from this fit.
We observe that this case is similar to that of TrES-3b: upon fixing the impact parameter, the previously observed transit depth variability no longer exists.The transit depths are all consistent to well within 1σ, and the amplitude of the best-fit sinusoid is now greatly reduced to 620 ± 1550 ppm (0.4σ).This is much smaller than the original detection.Although we search for and find photometric variability with period 6.0 days and amplitude 1215 ± 6 ppm, this does not give us further insight on the transit depth variations for this system.We again conclude that the initial transit depth variability is likely an artifact resulting from the planet's grazing orbit.

WASP-121b
WASP-121b is a target of particular interest which has had previous claims of variability in the planet's transmission spectrum (Wilson et al. 2021).More specifically, spectra collected in January 2017 and October 2016 seem to show differences of ∼ 2σ in the transit depth at wavelengths shorter than 650 nm.Here we present a brief analysis of this target.We measure WASP-121b's median transit depth in the TESS band to be 14870±290 ppm, consistent to 2σ with 15510 ± 120 ppm from Delrez et al. (2016).The transit depths seem to vary with amplitude 154±76 ppm (2.0σ) at P = 7.6507 days, as shown in Figure 10, but this does not reach our 3σ threshold).The periodogram also does not display peaks with false-alarm probabilities of lower than 0.1%.
We detect photometric variability most strongly at a period of P = 13.67 days, with amplitude 141 ± 7 ppm (see Figure 11).This is comparable to the variations in the transit depths.Thus, even if the transit depth variability were significant at > 3σ, we are still unable to infer planet atmospheric variability, as we cannot rule out stellar activity as an alternate explanation.Fur-thermore, there are a few nearby sources within 1 − 2 TESS pixels of the target star, which could also result in false-positive scenarios.
We note, however, that TESS has a wide, red bandpass covering 600 to 1000 nm, while Wilson et al. (2021) found that time variability for WASP-121b is most apparent for wavelengths shorter than 650 nm.This comprises only a small portion of the TESS bandpass (the range from 600 to 650 nm).In addition, TESS 's wide bandpass could possibly dilute any true signal.Thus, we believe that while we do not detect significant transit depth variability for this target, our results are nonetheless in agreement with those of Wilson et al. (2021).
Despite this non-detection, the case of WASP-121b provides an interesting example.Because the opacity of the spectra of species such as TiO is highest in longer wavelengths around 600−1000 nm, which coincides with the wavelength range of TESS, targets which are ob-served by TESS to exhibit transit depth variations may be more likely to experience atmospheric variability as compared to stellar activity.

Nested Ratio
Our final analysis is a search for population-level trends for transit depth variability.We hypothesize that transit depth variability may be dependent upon the strength of the Transit Light Source effect (Rackham et al. 2018), which is in turn related to the properties of the host star.For instance, Rackham et al. (2018) finds that this effect is very pronounced for M dwarfs, while the present state of knowledge regarding this effect for other stellar types is uncertain.To shed light on the extent to which the Transit Light Source effect may depend on stellar type, we search for trends amongst transit depth variability against stellar effective temperature and host star mass.We calculate the nested ratio, our method of normalizing transit depth variations consistently across all planets, using the steps described in Section 2.4 and compare them against these two stellar properties.
For each target, all of the transits that are injected have a constant transit depth (see Section 2.4.1).Thus, the F-ratio for a simulated set of transits is an attempt to measure the portion of the F-ratio that is solely due to the non-Gaussian noise in the lightcurve, with transit depth variability being taken out of the question.Therefore, a nested ratio greater than 1 suggests variability that persists even when accounting for noise; additionally, a larger nested ratio suggests greater true variation.We note here that such true variability may have two origins: the planet, such as its atmosphere, or the Transit Light Source effect.Thus, to focus on the latter, we exclude the four planets identified in Section 3.3 to possibly exhibit planetary-induced variability from this population analysis.
Due to our simulation and selection processes (see Section 2.4.1),targets which did not meet the minimum threshold of 9 successful injections were excluded from this analysis.Furthermore, the nature of our criteria biases against targets observed in multiple sectors, on average excluding multi-sector targets at a higher rate.However, this is not expected to bias our results.In total, 208 of our 330 original targets were included in this subset of the population-level analysis.The nested ratios are also listed in Table A1; note that some entries are missing due to the calculation only being performed on a subset of our sample.

Stellar Effective Temperature
Our first measure of stellar type is stellar effective temperature, which we obtain from various works in the lit-erature through the exoctk package.Figure 12 shows the nested ratio plotted against T eff .Referring back to our speculation based on the results of Morales, J. C. et al. (2008), one might expect larger transit depth variability for planets orbiting cooler hosts.However, there does not seem to be a trend of higher nested ratios for cooler stars; in fact, if we ignore the errorbars, there seems to be an absence of high nested ratios (N > 1) for planets around stars below ∼ 5000 K, thus suggesting that nested ratios are lower for planets around stars below ∼ 5000 K.We note that ≤ 5000 K corresponds to K-dwarfs, which have been found to exhibit less stellar variability and are good host stars in the search for extraterrestrial life (Cuntz & Guinan 2016;Lillo-Box et al. 2022).
However, this feature is not found to be statistically significant.We observe that, for stars with T eff < 5000 K, there are 10 which have planets with N > 1.0 and 22 with N < 1.0 (69% below 1.0), whereas for stars hotter than 5000 K we have 73 and 103 counts respectively (59% below 1.0).A Student's t-test of the null hypothesis that these two populations have the same proportion of targets with N < 1.0 results in a P-value of 0.13, which does not reach the standard ≤ 0.05 significance threshold.Furthermore, a weighted average calculation of the nested ratios for each population reveals N (T eff < 5000K) = 0.618 ± 0.046 and N (T eff > 5000K) = 0.497 ± 0.011, values which are consistent at 2.5σ.Finally, we conduct an MCMC fit with bootstrapping to the data by randomly choosing twothirds of the planets, resampling each data point using its errorbars, and performing a linear least-squares fit.We repeat this process 10 5 times, and obtain slope (3.7±2.1)×10−5 and vertical axis intercept (projected to 3000 K) 0.839 ± 0.060, which are consistent with 0 and 1 to within 1.8 and 2.7σ respectively (see Figure 12).This suggests that the data is in agreement with no excess variation at the encompassed stellar temperatures.From the above tests, we conclude that our current data and analysis are not indicative of a statistically significant correlation.

Host Star Mass
Our second measure of stellar type is the mass of the host star.We also obtain these through the exoctk package, and its relation with the nested ratio is shown in Figure 13.We do not notice any obvious correlation in the data, and thus do not perform a Student's t-test and weighted average calculation.Using the same MCMC and bootstrapping linear fit as in Section 3. to identify a significant dependence between the nested ratio and host mass.

DISCUSSION AND CONCLUSION
In this work we perform a blind search of transit depth variability on a set of 330 known planets observed by TESS.We fit 2-minute PDCSAP TESS data for all targets using juliet, and use out-of-transit data to inform our in-transit fits to minimize the uncertainty in our fits while keeping all transit parameters (especially transit depths) independent.For each target we calculate the standard deviation of its depths, forming an atlas of transit depth variability which can serve as reference for future depth variability studies.
Next we identify individual targets with significant signs of transit depth variability.We conduct an automated periodogram search for all targets, with further vetting conducted with a significance calculation.In general, we find that the majority of targets do not display significant variability above what is expected from noise.However, for the 4 targets KELT-8b, HAT-P-7b, TrES-3b, and HIP 65Ab -which form roughly 1% of our entire sample -we find signals that meet our selection criteria.
For KELT-8b, we attribute the observed transit depth variability to a nearby variable star.The transit depth variability analyses for TrES-3b and HIP 65Ab are interesting as both have relatively high (> 0.8) impact parameters, resulting in highly unstable depths when fitting for single transits.We demonstrate that keeping the impact parameter fixed in our fits stabilizes the transit depths and allows us to accurately characterize any variability.In both cases, we find that upon fixing the impact parameter, depth variability is no longer observed.This leads us to believe that the initial detection is most likely an artifact of the planets' grazing orbits.
The only target we deem worthy of additional followup and analysis is HAT-P-7b.This target provides a special case in that it has also been observed by Kepler, and its first claim of atmospheric variability had made use of Kepler data (Armstrong et al. 2016).It may be informative to conduct a similar measurement of transit depth variations using Kepler data in addition to TESS, as the combined data could provide more robust constraints on the system's observed depth variability than TESS alone.Transit depth variability that is found to be consistent across instruments may also further demonstrate its authenticity and warrant additional follow-up such as transit spectroscopy.
Finally, motivated by the fact that transit depth variability can be caused by stellar activity through the Transit Light Source effect (Rackham et al. 2018), we search for population-level trends of transit depth variability against stellar type, which we characterize using stellar effective temperature and host star mass.We carry out this analysis on a subset of 208 targets, for which we compare true transits to simulated sets of transits.This forms a nested ratio statistic that normalizes depth variability across targets.Through a variety of statistical tests, we are unable to identify any significant correlations between the nested ratio and host star properties.For individual targets with follow-up by transit spectroscopy, our atlas of transit depth variations can also serve as a guide as for how much transit depth variability could be attributed to stellar activity in the TESS bandpass of 600 − 1000 nm.
Our work highlights the richness of datasets such as TESS ' to perform detailed analyses of known exoplanet systems, and how it can allow for the detection of relatively rare phenomena such as transit depth variations, which was the focus of study of the present work.With this data-driven view of transiting exoplanets in mind, it is also interesting to note that the Nancy Grace Roman Space Telescope, due to launch by 2027, is expected to yield ∼ 100, 000 transiting planets through the photometric monitoring of 100 million stars in its microlensing survey (Montet et al. 2017;Wilson et al. 2023).With ∼ 1% of targets having detectable transit depth variations in our sample, by extrapolation we could expect of order ∼ 1, 000 transiting planets discovered by Roman to display transit depth variability at a significant level.Such a large sample could, in turn, allow us to explore in detail the cause of those variations through population level studies -an endeavor for which our current sample is much too small.

ACKNOWLEDGMENTS
We thank the anonymous referee for their comments and suggestions, which significantly improved our results and their presentation.This work includes data collected by the TESS mission, obtained from the MAST data archive at the Space Telescope Science Institute (STScI).Funding for the TESS mission is provided by the NASA Explorer Program.STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555.The TESS data described in this work may be obtained from the MAST archive at doi.org/10.17909/t9-nmc8-f686(MAST Team 2021).This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with

Figure 1 .
Figure 1.Radius-period distribution for planets in our sample, color-coded by number of transits.The majority of planets in our sample are hot Jupiters with period less than 10 days and radius ∼ 10 − 20 R⊕.A small number of roughly Earth-size planets are also present.

FrequencyFigure 2 .
Figure 2. Number of transits for targets in our sample.

Table 1 .
Sample priors and posteriors for fits of CoRoT- Scaled semi-major axis * Corresponding posterior for out-of-transit fit a N (a, b) represents a Normal prior with median a and standard deviation b b J (a, b) represents a Jeffreys prior between a and b c U(a, b) represents a uniform prior between a and b

Figure 3 .
Figure3.Ratios for randomly-selected samples of different sizes for Qatar-10b.Ratios are calculated starting from 2 randomly-chosen transits until all 218 transits for the target are included.Even omitting just a small number of transits can result in a large change in the ratio, demonstrating that the F-ratio is not immune to changes in the number of transits.

Figure 4 .
Figure 4. Schematic showing our procedure for calculating the nested ratio.

Figure 5 .
Figure 5. Scatter plot of absolute variability vs TESS magnitude.

Figure 6 .Figure 7 .
Figure6.Periodograms for and HIP 65  Ab from top to bottom.In each panel, horizontal dashed lines indicate 10%, 1%, and 0.1% false-alarm probabilities (from bottom to top).All targets have peaks with false-alarm probabilities of less than 0.1%.

Table 2 .
Amplitude sigmas for each of the 4 targets.