A Study of Two Periodogram Algorithms for Improving the Detection of Small Transiting Planets

The sensitivities of two periodograms are compared for weak signal planet detection in transit surveys: the widely used Box Least Squares (BLS) algorithm following light curve detrending and the Transit Comb Filter (TCF) algorithm following autoregressive ARIMA modeling. Small depth transits are injected into light curves with different simulated noise characteristics. Two measures of spectral peak significance are examined: the periodogram signal-to-noise ratio (S/N) and a false alarm probability (FAP) based on the generalized extreme value distribution. The relative performance of the BLS and TCF algorithms for small planet detection is examined for a range of light curve characteristics, including orbital period, transit duration, depth, number of transits, and type of noise. We find that the TCF periodogram applied to ARIMA fit residuals with the S/N detection metric is preferred when short-memory autocorrelation is present in the detrended light curve and even when the light curve noise had white Gaussian noise. BLS is more sensitive to small planets only under limited circumstances with the FAP metric. BLS periodogram characteristics are inferior when autocorrelated noise is present due to heteroscedastic noise and false period detection. Application of these methods to TESS light curves with known small exoplanets confirms our simulation results. The study ends with a decision tree that advises transit survey scientists on procedures to detect small planets most efficiently. The use of ARIMA detrending and TCF periodograms can significantly improve the sensitivity of any transit survey with regularly spaced cadence.


INTRODUCTION 1.Difficulties with detecting small transiting planets
The transits of giant Jovian planets producing periodic ∼ 1% dips in brightness can be easily seen in photometric light curves produced by space-based observatories such as COROT (Baglin et al. 2008), Kepler (Borucki et al. 2010), K2 (Howell et al. 2014), Corresponding author: Eric D. Feigelson TESS (Ricker et al. 2015), and likely in the forthcoming PLATO (Rauer et al. 2014) and Roman Space Telescope (Spergel et al. 2015) missions.However, achieving planned goals to discover suspected large populations of smaller planets has proved challenging.Predictions that several thousand transiting planets will emerge from analysis of TESS data (Barclay et al. 2018;Kunimoto et al. 2022) are, at present, overly optimistic1 .Problems arise because transits from smaller rocky planets producing 0.01% − 0.1% periodic photometric dips are often masked by other sources of photometric variability (Gilliland et al. 2011): rotational modulation of starspots (McQuillan et al. 2014); microvariability from stochastic stellar magnetic activity (Aigrain et al. 2004); contamination by eclipsing binaries blended in the large pixels of wide-field telescopes (Torres et al. 2011); instrumental effects involving satellite operations (Vanderburg & Johnson 2014); red noise (Pont et al. 2006); and unavoidable detector photon noise.
The challenge of small planet detection requires solving complicated problems in time series analysis.Several stages of analysis are needed: 1.The light curve is detrended to remove aperiodic or quasi-periodic variations unrelated to strictly periodic planetary orbits.Detrending is typically pursued using nonparametric or semi-parametric methods such as spline fitting, wavelet transforms, or Gaussian Processes regression (e.g.Jenkins et al. 2002;Gibson 2014;Lightkurve Collaboration et al. 2018a;Hippke et al. 2019;Feinstein et al. 2019;Montalto et al. 2020;Foreman-Mackey et al. 2021;Guerrero et al. 2021).
2. The detrended light curve is typically searched for transit-shaped periodic dips using the parametric Box-Least Squares (BLS) algorithm developed by Kovács et al. (2002).For each trial period, a boxshaped signal is fit to the folded light curve for a range of transit durations and phases.A BLS periodogram is constructed using the strongest signal found at each period, and spectral peaks are investigated as possible transit signals.Kovacs et al. show that this procedure is more sensitive to faint box-shaped dips than Fourier periodograms (or, for irregular observation cadences, Lomb-Scargle periodograms; Scargle 1982) that search for sinusoidal signals and more sensitive than nonparametric periodograms that search for arbitrarily shaped signals such as phase dispersion minimization (Stellingwerf 1978).
3. Procedures are applied to cull False Alarms and astronomical False Positives (particularly contaminant eclipsing binaries in telescopes with lowresolution images) from spectral peaks above some threshold.This typically involves a combination of machine learning classification (such as Random Forest or neural network) and human or automated vetting.Vetting procedures for transit planet detection are discussed by Thompson et al. (2015), Twicken et al. (2018), Hedges (2021), Guerrero et al. (2021), Melton et al. (2023a), and others.
For the Kepler and TESS space missions, these procedures are used by NASA science teams to generate Kepler and TESS Objects of Interest (KOIs and TOIs) that are then passed to ground-based telescopes for further study (Twicken et al. 2016;Guerrero et al. 2021).However, these standard procedures for planet detection may have technical deficiencies that other statistical approaches might improve.At least two issues might be considered.First, detrenders based on a function with a kernel or a mother wavelet with constant bandwidth can miss short-memory variations.The main concern is autoregressive variations characteristic of stellar magnetic activity.Autoregressive behaviors occur when future values of a time series depend, at least in part, on current and past values.Its presence can be easily checked by plotting the autocorrelation function of the detrended light curve to see if short-memory autoregressive behaviors are present.Formal statistical time series diagnostics, such as the Shapiro-Wilk and Ljung-Box tests, can determine if the light curves deviate from Gaussian white noise.This is a significant problem: 36% of light curves from TESS Full Frame Images (FFIs) show statistically significant short-memory stochastic variability after spline detrending (Melton et al. 2023b, their Figure 5).
Second, periodograms have complicated and poorly understood statistical properties that hinder the straightforward identification of significant peaks representing actual periodic behaviors.Even in classical Fourier analysis, the theorems underlying the statistical distribution of periodogram peaks apply only to the unrealistic situation of an infinitely long, uninterrupted, evenly-spaced data stream of Gaussian white noise with a single sinusoidal signal (Percival & Walden 2009).The Lomb-Scargle periodogram (Lomb 1976;Scargle 1982) was proposed as an extension to the classical periodogram to alleviate issues arising due to irregular data spacing while providing other statistical benefits.
However, several issues in periodogram analyses still persist.Spurious spectral peaks can arise from periodic instrumental effects (such as data gaps associated with the satellite orbital period or data downloads) for realistic time series arising from space-based missions.Complicated aliases of true signals often appear (VanderPlas 2018).Periodograms can exhibit undesirable behaviors even when no interesting signal is present: Ofir (2014) shows that the noise properties of BLS periodograms often have trends in value and noise (heteroscedasticity) as a function of the trial period.
The first issue concerning autoregressive variations that may escape removal by detrenders has a clear treatment using low-dimensional parametric autoregressive moving average (ARMA) models.These are commonly combined with a simple nonparametric detrender involving 'differencing' (or differentiating) the time series.The resulting ARIMA modeling (also known as Box-Jenkins analysis) and its many extensions have dominated analyses of stochastic time series for the past 50 years in engineering signal processing, econometrics, and other fields.Autoregressive modeling dominates most textbooks; the foundational text by Box et al. (2015) has over 50 thousand citations and a Nobel Prize in Economics was awarded for the non-linear GARCH model that introduced stochastic volatility to the simpler linear ARIMA model.
ARIMA detrending for transiting planet detection was introduced by Caceres et al. (2019a) and found to be effective in reducing unwanted light curve variations in most Kepler and TESS light curves (Caceres et al. 2019b;Melton et al. 2023b).The general utility of ARMA-type modeling for astronomical time domain studies is discussed by Feigelson et al. (2018).
However, the traditional BLS algorithm can not be applied to ARIMA residuals because the differencing operation changes a sequence of box-shaped into a sequence of double spikes.Caceres et al. (2019a) developed the Transit Comb Filter2 (TCF) as an alternative algorithm to produce periodograms of ARIMA residuals.As the double spike reflects only the ingress and egress, it would seem to have less information and therefore be less sensitive to weak transits than BLS fitting.However, in searching for small planets in the 4-year Kepler data, Caceres et al. (2019b) find that the TCF periodogram has low noise and is remarkably sensitive to faint transits corresponding to Earth-and Mars-sized planets.Similarly, Melton et al. (2023c, their Figure 16) find that the ARIMA-TCF procedure tends to find smaller planets than other pipelines used to generate TOIs.
In addition to the choice of the periodogram algorithm, the transit scientist must decide on the best measure for the strength of a spectral peak.The most straightforward approach is to locate the period with the highest periodogram power.However, strong peaks can be produced by autocorrelated noise in the light curve or aliasing associated with an irregular cadence.Noting heteroscedasticity and trends in periodograms' response to noise, Ofir (2014) recommends using local signal-tonoise ratios of the detrended BLS periodogram.
Evaluating False Alarm Probabilities (FAPs) associated with the chosen measure is tricky even for traditional Fourier analysis of Gaussian white noise data (Percival & Walden 1993) and even more difficult when irregular cadences or autocorrelated noise is present.Issues concerning FAP estimation have been extensively discussed in the context of Lomb-Scargle periodograms (LSPs).Early analytic FAPs for LSPs (Scargle 1982;Horne & Baliunas 1986) were shown to be unreliable (e.g., Koen 1990;Schwarzenberg-Czerny 1998), and numerous improvements were suggested.Two broad approaches to estimating statistically significant periodic signals from periodograms have emerged: one based on the 'sigma' noise level of the periodogram (e.g., Scargle 1982) and another based on periodogram extreme values.One outcome of these astrostatistical analyses is that procedures ignoring most periodogram noise values might better measure true periodicities.The statistical field of extreme value theory (EVT) based on the Fisher-Tippett-Gnedenko Theorem provides a mathematical foundation for evaluating peak significance in astronomical LSPs (Baluev 2008); we review the underlying mathematics in Appendix A.
EVT has been widely applied to problems in geology, finance, and engineering; for example, EVT helps evaluate whether a storm exceeds a '100-year hurricane' or whether a sudden stock market change is a fluctuation or a 'crash'.The mathematics and many applications of EVT are presented in texts like Coles et al. (2001) and Castillo et al. (2004).The application of EVT to Lomb-Scargle periodogram FAPs has been discussed in astrostatistical studies (Baluev 2008;Süveges 2014;Süveges et al. 2015;Sulis et al. 2017;Vio et al. 2019;Delisle et al. 2020;Koen 2021;Giertych et al. 2022) and the review by VanderPlas (2018).EVT is also gaining increased attention for other astronomical applications, including solar, stellar, galaxy, and cosmological studies (e.g.Asensio Ramos 2007; Pratt et al. 2017;Waizmann et al. 2012;Davis et al. 2011).We extend the application of EVT-based detection to periodograms specializing in planetary transit detection guided by the approach of Süveges (2014) as described in Appendix A.

Scope of This Study
The present study aims to investigate statistical issues related to the behaviors of BLS and TCF periodograms and their sensitivities to small planetary transit signals under different noise conditions.Much of our effort is based on the analysis of simulated light curves described in §2.1, where we discuss two metrics to evaluate the significance of periodogram peaks.This reveals previous properties of the two periodograms that extend the work of Ofir (2014) and show their dependencies on light curve properties ( §3-4).We then illustrate the two periodograms on real TESS light curves with known small planets ( §5).After a discussion of the findings ( §6), the study ends with advice to transit survey scientists on the best approaches for small planet detection ( §7).
Our analysis is not intended to be a comprehensive study of periodograms for transit study.The effort here is limited to comparing the performance of two periodograms, BLS and TCF, applied to light curves with continuous evenly-spaced cadences.Our simulations described in §2.1 make specific assumptions about the transit shape and exclude some astrophysical effects.Both are low-dimensional parametric procedures with fixed functional forms at a chosen trial period: a rectangular box for BLS and a double-spike pattern for TCF.We do not treat cases with heteroscedastic light curves where different data points have different weights, nor cases where the noise is non-Gaussian.
We do not consider other periodograms based on sinusoidal variations, such as the Schuster periodogram of classical Fourier analysis or its Lomb-Scargle extension to irregular observing cadences (Scargle 1982), nor do we consider nonparametric periodograms such as phase dispersion minimization (Stellingwerf 1978) and minimum string length (Dworetsky 1983).The Transit Least Squares (TLS) algorithm (Hippke & Heller 2019;Heller et al. 2022), an important variant of BLS with ingress and egress transit shapes arising from astrophysical modeling, is not analyzed.Period search procedures with different statistical approaches are not treated, including Waldmann (2012) and Zucker (2015).We also ignore evaluations of periodicity significance in the time domain, such as the classical Wald test (Pont et al. 2006).We assume datasets with evenly spaced time series (although missing data may exist).Thus, we do not consider highly irregular light curves typically emerging from ground-based telescopic surveys.Our calculations use the original BLS algorithm of Kovács et al. (2002), and we only briefly mention faster algorithms in Appendix B. Comparison of period finding algorithms, although with different focuses, has been performed in previous studies (e.g., Graham et al. 2013 and references therein).
One method not examined here may have high sensitivity to small planets.Gregory & Loredo (1992) formu-late a likelihood for Bayesian period search assuming an arbitrarily shaped transit.A sensitive likelihood-based periodogram might emerge if one inserts a strong prior for a periodic box or double spike shape with a small duty cycle.

Construction and Analysis of Simulated Light Curves
Simulations provide an excellent way to assess and compare periodogram peak significance due to full control over noise conditions.Our simulations are designed to roughly resemble single-sector observations from the TESS satellite prime mission survey.The light curves have a uniform 0.5 hr observing cadence.No gaps in observations are included.The noise behaviors are stationary-no changes during the entire time span of the light curve are modeled.In analogous to observed light curves, this implies that nonstationarity has previously been removed with a detrending procedure: moving average filter, spline or Gaussian Processing regression, wavelet transform, or similar operation.
Two noise models are constructed.The first assumes Gaussian white noise with mean 0 and standard deviation σ 0 = 1 × 10 −4 or 100 parts-per-million (ppm).This is characteristic of TESS photon noise levels for bright stars with T ≃ 7 − 8 where super-Earths might be detected around solar mass stars, and is similar to the simulation noise level assumed by Hippke & Heller (2019).
The second noise model assumes an autoregressive moving average (ARMA) process with order ARMA(3,3) and coefficients sufficiently high to give statistically significant autocorrelation up to ≃ 5 hours.Specifically, we assume the flux value X t at time t is where ϵ t = N (0, σ 2 ) is a white noise process with σ = 1 × 10 −4 and the ARMA coefficients are set to ϕ = (0.2, 0.3, 0.2) and θ = (0.2, 0.2, 0.3) 3 .While these coefficients may be larger than realistically present in many detrended TESS light curves, the simulations are designed to reveal clear differences between periodogram performance for white and correlated noise.For reference, ARIMA models have been applied to large samples of Kepler and TESS light curves by Caceres et al. (2019b) and Melton et al. (2023b), respectively, based on the methodology described by Caceres et al. (2019a) and Melton et al. (2023a).
Simulations with a hypothetical transiting planet contain periodic transits with varying characteristics such as the orbital period, transit depth, duration, noise type, and total transits ( §4).The shape of the transits is modeled as a trapezoid with 30 min ingress and egress; limb darkening, impact factor, and other possible effects are not included.We simulate only one planet; systems with multiple transiting planets producing multiple periodic light curves are not examined.
Before calculating the BLS periodogram when autocorrelated noise is simulated (but not when pure Gaussian white noise is simulated), the simulated light curves are subject to Gaussian Processes regression, which removes trends but may leave short-memory autocorrelation.We use the software implementation gausspr in CRAN package kernlab (Karatzoglou et al. 2023) within the R statistical software environment (R Core Team 2022) based on the methodology described by Karatzoglou et al. (2004) and Williams & Barber (1998).We use the squared exponential kernel (the radial basis kernel function) as the covariance function.The kernel width hyperparameter, σ, is set using a heuristic to set a reasonable value based on the data (controlled by the keyword kpar = 'automatic') 4 .
Before calculating the TCF periodogram, the simulated light curves are subject to ARIMA modeling that effectively removes both trends and short-memory autocorrelation, leaving residuals close to white noise.We use the software implementation auto.arima in CRAN package f orecast that automatically calculates maximum likelihood fits for a range of (p,0,q) orders and selects the best model based on the Akaike Information Criterion that is penalized for model complexity (Hyndman & Khandakar 2008;Hyndman et al. 2023).We restrict model complexity to p, q ≤ 5.
Periodograms are then calculated from the detrended light curves to reveal transit-shaped periodic behaviors.As outlined in §1, we will be examining the sensitivity of two periodograms − BLS and TCF − and two measures of significance in periodogram peaks: an SNR that takes into account trends and heteroscedasticity in periodogram noise; and an EVT-based probability of a standardized periodogram that ignores periodogram noise and considers only extreme values.Such a combination has been used since both metrics have their advantages and disadvantages: FAPs affected by aliasing-a common feature in astronomical periodograms (Baluev 2013;Baluev 2008;Süveges 2014); SNR affected by complex alias structures, non-Gaussianity (Caceres et al. 2019a).These significance measures are described in §2.2 and §2.3, respectively.

Periodogram Peak Metric Based on Signal-to-Noise Ratios
It is not improbable that a periodicity from a small planet produces a peak in a periodogram that can not be unambiguously interpreted as a transiting planet due to the noise characteristics of the periodogram in surrounding frequencies.If the periodogram power values and noise variance are not heteroscedastic, this problem can not be effectively captured in bootstrap procedures as used in EVT-based analysis ( §2.3).However, this situation is treated by a local SNR measure providing the noise is estimated using frequencies close to the peak of interest.We emphasize that the noise considered here is in the frequency-domain periodogram, not the noise in the original time-domain light curve as considered in many other studies (e.g.Kovács et al. 2002;Pont et al. 2006;Fressin et al. 2013;Dressing & Charbonneau 2015).
To account for the periodogram trends (typically a rise in the mean level of the periodogram with the increasing period as noted by Ofir 2014), the periodogram is detrended using a smoother designed to be robust against non-Gaussianity and outliers.We use the median (50% quantile) curve of a quadratic smoothing B-splines with roughness penalty parameter λ = 1 using the method developed by Ng (1996) and Ng & Maechler (2007).Twenty equally-spaced spline knots are used.Code implementation is provided by CRAN package cobs (Ng & Maechler 2022).
We then define the SNR of the periodogram peak as where Power peak denotes the peak periodogram power, MAD peak is the median absolute deviation of nearby periodogram powers measured in a window of frequencies around the peak under study after the periodogram has been detrended ( §2.1).
The MAD is a robust measure of local scatter that, unlike the usual root-mean-square σ value, is insensitive to strong non-Gaussianity of periodogram noise values.The MAD is used in this context, for example, by Vanderburg et al. (2016) in a search for K2 transiting planets.We note that there are no theorems associated with our definition of SNR, and its statistical distribution is unknown.
There is little guidance on defining a 'nearby' region of the periodogram to measure the noise; we select a window of 3000 periods symmetrically centered around the peak.This choice is generally sufficiently large to give a good estimate of the MAD but sufficiently narrow to avoid heteroscedasticity (changes of noise amplitude with the period) in the periodogram.

Periodogram Peak Metric Based on EVT False Alarm Probabilities
Mathematical background of extreme value theory (EVT) and its application for periodogram peak significance is given in Appendix A. Our approach of applying EVT ( §A) to periodogram peak evaluation closely follows Süveges (2014); see also Suveges (2012) and Süveges et al. (2015).Here a non-parametric bootstrap procedure is combined with EVT to estimate FAPs for the peak in a periodogram.The first step is generating R bootstrap samples from the original time series.The next step is calculating the periodogram for each of the R bootstrapped series and selecting the maximum of each periodogram, where each periodogram is computed on K × L frequencies selected from the entire frequency grid.This yields a sample of R maximum values to which the GEV distribution is fit to obtain the FAP of the peak in the original periodogram.These K ×L frequencies are selected by randomly selecting L non-overlapping frequency intervals, each with K consecutive frequencies.Such a selection ensures that long-range dependencies are accounted for due to L, while short-range dependencies (spectral leakage) are accounted for due to K.
We have conducted tests to ensure that the sensitivity to faint planets evaluated using our EVT procedures is largely independent of reasonable choices of R (100 − 500), K (1 − 5), and L (100 − 500), similar to the stability experiments performed by Süveges (2014).These tests were made for simulated light curves with Gaussian white noise and autocorrelated noise.One might also predict that unnecessarily higher oversampling might introduce spurious small-scale structures in the periodogram.This effect might be present for K ≃ 10−20 but is not seen in our simulations for K ≤ 5.
A GEV distribution is then fit by maximum likelihood to the sample of R maxima to obtain the FAP of the peak in the original periodogram.The quality of the GEV fit is checked with the Anderson-Darling test (Stephens 1974), requiring p-value > 0.01.A similar "shortcut" described in Koen (2015) was used: the three estimated GEV parameters were treated as known while calculating the p-value for these tests.This goodnessof-fit test is needed because the GEV is only a limiting distribution.If the model is deemed valid, it can be used for estimating the FAP of the observed peak in the periodogram.Here where G is the fitted GEV distribution, and x is the peak power of the periodogram calculated on the original time series.Despite the common practice of oversampling periodograms, which introduces dependency, approaches based on EVT have shown plausible results, provided one has verified the GEV fit quality.
To facilitate the comparison of periodograms using FAPs, we apply the EVT procedure described above on "standardized" periodograms, i.e., with its trend removed and normalized by the local scatter, that converts the periodogram powers to similar scales.The detrending is performed with the same procedure used for SNR calculation described in §2.2.The local scatter of the detrended periodogram is then estimated as a running MAD using ten windows in the entire frequency grid.The periodogram powers at the edge of the frequency grid are handled by taking their MAD.The local scatter implementation is taken from the runmad function from the caTools CRAN package version 1.17.1.We advocate the use of a "local" scatter measure rather than a "global" scatter measure used previously by, e.g., Hippke & Heller (2019), to account for the heteroscedastic noise structure observed in the BLS and TCF periodograms (Ofir 2014;Caceres et al. 2019a).Standardization is also performed on the partial periodograms before extracting their maxima for GEV fitting.
As discussed in §A, it is important to note that this approach is not directly applicable to correlated time series.However, here the light curves have been detrended prior to periodogram applications when correlated noise is simulated and thus are close to uncorrelated.We remind the reader that detrending is needed twice: once on the light curves and second on the periodograms; however, both differ.

DETAILED EXAMINATION OF A SINGLE
TRIAL OF TRANSITING PLANETS

Sensitivity comparison using FAP and SNR metrics
We show the results on BLS and TCF periodograms with SNR and FAP peak evaluation metrics for sim-ulated light curves with Gaussian and autocorrelated noise characteristics described above.For illustration here, we have injected transit signals for planets with two different sizes assuming an orbital period of two days and a two-hour transit duration.The results are shown in Figures 1-4.
In the simulation with Gaussian white noise, the transits can be seen visually in the light curve for the larger planet (simulated depth = 200 ppm, Figure 1), but they are lost in the noise for the smaller planet (simulated depth = 68 ppm, Figure 2).The signal is recovered for the larger planet and only marginally recovered for the smaller planet by the BLS and TCF periodograms.However, the TCF gives a substantially stronger signal for Gaussian white noise: SNR = 40.1 vs. 20.8 for the larger planet and 11.1 vs. 7.5 for the smaller planet.Upward trends in noise values are seen in the TCF periodogram for longer periods (red cobs curves) that are removed by standardization (fourth row).The BLS periodogram shows heteroscedasticity in noise as the period changes, as noted previously by Ofir (2014) and Caceres et al. (2019a).Similar to Ofir (2014), we observe that the BLS scatter increases at longer periods (i.e., smaller frequencies) in the periodogram, which was found by observing an increasing trend in the scatter estimate described in $2.3.Thus, an increasing trend in the power and the power scatter towards longer periods suggests that the distribution of powers at shorter and longer periods could differ.
Although the TCF has greater sensitivity than BLS using the SNR metric due to a higher SNR, BLS is considerably more sensitive using the FAP metric.This is shown in the annotations in the fourth rows of Figures 1-2 where FAP ∼ 5 × 10 −5 for BLS and ∼ 4 × 10 −2 for TCF with the smaller simulated planet.For the larger planet, secondary peaks are mostly aliases of the injected 2.0 day period, while random noise peaks are the principal source of secondary peaks for the smaller planets.
Figures 3-4 show the periodogram analysis using autocorrelated noise.It is difficult to unambiguously see the 2-hour transits visually in the light curve, even for a planet with depth = 400 ppm, twice the depth needed for a similar SNR for TCF when only Gaussian white noise was present.With autocorrelated noise, the ARIMA fitting and the TCF periodogram give SNR = 44.1, while the BLS periodogram is severely degraded with SNR = 8.2.Here a strong trend in BLS periodogram power levels for periods without true signal is present.Standardization of the BLS periodogram removes this strong trend (fourth row of Figure 3).However, the FAP value of 4 × 10 −4 is now worse than for the TCF periodogram.The TCF periodogram possesses only mild trends, unlike BLS.
The situation for the smaller planet in the presence of autocorrelated noise is similar to the larger planet (Figure 4).The TCF periodogram captures the injected periodic transit without difficulty (SNR = 21.6), while it is hardly detected in the BLS periodogram (SNR = 4.3).The FAP metric gives a significant detection for TCF (FAP = 7 × 10 −7 ) but an insignificant detection for BLS (FAP = 4 × 10 −2 ).BLS shows strong trends and heteroscedasticity in the periodogram, while these problems are milder for TCF.Altogether, the ARIMA fitting and TCF periodogram are much better behaved than the BLS periodogram in the presence of autocorrelated noise.Thus, FAP and SNR present dissimilar conclusions regarding peak significance for Gaussian white noise but similar conclusions for autocorrelated noise.
We thus see a big difference in periodogram noise characteristics in response to light curves with Gaussian white noise vs. autocorrelated noise.The BLS and TCF periodograms share similar properties − mild trends and heteroscedasticity − for Gaussian white noise.The TCF periodograms have a similar structure even for autocorrelated noise, as the autoregression is effectively removed by the ARIMA modeling that precedes TCF.However, Gaussian Processes (or similar nonparametric local) regression applied before BLS leaves significant autocorrelation if it was present in the original light curve.The BLS periodogram in Figures 3-4 thereby exhibit undesirable strong behaviors not present in BLS periodograms in Figures 1-2 with Gaussian white noise.Consequently, BLS has less ability to detect small planets if autocorrelation in the light curve still persists after detrending.
We thus find, for this single simulation, that TCF detects small planets more effectively both for simulated light curves with Gaussian white noise (Figures 1-2) and with autocorrelated noise (Figures 3-4).
The FAP and SNR values shown in the plots correspond to a single realization of noise used to create the light curves.In practice, we have observed that the FAPs change when different noise realizations are used (see §4.1 for more details).

Supplemental periodogram analyses
We can finally inquire into the accuracy of the transiting planet depth obtained from the BLS and TCF periodograms.For Gaussian noise, the depth estimates in both periodograms overestimate the simulated depths.The situation differs for autocorrelated noise, where the TCF underestimates the true depth.Inaccuracies in TCF depth estimation may have several causes: (a) incorporation of some transit signal into the ARMA fit can reduce the estimated depth; (b) "over-differencing" by ARIMA can produce an anti-correlation at lag = 1 and increase the estimated depth; and (c) inaccurate registration of the cadence with respect to the transit ingress and egress can reduce the double-spike signal and estimated depth.See Figure 7 in Melton et al. (2023a) and also Melton et al. (2023b) for more discussion.
These problems with TCF depth estimation were noted by Caceres et al. (2019b) and Melton et al. (2023b) in their Kepler and TESS applications, which required improvement during the vetting phase of analysis.Difficulties with transit depth estimation also arise in BLS transit fitting, as discussed by Kovács et al. (2002) and Ofir (2014).Altogether, planet parameters derived from periodograms alone may be inaccurate in complicated ways.
As expected from the mathematical discussion in §A, one can visually see in rows 4-5 of Figures 1-4 that the FAP significance calculation based on EVT depends on the rightmost tail region of the periodogram power distributions.The histograms also provide information about the overall noise characteristics of the periodogram.Ideally, for a single true periodic signal, the rightmost bin in the histograms would be a single iso- An effect worth pointing out in the above figures is that post-standardization, the local periodogram noise at shorter periods is enhanced, which arises due to the heteroscedastic noise pattern of the periodogram.A modified windowed approach could be used to mitigate this issue to some extent; however, we do not deal with it here since the FAP calculation does not consider the periodogram noise.
A few other remarks are as follows: 1. We have also verified the P −1/3 and P −1/2 dependence of BLS and TCF periodogram peak, where P is the planet period, as described in Caceres et al. (2019a).To achieve this, we simulated three planets with the same transit depth and duration but varying the period and number of transits so that the total length of the light curve remains approximately constant6 .The BLS and TCF periodogram peak powers scale as 1:0.48:0.21 and 1:0.41:0.14, which is approximately similar to P −1/3 and P −1/2 , respectively.2. Another notable observation in the autoregressive case (Figures 3-4) is that the widths of the BLS periodogram peaks are larger than TCF.We generate simulated light curves for a range of planet and light curve properties following procedures described in §2.1.All light curves in this section have a 0.5 hr cadence.Noise characteristics are white Gaussian noise or autoregressive noise following equation (1).Injected planets have trapezoidal transits with 0.5 hr ingress and egress.Computation of the BLS periodogram is a modified version of the original Fortran'77 BLS routine that accounts for edge effects and uses binning for computational efficiency (Kovács et al. 2016).The TCF implementation is the Fortran code at Cac- eres & Feigelson (2022) with minor modification7 .The SNR metric is calculated using equation 2, and the FAP metric is calculated as described in §2.2 with K = 2, R = 300, and L = 300.BLS and TCF periodograms are calculated using uniform frequency sampling8 .
To compare the performance of periodograms under different conditions, we define a threshold called "minimum detectable depth" (MDD) of a small planet based on the SNR and FAP metrics.No clear consensus has emerged in the research community on the best thresholds that balance sensitivity for small planet detection against False Alarm reports.SNR thresholds used for (often standardized) BLS periodograms include SNR > 6 (Kovács et al. 2002), SNR > 15 (Ofir 2014), SNR > 9 (Vanderburg et al. 2016), and SNR > 5 (Shallue & Vanderburg 2018).FAP thresholds are widely used for planet detection using Lomb-Scargle or BLS periodograms with values typically ranging over 0.001 < FAP < 0.01 (e.g., Maxted et al. 2011;Lund et al. 2014).The threshold FAP = 0.003 corresponding to the Gaussian 3σ criterion lies in this range.
Since our scientific goal here is to maximize sensitivity for small planets and not to minimize False Alarms, we use relatively low thresholds here 9 .We define the MDD of a small simulated planet as the transit depth at which SNR > 6 or FAP < 0.01 depending on whether FAP or SNR is used.We calculate MDD values for chosen properties of the light curve or planet properties with trial planet injections of different depths, and the MDD is the lowest depth at which the planet depth is still significant, as quantified by the FAP or SNR of the periodogram peak.Our internal tests have shown that, for some cases, the FAPs of periodogram peaks change considerably across different noise realizations of the light curve but are stable in many other cases.The instability of FAPs across different realizations poses no issues for very high or very low FAPs but only for FAPs near the set threshold, 0.01.Subsequently, the MDD values are averaged across ten distinct noise realizations in all cases to get more reliable sensitivities.Planet properties cover the range of Kepler-discovered planets that might be detected in a single sector of TESS observations.
We note that some of our simulations have the number of transits as a tunable parameter, as this most clearly reveals differences between periodogram performance.This differs from observational surveys, where the availability of time for observing a host star, rather than the number of transits, is known beforehand.

Results
Figures 5-6 compare BLS and TCF as a function of the properties of the light curve (number of transits, white Gaussian vs. autocorrelated noise), the properties of the injected planets (orbital period, transit duration) and the statistical metric for planet detection (SNR vs. FAP).
The four panels of Figure 5 give insight into a critical effect discussed in §3: the sensitivity of the BLS and TCF periodograms is reversed depending on the nature of the light curve noise and chosen detection metric.For Gaussian white noise and the FAP metric (upper left panel), BLS is more sensitive than TCF (the blue curve lies below the orange curve).The periodogram sensitivities are similar for autocorrelated noise using the FAP metric, while TCF is considerably more sensitive when the SNR metric is used.Differences are most substantial when fewer transits are present; the choice of periodogram and detection metric becomes unimportant for short orbital periods embedded in long-duration light curves since the number of transits in such cases is sufficiently large.
As expected, sensitivity to small planets improves by extending light curves to include more transits; more points are available for building up the box-like transit signal for BLS and the double-spike signal for TCF.However, the improvement ceases after sufficient transits are observed, as both periodograms reach a fixed MDD.In these simulations with Gaussian white noise σ = 100 ppm, we set this limit to MDD ≃ 50 ppm (horizontal dashed lines).TCF's periodogram peak has an SNR more significant than the threshold, 6, even with two transits, as seen in Figure 5.A larger SNR threshold could increase TCF's MDD for the two-transit case.Overall, the FAP metric is less effective than the SNR metric for small planet detection when the number of transits is small.
Figure 6 compares the sensitivity of the BLS and TCF periodograms to the orbital period and transit duration.For most combinations of light curve noise characteristics and detection metrics, the MDD does not exhibit strong dependencies on these orbital properties.BLS's sensitivity shows some benefit from longerduration transits; this effect is expected as BLS has more points to fit the box with its least squares algorithm.The TCF algorithm considers only the double spikes from ingress and egress and is thus not sensitive to transit duration (provided the period and number of transits are fixed).
The primary trend seen in Figure 6 is the deterioration in MDD for the BLS periodogram in the presence of autocorrelated noise (blue curve in the right panels, first and second rows).We suspect that this arises from the timescale of the autoregressive component we added to the light curve noise compared to the timescale of the orbital period.For P = 0.5 days, the variability structure covers a wide duty cycle in the folded light curve.However, for P = 7 days, the structure is confined to a narrow range of phases that mimic planetary transits.This points to the importance of effectively removing correlation in the light curve on timescales comparable to a transit duration.On the other hand, the ARIMA + TCF procedure removed the short-memory autocorrelation and ignored the relationship between period and transit duration.It thus has near-optimal MDD performance for the full range of periods for the SNR metric (orange curve in the right panel, second row).TCF benefits from longer-period transits using the FAP metric.In contrast, BLS only benefits when the noise is white Gaussian and instead deteriorates for autocorrelated noise (orange and blue curves in the two columns in the top row for TCF and BLS, respectively).Overall, the number of transits, not the period and duration (when the number of transits is fixed) is the dominant factor that affects the sensitivity of the BLS and TCF periodograms.

APPLICATION TO TESS LIGHT CURVES
To complement our comparison of the BLS and TCF periodograms in simulated lightcurves ( § §2-4), we ap- ply the procedures to four TESS FFI light curves drawn from the DTARPS-S survey (Melton et al. 2023a,b,c) that contain true known small exoplanets.Trends in these light curves have been removed using splines; however, we still preprocess the light curve using Gaussian Processes regression to maintain an analysis procedure similar to our simulations above10 .We conduct the analysis without using the known period.
The test performed here uses realistic rather than simplified light curves, no control over the noise level or characteristics, and gaps in observations from satellite operations.
Since ARIMA requires uniformlyspaced time series, one can consider the observations evenly spaced with missing data points (Feigelson et al. 2018).The data gaps create spurious structures in periodograms, which may particularly affect GEV fits and associated FAPs (Süveges 2014).
Figures 7-10 and Table 1 show the results of the periodogram analysis on the four TESS light curves.One can see that both BLS and TCF periodograms obtained spectral peaks at the true orbital periods in all four cases.Other effects in Table 1 are very similar to those found in the simulations.BLS and TCF estimated transit depths tend to underestimate true depths.All peak SNR values are much higher for TCF than BLS, while peak FAP values are significant for both BLS and TCF in all four cases.
The autocorrelation functions in Figures 7-10 show mild anticorrelations with lags up to 10 hours.Consequently, the periodograms show only mild trends and heteroscedasticity with the period.The expected aliases associated with the true period are seen in both periodograms.The TCF periodogram generally has a lower noise, giving it a higher peak SNR than BLS.The mild autocorrelation observed in these four cases also suggests that the expected comparisons should follow more closely to the Gaussian white noise simulations than the autoregressive noise simulations, which are observed here.
We thus find a complete validation of the simulation results in these real TESS FFI light curves.

The light curves are detrended with a Gaussian
Processes regression model prior to applying the BLS periodogram and with an ARIMA regression model prior to applying the TCF periodogram ( §2.1).Both remove long timescale trends but the latter also removed short-memory stochastic autocorrelation.
3. Two statistical metrics are applied to decide whether a periodogram peak represents a true planetary signal: a Signal-to-Noise Ratio (SNR) measured locally in the periodogram using a robust noise measure and a False Alarm Probability (FAP) based on extreme value theory ( §2.2-2.3).In the latter case, a Generalized Extreme Value (GEV, §A) statistical model is fitted to the peaks of bootstrapped periodograms calculated using only a portion of the frequency range to alleviate high computational costs, as proposed by Süveges (2014).
4. The sensitivity of a given periodogram is quantified with a Minimum Detection Depth (MDD) measure defined as the smallest statistically significant transit depth ( §4).
Our most noteworthy finding for both simulated and observed light curves is that TCF's periodogram peak shows a larger SNR than BLS in all experimented cases when the number of transits in the light curve is below ∼20 (∼100) transits for Gaussian (autoregressive) noise ( §4).This demonstrates that the TCF periodogram following ARIMA modeling is more sensitive to small planets than BLS (following some local regression procedures like Gaussian Processes or spline fitting) using the SNR criterion for shorter duration light curves.TCF shows no degradation in sensitivity to small planets as the number of transits in the light curve drops, even for only 2 − 3 transits.
It may seem surprising that TCF outperforms BLS even for the pure Gaussian white noise simulations, as a least squares procedure gives a maximum likelihood estimator according to the Gauss-Markov Theorem.However, the theorem only applies to homoscedastic independent noise, while periodograms have heteroscedastic (and very non-Gaussian) power distributions, as seen in the histograms of periodogram power values.We discuss this issue below ( §6.2).
We find that the FAP and SNR metrics lead to opposing conclusions when pure Gaussian white noise is simulated in light curves but similar conclusions for autocorrelated noise.Experiments on Gaussian white noise suggest that BLS is slightly more sensitive than TCF using the FAP criterion, whereas TCF is more sensitive using the SNR criterion.For the FAP metric and light curves with autocorrelated noise (upper right panel of Figure 5), the sensitivities of both periodograms using the FAP criterion are degraded compared to the case of pure Gaussian white noise light curves, particularly when few transits are present.However, this is not a critical problem, as the SNR metric is remarkably unaffected by the number of transits for the TCF periodogram.
The four TESS light curves analyzed in §5 have milder autocorrelation than in our simulations but are significant enough to distinguish it from white Gaussian noise.All four planets readily passed our significance criteria (periodogram peak FAP < 0.01 or SNR > 6) for both BLS and TCF with the correct orbital periods.
Altogether, the most sensitive approach to small planet discovery is using the Transit Comb Filter periodogram preceded by ARIMA modeling of the light curve and a robust signal-to-noise ratio metric.These findings explain why TCF had high sensitivity to small planets in previous studies: Caceres et al. (2019b) applied the ARIMA + TCF procedure to ∼150,000 Kepler light curves and reported 97 Earth-and Mars-sized planetary candidates from the 4-year Kepler data, and Melton et al. (2023c, Figure 16) applied the procedure to ∼1 million TESS light curves and reported hundreds of candidate planets substantially smaller than Confirmed Planets in the Year 1 TESS survey.It is also being used in Pellegrino et al. (in preparation) for Year 2 TESS data.As investigated by our experiments here, an advantage of SNR over FAP is that while the latter tends to fluctuate non-trivially across multiple simulated noise  1 with the normalized light curve, autocorrelation function, BLS and TCF periodograms with median trend fit and their histograms, standardized periodograms, and histograms.Scalar results are provided in Table 1.realizations ( §4.1), the former was found to be relatively more stable.
The problems of heteroscedasticity and trends in BLS periodograms may be partly ameliorated by different choices of detrenders.For example, Hippke et al. ( 2019) recommend a robust time-windowed filter procedure.However, as Figure 5 shows, the TCF periodogram provides more sensitivity to small planets (using the preferred metric, SNR) than BLS even for Gaussian white noise.This suggests that TCF is preferred regardless of the detrending procedure used before BLS.
Another contribution of this paper is to present a broad approach for comparing periodograms.Previous studies such as of Graham et al. (2013) performed a comparative study of periodograms using relevant metrics, however, a concrete methodology was not sought.While the application of our periodogram comparison approach has been limited to two periodograms for tran-siting planet detection, the approach is easily extensible to compare any set of periodograms having similar aims (e.g., comparing the Fourier Schuster, Lomb-Scargle, Phase Dispersion Minimization, BLS and TCF periodograms) since the approach is agnostic to the type of periodogram.When applied to periodograms where significance is not the primary metric, the MDD criterion can be changed to any other criterion relevant to the task, and the comparison approach can still be used by changing the x-axis in Figure 5 from the number of transits to some other parameter of the light curve.The R code we developed is publicly available11 and can be used to extend the comparison study for different applications.

Why does BLS perform so poorly?
Ideally, both simulated noise models (white Gaussian and autoregressive) should have resulted in similar conclusions since BLS and TCF are preceded by detrending procedures that should have removed any correlation structure from the light curve.However, our findings show that BLS (preceded by Gaussian Processes regression detrending) is less sensitive to small planets than TCF (preceded by ARIMA regression detrending).We investigate the causes of this difference in sensitivity with a detailed examination of the inner workings of both algorithms.
Figure 11 shows an example of a simulated planet with period = 2.00 days, transit duration = 2.00 hrs, depth = 0.04%, ten transits in the light curve, and autocorrelated noise with ARMA (3, 3) from equation 112 .The depth is chosen sufficiently large so that BLS and TCF peaks are significant using the FAP and SNR criteria.
The top-left plot shows the original light curve with the Gaussian Processes Regression fit overlayed; the topright plot shows the differenced light curve with the ARMA fit.This Gaussian Process fit clearly misses most of the short-memory structure, although different kernel hyperparameters might do a better job.The second row shows the corresponding BLS and TCF periodograms.The same set of test periods is used; here, we omit the Gaussian Processing detrending to better highlight BLS's characteristics in the case of short-memory autocorrelation.The BLS periodogram exhibits higher and spikier noise and a stronger rising trend with period than the TCF periodogram, as seen earlier in Figures 1-4 and Figures 7-10.
The fourth and fifth rows of Figure 11 examine two false peaks marked by labels "A" and "B" corresponding to shorter and longer periods in the periodograms, respectively.The folded light curves for the correct period clearly show a box-like transit for BLS and a double spike for TCF (third row).Since periods A and B are not the true period, their light curves should not possess box-like shapes for BLS and double-spike for TCF.However, the fourth and fifth rows of Figure 11 illustrate that model fitted by BLS tends to capture chance alignments of outliers or autocorrelated ripples even when the folded light curve possesses no transits.While TCF can also match a double spike pattern at non-transit periods, it is used only when the autocorrelation is removed via ARIMA modeling.TCF only considers extreme points in the differenced light curve that proves to be less susceptible to random chance alignments.
We infer from Figure 11 that BLS periodograms are often noisier than TCF periodograms, particularly for light curves with short-memory autocorrelation remaining after inadequate detrending, because chance alignments of the structure can easily mimic box-like shapes.On the other hand, the double-spike structure matched by the TCF algorithms is difficult to reproduce by autocorrelation alone, so the TCF periodogram noise is better behaved.The trend of increasing BLS power as the period increases for all periods without true periodic signals can be attributed to the weaker dilution of autocorrelated patterns in the folded light curves.This can be seen by comparing the higher depth of the false box in the fifth row compared to the false box in the fourth row.
It is reasonable that the BLS algorithm (with a standard detrender like Gaussian Processes regression) pro- duces more noise than the TCF algorithm (with an ARIMA detrender designed to remove stochastic shortmemory autocorrelation) when the detrended light curve has significant autocorrelation.However, we are surprised that TCF can outperform BLS with the signalto-noise ratio metric when the light curve is mostly white Gaussian noise (Figures 1-2 and the lower-left panel of Figure 5).We believe the TCF algorithm is more stable because it seeks a distinctive double-spike pattern of a few brightness outliers, which are sparse and unlikely to be aligned in light curves folded with random periods.

Advice for transit searches
Figures 5-6 have shown that a combination of TCF with the SNR metric achieves excellent sensitivity to small transiting planets for a wide range of transit periods and durations, whereas the other combinations of detection methods (TCF-FAP, BLS-FAP, and BLS-SNR) were relatively less sensitive.
Based on this result, we recommend the following procedure for small planet detection from a transit survey: Detrend the light curve: Compute the nonparametric autocorrelation function of the light curve.In the case of irregularly-spaced light curves, the traditional ACF estimation as used in this study may not be defined, so some modifications are needed (see, e.g., Scargle 1989;Andronov & Chinarova 2005).If the ACF deviates significantly from white noise (based on the Durbin-Watson and Ljung-Box hypothesis tests), then fit the best ARIMA model with complexity determined by the Akaike Information Criterion.A single differencing step should be used so that box-shaped transits are converted to double-spike patterns 13 .Repeat the autocorrelation function of the residuals to see whether they approach uncorrelated white noise.
Compute the TCF periodogram: This search for periodic double-spike patterns is calculated on the ARIMA residuals for a chosen range of periods with an oversampling of trial periods so that true spectral peaks are not missed.The periodogram is standardized in a robust fashion (footnote 4).
Identify the best trial transit period: The robust SNR metric in equation ( 2) is calculated for each trial period of the standardized TCF periodogram.
The highest SNR peak represents the best possibility of a transiting planet detection.
Reduce statistical False Alarms: This is a multifaceted step that can include: an examination of the periodogram for noise peaks comparable to the best peak, examination of folded light curves for patterns inconsistent with true periodic photometric dips, and construction of simulated light curves of the suspected transit.The periodograms of the simulations should be examined for alias structure and MDD sensitivity for comparison with the observed periodogram.
Reduce astronomical False Positives: These steps − discussed by Guerrero et al. (2021), Melton et al. (2023a) and others mentioned in §1 − lie beyond the scope of our discussion here.
These suggestions are summarized in Figure 12.However, it should not be considered the ultimate guide.In particular, we recommend making new versions of the MDD plots similar to Figure 5 based on the characteristics of the transit survey (noise level, non-Gaussian behaviors, observation duration, etc.) under study. 13We recommend using the auto.arimafunction in the f orecast CRAN package within the R statistical software environment described in the volume by Hyndman & Athanasopoulos (2021).The f orecast package, downloaded ∼ 8000 times per day for many purposes, is highly capable and reliable.
An additional analysis procedure can be considered to help adjudicate the reality of a small planet transit signal.First, Hippke & Heller (2019) shows that incorporating astrophysical knowledge can improve sensitivity to small planets.Here the square box shape transit of BLS and TCF is curved due to stellar limb darkening.The limb darkening shape depends on the star's effective temperature and surface gravity, which can be inferred from Gaia photometry and astrometry.This curved transit shape could be incorporated into a BLS algorithm or the TLS algorithm of Hippke & Heller (2019) (see also batman; Kreidberg 2015).Second, after a tentative periodicity has been identified from the TCF periodogram after ARIMA detrending, one might fit an ARIMAX model to the light curve that incorporates a deterministic box-shaped periodic transit component with the autoregressive component.This gives a new estimate of the transit depth with uncertainty based on the model Fisher information matrix.The SNR of the resulting transit depth can be beneficial for subsequent analyses.See Caceres et al. (2019a) ( §3.3) for more details.
ARIMA + TCF and TLS are complementary approaches: while TLS improves the sensitivity of BLS by adding astrophysical insights, TCF is more sensitive than BLS because of an effective treatment of stellar autocorrelated noise, resulting in reduced periodogram noise.The standard ARIMA + TCF procedure could be followed by TLS, a more refined periodogram incorporating limb darkening, and a self-consistent pattern of transit ingress and duration for the particular stellar and planetary inferred parameters from the standard procedure.

CONCLUSION
This paper cautions the reader about weaknesses in the sensitivity of the commonly used BLS periodogram for detecting small planets.Problems are exacerbated when autocorrelation persists in the light curve, even after detrending.BLS has the unfortunate characteristic of fitting boxes to noise for both autocorrelated and Gaussian white noise.This results in spurious peaks and poor statistical properties (heteroscedasticity and trends) discussed by Ofir 2014.These factors inhibit the detection of small planets using BLS, as previously noted by studies such as that of Hippke & Heller (2019).
The main achievement of this paper is explaining the advantages of ARIMA + TCF procedure from the AutoRegressive Planet Search project (Caceres et al. 2019a) has improved performance over standard detrenders with BLS periodogram.ARIMA, a widely used method for modeling stochastic autocorrelated time se- ries since the 1970s (Box et al. 2015), detrends (in most cases) both longer-term trends and stationary shortmemory autoregressive behaviors in the light curve.It is followed by the Transit Comb Filter that matches the sharp ingress and egress spikes in the ARIMA residuals for a trial period, providing the cadence is well-matches to ingress timescales.
The ARIMA + TCF pipeline proves to be remarkably effective, and we show that much of the advantage emerges from the TCF periodogram.It has a lower noise with weaker spurious peaks than the BLS periodogram, even for time series with white Gaussian noise.TCF also scales slightly better in terms of computation time compared to BLS.
We further find that the commonly used signal-tonoise ratio is the preferred metric for optimizing small exoplanet detection compared to a False Alarm Probability based on extreme value theory.The latter is now often used with the Lomb-Scargle periodogram (Baluev 2008;Süveges 2014).Section 5 of Caceres et al. (2019b) and Figure 16 of Melton et al. (2023c) show that transit planet candidates derived from the ARIMA + TCF combination are often smaller than confirmed planets derived from BLS-based procedures, and thus the findings of this paper.Our study explains some of the problems with the BLS periodogram described by Ofir (2014).Section 6.2 elucidates why the BLS periodogram often has unnecessarily high noise and spurious peaks where no periodicity is present.
Our study also shows that analysis of simulations can be effective for evaluating the periodogram peak significance on observational data, particularly when complicated conditions (such as missing or autocorrelated data in observations) are present.Analysis of simulations can be used to compare any combination of periodograms.Simulations were also recommended by Van-derPlas (2018) in his discussion of the significance of Lomb-Scargle periodogram peaks.
The results emphasize that effective detrending algorithms are vital to improving periodogram sensitivity.We use the classical sequence of differencing and ARMA modeling.However, combining spline or Gaussian Processes regression for longer-term trends with ARMA for short-memory stochastic autocorrelation is an intriguing possibility that may be beneficial for BLS.However, our preliminary experiments (not reported here) suggest that BLS performance remains inferior.
Any procedure seeking efficient small planet detection must avoid fitting the light curve too well so the planetary signal is absorbed into the detrending model.The ARIMA method of differencing and fitting an ARMA model leaves most of the planetary signal untouched since, once the light curve is differenced, the ARMA model does not fit the double spike since most of the in-transit points are removed.It allows the use of the TCF algorithm that gives a well-behaved periodogram with more homoscedastic noise and reduced spurious peaks.Novel methods for removing stellar trends have been proposed (such as Alapini &Aigrain 2009 andSmith et al. (2012)) and could be studied in combination with periodogram analyses for detecting small transiting planets in upcoming exoplanet missions.The incorporation of astrophysically motivated transit shapes can be combined with ARIMA + TCF to improve transit survey sensitivity further.
In conclusion, the ARIMA-TCF procedure instead of local regression detrenders and BLS can significantly improve the sensitivity of space-based photometric surveys for detecting small planets.The degree of improvement depends on several factors, such as the number of transits and the noise characteristics in the light curve.Prospective applications of our study include existing COROT, Kepler, K2, and TESS datasets and forthcoming data products of ESA's PLATO mission and NASA's Roman Space Telescope.
The Fisher-Tippett-Gnedenko theorem (also called the Extreme Value Theorem) lies at the foundation of EVT.It suggests that the maxima of a large sequence of i.i.d.univariate random variables after some standardization (see below) asymptotically follow well-defined distributions (Fisher & Tippett 1928;Gnedenko 1943).The theorem's assumptions are very general, similar to those of the Central Limit Theorem for the asymptotic behavior of mean values.In both cases, the distribution F of the variable does not need to be known; the theorem is valid for data drawn from almost all continuous distribution functions.
Consider again the maximum value M n of a sequence of n i.The Extreme Value Theorem states that, under broad conditions, the limiting distribution of M * n , P (M * n ≤ x), converges to distribution G belonging to the Gumbel, Fréchet, or Weibull families.These families are conveniently combined into the Generalized Extreme Value (GEV) distribution (Jenkinson 1955), a three-parameter distribution with parameters µ, σ, and ξ denoting the location, scale, and shape: where 1 + ξ x − µ σ > 0 and σ > 0, µ ∈ R, and ξ ∈ R. If the shape parameter ξ = 0, then 1/ξ is undefined and the expression reduces to by continuity; this is called the Gumbel distribution.The Fréchet and Weibull families correspond to ξ > 0 and ξ < 0, respectively.If ξ > 0 and ξ x − µ σ ≤ −1, then G(x) = 0 and if ξ < 0 and ξ x − µ σ ≤ −1, then G(x) = 1.If one is interested in minimal rather than maximal values, the corresponding expressions for minima are obtained by replacing x with −x.Probabilities for the GEV distributions are calculated with CRAN package extRemes (Gilleland & Katz 2016).Two widely used approaches to evaluate the significance of extrema in EVT are the block-maxima and the peaksover-threshold methods (Coles et al. 2001).In the block-maxima approach, data is partitioned into blocks of a certain size, and the maximum from each block is used to generate a sample of extreme values on which the GEV model can be fit.The peaks-over-threshold selects extreme value samples by selecting values higher or lower than a chosen threshold, followed by declustering to try to achieve independence.The block-maxima method is a natural choice for periodograms because it accounts for short-range dependences in the periodogram and can be practically easier to use (Ferreira & de Haan 2015).
An advantage of the EVT approach is that n, the number of independent values in the dataset, does not appear in the GEV distribution, as in equations ( 1)-(2).However, other difficulties seen with earlier FAP estimates remain.Harmonics of true periodicities with spectral peak strengths comparable to the true period will tend to overpopulate the extreme values and distort GEV probabilities.Heteroscedasticity and trends in the periodogram noise as a function of period violate the i.i.d.assumptions.This can be partially compensated by detrending and standardizing15 the periodogram; that is, using local signal-to-noise ratios rather than periodogram power directly as advocated by Ofir (2014) and our treatment below ( §2.2).
Most importantly, the assumption that the dataset x i is i.i.d.(i.e., stationary white noise) will not apply to many astronomical periodograms.This assumption of the extreme value theorem can be relaxed under some circumstances; for example, the theorem is valid for dependent variables provided the dependence decays for increasingly separated variables when n → ∞ (Leadbetter & Rootzen 1988).However, this condition is not met for oversampled periodograms.
Therefore, treating periodograms under EVT requires a computational approach in place of the asymptotic analytic formulae (3)-(4).The most widely used approach, nonparametric bootstrap resampling, is inadequate due to its assumptions of i.i.d. and the burden of computing periodograms with many trial periods for each bootstrap resample.Süveges (2014) propose a more efficient procedure involving bootstrapping partial periodograms over randomly chosen ranges of periods.This hybrid approach combining extreme value statistics and bootstrap resampling is favorably reviewed by VanderPlas (2018) and Koen (2021) for astronomical periodograms (see also Cuypers 2012 for a more general take on extreme value distributions for peak significance).We adopt this approach here ( §2.3).

B. EXECUTION TIME COMPARISON OF BLS AND TCF PERIODOGRAMS
The computational execution time of the BLS and TCF calculations can be compared, and their scaling with the data set size can be established.The experiment was performed on a machine equipped with Intel(R) Xeon(R) CPU at 2.20GHz using a single CPU core.The same optimal frequency sampling is used for BLS and TCF.
We simulate light curves with Gaussian white noise and ten transits of a planet producing depth = 100 ppm transits, each with 2 hours duration.To increase the data points in the light curve, the period is progressively increased while all other parameters are kept fixed-the periods used are 0.5, 1,3,5,10,20,40,80 days.The number of points in the light curve ranges from a few hundred to ∼10 4 .
We use the microbenchmark CRAN package (Mersmann 2021) to measure execution times.The execution time does not include preprocessing operations, such as ARIMA for TCF or Gaussian Processes regression before BLS; these occur much faster than the periodogram computation.For the relatively smaller periods, we use the median execution time across five different runs since we observed non-trivial differences across different runs.For the larger periods, we use the execution time from only one run.
Figure 13 shows the execution time comparison of BLS and TCF.The figure illustrates that TCF is somewhat faster than the BLS algorithm and scales as O(N 3 ), somewhat better than BLS with O(N 3.7 ).We have not examined alternative, faster versions of the BLS algorithm such as SparseBLS (Panahi & Zucker 2021) and fBLS (Shahaf et al. 2022).

Figure 1 .
Figure 1.Periodogram analysis of a simulated light curve with Gaussian white noise together with an injected planetary transit with depth 200 ppm.The light curve cadence and duration are similar to a single-sector TESS FFI observation.The injected planet has an orbital period of two days and a transit duration of two hours.The top row shows the simulated light curve (left) and its autocorrelation function (right).The second row shows the BLS (left) and TCF (right) periodograms.The red curve gives the cobs median fit to the periodogram powers.A rug plot (small vertical marks on the x-axis) shows the location of the knots used for the median fit.Annotations give the SNR, period, and depth obtained from the periodogram.The third row gives histograms of the BLS (left) and TCF (right) power values.The fourth and fifth rows show the periodograms and the corresponding histograms after standardization.The annotation in the fourth row gives the FAP value of the peak in the standardized periodogram

Figure 2 .
Figure 2. Periodogram analysis for Gaussian white noise and a smaller planet with injected transit depth 68 ppm.See Figure 1 for a description of the panels.latedcount 5 .The presence of histogram values crowded near the maximum indicates that spurious peaks confusing the true signal are present, either from aliases or noise.The shape of the histograms for the bulk of noise values of the BLS and TCF periodograms differ considerably; both have strongly skewed, non-Gaussian distributions.However, this distribution pattern does not affect the FAP significance analyses.An effect worth pointing out in the above figures is that post-standardization, the local periodogram noise at shorter periods is enhanced, which arises due to the heteroscedastic noise pattern of the periodogram.A modified windowed approach could be used to mitigate this issue to some extent; however, we do not deal with

Figure 3 .
Figure 3. Periodogram analysis with autocorrelated noise.Here the injected transit depth is 400 ppm.See Figure 1 for a description of the panels.
3. It is important to note that the comparison of BLS and TCF presented in this paper is, in fact, a comparison of (a) Gaussian Processes regression + BLS and (b) ARIMA + TCF instead of BLS and TCF themselves.One might argue that a more faithful comparison should consider (a) ARMA + BLS and (b) ARIMA + TCF so that the actual behaviors of periodograms are apparent.However, we have found ARMA + BLS ineffective in detecting small planets, likely because ARMA fits typically capture the transits along with the nontransit, stellar autocorrelation.ARIMA + TCF does not suffer from this issue since there are neg-ligible transit points to fit an ARMA model on the differenced light curve.

Figure 4 .
Figure 4. Periodogram analysis with autocorrelated noise and a smaller injected planet with 234 ppm.See Figure 1 for a description of the panels.

Figure 5 .
Figure 5. Minimum detectable depth (MDD) (in percent of the stellar brightness) as a function of the number of transits.The injected planet has a one-day period and a two-hour transit duration.Two metrics are shown: FAP based on extreme value theory on standardized periodograms (top row) and signal-to-noise ratio on detrended periodograms (bottom row).Light curves have Gaussian white noise (left column) and autocorrelated noise (right column).Each panel shows the MDDs for the BLS (blue curve) and TCF (orange curve) periodograms.The horizontal dashed line corresponds to MDD = 0.005% to guide the eye.

Figure 6 .
Figure6.Minimum detectable depth (MDD) as a function of transit period (top rows) and duration (bottom rows).Simulations assume ten transits, transit duration = 2 hrs for the comparison using period, and period = 1 day for the comparison using transit duration.See Figure5for panel details.
Summary of the studyThis paper provides a detailed comparison of the Box-Least Squares (BLS) and Transit Comb Filter (TCF) periodograms to optimize the detection of small exoplanets from light curves with regular cadences as obtained from space-based surveys.The analysis has several steps: 1. Tests are first conducted on simulated light curves with Gaussian white noise and autocorrelated noise by varying the number of transits, planet period, and transit duration ( §2).The simulated light curves resemble single-sector TESS observations.The analysis is repeated for four observed TESS light curves with known small planetary transit signals ( §5).

Figure 7 .
Figure 7. Example 1 of a TESS Year 1 light curve for a star with smaller confirmed transiting planets.The panels are similar to those in Figure1with the normalized light curve, autocorrelation function, BLS and TCF periodograms with median trend fit and their histograms, standardized periodograms, and histograms.Scalar results are provided in Table1.

Figure 8 .
Figure 8. Example 2 of a TESS Year 1 light curve and the corresponding periodograms and histograms.See Figure 7 for a description of the panels.

Figure 9 .
Figure 9. Example 3 of a TESS Year 1 light curve and the corresponding periodograms and histograms.See Figure 7 for a description of the panels.

Figure 10 .
Figure 10.Example 4 of a TESS Year 1 light curve and the corresponding periodograms and histograms.See Figure 7 for a description of the panels.

Figure 11 .
Figure 11.Illustration of BLS (left panels) and TCF (right panels) analysis of a simulated light curve with a P=2.00 day transiting planet superposed on autocorrelated noise.First row: Zoom of several days of the light curve with the Gaussian Processes Regression fit overlayed (left) and the same light curve after differencing with the ARMA fit overlayed (right).Second row: Periodograms of the entire simulated light curve with two BLS false peaks 'A' and 'B' marked.Third row: Folded light curves for the true transit period.Fourth and fifth rows: Folded light curves of the original (left) and differenced (right) data for false spectral peaks A and B.

Figure 12 .
Figure 12.Decision tree outlining general suggestions for selecting a periodogram algorithm based on certain conditions.
i.d.random variables x 1 , x 2 , ..., x n .Start by standardizing a sequence of n real-valued observations M n using sequences of constants {a n > 0} and {b n } to yield M * n = M n − a n b n .

Figure 13 .
Figure 13.Execution time comparison of BLS and TCF as a function of the number of data points in the light curve.

Table 1 .
Periodogram Performance for Four TESS Planet Candidates Peak Signal-to-Noise Ratio for the detrended periodogram.Col. 7: Extreme Value Theory False Alarm Probability for the standardized periodogram.