DIAmante TESS AutoRegressive Planet Search (DTARPS): I. Analysis of 0.9 Million Light Curves

Nearly one million light curves from the TESS Year 1 southern hemisphere extracted from Full Frame Images with the DIAmante pipeline are processed through the AutoRegressive Planet Search statistical procedure. ARIMA models remove trends and lingering autocorrelated noise, the Transit Comb Filter identifies the strongest periodic signal in the light curve, and a Random Forest machine learning classifier is trained and applied to identify the best potential candidates. Classifier training sets include injections of both planetary transit signals and contaminating eclipsing binaries. The optimized classifier has a True Positive Rate of 92.8% and a False Positive Rate of 0.37% from the labeled training set. The result of this DIAmante TESS autoregressive planet search (DTARPS) analysis is a list of 7,377 potential exoplanet candidates. The classifier has a False Positive Rate of 0.3%, a 64% recall rate for previously confirmed exoplanets, and a 78% negative recall rate for known False Positives. The completeness map of the injected planetary signals shows high recall rates for planets with 8 - 30 R(Earth) radii and periods 0.6-13 days and poor completeness for planets with radii<2 R(Earth) or periods<1 day. The list has many False Alarms and False Positives that need to be culled with multifaceted vetting operations (Paper II).


INTRODUCTION 1.Challenges in TESS Planet Discovery
With the 2018 launching of the Transiting Exoplanet Survey Satellite (TESS), scientists acquired a tool for in-depth analysis of rare phenomena such as transiting exoplanets, stellar superflares, and tidal disruption events (Ricker et al. 2015).TESS surveys the entire celestial sphere in month-long observations with four wide-field cameras with pixel scale of 21 .During the first year, over 200,000 bright stars were pre-chosen to have 2 minute observing cadence as prime transit targets, but millions of relatively bright stars are accessible from Full Frame Images (FFIs) with 30 minute cadence.
The principal goal of the TESS mission is the identification of sub-Neptune (R < 4 R ⊕ ) transiting planets around stars sufficiently bright for follow-up characterization of the planets' physical characteristics, including atmospheric composition.Quantitative calculations prior to the mission by Barclay et al. (2018) predicted that ∼ 3100 transiting planets would be found from light curves of approximately 6 million FFI stars acquired during prime mission (TESS Years 1 and 2); of these, ∼ 1100 would be sub-Neptunes.They predicted that ∼ 12, 000 larger planets (R > 4 R ⊕ ) would be discovered in the FFI database.In a revised calculation, Kunimoto et al. (2022) predict that ∼ 4000 planets would be detected in the prime mission using FFI images.
The predictions of Barclay et al. (2018) and Kunimoto et al. (2022) have been overly optimistic.The TESS Objects of Interest from the prime mission include 2241 Planet Candidates based on automated detection of a transit-like signal followed by review by the TOI Vetting Team (Guerrero et al. 2021).These include 1035 unique FFI stars obtained with the TESS Science Processing Operations Center (SPOC) and Quick Look Processing (QLP) pipelines.Nearly half of these have been subject to some follow-up observations (the community-based ExoFOP-TESS enterprise) of which most (88% in the published 2021 catalog) have been redesignated False Positives.Thus, only a few hundred FFI stars − rather than thousands − have emerged to date as reliable hosts of transiting planets from the TESS prime mission.
A variety of difficulties contribute to this discrepancy between predicted and actual performance in TESS planet discovery.These include: non-Gaussian autocorrelation (including 'red noise') in the light curves from stellar activity; instrumental problems such as slow pointing settling after data gaps; contamination by blended eclipsing binaries (BEBs) in the extracted TESS pixels; sparsity of photometric observations during brief transits at longer periods; and mathematical difficulties in reliably evaluating False Alarm probabilities in a periodogram.It is challenging to design a detection procedure that effectively removes a wide variety of non-planetary light curve variations while maintaining the planetary transit signal.Conservative classification and vetting procedures seeking to reduce False Positive contamination will also reduce small planet discovery.As Barclay et al. (2018) did not consider all these issues, it is not surprising that they overestimated the number of planets that can realistically and reliably be found in TESS FFI data.
This situation motivates the search for TESS FFI planets using methodologies different from those used to generate official TESS Objects of Interest QLP described by Guerrero et al. (2021), Kunimoto et al. (2022) and Tey et al. (2023).Past efforts include: the DIAmante pipeline by Montalto et al. (2020)

AutoRegressive Planet Search
In the present study and its accompanying papers (Melton et al. 2022a,b), we develop a pipeline for TESS FFI light curves with two foundations: light curve extraction and pre-processing by the DIAmante project of Montalto et al. (2020, , henceforth M20) and transit search by the AutoRegressive Planet Search project of Caceres et al. (2019a) that was applied to the 4-year Kepler light curves (Caceres et al. 2019b).We call this DTARPS: the DIAmante TESS Autoregressive Planet Search project.The DTARPS pipeline is outlined in Figure 1.
The DTARPS procedure differs from most alternative pipelines in several respects.First, nonstationarity (trends) in the light curve are removed with a simple nonparametric algorithm called 'differencing', rather than a more complicated semi-parametric detrending procedures like spline or Gaussian Processes regression.Differencing treats both stellar and instrumental variations in a single step, but leaves behind sudden changes such as transit ingress and egress.Second, we fit parametric autoregressive moving average ARMA(p,q) models to short-memory structure in the detrended light curve.This crucial step is missing in other transit searching procedures that leave short-memory autocorelated behaviors.Together, these procedures are known as ARIMA(p,d,q) modeling (or Box-Jenkins analysis) that has dominated analysis of stochastic time series analysis since the 1970s in fields such as econometric and engineering signal processing.The textbook by Box et al. (2015) has over 56,000 citations over five editions.
The differencing operation changes a sharp-edged box-shaped transit into a double-spike representing the planet ingress and egress.The third innovation is a new sensitive periodogram, the Transit Comb Filter (TCF) developed by Caceres et al. (2019a) to measure the amplitude of periodic double-spike patterns in the ARIMA residuals.The TCF periodogram was found to be surprisingly sensitive to small planets in 4-yearKepler light curves (Caceres et al. 2019b); this is explained in the simulation study of Gondhalekar et al. (in preparation).
DTARPS continues with tuning a machine learning Random Forest (RF) classifier based on dozens of features to select possible exoplanet transiting candidates from the vast number of non-transiting systems.Following Montalto et al. (2020), the classifier is trained towards a positive training set of light curves with injected simulated planetary transit signals, and trained away from training set that includes simulated signals with eclipsing binaries larger than 2.5 R J .The final stage of the DTARPS procedure involves multi-faceted visual vetting to reduce False Alarms and False Positives that passed over the Random Forest threshold.Altogether, DTARPS is a comprehensive planetary transit analysis system, starting with cleaned extracted light curves and ending with a list of new candidate planetary transiting systems.
When the ARPS methodology was applied to ∼ 150, 000 Kepler 4-yr light curves by Caceres et al. (2019b), though without the final vetting step, it recovered 97% of the Kepler Golden sample providing the Kepler model had signalto-noise ratio SNR> 20.It also identified 97 new Kepler exoplanet transit signals, mostly (sub)Earths with P < 20 day periods orbiting faint stars that could not be readily confirmed with follow-up radial velocity spectroscopy.One case, a Mars-size planet orbiting an M star, is discussed by Cañas et al. (2022).However, the application to TESS data discussed here produces a collection of candidates that is much more accessible to follow-up study than the earlier Kepler ARPS study.
Here we apply an improved procedure that combines procedures from the DIAmante and ARPS pipeline Caceres et al. (2019a, M20, ).M20 extracted 0.9 million light curves from TESS Year 1 FFIs covering the southern ecliptic hemisphere and, with the BLS periodogram and a RF classifier, they proceed to identify 396 exoplanet candidates.DTARPS is based on the same 0.9 million DIAmante extracted and preprocessed light curves, but then diverts the analysis to the ARPS procedure for planetary transit identification.
The present study (Paper I) describes DTARPS through the application of the RF classifier to the data (Figure 1), producing a DTARPS Analysis List of 7,377 stars that exhibit transit-like behavior.This stage is roughly comparable to the threshold-crossing events (TCEs) of the QLP TOI analysis (Guerrero et al. 2021).Paper II (Melton et al. 2022b) describes the rigorous multi-faceted vetting procedure applied to the RF classifier results.It presents a DTARPS Candidates catalog of 463 high-confidence transiting candidates and a list of possible transiting systems near the Galactic Plane.Paper III (Melton et al. 2022c) provides an scientific analysis of the DTARPS Candidates.Our presentation is purposefully more detailed than most presentations of transiting exoplanet discoveries for three reasons.First, we seek reader understanding and reproducibility of our procedures.Second, we present in early papers results needed in later analysis, such as the first TESS-based planet occurrence rate in Paper III. Third, intermediate results will be used in the future to improve steps in the DTARPS pipeline.
The paper is structured as follows.The DTARPS analysis procedure, including DIAmante input light curves, is presented in §2 and 4. Sections 5 through 7 describe the RF classifier training set (including synthetic injections of transits and eclipsing binaries), the process of RF optimization and the final RF classifier.The principal product is the DTARPS Analysis List of 7,377 TESS light curves presented in §8.Sections A-9 discuss two aspects of the classification with respect to injections.Section 10 compares the results of the RF classifier with other exoplanet surveys.The findings are summarized in §11 with motivation for the vetting stage described in Paper II.The Appendix describes external data sets used for comparison and validation of this effort.

Past Efforts for Detrending and Transit Identification
Any transiting exoplanet search must try to remove a wide range of stellar variability behaviors and variations due to instrumental effects.Detrending procedures used in transit detection outlined in the Appendix include: NASA MIT Quick Look pipeline (Huang et al. 2020) which uses a high pass filter and fitted splines to detrend the light curve; eleanor pipeline (Feinstein et al. 2019) that cotrends the extracted light curves using Principal Component Analysis (PCA); photometry extraction with difference image analysis (Oelkers & Stassun 2018); DIAmante pipeline with difference image analysis, PCA cotrending, and spline fits (M20); NEMISIS (Feliz et al. 2021) that combines pixel-level decorrelation with an iterative smoother.A common choice for detrending is Gaussian Processes regression (e.g., Luger et al. 2016;Foreman-Mackey et al. 2017;Angus et al. 2018) but other methods have been tried before the TESS missions such as Independent Component Analysis (Waldmann 2012), correntropy (Huijse et al. 2012), empirical mode decomposition (Roberts et al. 2013), and Singular Spectrum Analysis (Greco et al. 2016).
The SPOC pipeline used in TESS TOI selection involves a complex series of operations including autoregressive filling of gaps, removal of some instrumental systemic effects, whitening with power spectral density analysis, removal of multiscale temporal structures with wavelet analysis, and identification of transiting exoplanet signals with adaptive wavelet-based matched filters (Jenkins et al. 2017b;Guerrero et al. 2021).A statistical bootstrap test is applied to the light curve and transit detection to determine the probability of the event being a false alarm (Jenkins et al. 2017a).
Following detrending in most analyses, repeated transiting signals are sought with BLS periodograms (Kovács et al. 2002).The BLS method models a transit signal as a periodic box-shaped dip superposed on a constant flux with three parameters: duty cycle (duration/period), transit depth (fraction of the constant flux), and epoch of the transit.For all parameter values, BLS fits a box to the folded light curve and returns the signal strength in a periodogram.Variants to BLS include the transit least squares (TLS) algorithm that considers the effect of stellar limb darkening during planetary ingress and egress (Hippke & Heller 2019) and the fast BLS computational algorithm (Shahaf et al. 2021).However, the BLS periodogram is vulnerable to strong trends, high noise levels, and complex alias structures that impede detection of smaller planets (Ofir 2014, , Gondhalekar et al. in preparation).

ARIMA Modeling
Stellar activity is often classified as an autoregressive behavior wherein future photometric values depend on current and past values (Caceres et al. 2019b).In the Sun, the waiting time between X-ray flares have strong temporal autocorrelation on timescales of hours (Wheatland 2000;Aschwanden & McTiernan 2010) and autoregressive statistical models give good fits to solar X-ray variability (Burnecki et al. 2008;Stanislavsky et al. 2019).Standard nonparametric detrending procedures − such as running medians, spline fitting or Gaussian Processes regression − will not remove short-memory stochastic autocorrelated behaviors.But they can be removed with low-dimensional parametric regressions such as ARMA models that are specifically designed to fit stochastic autocorrelated behaviors.ARMA-type modeling is well-established in many fields of time series analysis with extensive methodology described in textbooks such as Box et al. (2015), Chatfield &Xing (2019), andHyndman &Athanasopoulos (2021).Feigelson et al. (2018) argue that these models can be effective for many time domain problems in astronomy.
In the ARPS pipeline for transiting planet identification, stationary low-dimensional linear autoregressive models are combined with a simple differencing operator to remove non-stationary trends arising from stellar and instrumental variations.Caceres et al. (2019b) show that the shape of a transit is transformed from a box to a double-spike by ARIMA modeling, necessitating a new periodogram − called the Transit Comb Filter (TCF) − to identify periodic sequences of double-spike patterns in the cleaned, whitened ARIMA residual light curves.A brief overview of the ARPS methodology is given here, the interested reader will find more details on ARIMA and the TCF periodogram in Caceres et al. (2019a) and Caceres et al. (2019b).
The autoregressive moving average (ARMA) model family can treat an enormous variety of both stationary and nonstationary time series (Box et al. 2015).ARIMA and its extensions like ARFIMA, GARCH and HAR, are able to treat short-memory processes, long-memory processes, volatility, burstiness, nonstationarity in the light curve without the need for choosing parameters like a smoothing kernel function and bandwidth in Gaussian process regression or a basis function and denoising threshold in wavelet analysis.ARMA-type models are fit by maximum likelihood estimation without any free parameters.The best-fit order (i.e.integer value of p and q) is chosen by a likelihood-based balance between goodness-of-fit and model complexity.
The linear ARIMA model has three components: autoregressive (AR), integrated (I), and moving average (MA).We start our analysis with the 'I' component that treats nonstationarity; typically nonstationarity arises from variations in the mean fluxes of the light curve due to stellar or instrumental effects.A single differencing operation is a pixelated differential operation and is equivalent to removing the narrowest possible median filter of the time series: (1) The ARPS procedure applies the differencing step to the light curve and passes the differenced values to the ARMA function.By separating the difference step from the application of the ARMA model, the transits (if present) are guaranteed to be changed into the double-spike pattern detectable by TCF.Caceres et al. (2019a) found that the differencing step greatly reduced the interquartile range (IQR) of light curves with intrinsic stellar variations (such as rotationally modulated starspots) but slightly increases the IQR for light curves without noticeable variation present.They also investigated more complex differencing operations, such as fractional differencing equivalent to long-memory 1/f α -type red noise models, but found this did not improve the fits for most Kepler light curves.
The autoregressive AR(p) portion of the model represents how the stellar flux responds to recent previous values of the flux according to where x t is the value of the light curve at time t, p is the order of the AR component, and φ is a vector of unknown realvalued coefficients with length p.As with most regressions, the error term is assumed to be homoscedastic Gaussian, = N (0, σ 2 ) where the variance σ 2 is another parameter of the model.The moving average MA(q) portion of the model represents the effects of random shocks to the light curve by modeling the current flux value as where q is the order of the MA component, t is the same error as in equation ( 2), and θ is a vector of unknown coefficients with length q.
For every possible combination of p and q, the φ and θ coefficients are computed using a maximum likelihood estimator.In order to reduce computation time for this step, we restricted p and q to have a sum less than or equal to 10.In practice, this restriction has little effect on the solution.The best ARIMA(p,1,q) model is chosen to balance the accuracy of the ARIMA model compared to the difference light curve data and the overall model complexity using the Akaike Information Criterion (Sakamoto et al. 1986), a penalized likelihood measure that balances the model complexity and accuracy of fit in a self-consistent manner.The calculation is performed using function auto.arima in CRAN package forecast within the R statistical software environment (Hyndman & Athanasopoulos 2021).
The temporal structure is examined using the nonparametric autocorrelation function for both the original light curve and its ARIMA(p,d,q) residuals.We find that most TESS ARIMA model residuals have little or no autocorrelation and are consistent with Gaussian white noise.
A critical question is whether the ARIMA model absorbs the planetary signal in addition to stellar and instrumental variations.Since an exoplanetary transit signal, if present, occurs during only a very small fraction of the observations and the number of time steps between transits is larger than our maximal p and q values, it is mostly ignored by the maximum likelihood estimator.A bias does occur in the depth of the deepest transits (e.g.inflated hot Jupiters) as the ARIMA model incorporates some of the transit signal (Caceres et al. 2019b).This bias is corrected in a later stage of analysis (Paper II).

Transit Comb Filter Periodogram
The difference step of the ARIMA processing in equation 1 changes the shape of a planetary transit from a periodic box pattern to a double-spike pattern.Caceres et al. (2019a) developed the TCF, a matched filter algorithm that searches over a grid of durations and phases to find the strongest periodic double-spike patterns at a chosen trial period.For a continuous time series of Gaussian white noise with regular cadence, the algorithm is equivalent to the maximum likelihood estimator.A periodogram is constructed from the strength of the matched filter fit to the ARIMA residuals for each period passed to the TCF.The TCF computation involves the same triple-loop as the traditional Box-Least Squares algorithm (Kovács et al. 2002).
As with the BLS periodogram, the TCF periodogram can have systematic changes in mean as one passes from short to long periods (Ofir 2014).We remove this trend with a smoothed locally fitted, least squares regression polynomial function − the LOESS algorithm (Cleveland & Devlin 1988).The power of the TCF for a specified period is then measured from the TCF power above the LOESS curve.The peak with the highest signal-to-noise ratio (SNR) in a window around the peak, with respect to the LOESS curve, is chosen as the most likely fit for an exoplanet transit in the light curve (Caceres et al. 2019a).The statistical behavior of the TCF for different light curves, and comparison with BLS for the ability to detect small transiting planets, is investigated by Gondhalekar et al. (in preparation).

ARIMAX Model
After the best TCF periodogram peak is found for a light curve, the transit parameters are used to fit a new ARIMA model to the light curve in order to jointly fit the transit and the autocorrelated noise in the light curve.In the parlance of ARMA modeling, this is an ARIMAX model where 'X' refers to 'exogenous' variables (Hyndman & Athanasopoulos 2021).A simple box transit mask is built using the best transit period, phase, and duration.The depth of the box transit mask is left as a free parameter of the exogenous variable.Therefore, when the ARIMAX model uses maximum likelihood estimation to jointly model the transit depth and the autocorrelated noise of the light curve, it also models a transit depth with a confidence interval (error value).Further details about the ARIMAX modeling are provided by Caceres et al. (2019a) and Caceres et al. (2019b).
We found that the ARIMAX depth tended to underestimate the depth of the transit necessitating astrophysical transit models to be fit to the candidates.Thus like the bias produced by ARIMA modeling ( §4), this has to be corrected in the later stage of DTARPS analysis so reliable estimates of the planet radius can be obtained (Paper II).

Random Forest Classification and Vetting
While a prominent peak in the periodogram is a necessary indicator that a transit-like periodicity is present in a light curve, this alone is not a sufficient criterion.Caceres et al. (2019a)  The ARPS method utilizes a Random Forest (RF) classifier to identify the most promising exoplanet candidates from the full data set with a high recall rate and human vetting to further refine the candidate sample.We carefully test different training sets, features, and classifier settings to optimize its performance.RF machine learning classifiers were developed by Breiman (2001) as an extension of his earlier Classification and Regression Tree (CART) procedure that grows a decision tree based on the problem's training set in a well-defined iterative procedure.Each node produces two daughter nodes based on a break in a single data feature that best separates the classes according to some cost function.To reduce overfitting, the tree is pruned to a predetermined level.The main drawbacks of CART is a tendency to overfit the training data and often use only a few of the possible features in classification.RF overcomes the disadvantages of CART classification with a 'bagging' strategy that uses multiple CART trees with randomized data subsets and feature subsets at each branching node.Whereas a single decision tree produces a 'hard' prediction for each object in the test set, a RF gives 'soft' probabilistic predictions arising from votes of many trees.The RF prediction value is an uncalibrated pseudo-probability; higher predictive scores point to stronger possible exoplanet transiting candidates.
RFs are extremely versatile classifiers that can combine data of different types (integer, categories, floating point numbers), units, and scales.RFs have been shown to be robust to imbalanced training set problems, performing well on training sets whose positive class comprises only 2% of the entire training set (Chen et al. 2004) and can handle small fractions of mislabeled data in the training set (Mellor et al. 2015).The contribution of each feature to the classifier allows for RFs to be partially interpretable.
In transit searches, vetting the results of a machine learning classification is necessary to remove lingering False Alarms and False Positives in samples that pass the RF classifier.The vetting procedure employed in DTARPS, described in Paper II, is a mixture of multifaceted automated vetting tests and subjective vetting by humans.Due to the rarity of transiting planets in a random sample of stars, the RF classifier and the subsequent vetting procedures must strive to reduce the number of False Positives in the final catalog.For a RF classifier with a False Positive Rate 1% as measured with a validation set, the number of expected False Positives in sample of a million light curves would overwhelm the planet candidate sample, even if the classifier were to identify every true planet.

LIGHT CURVE INPUT FROM THE DIAMANTE PROJECT
DIAmante is a pipeline for extracting and analyzing light curves from the TESS FFIs applied to the TESS Year 1 and Year 2 by M20 and Montalto (2023), respectively.M20 used the DIAmante extracted light curves to search for exoplanet transits and identified 396 exoplanet candidates.We applied the DTARPS method to the M20 light curves extracted and preprocessed with the DIAmante pipeline.
M20 defines a sample of 976,814 dwarfs and subgiants with spectral types F5 to M falling in the footprint of TESS sectors 1−13 surveyed during Year 1 with identifications in the TESS Input Catalog (version 8, Stassun et al. 2019).FGK stars were restricted to V < 13 magnitude while M stars were restricted to V ≤ 16 magnitude and distance D < 600 pc.The sample is further limited to dwarf and sub-giant stars with log g ≥ 3.
The DIAmante extraction was applied to the calibrated FFIs provided by the Science Processing Operations Center (Jenkins et al. 2016;Tenenbaum & Jenkins 2018).These calibrated FFIs are reduced CCD images have already been processed with TESS instrument specific corrections.The DIAmante extraction pipeline is based on Difference Image Analysis (Alard & Lupton 1998) that reduces the impact of contaminants on the target photometry through the subtraction of a reference image convolved with a kernel to separate the target and the background flux.Because TESS FFIs are known to have erratic background variations that depend on the boresight angle between the camera, Sun, and Moon, a flux-conserving delta basis kernel was utilized to create a differential background model using a 20 pixel box smoothing region.After calibration to 250 standard stars in the reference images for each CCD, photometry was extracted from a circular aperture with a radius of 1 pixel for stars with V > 11 and 2 pixels for stars with V ≤ 11.
The DIAmante light curves from each CCD for each camera and sector were processed with cotrending to remove systemic variations from the instrument.Principal Component Analysis was applied to the most highly correlated light curves to extract top eigenvectors to cotrend the light curves.After cotrending, individual stellar variations are further detrended with an 8 hour median filter and a B-spline interpolation.The final DIAmante light curves are the averaged values of the B-splines evaluated at each observation time.
Figure 2 shows the raw TESS extracted light curve for three stars in the DIAmante data set prior to any preprocessing.These examples were extracted using the Python Lightkurve package (Lightkurve Collaboration et al. 2018).DTARPS identified a new planetary candidate around each of these three examples that had not been identified to date (Paper II).Two of the light curves shown (top and middle panels) are the length of one TESS sector, as are most of the light curves from the DIAmante data set.The bottom light curve panel is a source observed in two sectors because it lay in the overlap area between sectors.
In Paper II, these three stars are found to have candidate planetary transits.TIC 284160132 (DTARPS 313) is an early-G star with a V = 12.2 mag at a distance 384 pc.DTARPS identified periodic dips consistent with a gas giant with orbital period 2.96217 days.TIC 316852947 (DTARPS 356) is a mid-F star with a V = 11.6 at a distance of 341 pc.DTARPS identified a transit consistent with a Neptune-sized object with a period of 1.88337 days.TIC 11232328 (in the DTARPS Galactic Plane list) is a fainter star with V = 12.9 mag, a late-F star at 714 pc.DTARPS identified a transit consistent with a Saturn-size planet orbiting at 3.17371 days.
The right-hand panels in Figure 2 show strong autocorrelation present in the light curves between the 30 minute time steps for the TESS FFIs.The ARIMA fitting in the ARPS procedure is designed specifically to remove autocorrelation in light curves; the presence of autocorrelation is measured with the Ljung-Box test (Ljung & Box 1978).The p-value from Ljung-Box test is included in subsequent figures to indicate the presence or absence of autocorrelation in the light curves.A small p-value indicates that there is significant autocorrelation present in the light curve and p-value .01means that the light curve is consistent with white noise without autocorrelation.

Additional Ramping and Outlier Removal
There is a well known issue in TESS of erroneous flux variations lasting a few hours introduced to the FFI light curves near the beginning of a sector, end of a sector, and near the beginning and end of the mid-sector gap in the light curve for the data download (Ricker et al. 2015).Due to spacecraft jitter before the pointing settles, the target star can land on CCD areas with different quantum efficiency or a neighboring star can enter or leave the field changing dilution levels.The DIAmante pipeline was able to remove most -but not all -of the trends and the jitter effects.There is a weak ramp up effect seen in the top and middle panels of Figure 2 after the mid-sector gap in the light curves as well as in the second sector of the bottom panel of Figure 2 after the second gap.The bottom panel light curve in Figure 2 has two strong ramps during the second sector.Most, but not all, of these ramps are removed by the DIAmante pipeline extracted light curves.
We therefore add a preprocessing step to remove remaining ramps around data gaps, flares, and other outliers that may be in the data.The clipping routine is characterized by the outlier threshold and the gap threshold.The outlier threshold defines the maximal distance between the median value of the light curve and a data point; we set the threshold at 5 times the standard deviation of the light curve.The gap threshold defines how large a gap of missing data can be, in time steps, before the clipping routine will examine the points on either side of the gap for evidence of ramping.We set the gap threshold to be 50 time steps, or 25 hours.With this gap threshold, the clipping routine removes erroneous ramping points from the beginning and end of the light curve, and points leading up to and away from a large gap in the light curve.After removal of a data point, the clipping routine is recursively applied to the modified light curve until no more points are removed as outliers.
Figure 3 shows the DIAmante light curves in Figure 2 after DIAmante detrending, ramping, and outlier removal procedure.The ramping procedure removed most of the ramping in TIC 11232328 left by the DIAmante extraction pipeline.A weak residual ramping effect around Day 1531 remains (Figure 3).However, such brief and weak effects will have little effect on our transit search algorithm ( §2.3).None of these three transiting objects have been identified previously in the literature, despite each of them having strong transit signals in the DIAmante extracted light curves.

ARIMA MODELING AND PERIODOGRAM ANALYSIS
The right panels of Figure 3 illustrate that short-memory autocorrelation is often still present in the preprocessed light curves, even though the noise level is often reduced below 0.1% where large planet transits can be seen with the unaided eye.This autocorrelated behavior can increase noise in BLS periodograms and thus reduce sensitivity to weaker planetary transits (Gondhalekar et al., in preparation).Some of autocorrelation variations may be due to planetary transits, but typically it is intrinsic to the star or uncorrected instrumental effects.
When the ARIMA model is applied to the light curve, we apply the differencing operation (equation 1) first and then obtain fits from the ARMA model (equations 2-3).This guarantees that any real transits will be changed into the double-spike pattern and thereby be detectable with TCF.Applying the ARMA model to the differenced light curve removes this introduced autocorrelation Figure 4.In two of three cases, the Ljung-Box test shows the ARIMA residuals are consistent with white noise.Caceres et al. (2019a) found that the differencing step greatly reduced the IQR of light curves with intrinsic stellar variations, but slightly increases the IQR for other light curves.In our case, the DIAmante pipeline removes most of the light curve trends before we receive them.Therefore, it is not surprising that the IQR of the light curve does not change much over the course of the ARIMA processing.
We implement the ARIMA model using the auto.arimafunction from the forecast package (Hyndman & Athanasopoulos 2021) in the statistical computing language R (R Core Team 2020).Figure 4 shows the residuals after the best fit ARMA model has been subtracted from the differenced light curve for the three example light curves in Figure 2. The p and q values for the best fit ARIMA models are given in each of the three panels where it states "ARIMA(p, 1, q)".The autocorrelation function of the residuals is often consistent with Gaussian white noise with associated Ljung-Box test p-value > .01.The improvement in Ljung-Box probabilities for the full DIAmante sample is shown in Figure 5: 46% of the light curves from the DIAmante pipeline have significant autocorrelation present in the light curve while only 4% have autocorrelation after ARIMA modeling.
The residual light curves in Figure 4 now exhibit periodic double-spike patterns characteristic of a transiting planet but without stellar or instrumental autocorrelated behavior.In most cases, the DIAmante preprocessing, outlier removal, and ARIMA modeling together successfully remove structure present in TESS light curves except for transits or other brief non-stochastic behaviors such as stellar flares.
The TCF algorithm is described by Caceres et al. (2019a).It is coded in Fortran for computational efficiency; the code is available at the Astrophysics Source Code LibraryCaceres & Feigelson (2022).The TCF algorithm finds the optimal phase, duration, and depth for each period passed to the algorithm.For DTARPS, trial periods in its matched  filter algorithm were restricted to periods between 0.2 and 30 days.The lower limit of 0.2 days was chosen to facilitate the search for extreme ultra short period exoplanets; the shortest reported period for a confirmed planet in the NASA Exoplanet Archive (as of March 15, 2022), K2-137 b with a period of 0.179 days (Smith et al. 2018).The upper limit of 30 days accommodates the common 27-day single-sector TESS light cure.The 354,982 periods passed to the TCF search algorithm were chosen to be evenly distributed in log-space.The durations looped over for each period were limited to a range from a minimum of 15% of the period to a maximum of 25 hours or 50 time steps (Caceres et al. 2019a).
The top panels of Figure 6 shows the resulting TCF periodograms for the three stars in Figure 2. The LOESS curve is plotted in red to remove any trend in the periodogram noise.The plus symbol denotes the periodogram peak with highest signal-to-noise ratio (SNR) using a window of 10,000 periodogram values on either side of the peak to estimate the noise level.Note this is the SNR of the periodogram peak, not the SNR of the transit depth.The highest SNR of a TCF peak seen in stars without planetary transit signals is typically between 9 and 13, much lower than the peak SNRs between 43 and 75 seen in Figure 6.The potential DTARPS Candidates identified by the RF classifier ( §8) have peak SNRs typically between 17 and 55 and the peak SNRs for our final DTARPs Candidates (Paper II) are between 32 and 71.
The bottom panels in Figure 6 show the original DIAmante light curve and the ARIMA residual fluxes folded modulo the TCF peak period.The phase is adjusted so the transit is centered at phase 0.5; ordinate scales may differ.The phase-folded ARIMA residuals shows the double-spike shape that the transit shape was transformed into due to differencing step of the ARPS processing.The TCF algorithm is applied to the ARIMA residuals light curve but the transit is more intuitively identified by human vetters as a box shape in the bottom left panel showing the folded DIAmante light curve.
The ARIMAX model is run using the auto.arimafunction from the R CRAN package forecast (Hyndman & Athanasopoulos 2021).A transit again appears as a box shape in the ARIMAX residuals, now with autocorrelated structure in the original light curve removed.We often found that the ARIMAX model underestimated depths for the TESS DIAmante light curves, probably because of the sparsity of photometric data points in the transit and/or incorporation of some transit signal into the ARIMAX model.Despite this deficiency, the ARIMAX transit depth SNR is a prominent feature in the RF classifier ( §7).

Kepler-Based Planet Injections
The injected transit signals were drawn from the Kepler 4-year mission exoplanets that can be considered to be an unbiased sample of the true planetary occurrence rate for the shorter period exoplanets and larger radii that TESS is might identify during its prime mission.The Kepler exoplanet sample was acquired from the NASA Exoplanet Archive (accessed March 14, 2021) with period constrained P < 13.5 days to allow at least two transits during a 27-day TESS sector exposure.Following the finding of Caceres et al. (2019a) that roughly half of the 'confirmed'Kepler Objects of Interest (KOIs) with low Kepler M odel.SN R were not recovered with ARPS analysis, we removed KOIs with Kepler M odel.SN R < 20.Of the 2,356 Kepler confirmed planets, 949 are suitable for injections.
The left panels of Figure 7 show the distribution of transit parameters period, duration and depth for these 949 confirmed Kepler planets.Most of the transit signal depths from theKepler sample are below 1,000 parts per million (ppm) corresponding to a planetary radius of 3.5R ⊕ for a Sun-like star.Only 62 gas giant planets are among the 949 Kepler planet sample with planetary radii > 8R ⊕ .
This sample is too small for viable training of a high-dimensional RF classifier for an imbalanced classification problem.We therefore augment the sample of 949 Kepler planets with synthetic exoplanets sharing the same distribution of transit parameters.Synthetic exoplanets are injected into thousands of DIAmante light curves to achieve a diversity of light curve durations, cadences, and noise characteristics.The synthetic planet characteristics are derived using the Adaptive Neighbor Synthetic Minority Oversampling Technique, ANS-SMOTE, described in Siriseriwan & Sinapiromsaran (2017).This is an improved modification of the widely used SMOTE algorithm (Chawla et al. 2002).SMOTE selects a random instance in the minority class and finds the k-nearest minority class neighbors in feature space where k = 5 is commonly chosen.ANS-SMOTE removes the need to select k, finding optimal values based on local densities in the feature space.We used ANS-SMOTE to create a sample of 10,850 synthetic exoplanets with periods, durations, and transit depths shown in the right panels of Figure 7. Like the Kepler planet sample, ∼ 6% of the synthetic exoplanet transits had depths consistent with gas giant planets.Code implementation is in CRAN package smotefamily within the R statistical software environment (Siriseriwan 2019a).
A well-sampled exoplanet transit signal will also have an ingress and egress duration as a transit parameter.We assume the ingress/egress time for the exoplanets ranges from ∼ 5 minutes for smaller exoplanets (Earths to sub-Neptunes) up to ∼ 30 minutes for hot Jupiter exoplanets.Our injection model is a trapezoidal shape with straight-line ingress, duration, and egress.Given the TESS FFI cadence is 30 minutes, the ingress and egress is instantaneous in most cases.
We injected the 10,850 planetary signals into 6,506 random light curves after DTARPS preprocessing but before DTARPS anslysis.This procedure differs from some other studies that inject planetary transit signals into the pixel data (e.g., Christiansen et al. 2020).The light curves that received planetary injections were selected randomly from the sample of DIAmante stars with the following filter to avoid subgiants: R < 1.45R , 5900 ≤ T ef f < 6200 K and R < 1.55R , 6200 ≤ T ef f < 6500 K and R < 1.65R , and 6500 ≤ T ef f and R < 1.7R .
Figure 8 illustrates the injection process for a planetary injection.The top panel shows the DIAmante extracted TESS FFI light curve for TIC 398441407, a V=12.1 G2V with radius 1.1 R .The middle panel shows the injected transit characterized by the transit period, depth, and duration.The phase of each transit was chosen randomly.Sometimes the depth of the injection signals may appear to vary due to the transit jittering with respect to the 30 After the injected light curves had been modeled with a best-fit ARIMA model and analyzed by TCF, the resulting TCF periodograms were vetted by human vetters to identify injected transit signals that had been successfully recovered by TCF.The injected transit signal was only considered recovered if the orbital period of the peak TCF period matched the injected orbital period (or an integer ratio of the injected orbital period) within 1% and a transit dip in the light curve was visible.Smaller injected planets were more likely rejected by human vetting due to the lack of a visibly discernible transit.Of the 10,850 injections, 1,327 synthetic injections were recovered for use as the positive training set for the RF classifier.

Negative Training Set
Identifying transiting exoplanets is a highly imbalanced classification problem as most planets have unsuitably inclined orbits or are too small for transit detection.However, the RF technique is well-adapted to this situation; Chen et al. (2004) showed that RF classifiers can perform well with a positive training set that is as small as 2% of the entire training set.We chose to a 20:1 ratio of negative to positive training set sizes with 26,953 light curves without injected planetary signals compared to 1,327 light curves with planets.
The negative training set of the RF classifier should be made up of light curves with no transiting exoplanet signals.However, it is infeasible to manually vet and remove transits from this large negative training set.From theKepler survey, Howard et al. (2012) found that the expected planet occurrence rate of exoplanets with radii 2 − 32R ⊕ and periods less than 10 days for GK dwarfs is 0.034 ± 0.003.Since most of these have inclined orbits without transits, an unvetted random sample of TESS light curves thus suffers negligible contamination from transiting exoplanets.In any case, RF classifiers have been shown to perform well when a small fraction of their training set has been mislabeled (Mellor et al. 2015).
In order to push the classifier away from labeling EB transit signals as an exoplanet transit signal, we supplemented the negative training with the injected FP light curves used in M20.The injected FP light curves are made up of injected EB transit signals corresponding to secondaries with radii larger than 2.5 R J in circular orbits.The injected FP signals were not vetted to see if the FP signal was recovered after the ARIMA processing and the TCF analysis because the classification and characterization of FPs is not the goal of the classifier.We used 11,342 injected FP light curves and 15,611 random light curves in our negative training set.

Training and Validation Sets
The full set of labeled objects used for training is split into a training set for the RF classifier and a validation set to measure the performance of the RF classifier on a set of labeled data.We reserved 20% for the validation set, randomly chosen.The output of a RF applied a new data point is a prediction value between 0 and 1; this is not a calibrated probability value, but can be considered a pseudo-probability that the input to the RF belongs to the positive class.After the classifier is applied to a validation set, we must set a classifier threshold to convert this 'soft' classification pseudo-probability to a 'hard' classification to produce a confusion matrix.
There is no rule governing the choice of the threshold for a classifier other than post facto performance metrics like the Matthew's Correlation Coefficient (MCC), Youden's J Index, and adjusted F-score (Powers 2011).These are defined as follows: where TP is the number of true positives (exoplanet injections above the RF threshold), TN is the number of true negatives (negative validation set objects below the RF threshold), FP is the number of False Positives (negative validation set objects above the threshold), and FN is the number of false negatives (exoplanet injections below the RF threshold).MCC is the correlation coefficient between the labeled test data set and the predicted labels for the validation set.It can have values between -1 and 1, with 1 corresponding to a perfect classifier, 0 indicating a random classifier, and -1 corresponding to the worst possible classifier.The adjusted F-score, ranging from 0 to 1, is an improvement to the normal F-score that balances classifier recall and precision for imbalanced classes.It gives a higher weight to the correctly classified positive instances in the test data set and a stronger weight against FPs than the traditional F-score.In Youden's J, also ranging from 0 to 1, the first term is the classifier recall rate or True Positive Rate (TPR) and the second term is the False Positive Rate (FPR).When evaluating a trial classifier, the threshold corresponding to the maximum Youden's J index was used although all three metrics were used to select the final RF classifier threshold.
The Receiver Operating Characteristic (ROC) plots the TPR as a function of the FPR for every possible threshold value of the classifier.We used CRAN package ROCR (Sing et al. 2005) implemented with the R software (R Core Team 2020) to calculate the ROCs.The Area Under Curve (AUC) for the ROC is a measure of classifier performance that does not depend on a single threshold choice.An AUC of the ROC of 1 indicates a perfect classifier, 0.5 indicates a classifier that performs no better than assigning random labels, and 0 indicates the worst possible classifier.Closely related to the ROC, the Precision-Recall curve is used for imbalanced classification problems because both precision and recall focus on the correct classification of the positive class.The AUC for the Precision-Recall curve is another measure of classifier performance that does not depend on a threshold choice.

Training RF Classifiers
In order to optimize the final RF classifier, we trained thousands of trial RF classifiers with different combinations of feature selection, feature weights, and algorithmic options to maximize the performance of the RF classifier on the validation set.The number of features to try at each node was left at the default value of 7, but instead of testing a set number of splits in the data for the node features, the optimal split for each node was found.Since the training set is highly imbalanced ( §5.3), the balanced Random Forest option is used (Chen et al. 2004;Ishwaran & Kogalur 2022).Balanced RF compensate for an imbalanced training set by undersampling the majority class for each tree in the RF classifier so that each tree is grown using a balanced subsample of the full training set.The number of trees in the forest was varied from 500 to 1000.The RF analysis was performed using CRAN package randomForestSRC (Ishwaran & Kogalur 2022) implemented by public domain R statistical software environment (R Core Team 2020).For statistical background, see O'Brien & Ishwaran (2019) and the randomF orestSRC vignettes1 .
The feature selection and weights were the focus of our tuning parameters to build the best possible performing RF classifier.Over 100 features were gathered from every stage of the ARIMA and TCF analysis as in Caceres et al. (2019a).Features describing the light curve were extracted from the light curve, the differenced light curve, and the residuals of the light curve after the best fit ARIMA model had been subtracted.Features from the TCF analysis included the features from the top 100 peaks of the TCF periodogram as well as features from the peak with the greatest peak signal-to-noise ratio (SNR).Features were also created for the light curve folded according to the parameters from the best TCF periodogram peak.Stellar metadata from the TIC v8 (Stassun et al. 2019) and the Gaia DR2 (Gaia Collaboration et al. 2016, 2018) were gathered for each light curve.Finally, two features that were of high RF feature importance in M20 were calculated for all of the light curves.
Each trial RF classifier is evaluated with ROC and Precision-Recall curves.Trial classifiers with AU C0.9 for the ROC and > 0.85 for the Precision-Recall curve were kept for further consideration.The features from the top performing RF classifiers were then combined with different feature weights and a new batch of RF classifiers were grown.The optimization process was repeated to narrow down the feature and feature weight choices.Altogether approximately 20,000 classifiers were considered.In the final rounds of optimization, the number of trees for each RF classifier was raised to 1,000 trees.All classifier trials are calculated from light curves after the removal of outliers and ramping problems ( §3.1).
In the last round of optimization, we added 133 random candidates from M20 to the validation set and used the TPR of the M20 candidates to help make the final classifier decision.The final RF classifier had the highest AUC of the ROC and Precision-Recall curve along with the highest recall rate of M20 candidates at the threshold for the maximal Youden's J index.TCF periodogram properties: Caceres et al. (2019b) found that the collective top periodogram peak properties, not just the results from the strongest TCF periodogram peak, improved the performance of the classifier for Kepler light curves.We find the same behavior for the TESS classifier.The mean SNR of the top 100 TCF peaks is used to identify periodograms with many noisy peaks in the periodogram, suggesting that no strong periodicity was identified.The mean power of the top 100 TCF peaks calculated from the raw TCF output and from the LOESS regression line are used to pick out periodograms with just a single or a few strong peaks.A low mean power of the top 100 peaks but a strong power for the best peak indicates that the best exoplanet transit peak is probably significant and does not arise from periodogram noise.

Feature Selection
Best TCF transit properties: The eleven properties based on the best TCF period have high feature weights.These include: three properties of the highest TCF periodogram peak; six measures of the time series folded modulo the best period; and the significance of the depth derived from the parametric ARIMAX model.The three properties of the strongest TCF peak are the period, TCF power, and SNR after subtracting the LOESS fit.Caceres et al. (2019b) refrained from including the transit period and corresponding planetary radius from their RF classifier to avoid biasing their candidate results.We include the transit period because it reduces the significance of spurious peaks with periods 13 − 15 days due to the TESS satellite orbit (Figure 12).
The shape parameter of the folded DIAmante light curve compares the mean value of the transit with the median absolute deviation (MAD) value for the other mean values of the non-transit sections of the phase-folded light curve.This measures the transit's flux difference compared with the rest of the light curve, and provides a subtle distinction between planets and EBs.
The Anderson-Darling test is applied to the distribution of phases for the observations in the phase-folded light curve to test if the phases are consistent with an underlying uniform distribution.If not, then the TCF may have found a spurious periodic signal by aligning gaps in the light curve rather than identifying a true exoplanet transit.This is a crucial feature for identifying and reducing spurious periodogram peaks arising due to periodicities in cadence gaps.
The t-test, designed to quantify the difference in means of two Gaussian distributions, is applied to the even and odd transit light curve flux values and the in-transit and out-of-transit light curve flux values.A larger p-value is desired for the even-odd t-test to distinguish planet transits from EBs, while a smaller p-value is desired comparing the in-transit and out-of-transit fluxes states to show that the transit represents a statistically significant dip in flux.These tests rely on a transit mask to label points in the light curve that are in-transit and out-of-transit, as well as to label even and odd transits.
The SNR of the primary transit feature describes how well the transit signal with the period and phase from TCF add up over the phase-folded light curve.It is described in both M20 and Kovács et al. (2002).The fractional transit duration is the ratio of the transit duration to the transit period (M20).
Inferred planet radius: The radius of an exoplanet is calculated from the depth of the transit from the best TCF peak and the stellar radius from the TIC.We include the planetary radius because the injected FP signals included in our negative training set (see §5.2) allowed us to train the RF classifier away from likely astrophysical FPs with very deep transit depths.Note however that this may reduce the DTARPS classifier sensitivity to very large, inflated gaseous exoplanets.
Figure 9 shows the feature importance plot associated with the final RF classifier where input features are ordered by their importance to the classification.Feature importance is calculated by comparing the training set label accuracy from a perturbed OOB forest ensemble with the unperturbed OOB forest ensemble (Ishwaran & Kogalur 2022).For each feature, the label from the perturbed OOB forest is found by classifying each data case normally on the OOB trees in the forest for that data case, but whenever a node is encountered that is split using the feature for which the importance is being calculated, the opposite daughter node is used for classification.Therefore, the feature importance shows the improvement of the accuracy of the entire RF classifier when the correct classification path in the trees is used for a feature rather than the opposite classification path for that feature.The feature importance is calculated from the predictive success of the feature and often cannot be interpreted physically (Genuer et al. 2010).
The signal-to-noise ratio of the transit in the folded light curve is the most important feature, followed by the error on the ARIMAX fitted transit depth, the period, and the planet radius.Some features serve as positive discriminators of planet transits (such as snr.transit and arbox deperr) while others serve to push away spurious effects (TCF period helps remove 13 − 15 day periodogram peaks, and planet rad tcf helps remove deep EB eclipses).Of the top five most important features, three are also among the most important features in the DIAmante classifier derived by M20 (snr.transit,planet rad tcf, and frac dur).

THE DTARPS FINAL CLASSIFIER
Figure 10 shows the ROC and Precision-Recall curves for the final RF classifier.Note that the abscissa is logarithmically transformed to highlight small values of FPR needed for reliable transit discovery.The solid lines give the TPR and the FPR (or the TPR and the precision) for every possible threshold value between 0 and 1.The dashed lines give the recall rate for random sample of 133 planet candidates from the M20 DIAmante study.
Our threshold choice of Random Forest probability P RF = 0.300 is shown with the larger green points.We chose this threshold to minimize the FPR as much as possible while maintaining high DIAmante survey recall and the TPR.Three other threshold choices are shown for comparison; M20 chose a threshold that gave a FPR of 1% that lies very close to the maximum Youden's J threshold.The final TPR and FPR values for our chosen threshold, and comparison thresholds, are listed in Table 2 along with classification metrics described in §6.The DTARPS final classifier maximizes the MCC metric but represents a compromise with respect to other metrics.Further detail is given in Figure 11 showing the confusion matrix for the final DTARPS classifier based on our chosen threshold P RF = 0.300.The confusion matrix shows how well the predicted labels from the RF classifier line up with the actual labels of the data in the training and validation sets.For the training set, we used the out-of-bag (OOB) RF prediction value to determine the predicted label.Each tree in the RF classifier uses a bootstrapped sample of the training set for construction, called bagging.OOB prediction values are calculated using only decision trees in the RF ensemble that were not grown using that training data case (Breiman 2001).
For the data whose label is non-exoplanet, we separately counted the negative data sets of randomly selected light curves and injected FPs.Perhaps the most important result here is the extraordinary effectiveness of the final RF classifier with respect to injected False Positive EBs: only labeled 2 out of the 11,342 (0.02%) injected FPs used in the labeled data sets are falsely labeled as an exoplanet transit signal.Overall, the confusion matrix shows that the optimized Random Forest classifier attains a 92.8% True Positive recovery rate with 0.37% False Positive contamination with respect to injected exoplanet transits and simulated variable stars.The performance of the final optimized classifier is shown visually in Figure 12 where the classification results are plotted as a function of period from the best TCF peak for the entire labeled data set ( §5.3) and 133 randomly selected candidates identified in M20.The RF prediction values for labeled data set objects in the training set is the OOB prediction value.
The plot of P RF against TCF best period in Figure 12 provides valuable insights into the classifier performance that are not revealed in the confusion matrix.The vast majority of injected planets are recovered with periods ∼ 0.7 to 11 days.The small number of False Positives (with respect to the threshold) do not have preferred periods.
A strong spike of negative label points lying below the P RF = 0.300 threshold is present at periods 13 − 15 days.This arises from the TESS satellite 13.7 day lunar-synchronous orbital period with a large gap in the middle of the FFI light curve in each sector.This leads the Transit Comb Filter algorithm, in the absence of a strong transit signal, to fold the data in half to line up the gaps in the data.Other period search procedures applied to TESS light curves are similarly affected (e.g.M20, Chakraborty et al. 2020).Many of the trial RF classifiers were less successful than the final classifier in pushing down the P RF values for these spurious periodicities.However, in the final classifier, this spurious spike in Figure 12 has the indirect effect of causing a sharp drop in the RF prediction value of all objects with periods longer than ∼ 11 days.As a result, our classifier is insensitive to true exoplanet transits at longer periods.This might have been alleviated if our injected exoplanet training set extended to longer periods ∼ 15 − 25 days.
In contrast, although the injected exoplanet periods do not go shorter than 0.625 days (because the injections were based on Kepler planets based on a transit search truncated below 0.5 days), the optimized RF classifier does not appear strongly biased against short period transit signals.This is seen by the recovery of several DIAmante candidates in the 0.2 − 0.6 day regime.

THE DTARPS ANALYSIS LIST
The final RF classifier was applied to a test set of 823,099 DIAmante light curves.This is the full DIAmante collection of TESS Year 1 light curves minus those with missing features (Table 1).Random Forest classifiers require full characterization of each object, and we did not attempt imputation of missing data.The classifier threshold of P RF = 0.300 was then applied.The result is 7,377 DTARPS processed DIAmante light curves had a RF prediction value above the threshold.We call this the DTARPS Analysis List of TESS Year 1 stars.This DTARPS Analysis List represents 0.9% of input light curves selected by uniform statistical procedures to have periodic transit-like features.It is roughly similar to the Threshold-Crossing Event (TCE) list by the TESS official pipeline as a step toward producing their TESS Object of Interest (TOI) list (Guerrero et al. 2021), although their processing steps are quite different from the DTARPS analysis.
A small portion of the DTARPS Analysis List is shown in Table 3 with the full list available in machine readable format from the electronic edition of this paper.
We emphasize that, while this DTARPS Analysis List of 7,377 TESS stars has captured many transiting planets, it is still dominated by False Alarm and False Positive objects.The False Positive Rate of 0.0037 estimated from the combined labeled training and validation sets (Table 2) predicts that at least ∼ 3, 000 of the 7,377 objects are not valid transiting planets.A rigorous vetting process to remove as many falsely labeled objects as possible is therefore needed to give a smaller catalog with much higher reliability ( §11.3).In the parlance of machine learning classification, the 7,377 stars represents the maximum recall of the DTARPS analysis but with low sensitivity.The DTARPS Candidates catalog in Paper II produced after applications of vetting procedures represents the subset with high sensitivity.Col 20: PRF : Pseudo-probability of planet classification from Random Forest classifier.

COMPLETENESS OF THE DTARPS CLASSIFIER
The ability of a classifier to recover positive cases in a training set is called the recall rate in statistical parlance, and is called the completeness in astronomical parlance.We use these terms interchangeably.An understanding of the completeness is needed to evaluate the merits of the DTARPS Analysis List derived in §8 and the smaller DTARPS Planet Candidate catalog produced in Paper II that is less complete but with higher quality with respect to False Positive rejection.

Dependence on Period and Radius
The completeness of the RF classifier for different bins in planetary period-radius space is measured using the full set of synthetic planetary injections based on the Kepler planet sample ( §5.1).This analysis is based on 7,751 of the 10,850 synthetic planetary injections that were processed by the RF classifier; the remaining objects were omitted due to missing features.Figure 13 shows the completeness for each period-radius bin.This is the number of injected planets in the bin with a RF prediction value greater than the P RF = 0.300 threshold divided by the total number of processed injected planets in the bin.The bins are distributed evenly in log-space for the injected planetary radius and the injected orbital period.
The heat map shows very poor completeness (< 10%) for radii less than 2 R ⊕ across all periods, and very poor completeness for periods less than 1 day and radii < 5 R ⊕ .The classifier has low completeness (10% -25%) for planets with radii between 2 and 4 R ⊕ , and high completeness (70% -100%) for periods between 0.6 and 13 days for planetary radii between 8 and 30 R ⊕ .In the latter region, the classifier essentially captures the full exoplanet population.At a given planet radius, the DTARPS classifier achieves somewhat higher recall rates for periods around 2 − 4 day than around 7 − 13 day, producing a tilt in the heat map.The outlying bins of the distribution in Figure 13 are sparsely populated bins where recall rates are uncertain.For example, the pale yellow bins with radii larger than 10 days do not represent an inability of DTARPS to recover planets, as only a single injected object is present in those bins.

Dependence on Transit Depth
The sensitivity of a transit survey can be examined in terms of the transit depth in the best-period folded light curve.A common measure of the ability to recover a synthetically injected transit signal is the 'effective signal-to-noise ratio (SNR ef f )' of a transit signal, the depth of the transit divided by the standard deviation of measurements in the transit.(Kovács et al. 2002;Howard et al. 2012;Christiansen et al. 2013Christiansen et al. , 2016)).Using features already calculated for use in the RF classifier, we calculate an SNR ef f for the injected transit signal and the periodic signal associated with the best peak in the TCF periodogram based on equation 1 in Howard et al. (2012).Here, the effective SNR of the transit is where δ is the fractional depth of transit, IQR is the InterQuartile Range of the light curve, n pts is the number of points in the light curve, and T dur /P is the fractional duty cycle of the transit.We substitute the IQR of the original light curve (for the injected signal) and the IQR of the ARIMA residuals (for the best TCF transit) instead of standard deviations.Both the IQR and the standard deviation of a distribution are measures of the spread of the distribution, but the IQR is more robust against non-Gaussianity.Figure 14 compares the SNR ef f for the injected planet signal and the SNR ef f for the strongest periodic signal in the ARIMA residuals from the TCF periodogram.This is shown for two subsamples: the injections that are successfully recovered by DTARPS processing (purple) and injections that were rejected (gray).The orange dashed lines in Figure 14 show the approximate lower boundaries for the SNR ef f for the recovered injected planet signals used in the positive training set.The boundaries were set at SN R ef f = 6 for both the injections and the ARIMA residuals.
Ninety-three percent of the rejected injected planetary signals lie to the left or below the boundaries in Figure 14.The recovery rate of the injected planetary signals with effective SNRs above the SN R ef f = 6 boundaries is 71%, much larger than the overall recovery rate of the injected planetary signals ( §5.1).We infer that the paucity of TESS observations − only ∼ 1, 000 points in single sector observations − hinders detection of small planets with short periods that have SNR ef f ≤ 6.Similar difficulties probably affect other TESS transit analysis systems: only 6% of TOI candidates have periods <1 day and only 5% have radii <2 R ⊕ .

Correct Identification of False Positives
In statistical parlance, the rate of correct identification of False Positives in the training set is called the specificity of a classifier.Figure 15  probable False Positives, pulsing stars, rotating stars, and planetary candidates with no discernible corresponding radial velocity signal were all considered as False Positives for this analysis.If an object was labeled as a planetary candidate by one study and a false positive by another, than it was considered to be a false positive object.In this sample, the DTARPS RF classifier correctly identified 400 of the 513 (78%) False Positive signals as nonexoplanet candidates.The TCF obtained correct periods for 390 (76%) of the False Positives.Figure 15 shows the specificity of the RF classifier for different bins in transit signal radius-period space.DTARPS has excellent specificity for most periods and radii, even at the extreme values: 90% for periods <1 day, 97% for periods >10 days; 93% for strong signals with associated radii >15 R ⊕ .
DTARPS thus effectively removes most known False Positives over a wide range of parameters.But in situations where False Positives overwhelm True Positives by a large factor, this level of specificity will give significant contamination of False Positives in lists of True Positives.In Paper II, we estimate that the DTARPS Candidates catalog has around 50% purity for transiting planets with the remaining objects as incorrectly classified False Positives.

RELATIONSHIP TO OTHER SURVEYS
The main difficulty in making scientifically useful comparison of the DTARPS classifier to previous studies are likely incompleteness and erroneous (False Alarm or False Positive) exoplanet identifications in all surveys.The 'Confirmed Planets' from the NASA Exoplanet Archive (NEA, accessed March 15, 2022) is likely to have the fewest errors, though its listings are culled from a heterogeneous collection of studies so it does not have a well-defined completeness.The TESS Objects of Interest (TOI) list, the community TOI (cTOI) list (both accessed March 15, 2022), and the M20 DIAmante analysis candidates are specifically derived from TESS data, but their reliability is uncertain.The NEA and TOI efforts also list False Positives that are useful for comparison with DTARPS results ( §9.3).
We matched the DIAmante data set and the potential candidate transits in the DTARPS Analysis List with lists from 15 previous studies on exoplanet surveys or False Positives such as low mass eclipsing binaries, flare stars, and stellar rotation.The exoplanet survey studies utilized here are Mayo et al. (2018), Dressing et al. (2019), Feinstein et al. (2019), Kostov et al. (2019), Kruse et al. (2019), Yu et al. (2019), M20, Dong et al. (2021), Eisner et al. (2021), andOlmschenk et al. (2021).The False Positive studies are Affer et al. (2012), Collins et al. (2018), Schanche et al. (2019), von Boetticher et al. (2019), and Tu et al. (2020).Appendix B gives brief description of each of the external surveys and their corresponding entries in the DTARPS Analysis List.Where TIC numbers were not available for matching DIAmante light curves with reported objects in these external studies, we used the best match between the right ascension and declination coordinates of the objects with a search radius of 5 .The periods reported in the external surveys (when available) are compared with the period from the best TCF peak.We consider the period matched when the TCF peak period is within a 1% fractional difference of the reported period (or a harmonic of the reported period).
As most reported candidates are not yet confirmed by spectroscopic observations, these recovery rates do not reflect true planetary populations.Rather, the value of DTARPS Analysis List is to add confidence to the reality of candidates it confirms, and to cast some doubt on those it does not confirm in regimes where it has strong recall rates.

NASA Exoplanet Archive Confirmed Planets
Of 3,616 'Confirmed Planet' or 'Known Planet' systems in the NASA Exoplanet Archive or TOI lists (accessed March 15, 2022), 202 were in the DIAmante data set classified by our RF classifier.TCF correctly matched the period for 166 (82%) Confirmed Planets.Of the 36 Confirmed Planets that failed the DTARPS selection criteria, 9 have periods >13.5 days.But 7 other Confirmed Planets with periods 13.5 − 30 days were recovered, and the 1/3 harmonic of one Confirmed Planet with period >30 days was found.Even though the injected planetary signals have periods restricted to < 13.5 days, DTARPS is still capable of matching the periods of long period planets.
The P RF = 0.300 threshold of the DTARPS RF classifier identified 130 of the 202 Confirmed Planets giving a 64% recall for known exoplanets.All have correctly matched periods.Figure 16 shows DTARPS has a strong recall rate (> 50%) for periods between 1 and 10 days, up to radii of 10 R ⊕ .DTARPS has poor recall rates for Confirmed Planets in the lower-left bins with periods <1 day (Ultra-Short Period planets, USPs) and radii <5 R ⊕ likely due to the overpopulation of short TCF periods ( §A.1).Of the 22 Confirmed Planets in these bins, only 3 have TCF periods that match the reported period.DTARPS recall rate is good for larger USP planets, exceeding 60% for periods 0.5 − 1.0 day and radii 5 − 15 R ⊕ , but often the recovered periods are incorrect.DTARPS thus has poor recovery overall of Confirmed Planets with periods less than 1 day.

M20 DIAmante Candidates
Of the 394 planet candidates identified by the M20 DIAmante analysis (some of which were later confirmed as planets or found to be False Positives), 364 were processed through the entire DTARPS procedure.The best DTARPS period matched the M20 reported period for 333 (91%) of the M20 candidates.The TCF periodogram and BLS periodogram thus emerge with identical results for nearly all DIAmante candidate planets.Of the 31 M20 candidates where TCF failed to recover the correct period, 17 had M20 periods greater than 13.5 days and were thus outside the range of our injected planet training set.TCF often identified the 1/2 or 1/3 harmonic period of long-period M20 candidates; it is not clear which period is correct in these cases.
The DTARPS Analysis List captures 213 of the 364 M20 candidates giving a 59% recall rate for the M20 study.The main difference between DTARPS and DIAmante results is thus attributable to the classification stage of analysis.All of these recovered M20 candidates had a TCF period matching the reported M20 period.
Figure 17 (top panel) shows the recall rate of the RF classifier for different bins in radius-period space from the best TCF peak.The RF classifier has the strongest recall rates for the candidates whose planetary radius from TCF is between 2 and 10 R ⊕ and whose TCF orbital period is between 1 and 10 days.The RF classifier only has a 20% recall rate for M20 candidates with TCF periods greater than 10 days and a 47% recall rate for TCF periods less than 1 day.The recall rate for the M20 candidates also falls off at larger TCF planet radius likely due to the many injected False Positive signals in the negative training set with planetary-consistent radii.In the bottom panel of Figure 17, the colored points show the recovered M20 candidates and the black points show the unrecovered M20 candidates with the RF classifier.The distribution of M20 candidates closely follow the recall distribution of synthetic planet injections.
The candidates reported in M20 mostly have radii >7 R ⊕ , with moderate coverage around 3 − 7 R ⊕ .Neither the DTARPS nor the M20 samples cover well the region with small radii; only 16 candidates have radii <3 R ⊕ .Despite this, DTARPS is successful at recovering the M20 candidates with radii less than 5 R ⊕ (77%).But for periods < 0.5 day, DTARPS has only moderate recovery of DIAmante candidates.This is likely due to concentration of injected False Positives with TCF periods in this region and TCF radii consistent with planetary objects (Figure 21) that bias the classifier against short period planet transit signals.DTARPS also has poor recovery of the M20 candidates with periods >10 days and radii >10 R ⊕ for reasons explained in §A.1.

TESS Objects of Interest
The recall coverage of the TESS TOI confirmed planets and planet candidates are included in §10.1 and §10.4,so they are not presented here again.The DTARPS sample contains 846 TESS TOIs of which 140 have reported periods > 13.5 days.DTARPS matches the periods for 566 of 706 of the remaining TOI planets and candidates (80%).The recall rate of the TOI confirmed planets and planet candidates is 51% with 433 of 846 TOIs.

Candidate Planets in Other Surveys
Figure 18 shows recall dependencies for the planet candidates in the surveys/lists listed in §10 above.The DIAmante data set contains light curves corresponding to 1,042 stars in these candidate planet samples and has an overall recall rate of 41% for previously identified planet candidates.The results are similar to the Confirmed Planets and DIAmante sample.DTARPS has strong recall for TCF periods 1 − 10 days and for TCF radii 1 − 10 R ⊕ .Recall rate drops to 13% for TCF periods <1 day.At large planet radii, the recall rate drops to 46% (10 − 15 R ⊕ ) and 4% (>15 R ⊕ ).Note that a handful of planets are recovered with 15 − 30 day periods despite the absence of injected planet training for these long periods.This study is based on the premise that existing searches for transiting exoplanets − even those conducted by the TESS Science Office producing the TOI lists− have not identified the full detectable population of planetary systems in TESS FFI light curves ( §1.1).The sensitivity and reliability of transit search depends critically on the development and refinement of statistical methodology focused on the complexities of this scientific problem.The problems are challenging: a wide variety of contaminating stellar and instrumental signatures in the light curves; a highly imbalanced classification problem with imperfect training sets; and limited telescope time to validate the resulting planetary candidates.We adopt and refine the AutoRegressive Planet Search (ARPS) procedure developed by Caceres et al. (2019a) in an effort called DIAmante TESS AutoRegressive Planet Search or DTARPS.It combines the time series extraction and preprocessing from the DIAmante project (Montalto et al. 2020, M20), ARIMA modeling (Box-Jenkins analysis) for light curve detrending, our Transit Comb Filter (TCF) periodogram for transit discovery, and machine learning with Random Forest for optimizing True Positive and minimizing False Positive classifications ( §2).We apply the procedure to ∼ 1 million TESS Year 1 Full Frame Image (FFI) light curves for brighter stars in the southern ecliptic hemisphere ( §3).Best fit ARIMA models, fitted by maximum likelihood estimation with optimized model complexity, are subtracted from the DIAmante light curves ( §4).The TCF periodogram is then calculated to identify and characterize periodic transit-like behavior in the light curves.The most likely transit signal period was chosen as the TCF periodogram peak with the greatest signal-to-noise ratio after detrending the periodogram.
Considerable effort was expended to tune a Random Forest machine learning classifier that identifies exoplanet transit candidates while reducing various sources of contamination ( §5-6).The positive training set was constructed from synthetic planetary injections augmented from confirmed planets in the 4-year Kepler survey, and the negative training set of random light curves was supplemented with synthetic eclipsing binary injections from M20.Several dozen features from every stage of the analysis was combined with stellar metadata to construct a RF classifier; the final classifier has 37 features with different weights ( §7).
After choice of a threshold of RF prediction value, we produce a list of 7,377 objects called the DTARPS Analysis List ( §8).It is optimized to have high recall of known planets, but has low precision with many False Alarms and False Positives.The classifier performance is summarized in Figures 10, 11, and 12.The list has a True Positive Rate of 92.8% and False Positive Rate of 0.37% with respect to injections of simulated planets, simulated astrophysical False Positives (mostly EBs) and random light curves ( §10.1).
The DTARPS completeness heat map of the injected planetary signals for the RF classifier (Figure 13) shows that the DTARPS method has poor recall for radii <2 R ⊕ or periods <1 day, low completeness for planets with radii between 2 and 4 R ⊕ , and high completeness for planes with radii between 8 and 30 R ⊕ and periods between 0.6 and 13 days.The distribution of the recall for the confirmed planets, M20 candidates, and previously identified candidates generally follow the results of the injected planet completeness map ( §9).We compare the DTARPS Analysis List to other southern hemisphere samples: NASA Exoplanet Archive Confirmed Planets, TESS Objects of Interest, and other transit surveys ( §10, Appendix B).
Our classifier has imperfections.Smaller injected planets are not recovered by the TCF fitting algorithm and the TCF transit depth (scaling to planet radius) is somewhat under-estimated.This effect is from overfitting by the ARIMA modeling and underfitting by the TCF matched filter procedure (Appendix A).In addition, large-radii stellar companions are fitted with planetay-radii signals, thereby biasing the RF classifier against longer period (> 8 days) Jovian planets and short period (< 1 day) planets.Some of these problems can be ameliorated in future studies ( §11.2) while others are intrinsic limitations of the TESS survey and DTARPS methodology.
The principal product of this paper, the DTARPS Analysis List (Table 3), optimizes recall (completeness) at the expense of precision (acceptance of False Positives).It serves three purposes: Potential transiting planets for spectroscopic followup: This would proceed with the understanding that more than half are likely to be False Alarms (no real periodicity) and False Positives (non-planetary periodicity).It may be particularly useful for subsets such as very bright host stars.
Intermediate list ready for vetting: Vetting will increase precision, greatly reducing False Alarms and False Positives, but with reduced recall (completeness).This is accomplished in Paper II; see §11.3 for discussion.
Support for other surveys: If a star in the DTARPS Analysis List was independently found to be an unconfirmed planetary candidates in the TOI list or another transit search procedure, then confidence in its planetary nature is increased.

Future Improvements to DTARPS methodology
The statistical issues arising in reliable planetary transit identification are complex and differ with each survey.The focus of our DTARPS effort is based on the ARPS procedures developed in Caceres et al. (2019a) and applied to the Kepler dataset by Caceres et al. (2019b).We institute a variety of improvements to their methods in our application to TESS Year 1 light curves: injection-based training sets for both planetary transits and eclipsing binaries with sophisticated data augmentation procedures ( §5.1); optimized Random Forest algorithm for imbalanced training set ( §6); extensive engineering of feature selection including Gaia information ( §6); multiple metrics for classification performance ( §6.1); classification training and validation using both injections and Confirmed Planet samples ( §5, §9); and completeness heat maps ( §9).Based on the results presented here for application to TESS, a number of further improvement can be envisioned: Stellar variation removal: The linear ARIMA model is effective in removing short-memory autocorrelated trend for 90% of the DIAmante preprocessed TESS light curves considered here (Figure 5).However, ARIMA was less effective over the 4-yearKepler data, where only 47% of the ARIMA residuals were consistent with white noise (Caceres et al. 2019b).More elaborate nonlinear autoregressive models might be better for TESS light curves near the ecliptic poles or multi-year light curves.Options include ARFIMA (with 1/f α -type long-memory red noise), GARCH (with stochastic volatility), and HAR (with bursts).
Transit depth estimation: The ARIMAX modeling often gave biased estimates based on the ARIMA residuals because some of the planetary signal is incorporated into the ARIMA model ( §2.4).Trials of other multicomponent autoregressive-plus-periodic models can be made using a state space formalism (Durbin & Koopman 2012).In addition, the trapezoidal-shaped model used as the exogenous variable is adequate for most transits opbserved with 30 minute cadences, but will be inadequate for shorter cadence surveys.A more accurate exoplanet transit with curved ingress and egress is likely to be more effective, as in the Transit Least Squares procedure for identifying exoplanet transits (Hippke & Heller 2019).
TCF sensitivity: For Kepler 4-year light curves, the Transit Comb Filter periodogram appears to be more sensitive to smaller planets than other periodicity search methods such as the Box-Least Squares periodogram (Figures 9-10 in Caceres et al. 2019a;Figure 10 in Caceres et al. 2019b).This can be attributed to BLS periodogram noise enhancements when applied to non-Gaussian noise (Gondhalekar et al., in preparation).However, in our TESS FFI application, the TCF periodogram had a low recall rate for injected planets with radii 4 R ⊕ .The reason for this difference needs to be elucidated.Further investigation of both TCF and BLS periodograms is needed and improvements sought, such as evaluation of peak power using the Generalized Extreme Value distribution that is independent of periodigram noise characteristics.This can lead to discovery of smaller planets in a given data set which is a driving goal of the TESS and upcoming PLATO missions (Rauer et al. 2014).
Multiple planet systems: Currently, the ARPS procedure only treats the TCF periodogram peak with the strongest SNR and does not search for nor consider multiple transiting planets.Multi-planet systems could be searched for by iterative 'pre-whitening' procedure: the strongest planetary signal can be subtracted from the light curves, and ARIMA and TCF can be reapplied.The procedure would be repeated until the TCF peak effective signalto-noise falls below the threshold indicated in Figure 14.

Improved classifier training sets:
A larger sample of ANS-SMOTE simulations can be generated to reduce imbalance in classifier training sets.The injected planet properties can be refined with fewer small planets and more planets at both shorter and longer periods.
Classifier features: Specialized features can be added to the Random Forest classifier to better identify true exoplanets or astrophysical False Positives.A feature that quantifies the difference between the TCF periodogram peak and the spurious spike of non-exoplanet transits with a period between 13.5 and 15 days due to the TESS's orbit (Figure 12) might mitigate the bias against longer period exoplanets seen in the current DTARPS classifier.
Features that quantify curvature in the folded light curve may help reduce EB contamination and reduce the human vetting effort.

Classifier training set:
The training set planet properties were derived from injections based on confirmed planets in the Kepler sample.But with 4 year duration lightcurves, rather than 1 month typical of TESS FFIs, most of the injected planets are too small.The large number of undetectable planets in the positive training set may have distorted the classifier.A better match to TESS sensitivity might improve classifier performance.In addition, a larger number of planet injections may be helpful in regions of the period-radius diagram where the recall rate is transitioning between low and high (2 − 5 R ⊕ ), where true hot Jupiters compete with False Positive EBs, and near the edges of the heat map (Figure 13).Finally, the distribution of injected EB signals might be adjusted to approximate the expected dilution in blended systems.
Classifier type: The Random Forest classifier developed in §6 is highly effective, but the 0.37% False Positive rate is high given the huge class imbalance.This necessitates a complex vetting process (Paper II).Improved performance might be achieved with different machine learning classifiers such as XGBoost, LightGBM or Explainable Boosting Machines.XGBoost (Chen & Guestrin 2016) is similar to Random Forest but, rather than building decision trees independently, new trees are built iteratively to minimize classification error.LightGBM (Ke et al. 2017) is also similar but grows trees leaf-wise rather than level-wise.Explainable Boosting Machines (Lou et al. 2012) use small forests of decision trees for each feature in a linear regression ensemble.Preliminary investigation suggests that XGBoost may substantially reduce the False Positive Rate compared to our Random Forest classifier.
Specialized classifiers: Classifiers might be trained for particular sub-populations of TESS FFI stars such as Sunlike stars, lower mass K and M stars, subgiants, or stars in the continuous observing zone near the ecliptic poles.This would require new training sets of injections on light curves of just these stellar host sub-populations.Classifiers could also be trained for particular exoplanet sub-populations.'ephemeris matching' problem, Coughlin et al. 2014).Other effects to be removed include centroid wobbling in the FFI image, possible photometric binaries in Gaia photometry, deep secondary eclipses inconsistent with planetary radii, and variable stars blended in the TESS image extraction region. .Thus, despite our careful statistical efforts, the DTARPS Analysis List is dominated by non-planetary signals.The full list thus cannot be accepted for reliable calculation of exoplanet populations, such as converting the completeness heat maps into planetary occurrence rates, and it is an inefficient list for follow-up observations and spectroscopy with valuable telescope resources.
Fortunately, a suite of 'vetting' operations can be conducted to cull many of the False Positives and False Alarms.These are applied in Paper II to give a more reliable, though less complete, DTARPS Candidates catalog for the Year 1 TESS FFI DIAmante light curves.This study is also a product of the Center for Astrostatistics supported by the Eberly College of Science.We benefited from comments on the manuscript by members of these Centers: Ian Czekela, Rebekah Dawson, Hyungsuk Tak, Jason Wright, as well as Joel Hartman (Princeton).We also appreciate useful discussions with Yash J. Gondhalekar (BITS) on methodology.
This paper includes data collected by the TESS mission.Funding for the TESS mission is provided by NASA's Science Mission directorate.We acknowledge the use of public TOI Release data from pipelines at the TESS Science Office and at the TESS Science Processing Operations Center.This research also uses the NASA Exoplanet Archive operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.Data from the European Space Agency mission Gaia, processed by the Gaia Data Processing and Analysis Consortium (DPAC).Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.As the injected signals are the basis of the DTARPS identification of candidate planetary transit signals, it is important to understand how the ARIMA modeling and TCF periodograms respond to the injected signals.The DTARPS procedure described here is designed for sensitive and reliable detection of planetary transits and may not give accurate characterization of planetary properties such as orbital period and planet radius.In this section, we examine the limitations and biases present in the recovery of injected objects and their properties.

A.1. Recovered Planet Properties
The results of the RF classifier depend heavily on the orbital parameters obtained from the TCF algorithm for the best period.The top eight most important features of the RF classifier (Figure 9) are either extracted from the best TCF peak and TCF periodogram or are computed on the light curve phase-folded at the period identified by the best TCF transit model.
Figure 20a and b compare the injected orbital parameters with the orbital parameters from the TCF analysis for the full set of synthetic planetary injections.The synthetic planetary signals whose best TCF peak orbital period matched the injected orbital period (or an integer ratio) are shown as purple triangles.Spurious periodicities with periods of 13-15 days arise from the 13.7 day orbital period of the TESS satellite.It is not surprising that TCF would align the two halves of the light curve and find spurious double-spikes associated with ramping problems that escape our data cleaning procedure ( §3.1).The pile-up of identified TCF periods between 13 and 15 days and near the extreme limit of the TCF period search of 27 days are both expected and easily removed by the RF classifier and by vetting.
The tendency of TCF to identify much shorter periods than injected periods around 3 < P < 13 days was not expected.This is seen as the cloud of gray points in the lower-right of Figure 20a.Just over half of the 10,850 synthetic injected planetary transit signals were assigned periods < 1 day by the TCF algorithm, while only 3% of the injected FP signals had a spurious period found by TCF shorter than one day.
This issue is further elucidated in Figure 20b that compares the injected radius and the radius from the best TCF peak.The injected planets included in the positive training set shown in purple are concentrated along a locus that falls just below the desired 1:1 line.When TCF identifies the correct period for a planetary signal, it gives a slightly smaller radius than the radius of the underlying signal; the effect is more pronounced for radii 10 R ⊕ .This bias has multiple causes.First, the ARIMA model incorporates some of the transit signal with the stellar variability (Caceres et al. 2019a).This effect can also occur with other detrending statistical procedures such as wavelet analysis and Gaussian Processes regression.Second, for longer periods, the ARIMA residuals have only a few points in the ingress and egress spikes and the TCF matched filter has difficulty correctly fitting the extreme values of the spike shape.This partially accounts for the paucity of recovered large 10 − 20 R ⊕ planets at long periods in Figure 20c.Third, the ingress and egress will often be split between two TESS cadence slots so neither capture the full height of the spike in the ARIMA residuals.Stellar limb darkening may further slow ingress and egress, weakening the spike.We mitigate this radius bias in Paper II, both by fitting likelihood-based astrophysical models to the stronger transits and by visual correction of transit depths for weaker transits.
The gray points in Figure 20b reveal another bias: when TCF fails to identify the correct orbital period for a planet, it also tends to overestimate the planet radius.This overestimation of the TCF radii is strongest for smaller planets.For injected planets rejected from the positive training set, the TCF radius is more than twice the true radius for 43% of injections with R<4 R ⊕ compared to 14% of injections with R>4 R ⊕ .
The distribution of the injected planets in Figure 20c shows that the ability of TCF to recover the injected orbital parameters does not depend stringently on the injected period, but is moderately conditional on the injected planet radius.Both distributions cover the same range of injected periods from 0.5 to 13.5 days.The distribution of recovered injected planets is centered at a higher injected radius than the distribution of rejected planetary injections indicating that ARPS has an easier time identifying planetary signals with R 5 R ⊕ .This effect will be quantified in Paper II with completeness curves.The tendency of DTARPS to underestimate the depth of the transit signal for large injected radii signals (Figure 21b) causes DTARPS to greatly reduce the radii of injected FPs; compare the orange points in the two panels of Figure 21.This undoubtedly is due to incorporation of deep EB transits into the ARIMA model.The center of the distribution of injected FP radii is reduced from ∼100 R ⊕ to ∼20 R ⊕ .The FP and planet injection distributions overlap in the TCF parameters while they are fully separated in the injection parameters.

A.2. Injected False Positives
The predilection for TCF to report ultra short periods when it fails to find the correct period combined with the much smaller TCF radius value created a sample of ∼ 400 injected FP signals with TCF periods < 2 days and TCF radii < 10 R ⊕ .Only injected FP signals with spuriously identified TCF periods < 2 days have TCF radii < R ⊕ .There are only ∼ 100 injected planetary signals in the positive training set in that region.The erroneously characterized FPs completely dominate the shortest TCF periods with 0.2 − 1 day.This means, even though TCF tends to identify spurious short periods when it cannot correctly identify a transit signal (Figure 20), the RF classifier will be unlikely to identify them as potential DTARPS Candidates because that region of TCF period-radius space is dominated by injections from the negative training set.The classifier is also less likely to recover true planetary signals with periods less than 2 days.The RF classifier is more complicated than drawing boxes in the regions of TCF period-radius space, as it includes influence of over 30 other variables.But since the TCF period and radius are the 3rd and 4th most important features in the RF classifier (Figure 9), this mischaracterization of injected FP signals will affect the final classification.
In addition to the population of injected FPs with short TCF periods and TCF radii consistent with planetary objects, there are a large number of injected FPs with TCF radii consistent with a Jovian planet (∼ 10 − 20 R ⊕ ), particularly at longer periods 10 days.The presence of this population of FPs in the negative training sample may cause DTARPS to be less sensitive to Jovian planets and long period planets.The large number of injected FPs with TCF radii consistent with Jovian planets irrespective of orbital period may make it more difficult to find Jovian planets despite TCF's ability to better recover the correct orbital period for larger planetary signals.

A.3. Conclusions from Injected Populations
Only 12% of the injected planets are reliably recovered by the DTARPS procedure.This low fraction is not a surprise, as the distribution of injected radii is drawn from the more sensitive Kepler mission dominated by planets too small for TESS detection (Figure 7, top-right panel).The rate of capture of planetary injections will be examined in our completeness analysis below ( §9).
For the correctly identified injected planets, the TCF periods are mostly accurately recovered from 0.5 − 13 days.For a small fraction, the 1/2-period harmonic is preferred by the TCF.TCF-derived radii, on the other hand, are underestimated for correctly identified injected planets, particularly for large-radius injections.This bias is understood as a combination of ARIMA and TCF behaviors.We correct this bias for astronomically interesting candidates by manual intervention or astrophysical modeling in Paper II.
The DTARPS analysis also recovers a small fraction of injected false positive signals as potential planetary candidates.The response to astronomical False Positives is complicated and is discussed in §9.3.False Positive (and False Alarm) contamination motivates the strictness of our vetting procedures in Paper II.In that study, we remove nearly 90% of the objects in DTARPS Analysis List when creating the DTARPS Planet Candidate catalog.This reduces the completeness of the DTARPS Analysis List obtained here but greatly improves the reliability ('sensitivity') of the DTARPS Candidates catalog.The Confirmed Planets used here includes planets with published refereed planet confirmation papers.The NEA Confirmed Planets in the DIAmante data set were identified with the TIC ID number.In the case of multi-planet systems, we used the planet whose reported period best matched the TCF peak period.For planets with multiple entries, an average of reported orbital parameters were used.In total, 184 Confirmed Planet hosts on the NASA Exoplanet Archive were matched with objects in the DIAmante data set.

B.2. TOI List
The TOI list reports dispositions of known planet (KP), confirmed planet (CP), planetary candidate (PC), ambiguous planetary candidate (APC), or false alarm or false positive (FP).We combined CPs and KPs to be confirmed planets.Objects with recent confirmation in unpublished papers on arXiv are considered to be Confirmed Planets.We combined the APCs and PCs when considering planet candidates, and combined FAs and FPs in when considering False Positives.
Of the 1,036 objects in the TOI catalog that overlap with the DIAmante data set, 185 were labeled confirmed planets, 670 were labeled planetary candidates, and 181 were labeled False Positives.

B.3. cTOI List
The cTOI list has a wide range in the quality of planet candidates: followup EXOfop examination shows some are False Positives while others are promoted to planetary candidates on the TOI list.The DIAmante sample has 566 cTOIs; 364 are planetary candidates from M20 discussed below.2019) presented a catalog of 1,041 False Positives from the SuperWASP survey of the northern hemisphere that had been identified as potential planetary candidates previously and rejected after follow-up observations.The False Positives were classified as eclipsing binaries, blended eclipsing binaries, and low mass eclipsing binaries.The DIAmante sample matches 47 objects, 12 of which lie in the DTARPS Analysis List with the following classifications: TIC 16490297 as an eclipsing binary system; TIC 61069470, TIC 117549305, TIC 13675776, TIC 271269442 and TIC 271374913 as blended eclipsing binaries; TIC 9433212, TIC 12529950, TIC 264537668, TIC 277712294, TIC 443618156, and TIC 449050248 were labeled as low mass eclipsing binary systems.Most, but not all, have TCF periods matching the SuperWASP periods.B.17. Eisner et al. 2021Eisner et al. (2021) presented results from the Planet Hunters TESS citizen science project for the first two years of the TESS survey.They identified 90 new planetary candidates of which 18 lie in the DIAmante sample.However, none of the overlapped objects have a TCF peak period that matches the reported period from Eisner et al..This is partly due to their single transit events where the period of the planet was estimated from the transit duration.Two of their objects are in the DTARPS Analysis List.TIC 142087638 and TIC 404518509 were identified as single transit events by Eisner et al. (2021); the TCF periodogram gives accurate periods within the error bars of their single transit event estimate.B.18. Olmschenk et al. 2021Olmschenk et al. (2021) applied a convolutional neural network to TESS FFI light curves to identify planetary candidates followed by visual vetting.Of their 185 planet candidates, 25 overlap with the DIAmante sample, all of which have TCF peak periods that match their reported periods.The DTARPS Analysis List recovers 15 of their planet candidates.

Figure 1 .
Figure 1.The AutoRegressive Planet Search process.Blue boxes represent steps in the ARPS analysis covered in this paper.Orange boxes represent steps in ARPS covered in Paper II.
compared the results of classifying based on periodogram strength and a machine learning classifier and found that the machine learning classifier performed better.The principal reason is not lack of sensitivity to planetary signals but the capture of non-planetary signals, particularly BEBs.Two main families of classifiers used in exoplanet transit identification are deep learning classifiers and decision-tree-based classifiers (Jara-Maldonado et al. 2020), though other types are available.Deep learning classifiers learn features automatically from training sets of light curves.While training the neural network, the parameters of the linear combination inputs into each hidden layer feature are tuned using a cost function to minimize the classification prediction error.Transit detection using decision trees based on extracted features rather than the light curves themselves have been developed by McCauliff et al. (2015); Coughlin et al. (2016); Armstrong et al. (2018) as well as Caceres et al. (2019a).

Figure 2 .
Figure 2. Left: The raw light curves extracted from TESS FFIs for three example stars in the DIAmante data that were later found to have DTARPS candidate transiting planets.Right: Plot of autocorrelation present in the light curve as a function of lag in units of the 30 minute FFI candence.The p-value from the Ljung-Box test is 0 for all three light curves indicating that the flux values of the light curves are autocorrelated.

Figure 3 .
Figure 3.The DIAmante light curves for three example stars in Figure 2 after the removal of trends, ramping effects and outliers ( §3.1).The autocorrelation functions on the right show that autoregressive structure is still present in all three light curves.

Figure 4 .
Figure 4.The residuals after the best fit ARIMA model had been subtracted from the differenced light curves for three example stars in Figure 2. The three plots on the right show the amount of autocorrelation present in the light curve as a function of time step between points.The Ljung-Box test shows that two of three ARIMA residuals are consistent with Gaussian white noise.

Figure 5 .
Figure 5. Distribution of p-values from the Ljung-Box test for autocorrelation in light curves extracted from the DIAmante data set (on the left) and for the ARIMA residuals (on the right).There are 318,227 light curves in first bin in the histogram of Ljung-Box test p-values in the DIAmante data (left).

Figure 6 .
Figure 6.The TCF periodograms and best period phase-folded light curves for the stars in Figure 2. The red curve is the LOESS fit to trends in the median of the periodogram.The blue cross indicates the peak with the highest SNR over a window around that peak.The lower left panel shows the original light curve phase-folded on the best TCF period.The lower right panel shows the residuals after the best fit ARIMA model has been subtracted from the light curve phase-folded on the parameters from the best TCF periodogram peak.

5.
RANDOM FOREST: TRAINING SET Astronomical applications often have considerable flexibility in defining the training sets, and these choices can be a dominant contributor to classifier performance.For transiting exoplanet identification, a RF classifier requires positive training examples (light curves with exoplanet transit signals, §5.1) and negative training examples (light curves without exoplanet transit signals, §5.2).Following M20, we introduce simulated 'injected' False Positive signals into the negative training sets.Section 5.3 describes how the two samples were combined into a training set and a validation set.

Figure 7 .
Figure 7. Scatter plots of the transit parameters of the 949 confirmed Kepler planets in the left column plotted in olive green.The derived distribution of exoplanets created using Adaptive Neighbor SMOTE (ANS-SMOTE) are shown in the right column plotted in teal.

Figure 8 .
Figure 8.The steps for injecting a synthetic transit signal into a random DIAmante light curve.The top panel shows the original DIAmante light curve.The middle panel shows the transit mask created for a planetary injection.The bottom panel shows final injected light curve.
Stellar properties: Stellar metadata from the TIC v8(Stassun et al. 2019) and from the Gaia DR 2 catalog(Gaia  Collaboration et al. 2018).Stellar properties tested include the effective temperature, mass, TESS T magnitude, Gaia parallax, Gaia G magnitude, G magnitude SNR, and others.The classifier optimization found that stellar radius, surface gravity, luminosity, and Gaia G BP -G RP color index played significant roles in classification.DIAmante light curve properties:The reduced χ 2 measures the goodness-of-fit of the light curve to a constant median brightness.The tail range compares the range of the middle 96 percent of the light curve flux values with the range of the middle 50 percent of the light curve flux values.The Positive Outlier Measure (POM) measures the most extreme positive outlier in the light curve with respect to the median.As discussed byCaceres et al.  (2019b), the POM helps identify stars with strong flares that may cause spurious peaks in the TCF Periodogram.Skewness, the third standardized moment of the distribution, is a measure of the asymmetry of the distribution of light curve flux values around the mean.Kurtosis, the fourth standardized moment of a distribution, is helpful to measure the strength of outliers with respect to a Gaussian distribution.The Ljung-Box test for autocorrelation applied to the light curve has a null hypothesis that the flux values are independently distributed and tests the alternative hypothesis that the flux values show correlation.A p-value 0.01 indicates that the light curve is consistent with white noise.Differenced light curve properties: Statistics of the distribution of flux values for the differenced light curve including the 1st and 90th quantiles of the distribution.The POM, again, measures the most extreme positive outlier in the differenced light curve with respect to the median of the differenced light curve, which would identify a sharp transit brightening.ARIMA residual properties: These features include: four statistics describing the residuals after the best fit ARIMA model had been subtracted from the differenced light curve (10% and 90% quantiles, POM, IQR); three statistical tests applied to the residual light curve (χ 2 , Anderson-Darling and Ljung-Box); and two measures of the importance of the ARIMA fitting (IQR and χ 2 improvements).The Ljung-Box test applied to the ARIMA residuals indicates how well the ARIMA model did at removing short-memory autocorrelation from the time series.The Anderson-Darling test is used here to determine if the ARIMA residuals fluxes follow a Gaussian distribution.A p-value 0.01 indicates that ARIMA residuals are consistent with Gaussian noise.These two measures of the importance of the ARIMA fitting, with high feature weights, raised classifier performance more than most features; this demonstrates the importance of ARIMA modeling for planet detection.The ratio of the IQR of the ARIMA residuals to the IQR of the DIAmante light curve measures the improvement in the noise of the time series with a smaller ratio indicating a greater effect of the ARIMA fit.The ratio of the reduced χ 2 values for a constant flux model of the ARIMA residuals to the DIAmante light curve measure how well the ARIMA model removed variation and trend from the light curve.

Figure 10 .
Figure 10.Performance of the final Random Forest classifier on the validation set for every possible threshold choice shown with the Receiver Operator Characteristic (left) and Precision-Recall (right) curves.The solid lines derive from application to the validation set, and the dashed lines are the recall from 133 planet candidates of M20.Our choice of threshold PRF = 0.300 (shown in green) is compared with other possible threshold choices, including the 1% FPR choice used by M20 (purple).

Figure 11 .
Figure 11.Confusion matrix for the final RF classifier with threshold PRF = 0.300.Values are based on the validation set and OOB predictions for the training set.

Figure 12 .
Figure 12.Distribution of the Random Forest pseudo-probability PRF for the optimized classifier on the training and validation sets.The purple triangles represent light curves with injected transiting exoplanet signals that passed human vetting and were utilized in the positive training set ( §5.1).The green diamonds represent the 133 candidates identified in M20, that assisted in optimizing the classifier.The black points indicate random light curves that were given negative class labels; most have PRF values near zero.The orange points represent light curves with injected False Positive signals ( §5.2).Our chosen threshold for the final classifier is shown as the blue dashed line at PRF = 0.300.The strong performance of the classifier is directly seen: purple and green points lie mostly above the threshold while the black and orange points like mostly below the threshold.

Figure 13 .
Figure 13.Heat map of recall rates of the Random Forest classifier for the synthetic planetary injections as a function of injected period and radius.

Figure 14 .
Figure 14.Effective SNR of the injected planetary signals.(Left) Comparison of the effective SNR (equation 7 of the injected planetary transit to the effective SNR of the best transit from TCF.The injected planetary signals that were used in the training set are plotted as purple triangles while injected signals whose period was not recovered by the TCF are gray triangles.Light curves whose TCF peak periods were unaffected by the injected planetary signal are plotted in turquoise.The dashed orange lines show approximate lower bounds of injected planets used in the training set.(Right) Fraction of injected planetary signals that were recovered given the spread of effective SNR of the best transit from TCF.

Figure 15 .
Figure 15.Specificity of DTARPS RF classifier for 513 previously identified False Positives.A red colored bin represents a high percentage of False Positives with correct classification as a non-planetary signal.The number of False Positives in each bin are labeled.

Figure 16 .Figure 17 .Figure 18 .
Figure 16.Recall rates by the Random Forest classifier for 202 Confirmed Planets in the NASA Exoplanet Archive.(Top) Heat map as a function of planet radius and orbital period found with TCF.(Bottom) Confirmed planets from the NASA Exoplanet Archive superposed on the DTARPS heat map for injected planets from Figure 13.Red circles are recovered objects and black points are not recovered.The underlying heat map is the same as in Figure 13.

11. 3 .
Figure 19.Two sources of contamination that affect the DTARPS Analysis List of potential planetary transits.(Left) TCF Periodogram and folded ARIMAX residual light curve of TIC 254214344 that shows an unconvincing peak in the TCF periodogram with no clear transit in the folded light curve.(Right) Random Forest prediction score for the full set of DIAmante extracted light curves showing clusters of stars subject to ephemeris matching and periodic satellite operations as red points.

Figure 20 .
Figure 20.Comparison of the orbital parameters found from the best TCF peak with the injected orbital parameters for the synthetic injected planets.Recovered injections are shown as purple triangles.(a) Injected and recovered orbital periods.Dotted lines show integer ratios between the injected and TCF periods.(b) Injected and recovered planet radii.(c) Distribution of the injected planets in period-radius space.

Figure 21
Figure21shows the injected and TCF recovered radius-period distribution for both injected planets and injected False Positive (FP) signals.The injected FPs had periods from 0.2 to 357 days, the length of the longest extracted

Figure 21 .
Figure 21.Comparison of the synthetic orbital parameters injected into random light curves for the positive (purple triangles) and negative (orange points) training sets with the orbital parameters from the best TCF transit model.The left panel shows the distribution of injected orbital parameters and the right panel shows the distribution of recovered orbital parameters from TCF analysis.

B. 4 .
Affer et al. 2012Affer et al. (2012) measured rotation and binarity of field stars from the COnvection ROtation and planetary Transits (CoRoT) satellite for stars in the solar neighborhood.Forty objects in Table2ofAffer et al. (2012) were matched with objects in the DIAmante data set, one of which is in our DTARPS Analysis List.Affer et al. report a rotation period of 72 days for TIC 234091431 and we report a TCF period of 2.76592 days.Of the other 39 objects, only three have rotational or pulsational periods or pulsation periods that matched the TCF peak period.We label these as False Positives in the DIAmante data set.B.5. Collins et al 2018 Collins et al. (2018) identified and classified False Positives in Kilodegree Extremely Little Telescope (KELT) light curves.They classified over one thousand transit like signals in KELT as False Positives through photometric and spectroscopic observations in several classes: single-line spectroscopic binaries, multi-line spectroscopic binaries, spectroscopic giant stars, eclipsing binaries, blended eclipsing binaries, variable stars, nearby eclipsing binaries (blended in the KELT aperture), and stars with no significant radial velocity detected.The DIAmante samples has 156 objects matched in Collins et al. (2018), 19 of which are in the DTARPS Analysis List.We consider these to be previously identified False Positives.B.6.Mayo et al. 2018 Mayo et al. (2018) identified 275 planet candidates in the NASA's K2 mission, Campaigns 0-10, and estimated False Positive probability with the vespa package.The DIAmante samle has 21 objects examined by Mayo et al. (2018), one of which, TIC 21184505, is in the DTARPS Analysis List.Another object, TIC 68694240, was a probable eclipsing binary that we label as a False Positives.B.7. Dressing et al. 2019 Dressing et al. (2019) performed spectroscopic and photometric characterization for 172 K2 target stars identified as candidate hosts of transiting planets.They identified giants, likely eclipsing binaries, and cool dwarf stars.The DIAmante sample matches 8 of these stars with one, TIC 438338723, in the DTARPS Analysis List.It is a probable eclipsing binary that we label as a False Positive.B.8. Feinstein et al. 2019 Feinstein et al. (2019) developed elenor, an open-source tool for extracting light curves from TESS FFIs.They applied the method to TESS Sector 1 Year 1 data and vetted by visual examination.The DIAmante sample matches 16 of their objects, three of which are in the DTARPS Analysis List: TIC 159835004 and TIC 299780329 previously identified as planetary candidates, and TIC 38813184 is identified as an eclipsing binary.The reported EB period for TIC 38813184 matches the TCF peak period.We include TIC 38813184 in the previously identified False Positive list.B.9. Kostov et al. 2019 Kostov et al. (2019) created an open source automatic vetting pipeline for K2 data called Discovery and Vetting of Exoplanets (DAVE).They applied DAVE to 772 planet candidates from K2 and vetted the candidates either as planet candidates or False Positives.Of the 30 objects that match the DIAmante stars, TIC 21184505, TIC 294301883 and TIC 366443576 in the DTARPS Analysis List were labeled as planetary candidates by DAVE.All three objects had a TCF peak period that matched the reported DAVE period.B.10.Kruse et al. 2019 Kruse et al. (2019) identified 818 planetary candidates and 1060 eclipsing binary systems in Campaigns 0-8 of the K2 mission usong the EVEREST pipeline.The DIAmante samples matches 44 objects, two of which are in the DTARPS Analysis List: planetary candidate TIC 294301883 and eclipsing binary TIC 438338723.The reported periods from Kruse et al. match the TCF peak period.B.11. Schanche et al. 2019 Schanche et al. (

B. 12 .
von Boetticher et al. 2019 von Boetticher et al. (2019) characterized 10 low mass stars part of low mass eclipsing binary systems as part of the EBLM project.The DIAmante sample has six of these systems with TCF period matching the reported period.Four lie in the DTARPS Analysis List: TIC 101395259, TIC 277712294, TIC 350480660 and TIC 734505581.B.13. Yu et al. 2019 Yu et al. (2019) modified an neural network classifier, developed by Shallue & Vanderburg (2018) for identifying Kepler planet candidates, for TESS data.Applying the classifier to Year 1 Sector 6 TESS data and, accompanied by visual vetting, 288 new planetary candidates were identified.The DIAmante sample matches 140 objects of which 65 are in the DTARPS Analysis List.In all cases, the TCF peak period agreed with the period reported in Yu et al.We label these as previously identified planetary candidates.B.14. Montalto et al. 2020 Of the 394 candidates identified by M20 in the DIAmante study, 364 were in the set of light curves classified by the DTARPS Random Forest.These are identified by a flag in the DTARPS Analysis List; see §10.2 for details.These include 221 in the NEA Confirmed Planet list, the TOI list on the NEA, or in other external surveys.Altogether, 82 are identified as Confirmed Planets.The M20 objects were placed on the cTOI list: 82 are labeled as planet candidates, 18 as ambiguous planet candidates, and 26 as False Positive.These include 13 DIAmante candidates independently listed as planetary candidates by Yu et al. (2019), and 2 identified as low mass EBs by von Boetticher et al. (2019).B.15. Tu et al. 2020 Tu et al. (2020) studied superflares and other properties of 400 solar-type stars in TESS Year 1 data.Of the 277 stars in the DIAmante data set that had flares identified by Tu et al., only 57 had TCF peak periods that matched their stellar rotational periods.Six flare stars are in the DTARPS Analysis List.In the two cases where stellar rotational period matched the TCF peak period, TIC 121048789 and TIC 373844472, DTARPS is likely identifying the rotational period rather than a transiting planetary period.B.16. Dong et al. 2021 Dong et al. (2021) identified and characterized 55 Warm Jupiters in TESS Year 1 FFIs.The DIAmante sample has 40 of these systems of which 21 lie in the DTARPS Analysis List.Of these, 20 had TCF periods matching those reported by Dong et al.; the exception is TIC 73038411.
This leaves a training set of 1,048 injected exoplanet signals, 9,095 injected FP signals and 12,475 random light curves.The validation set holds 279 injected exoplanet signals, 2,247 injected FP signals and 3,136 random light curves.To evaluate classifier performance, Akosa (2017) describes several classification metrics appropriate for machine learning problems with an imbalanced training set.Criteria for selecting the best classifier are based on scalar classification metrics and the Area Under the Curve (AUC) for the Receiver Operator Characteristic (ROC) curve and the Precision-Recall curve.
Table 1 lists the 37 features and feature weights in the final optimized RF classifier.The weights indicate the probability of the feature being among the randomly chosen features for each node calculation.The table organizes the features by the stage of DTARPS analysis.The feature groups considered and chosen in the final RF classifier are:

Table 1 .
Scalar features used in the optimized Random Forest classifier

Table 2 .
Classification metrics for the validation set Figure9.Feature importance for the final Random Forest classifier ordered by importance.Descriptions of the 37 features appear in Table1.
The AutoRegressive Planet Search project is supported at Penn State by NASA grant 80NSSC17K0122 and NSF grant AST-1614690.E.J.M. and E.D.F.benefit from the vibrant community of Penn State's Center for Exoplanets and Habitable Worlds that is supported by the Pennsylvania State University and the Eberly College of Science.