Short-Period Variables in TESS Full-Frame Image Light Curves Identified via Convolutional Neural Networks

The Transiting Exoplanet Survey Satellite (TESS) mission measured light from stars in ~85% of the sky throughout its two-year primary mission, resulting in millions of TESS 30-minute cadence light curves to analyze in the search for transiting exoplanets. To search this vast dataset, we aim to provide an approach that is both computationally efficient, produces highly performant predictions, and minimizes the required human search effort. We present a convolutional neural network that we train to identify short period variables. To make a prediction for a given light curve, our network requires no prior target parameters identified using other methods. Our network performs inference on a TESS 30-minute cadence light curve in ~5ms on a single GPU, enabling large scale archival searches. We present a collection of 14156 short-period variables identified by our network. The majority of our identified variables fall into two prominent populations, one of short-period main sequence binaries and another of Delta Scuti stars. Our neural network model and related code is additionally provided as open-source code for public use and extension.


INTRODUCTION
The volume of astronomical photometric datasets is expanding rapidly.Due to their size, these datasets often contain data that no human eye has ever nor will ever see.Because of this, automated systems that can filter irrelevant information and identify interesting phenomena is imperative.In this work, we describe our development and application of a neural network (NN) to automatically identify short-period variables in the Transiting Exoplanet Survey Satellite (TESS , Ricker et al. 2014) full-frame image (FFI) data.
Modern wide-field surveys such as TESS , the Optical Gravitational Lensing Experiment (Soszyński et al. 2015) and the Zwicky Transient Facility (Graham et al. 2019) offer the possibility of developing and testing population synthesis models that provide insights into the distribution of variables in the Galaxy and they can help describe stellar evolutionary processes.
The primary goal of the TESS mission, the observatory whose data was used for this work, is to detect the signa-ture of planets as they transit in front of their host stars.Launched in April 2018, TESS is performing a near allsky photometric survey intended to identify planets with bright enough host stars to enable mass estimation from ground-based radial velocity measurements.Importantly, due to TESS 's high observational cadence, these data are also well-suited to search for various other forms of short-duration, time-domain astrophysical phenomena.
The TESS observatory is positioned in a high-Earth elliptical orbit of 13.7 days.TESS completed its 2-year primary mission in July 2020.While a principle data product of TESS 's primary mission is the 2-minute cadence photometry of more than 200,000 stars, of more relevance for this work is the FFI data.During the primary mission, at a 30-minute cadence, TESS took flux measurements of its entire field of view (24 • ×96 • ) resulting in the FFI dataset.This FFI data includes approximately 85% of the sky and observes billions of sources (Ricker et al. 2014).Searching this large-scale dataset requires a robust and computationally fast method and NNs provide one such approach.
Largely due to their potential to approximate any given function (Cybenko 1989;Leshno et al. 1993;Zhou 2020), in recent years, deep neural networks (DNNs, e.g., LeCun et al. 2015) have come to dominate the field of machine learning (ML).In the case of photometric data, a common function these methods may be tasked to learn is to predict physical classifications of the sources from observed flux measurements.Both ML and non-ML algorithms can only approximate the unknown ideal function that can perform this conversion perfectly.
When a NN is trained to learn this function, it attempts to learn optimal data transformations (Rumelhart et al. 1986), which can potentially lead to more accurate classifications than manually designed methods, often because these manually designed methods frequently omit data during processing.For instance, often outlier data points are filtered prior to period estimation.However, such filtering can often remove potentially useful information.In contrast to this, NNs do not directly remove sources of noise.Instead, they learn to identify such noise, while retaining signal within it, and incorporate this information into their predictions about the likelihood of a light curve containing the desired signal, in this case, a short-period signal.This approach makes them effective at processing noisy data (Dong et al. 2014;Hinton et al. 2012;Xu et al. 2014).For instance, if outliers occur due to a temporary systematic offset within the light curve, as is common in TESS data, but still carry the periodic signal, the relevant information for detecting the periodic signal can be retained by the NN where a simple omission of outliers would remove it.
As a generalized function approximator, a sufficiently large NN has the potential to learn any hand-crafted function arbitrarily well, as is shown by the Universal Approximation Theorem (Cybenko 1989;Leshno et al. 1993;Zhou 2020).This includes functions such at those used to produce a periodogram for a light curve.Furthermore, if improvements can be made to the hand-crafted transformation that yields improved results, a NN has the potential to learn that improved function instead.This inherent property of NNs gives them the potential to outperform their hand-crafted counterparts.
The computational speed of the method is another advantage of using NNs in detecting short-period signals, as we will show from the computational performance of our method.

Short-period main sequence binaries background
The first of the two primary populations identified by our NN are short-period main sequence (MS) bina-ries, which, at the shortest periods, consist of red dwarf stars.Due to their low mass, red dwarf stars burn their nuclear fuel at a slow rate, and every red dwarf star that has ever formed is still on the MS.Although red dwarf stars are the most common type of star in the galaxy (Bochanski et al. 2010), how they evolve in close binary systems, including in contact binaries, is not well understood.There is a dramatic cut-off of such binaries at 0.22 days Rucinski (1992Rucinski ( , 2007)).Notably, the reason for this cut-off is still under investigation.The cut-off was originally often attributed to the stars reaching their fully convective limit (Rucinski 1992).More recently, magnetic braking (Stepien 2007) and unstable mass transfer (Jiang et al. 2012) have been implicated of playing a prominent role.At any rate, a larger sample size may help to more clearly define the evolutionary and formation processes of these and other short-period MS binaries.

Delta Scuti background
Delta Scuti (δ Sct) stars are a type of pulsating variable star with intermediate mass, 1.5-2.5 solar masses (Bedding et al. 2020), located at the intersection of the classical instability strip (Dupret et al. 2004) and the MS stars on the Hertzsprung-Russell (HR) diagram (Handler 2009;Breger 2000).Typically of spectral types A to F, the δ Sct star class is composed of both Pop I and Pop II stars (Baade 1944).These stars, characterized by a period range spanning from 0.02 days to 0.25 days, demonstrate multiperiodic luminosity variability and can present both radial and non-radial pulsations (Breger 2000).
Stellar pulsations serve as a valuable tool in investigating the internal structures of stars.The excitation mechanism underlying these pulsations involves the cyclic transfer of kinetic energy from the internal energy of the mixture of gas and radiation in the ionization zones, particularly rich in elements like helium and hydrogen.By identifying the natural frequencies of their pulsation modes, it is possible to compare these observations with theoretical models improving our understanding of stellar properties and their evolutionary processes (Breger 2000;Bedding et al. 2020).Notably, δ Sct stars were among the first pulsators whose asteroseismic potential was recognized (Handler 2009).
As stars evolve, they may cross the instability strip and may pulsate.They could be stars moving from the MS to the giant branch or pre-MS stars evolving to zero-age main sequence (ZAMS) (Breger 2000(Breger , 1972)).δ Sct stars have also been detected in diverse scenarios, ranging from eclipsing binaries (Kahraman Aliçavuş et al. 2023) to stars populating other galaxies (Mateo et al. 1998).
This broad range of detection highlights that δ Sct stars are not uncommon.
Beyond their use to provide insight into the internal structures of stars, δ Sct stars are often used as standard candles, due to their adherence to a period-luminosity relation in certain passbands.

DATA
In this work, we use light curve data, i.e., measures of flux over time.Figure 1 shows an example of a TESS light curve, showing flux vs. time of the TESS target for TIC ID 149989733 sector 10.The light curves were produced by the eleanor pipeline (Feinstein et al. 2019) from raw flux measurements provided by TESS .
The light curve shown in Figure 1 is a typical example of a short-period variable identified by our NN.There is a gap in the middle of the data caused by the spacecraft pausing its observing to downlink data to Earth (Tenenbaum & Jenkins 2018).Notably, since we only searched for variables ranging from 1 hour to 5 hours, the FFI cadence of 30 minutes results in very few data points per period.The light curve shown in Figure 1 has a period of 1.326 hours, resulting in only 2 or 3 data points per period.This makes it difficult for a human to identify the periodicity in the unfolded light curves.Figure 1 shows an example from our δ Sct partitions (see Section 4.3 for partitioning explanation).Figure 2 shows another example light curve, this time taken from our MS binary partition.
Ideally, a light curve would contain only the flux from a single TESS target (typically a star system).However, in reality each TESS pixel covers 21 arcseconds of the sky, and TESS 's point spread function results in blending between pixel measurements.For these reasons, a LC will contain flux from multiple targets.This often makes it challenging to determine which source the signal (or noise) is originating from.
TESS takes measurements of a large portion of the sky at regular intervals.During the primary mission, this interval was every 2 minutes.However, due to limitations of the spacecraft's storage and downlinking capabilities, only a small portion of this 2-minute cadence data is stored and downlinked (Ricker et al. 2014).However, at a 30-minute cadence, all pixels' accumulated values are downlinked to Earth.These FFIs cover a much larger number of targets at a lower time resolution (Tenenbaum & Jenkins 2018).We used 67 million 30-minute cadence light curves (with TESS magnitudes <15) in this work, and this is the primary data set investigated by our NN.

Full-frame image light curve production
For details of the FFI light curve production, see Powell et al. (2022).Briefly, Powell et al. (2022) used the 129,000-core Discover supercomputer at the NASA Center for Climate Simulation, to build FFI light curves for all sources observed by TESS down to 15th magnitude.All original and calibrated FFIs were produced by the TESS Science Processing Operations Center (Jenkins et al. 2016).Target lists were created through a parallelized implementation of tess-point (Burke et al. 2020) on the TESS input catalog (TIC, Stassun et al. 2018) provided by the Mikulski Archive for Space Telescopes (2024).The light curves for each sector were constructed in 1-4 days of wall clock time (for a total of over 100 CPUyears), depending on the density of targets in the sector, through a parallelized implementation of the eleanor Python module (Feinstein et al. 2019).67 million light curves were used for this work.As of this writing, the light curves of the first 10 sectors of the TESS primary mission data have been made publicly available by Powell et al. (2022).We have used the full 26 sectors of the primary mission in this work.The public data release of the remaining primary mission sectors of the data by Powell et al. (2022) is still ongoing.These single sector light curves are the input to the pre-processing and, subsequently, our NN.From this dataset, we use the flux with the processing labeled as "corrected" by Powell et al. (2022).

NEURAL NETWORK PIPELINE
The NN architecture and data preprocessing used in this work is the same as in Olmschenk et al. (2021).In that work, we developed a NN to identify transiting planets in TESS FFI data.The NN architecture and data preprocessing from Olmschenk et al. (2021) were designed with the intent to be generalizable and many of the explanations of design choices given there directly translate to the use case of short-period variables as well.Where we feel the connection is not immediately intuitive, we include additional explanation below.Also provided in Olmschenk et al. (2021) is a short primer to NNs which we believe will be helpful for readers of this work who are not familiar with NNs.

Training data
When designing and training the network, we use 80% ( 54M light curves) of the available TIC IDs as the training data targets and use another 10% ( 7M light curves) as validation data targets.These validation data are data that are set aside that the network is not trained on.Instead, these data are used to evaluate the predictive performance of the network during the training process.This entails measuring the correctness of the network predictions on the validation data, which the network has not been trained with but where we know Figure 1.The FFI light curve and folded light curve for TIC ID 149989733 sector 10.This light curve was chosen as a typical example of the light curves identified by our NN.A notable aspect is that only 2 or 3 data points exist for each period (1.326 hours), resulting in the periodicity not being clear to a human observer in the unfolded light curve.Despite no specific periodicity-detecting mechanisms being included in the NN, it learns to identify such periods in the unfolded data.The time is given in TESS Barycentric Julian Day (BTJD) time (Tenenbaum & Jenkins 2018).With the Julian Day in the Barycentric Dynamical Time standard (BJD), BTJD = BJD − 2457000.0.BJD is used for its accurate time standard which accounts for many different timing corrections, including leap seconds (e.g., Eastman et al. 2010).The flux given is median normalized flux for the light curve.Color is based on unfolded time.This is an example from our δ Sct clusters (see Section 4.3 for clusters explanation).
the correct answer.The remaining 10% is reserved to be used as a test dataset for a future evaluation.The test data is intended to evaluate the trained network after all design decisions are finalized.Several of the specific network and training configuration decisions were guided by preliminary performance results on the validation data.However, this validation data evaluation and the test data evaluation are beyond the scope of this work.The synthetic data used for training was not designed to accurately reflect the true distribution of expected periodic signals.Instead, it was designed to be a dataset which would produce a trained NN that can identify short-period targets.Additionally, we believe such as test evaluation without further context would be misleading.For example, if we made the synthetic dataset less challenging (e.g., by increasing the lower bounds of the amplitudes) the network would preform better on the test dataset, but perform worse on real data.Instead, we provide evaluations on real data below.Our training data consists of four collections of light curves.First, are the regular TESS light curves.These are treated as negative samples during training.Although there will be rare false negative samples in this collection, due to their rarity and the NN's statistical nature, they have minimal impact on the training process.The second collection is the same real TESS light curves, but injected with synthetic short periodic signals.The synthetic periodic signals are mixed sine and sawtooth signals.Due to the relatively small number of data points per period and the many variations of signals that can be produced with such a mixed signal, we found these synthetic signals to be sufficient to train our NN.These are the positive examples used during training.The third collection is the same real TESS light curves injected with synthetic longer-period signals.We use this collection as a hard negative training case (a negative sample we expect to be challenging for the network) as it prevents the NN from fitting some artifact of the data (e.g., looking for a perfect sawtooth wave existing in the signal) and instead requires the network focus on the feature we are interested in (e.g., short periodicity).The fourth collection is the regular light curves injected with uniform noise with an amplitude profile that matches those in the synthetic periodic signals.This is to prevent the network from trying to learn that an increased variance in values from an injected signal should be used as an indicator of a positive sample.Samples from the four collections are used in equal ratios for each batch of data during training.During the training process, for each training sample with a synthetic signal injected, a The following are the specifics of the synthetic periodic signals we generated.A period is selected from U(0.25, 5) hours for the short-period signals and from U(9, 20) hours for the longer-period signals.Each signal had a relative amplitude selected from U(0.001, 1) independently for both the sine and sawtooth component.A random phase from U(0, 2π) was selected to offset the sine and sawtooth components.The fraction of the sawtooth cycle which consisted of the rising ramp was selected from U(0, 1) (with the remainder being the falling ramp).The signal was injected at a random phase into the real TESS light curve.A random sample of these synthetic light curves are shown in Figure 3.

Network architecture
In this work, we use a 1D convolutional neural network (CNN, Krizhevsky et al. 2012) that was originally designed and developed in Olmschenk et al. (2021).This NN architecture is shown in Figure 4. Refer to Olmschenk et al. ( 2021) for the details of the CNN design.
Our NN framework code is available online 1 .This framework is also installable as a PyPI package 2 .Documentation for the NN framework is also available online 3 .This NN framework is a generalized photometric 1 https://github.com/golmschenk/ramjet(see https://github.com/golmschenk/ramjet/releases/tag/short period variable neura l network paper for the code version used in this work) 2 https://pypi.org/project/astroramjet/ 3https://astroramjet.readthedocs.io/en/latest/NN framework.Our code that applies this generalized framework to the specific task of short-period variable identification can be found online4 .
Similar to the situation in Olmschenk et al. (2021), the 1D CNN is an apt choice for searching for shortperiod variables due to how it constructs high-level global features from low-level local features.First, we expect the NN to find features such as individual peaks and troughs as low-level features.In early layers, the NN will likely ignore the positions of these features in the light curve, and only determine their presence based on the local light curve shape.As such, the early layers of our network are convolutional layers, which treat each segment of the light curve identically (Krizhevsky et al. 2012), e.g., they search each portion of the LC for a peak or trough occurring in that location.Only after local level features, e.g., individual wave cycles, are discovered do we expect the network to combine these features into global level features-in this case repeating periodic wave cycles.The other advantages of a CNN, such as its prevention of overfitting, are the same as explained in Olmschenk et al. (2021).
No prior short-period variable parameter information is required by our NN (e.g., no phase folding or other prior information extraction is performed).The only inputs to the NN are the fluxes of the light curves.Inference on a light curve is performed by the NN in 5ms on a single GPU.This allows for inference of the entire 67M FFI light curve dataset to be completed in a few days.The network training took 5 days on a single GPU, with this training time able to be decreased approximately linearly with number of GPUs used in parallel.
The specific number of layers and size of each layer (as seen in Figure 4) was decided through limited experimentation, vaguely guided by prior experience with network structures in computer vision.While a systematic search of network structures is beyond the scope of this work, we note that a minor change to our NN (e.g., adding/removing a layer), while adjusting the remainder of the NN to produce the same output size, does not produce a trained NN that produces extremely different results.

Pre-processing
We use the same pre-processing of the light curves as in Olmschenk et al. (2021).The primary purpose of this pre-processing is to prevent the network from overfitting and to encourage generalization of learned features.Briefly, during the training phase, a random  data point removal is applied and a random rolling of the data is applied.During both the training phase and the inference phase, a uniform lengthening is applied and a modified z-score normalization is applied.See Olmschenk et al. (2021) for details on these pre-processing methods and why they are applied.

Post-neural network processing
After the NN has identified likely short-period variable candidates, we run a process on the candidates to deter-mine the periodicity and remove false positives.Where the NN only used the flux values of the light curve, here we use additional external information to remove false positives from the NN filtered candidate list.The complete code used to perform this processing can be found at https://github.com/golmschenk/generalized-photometric-neural-network-experiments.Below we describe the most salient points.
First, we selected the top 50,000 light curves given the highest confidence score from the NN.The number 50,000 is arbitrary.A larger sample with lower confidences could be selected, but we opted for a relatively high confidence cutoff.
Then, we estimated candidate periodicity using a Lomb-Scargle periodogram.The Lomb-Scargle periodogram search was limited to a period of 1 hour (twice the sampling rate of TESS FFIs) up to 10 days.From the resulting periodogram, of the frequencies with a power within 10% of the max power, we take the highest frequency, then find the local power maximum from that frequency.We take this frequency/period as the short-period variable's frequency/period.Any targets with periods greater than 5 hours are discarded.These steps are chosen to reduce the chances of choosing a shorter period alias.Here, we also fold the light curve, and, binning into 25 bins in phase space, determine the minimum and maximum bins of the light curve based on the median value of the bin.
Next, we remove false positives which have photometric centers of variability that do not align with the target.This is done to prevent cases where a nearby variable is the real source of the variability.To accomplish this, we first use TESScut (Brasseur et al. 2019) to obtain the time-series raw image data of the pixels surrounding the target.We use an image of 10 × 10 centered on the target.The time-series image data is folded on the period determined previously, and again placed into the 25 bins across phase space determined previously.Those which fall into the minimum and maximum bins of the previous step are used for the following variability photometric centroid estimation.We find median values of the binned images for the minimum and maximum bins, then take the difference of these resulting values.The centroid of these values is compared to the estimated position of the target from the TIC.If the estimated centroid is separated from the target position by more than 21 arcseconds (the angular size of a side of a TESS pixel), the candidate is discarded.
The amplitude is estimated as half of the difference between the maximum and minimum of the above binning.The relative amplitude is this previous amplitude divided by the median flux.The version of the relative amplitude with contamination uses the median flux of the original light curve.The version of the relative amplitude without contamination scales this value to take into account the estimated contamination ratio provided via the TIC where available.
Additional target stellar properties are taken from the TIC and the Gaia Mission (Gaia, Brown et al. 2018) where available.

RESULTS
Figure 5.The period distribution of the short-period variables as estimated by our pipeline.Note that for binary systems, this estimated period may be half of the system's full orbital period.
The table of our identified short-period variable candidates is available as a supplemental file to this work.Table 1 shows a random5 sample of 100 rows from the complete table.
Figure 5 shows the distribution of the periods of the candidates.These periods are compared against the effective temperatures of the short-period variables in Figure 6.The longer period population corresponds to MS binaries while the shorter period population corresponds to δ Sct.These were identified by comparing the population properties against known populations Ziaali et al. (2019); Barac et al. (2022);Fetherolf et al. (2023).Throughout the rest of the results and analysis, we have assumed these two populations to primarily contain MS binaries and δ Sct respectively.However, we emphasize that we have not performed any form of detailed modeling on these targets, and these two populations almost certainly contain many targets which are not directly from these two classes of objects.The δ Sct population itself can be seen to be separated into two populations when comparing the period to the absolute magnitude (Figure 7).

Human vetting
To help confirm our NN accomplished its goal of identifying short-period variable targets, we inspected 500 random light curves output by our pipeline.We visualized these light curves folded on the periods provided by our pipeline.Of the 500, 492 had the periodic signal immediately obvious in the folded version of the light curves.Of the remaining 8, 6 had a low signal-to-noise ratio.In all 6 cases, a median binned version of the phase folded light curve presented apparent trough and peak portion of the signal.While it's reasonably likely these are true signals, our post-processing also uses such a binning process to remove false positives.So it is possible that, given enough random binned data, these apparent signals may simply be the only kind of random noise which will pass through the post-processing.The other 2 of the 8 mentioned above had a period very near 1 hour (within 0.25 seconds) which, given the 30 minute cadence of the data, resulted in all the data points appearing near only two phases in the folded light curve with no data points in between.In both of these cases, the two phase segments with data points were consistent with a periodic signal (with one phase segment sloping upward and the other sloping downward), however, we feel there is too little phase coverage to be confident of a periodic signal in these cases.With this, we are confident 492 of the 500 (98.4%) of the randomly selected light curves demonstrate a short-period signal.Of the remaining 1.6%, all have some indication of a periodic signal, but they are less certain.

Evaluation against an existing catalog
We preform an evaluation using variable targets identified by Fetherolf et al. (2023).Fetherolf et al. (2023) presents a catalog of variables stars identified in TESS 2-minute cadence light curves.Here, along with evaluating the entire pipeline, we evaluate the predictive performance of the NN alone to give a sense of the performance of the NN by itself.We note that the post-NN-processing is designed to remove false positives that the NN mislabels, but it is valuable to still have an understanding of the predictive performance of the NN alone.
We note that this evaluation is imperfect, as the targets for which 2-minute cadence data exists are typically much brighter than our average target in the FFI dataset.However, it provides at least understandable metric to gauge our results by.Fetherolf et al. (2023) identified 5449 targets with periods between 1 and 5 hours.With repeats of these targets across sectors, we have 13,647 FFI light curves for these targets in our dataset.For this evaluation, we consider this to be the positive dataset.To make the comparison interpretable, we selected an equal number of negative light curves for evaluation.We selected these negative light curves randomly from the remaining targets with 2-minute cadence data available in the TESS primary mission data.To be clear, we do not use 2-minute cadence data in our pipeline in any fashion.We only selected light curves for targets for which 2-minute cadence exists to provide a collection of light curves which have similar properties to those in the positive dataset (e.g., magnitude, contamination).We then perform inference on the FFI light curves of these targets with our NN.
As noted elsewhere, the goal of our pipeline is not to find all short-period variables, but instead to provide a large collection of high confidence candidates.As such, in our main results, we only took the highest confidence candidates from our NN with an arbitrarily cut-off of 50,000 light curves.This corresponds to a confidence threshold of 0.99796987 from the NN.It's important to note, that this is an uncalibrated confidence and has no explicit bearing on the expected probability of a light curve containing a short-period signal.In our use case, only the confidences relative to one another are relevant and the absolute confidence values have little meaning beyond being a cutoff threshold.We use this same confidence threshold during this evaluation.Other thresholds of confidence could be chosen for other use cases (e.g., attempting a complete survey).
Again, since the goal is not survey completion, but rather the identification of a large number of candidates, the metrics of interest in this evaluation are the true positive rate (TPR) and the false positive rate (FPR).With our high confidence threshold value, 1,363 of the 13,647 positive samples are above the threshold.Assuming all the positive samples from Fetherolf et al. (2023) are correct, this results in a 9.987% TPR.For the negative case, 16 of the 13,647 are above the threshold.However, upon inspecting these 16 light curves, we find 11 of them have clear short-period signals from 1 to 5 hours in period, and they are actually positive samples.This means that only 5 of the negatives are mislabeled by the NN resulting in a 0.03663% FPR.All 5 of these negative samples are discarded by the post-NN-processing of our pipeline meaning that, at least for this evaluation sample, our pipeline admitted no false positives.
While our pipeline produced a perfect result on this evaluation dataset (producing only true positives and no false positives), it is important to note that this dataset is not perfectly representative of the data the pipeline was applied to in the real inference case.First, in the real-world case, the dataset is expected to be highly imbalanced with the vast majority of targets not exhibiting a 1 to 5 hour short-period signal.Second, the targets our NN were trained on typically had a dimmer magnitude on average than the collection of targets in this evaluation.As this distribution of samples is not the same as those the network was trained with, the network may be providing either higher or lower confidences for negative samples than it would on a distribution that matches the full inference dataset (which could result in either increased or decreased predictive performance).So, while this evaluation provides some sense of NN's performance, it is likely the above human vetting process is more representative of the quality of the short-period variables identified in this work.

Partitioning the data
In the absolute Johnson V magnitude vs log period distribution, we have defined linear partitions to divide the three primary populations of data, splitting the data into a MS binary partition, a primary ridge δ Sct partition, and a second ridge δ Sct partition (Figure 7).We note, our NN is not performing a classification.These populations of short-period variables were found during post-pipeline analysis of the properties of the targets that were identified by the NN.We have partitioned the results based on literature values.Notably, the partition line between the primary ridge and secondary ridge δ Sct populations is set to the relation provided by Barac et al. (2022).We additionally, attempted several simple fitting metrics to determine the partitions.However, due to outliers, unknown detection efficiencies, and indistinct boundaries (notably between primary and second ridge δ Sct), these simple metrics typically produced spurious partitions.Given that we have performed no detailed modeling of these different classes of objects and that these populations certainly contain other types of objects, we have opted to use the partitions based on literature values rather than produce more convoluted fitting metrics.However, we emphasize that these partitions should only be used for qualitative insights into the data and caution should be used when considering these partitions for any form of quantitative analysis.We also note that several smaller populations of interest likely exist in the data beyond these three, such as the small number of very high effective temperature variables discussed in Section 4.6.
Figure 7 shows absolute Johnson V magnitude vs period of the short-period variables along with the selected partition lines.The MS binary partition targets are shown in red, the primary ridge δ Sct partition targets in blue, and the second ridge δ Sct partition targets in yellow.These colors are consistent for any other figures using this clustering colorization, with the addition of gray for targets with unknown absolute Johnson V magnitude (which were then not included in the partitioning).Figures throughout this work will note when they use this coloring scheme.The primary and second ridges of the δ Sct are discussed in more detail in Section 4.5.This separation is more clear in the radius vs period relation (Figure 8), however, as the radius values are derived from the absolute magnitude values along with some assumptions of the target, we use the absolute magnitude vs the period relation to distinguish these clusters.Additionally, existing works typically use this absolute Johnson V magnitude vs period relation to define the ridges of the δ Sct targets.
Some caveats need to be considered when reviewing these results.First, the estimated period is the predominant period found during our processing explained above.Notably, for binaries, this estimated period will typically be half the orbital period.Second, except for the estimated periods, the properties of the targets listed here are taken from the TIC.Some of the TIC values may be taken from other existing catalogs or directly estimated through TESS observations.However, when these values are not either directly observed or known from existing catalogs, as is often the case, they may be derived values.Notably, luminosity, absolute Johnson V magnitude, effective temperature, radii, and mass are derived from magnitude, color, and parallax.The details of how these values are derived can be found in Stassun et al. (2018).These derived values may cause some issues in the visualizations shown here.Most notably, these derived values are estimated for a single star, where in the case of binary systems this will result in a scale factor misrepresentation for various properties (e.g., radius).
As more detailed modeling of each of these targets with certainty is beyond the scope of this work, and we cannot know which of the targets do not actually belong in their assigned partitions, we present the original derived values.In any case where one of the properties of the target being plotted in a figure is unknown, that target will be excluded from the figure.
After partitioning the data as described above, we then visualize the distributions of the raw observed values for each partitioned cluster of data.The observed distributions of the photometric color of each cluster is shown in Figure 9. Similarly, Figure 10 shows the distributions of the TESS magnitudes of the clusters and Figure 11 the distributions of the parallax.

Binaries analysis
The short-period MS binary partition contains 6563 targets, from which the overwhelming majority come from the population of binaries seen as the large grouping with longer periods in Figure 7.At the shorter period end of this population, there is a sharp cutoff at 0.12 days.With binaries, the full orbital period is twice this value at 0.24 days.This is consistent with a known sharp cut-off of the period distribution of red dwarf binaries at 0.22 days (Rucinski 1992;Norton et al. 2011).This cut-off is also seen in Figure 8 which shows radius vs period.As expected, this lower period is only found with cooler, smaller binaries, as the hotter binaries are larger and cannot easily orbit as closely.
This can easily be understood from basic stellar structure scaling relationships.For F, G, K stars with 0.5 ≲ (M/M ⊙ ) ≲ 2, the radius scales roughly like R = R ⊙ (M/M ⊙ ) 3/4 .For an equal-mass binary M 1 = M 2 , Roche lobe overflow will occur when the binary separation is roughly a ≈ 2.6R, corresponding to a minimum orbital period of For a red dwarf binary with M 1 = M 2 = 0.5M ⊙ , this corresponds to a binary period of 0.22d, just as we find empirically.Furthermore, if we use the mass-luminosity scaling for MS stars L ≈ L ⊙ (M/M ⊙ ) 3 , we are able to recover the relationship L ∼ T 4.8 orb , consistent with what we see in the lower track of Figure 7.
We folded the light curves on a period p f old = 2p estimated , where p estimated is the period estimated by our post-processing described in Section 3.4.Then, on these folded light curves, we applied a fast-Fourier transform, resulting in a fundamental mode with period p f old and the first overtone mode with period p estimated . Figure 12 shows the powers of the fundamental mode and the first overtone mode.Notably, we expect an increased fundamental for the binaries, as the differences between the two stars will primarily affect this mode.
In Figure 13, we see the relative amplitude of the periodic signal of the short-period variables, again split into two primary populations.The top right population is the binary population, with amplitudes that mostly range from 0.02 to 0.3.These estimates only apply to targets with a known contamination ratio from the TIC.As also evident in Figure 12, the binary cluster targets have, on average, a much larger relative amplitude than the δ Sct cluster targets.

Delta Scuti analysis
Similar to Figure 8, Figure 14 shows the targets in radius vs period space.Figure 8 is useful for seeing the gap between the two δ Sct populations, while Figure 14 shows the partition coloring after the partitioning has been made.
Figure 15 shows the effective temperature vs luminosity of the short-period variables.Notably, the δ Sct targets are higher along the MS branch than the short-period MS binaries, which is expected, and the partitioning chosen divides δ Sct from the MS binaries fairly neatly.
Other works, including Ziaali et al. (2019); Barac et al. (2022), have shown a gap between the two δ Sct populations in log period space with a notable sparse region between the two populations, where the second ridge of δ Sct stars is known to have period of 0.5 that of primary ridge (Ziaali et al. 2019;Barac et al. 2022).The reasoning for the existence of the second ridge is not known.The gap is usually shown via the distribution of the distances in log period space to the partition line between the two populations, or an equivalent metric.The gap between the two ridges is not as clearly defined in our results as in Ziaali et al. (2019); Barac et al. (2022), with our distribution resulting in a short plateau in the distribution rather than a sparse region.However, we note that we have not performed a fitting to maximize this gap.Due to non-δ Sct targets certainly being within the presented clusterings and the uncertain detection efficiency of the neural network on each of the clusters, we don't expect such a fitting to produce an accurate representation of the two δ Sct populations.We provide these partitions only as a simple qualitative tool for comprehension.Based on this partitioning, our δ Sct partitions contain 7089 identified targets, with 5637 in the primary ridge partition and 1452 in the second ridge partition.

Hot targets analysis
Our NN detected nine hot periodic objects with effective temperatures ranging from 28,000 to 38,000 Kelvins.These appear to be blue subdwarf (sdB) stars, which are extreme horizontal branch stars with very high surface gravity and temperature (Baran et al. 2021(Baran et al. , 2023;;Schaffenroth et al. 2022).There are three channels to formation of sdB stars involving close binary stellar pairs: common envelope, Roche lobe overflow (RLOF), and white dwarf (WD) mergers (Han et al. 2002).The common envelope channel produces short-period WD+sdB or sdB+MS binaries with periods ranging from 0.1 to 10 days.TIC IDs 333419799, 270491267, 396004353, 193092806, and 458785169 fall into this broad range.While the RLOF channel produces stars with periods exceeding 400 days and are thus not addressed by this work, the WD merger channel may produce single sdB stars that exhibit asteroseismic vibrational modes with periods of less than 0.1 days.Four hot periodic objects detected by our ML pipeline fall into this last category: TIC IDs 173295499, 409644971, 99641129, and 367779738, with all but the final one having periods less than but very near 0.1 days.To be the best of our knowlege, TIC ID 367779738 has not previously been identified as a subdwarf in a close binary, but the remainder of these targets have previously been identified as subdwarfs in close binaries (Baran et al. 2021(Baran et al. , 2023;;Schaffenroth et al. 2022).All of these targets are identified spectroscopically as hot subdwarf stars by the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) data release 8 (Lei et al. 2023).

Miscellaneous analysis
Figure 16 shows the mass vs period of the short-period variables.There is a sizable population of targets below the major δ Sct population.From our three partitions, most of this population comes from the δ Sct primary ridge population.However, this population may be a separate category of objects, or may simply be a group of targets whose mass is poorly estimated by the approach used by the TIC.

CONCLUSION
In this work, we presented a NN pipeline to identify short-period variable targets in TESS FFI data.This NN pipeline performs inference on a TESS 30-minute cadence light curve in 5ms on a single GPU.We also presented a collection of 14156 short-period variables, and provided a cursory analysis of the various populations of these targets.Within these populations, we examine two prominent populations consisting primary of shortperiod MS binaries and the δ Sct stars.6563 targets were identified in the MS binary partition and 7089 were identified in the δ Sct partitions.Though not all, we expect the overwhelming majority of targets in each partition to be of the class that is used to name each partition.We provide this collection of targets in a supplemental file for more detailed analysis by specialists in the field for each of the populations.For future work, our open-source NN framework allows for this work to be relatively easily extended.Notably, this work uses the TESS primary mission FFI data which had a 30-minute cadence.Later TESS FFI data has a higher cadence, which would enable searches for shorter period targets and more accurate post-processing.A simple change to the NN (e.g., adjusting the stride lengths) would prepare it for training on this shorter cadence dataset.This, combined with the relatively fast inference speed of our network ( 5ms per light curve on a single GPU), allows our method to be quickly deployed on new datasets.
Table 1.A random sample of the short-period variables table.Due to page space constraints, some columns are excluded here which are include in the supplemental file.For the partitions, 0 corresponds to the binary partition, 1 corresponds to the δ Sct primary partition, and 2 corresponds to the δ Sct secondary partition.

Figure 2 .
Figure2.The FFI light curve and folded light curve for TIC ID 159971257 sector 23.This is an example from our MS binary cluster with clearly distinct binary signal peaks (see Section 4.3 for clusters explanation).Presented the same as Figure1.See Figure1for details.

Figure 3 .
Figure 3.A random set of examples of synthetic periodic signals to be injected into TESS light curves as training data.Two periods of each synthetic signal are shown.
(a) The full network.(b) The outline of a convolution block structure.(c) The outline of a dense block structure.

Figure 4 .
Figure 4.An overview of the architecture of the convolutional neural network used in this work.See Olmschenk et al. (2021) for details.All convolution/dense layers within a block use a number of filters/units equivalent to the size of the last dimension of their output tensor.A kernel size of 3 is used in all convolutional layers.For clarity of the diagram, three deviations from the diagram blocks are not shown.First, the first convolution block and the last dense block do not apply dropout or batch normalization.Second, the final convolution block uses a standard dropout instead of spatial dropout as the following layer is a dense layer.Third, pooling is only used by the first 6 convolution blocks.The remaining convolution blocks do not use pooling.

Figure 6 .
Figure 6.A kernel density estimation of effective temperature vs period of the short-period variables.The effective temperature of the targets is taken from the TIC.

Figure 7 .
Figure7.The absolute V magnitude v the period of the short-period variables.Note, these are the absolute V magnitudes provided by the TIC which does not assume the targets are binaries.Partition colors and partition lines are chosen based on a partitioning in log luminosity vs log period relations described in Section 4.3.

Figure 8 .
Figure 8.The radius vs the period of the short-period variables color by the variable effective temperature.Due to high effective temperature outliers, the upper limit of the effective temperature scale is restricted to 11000K and any targets above this effective temperature are shown as red squares.Note, caveats given in Section 4.3 about how these values are estimated and how the estimate value may differ from the true value, especially in regard to binaries.

Figure 9 .
Figure 9.A kernel density estimation of the colors of the short-period variables.Color values come from the TIC.Distributions are colored by the partitions described in Section 4.3.

Figure 10 .
Figure 10.A kernel density estimation of the TESS magnitudes of the short-period variables.Magnitude values come from the TIC.Distributions are colored by the partitions described in Section 4.3.

Figure 11 .
Figure 11.A kernel density estimation of the parallaxes of the short-period variables.Parallax values come from the TIC.Distributions are colored by the partitions described in Section 4.3.

Figure 12 .
Figure 12.The powers of the fundamental and first overtone.Data points are colored by the partitions described in Section 4.3.

Figure 13 .
Figure13.The relative amplitude vs the period of the shortperiod variables color by the variable effective temperature.Due to high effective temperature outliers, the upper limit of the effective temperature scale is restricted to 11000K and any targets above this effective temperature are shown as red squares.

Figure 14 .
Figure 14.The radius vs the period of the short-period variables.Data points are colored by the partitions described in Section 4.3.

Figure 15 .
Figure 15.The effective temperature vs luminosity of the short-period variables.Distributions are colored by the partitions described in Section 4.3.

Figure 16 .
Figure16.The mass vs the period of the short-period variables.Due to high effective temperature outliers, the upper limit of the effective temperature scale is restricted to 11000K and any targets above this effective temperature are shown as red squares.