tdescore: An Accurate Photometric Classifier for Tidal Disruption Events

Optical surveys have become increasingly adept at identifying candidate tidal disruption events (TDEs) in large numbers, but classifying these generally requires extensive spectroscopic resources. Here we present tdescore, a simple binary photometric classifier that is trained using a systematic census of ∼3000 nuclear transients from the Zwicky Transient Facility (ZTF). The sample is highly imbalanced, with TDEs representing ∼2% of the total. tdescore is nonetheless able to reject non-TDEs with 99.6% accuracy, yielding a sample of probable TDEs with recall of 77.5% for a precision of 80.2%. tdescore is thus substantially better than any available TDE photometric classifier scheme in the literature, with performance not far from spectroscopy as a method for classifying ZTF nuclear transients, despite relying solely on ZTF data and multiwavelength catalog cross matching. In a novel extension, we use “Shapley additive explanations” to provide a human-readable justification for each individual tdescore classification, enabling users to understand and form opinions about the underlying classifier reasoning. tdescore can serve as a model for photometric identification of TDEs with time-domain surveys, such as the upcoming Rubin observatory.


INTRODUCTION
Tidal disruption events occur when stars pass too close to supermassive black holes (SMBHs).The tidal force exerted by the SMBH exceeds the self-gravity holding the star together, and the star disintegrates (Rees 1988).Much of the resulting stellar debris remains gravitationally bound to the SMBH, and is ultimately accreted onto the black hole.These TDEs can generate luminous emission across the entire electromagnetic spectrum, from radio to soft gamma-rays, and in recent years all-sky surveys have become increasingly adept at finding the previously-elusive class of transients (see Gezari 2021, for a recent review).TDEs offer a unique probe of otherwise-quiescent SMBHs residing in galaxies, and can be used to study a variety of areas such as astrophysical jet launching, SMBH demographics and accretion disk formation.
There are now ≳100 TDEs in the literature, the vast majority of which are identified by optical surveys.In particular, the Zwicky Transient Facility at Palomar Observatory (ZTF; Bellm et al. 2019;Dekany et al. 2020) conducts an all-sky survey which has detected ∼90 TDEs since 2018 (see e.g van Velzen et al. 2021;Hammerstein et al. 2023;Yao et al. 2023).With this large sample, we now know that at least some TDEs emit quasi-thermal optical flares with high apparent temperature that rise on a timescale of weeks, and fade more slowly over a timescale of months with little apparent temperature evolution (Gezari 2021).These optical TDEs appear to have a marked preference for 'greenvalley' galaxies (see e.g Arcavi et al. 2014;French et al. 2016;Graur et al. 2018;Hammerstein et al. 2021a).
Despite a nominal survey depth of 20.5 mag (Graham et al. 2019), the ZTF TDE program remains incomplete below a magnitude of ≈19.1 mag due to limited spectroscopic resources (Yao et al. 2023).This spectroscopic bottleneck will become even more severe with upcoming instruments and observatories such as the Vera C. Rubin Observatory (Ivezić et al. 2019), and ULTRASAT (Shvartzvald et al. 2023), which are expected to detect thousands of TDEs each year (see e.g Bricman & Gomboc 2020; Shvartzvald et al. 2023).
There is thus increasing need for the development of TDE selection methods which do not rely on expensive spectroscopic follow-up.However, photometric classification of nuclear transients remains in its infancy.Although some effort has been devoted to finding TDEs as part of generic multi-modal transient classifiers (see e.g Muthukrishna et al. 2019;Graham et al. 2023), the only effort in the literature which was specifically tailored to TDEs was Gomez et al. (2023).
In this letter, we introduce a novel binary machinelearning photometric classifier, tdescore, trained with the sample of ZTF nuclear transients to identify TDEs.The code itself is already available on GitHub1 and Zenodo (Stein 2024), while the corresponding training data will be released in a dedicated future publication (Reusch et al. in prep.).In Section 2 we introduce this ZTF Nuclear Sample, and in Section 3 we describe the process of generating high-level 'features' from the available data.We then outline the tdescore classifier itself (Section 4), and explore the reasoning behind the corre-sponding classifications (Section 5).Finally, in Section 6, we highlight the relevance of tdescore to both existing and future surveys.

THE ZTF NUCLEAR TRANSIENT SAMPLE
The first photometric optical search for TDEs was conducted by van Velzen et al. (2011) using archival searches of Sloan Digital Sky Survey data (York et al. 2000), finding that TDEs can be differentiated from supernovae using light curve evolution.Photometric identification of TDEs at Palomar began with the predecessor survey to ZTF, the intermediate Palomar Transient Factory (iPTF) survey (Kulkarni 2013).A systematic census of nuclear transients in 4800 sq.deg. of iPTF data was used to develop simple algorithmic cuts yielding candidate TDEs with a precision of 20%, which was sufficiently high to serve as a model for spectroscopic surveys (Hung et al. 2018).For the ZTF survey, looser cuts were paired with light curve analysis for the nuclear transient filter (van Velzen et al. 2019), which has been used to identify dozens of TDEs over the course of the survey (van Velzen et al. 2021;Hammerstein et al. 2023;Yao et al. 2023).The filter was implemented in AM-PEL, a realtime data analysis framework and ZTF alert broker (Nordin et al. 2019).The nuclear transient filter itself is an open-source python script2 , which broadly selects candidates based on: • estimated 'nuclearity' of the flux-weighted ZTF transient position using proximity to sources detected by the deeper Pan-STARRS1 (PS1) survey (Chambers et al. 2016) • probability of detection being 'real' based on machine-learning RealBogus/DeepRealBogus classification of images (Mahabal et al. 2019;Duev et al. 2019), and algorithmic cuts on image detection parameters • rejection of stellar sources via the machinelearning sgscore classification (Tachibana & Miller 2018)  • rejection of galactic sources by requiring galactic latitude |b| > 5 • rejection of moving objects by requiring multiple time-separated detections of a source.
These cuts are designed to be loose and inclusive, prioritising recall over precision.As part of the ongoing ZTF TDE program, additional light curve analysis and ranking is performed to highlight potential TDE candidates (van Velzen et al. 2021), which are then vetted by humans and assigned additional follow-up observations for classification.In many cases, a spectrum is required to resolve ambiguity.With tdescore, we aim to develop an alternative to this resource-intensive process using a machine-learning approach.
The nuclear transient filter has been iteratively modified over the course of the survey, to improve the falsepositive or false-negative rate.To develop tdescore, we start with the latest version of the filter, which was developed and applied to all archival ZTF alert data, yielding a uniform sample of 11699 nuclear transients discovered in ZTF-I, from 2018 April 1 to 2020 September 30, and in ZTF-II from 2020 October 1 to 2022 April 30 (Reusch et al. in prep.).
We extract any available classifications for these transients from the ZTF Fritz Marshal3 (van der Walt et al. 2019;Coughlin et al. 2023), and the predecessor ZTF GROWTH Marshal (Kasliwal et al. 2019).In general these are accumulated human-assigned classifications which can be based on spectra (including public ones taken from e.g the Transient Name Server, and host spectroscopy from the Sloan Digital Sky Survey (York et al. 2000)), light curve evaluation or other contextual information.We verify each of these human classifications (see Appendix Section A for details), and recover 5264 classified sources, of which 86 are classified as TDEs.This includes 30 sources from ZTF-I presented in van Velzen et al. (2021) and Hammerstein et al. (2023), 17 additional bright ZTF-II TDEs from Yao et al. (2023), as well as 39 additional faint or recent TDEs from ZTF-II which have not yet been published.

Light Curve Analysis
To develop a flexible framework which could be easily generalised to other surveys, we use a Gaussian Process to convert the extensive photometry from ZTF into more survey-independent high-level physical features such as peak magnitude and fade rate.We specifically design a multi-step fitting procedure tailored to the known characteristics of TDEs, namely that they are blue, longlived transients with little apparent colour evolution.Beyond this, the fitting procedure is agnostic about any underlying physical model for TDE emission, and can therefore capture the full diversity of TDE optical emission, including observed TDE outlier behaviour such as multiple peaks or long plateaus.
We use the alert photometry provided directly by ZTF as the basis of the analysis.No K-correction is applied to the data, but we do correct for galactic extinction using results from Schlafly & Finkbeiner (2011) and the extinction law from Fitzpatrick (1999).We perform a series of cuts similar to those in the nuclear filter, to remove detections which are not well subtracted, returning a subset of 'clean' photometry for each source.We specifically require a FWHM < 5", no bad pixels, a Real/Bogus score > 0.3, a pixel distance to host < 1", and a difference image depth of at least 19.0 mag to reject images taken under poor conditions.Though ZTF provides some (sporadic) i-band coverage as part of the partnership surveys, we only consider the g-band and r-band data, which are primarily provided in a uniform 2-day cadence by the ZTF MSIP public survey.
Our dataset is three dimensional (detections have a flux, wavelength, and time), so we cannot directly apply a simple univariate Gaussian Process.While multivariate Gaussian Processes have been applied for astronomical datasets, they require a customised covariance matrix to balance variation between bands against variation in time (see e.g Aigrain & Foreman-Mackey 2023 for a recent review).Moreover, multivariate Gaussian Processes have more associated uncertainty in cases such as here where the bands are not sampled uniformly.
Instead, we simplify the problem, and fit flux in one band as a function of time with a univariate Gaussian Process.Given that TDEs are generally blue, we first fit the g-band data with a univariate Gaussian process model implemented in scikit-learn (Pedregosa et al. 2011), using a 'Radial Basis Function' (RBF) kernel restricted to timescales of 50d-500d, and an additional white noise component equal to at least 0.1 mag to account for systematic uncertainty and prevent overfitting.After obtaining a model for the g-band data, we then perform a least-squares minimisation fit of the r-band data to this g-band light curve model, under the assumption that the data follows a linear colour evolution of the form: where C 0 and C 1 are fit parameters derived for each source and t is the observer time in days.After obtaining these coefficients, we estimate the g-band magnitude of the source for each r-band detection.
We then fit the combined (g-band and converted rband) light curve with the same univariate Gaussian process procedure.This provides our final model for Figure 1.An example of the light curve fitting procedure on a real TDE, ZTF20achpcvt/AT2020vwl (Hodgkin et al. 2020;Hammerstein et al. 2021b), for which limited data was available at peak.Using the two step-fit, the approximate g-band peak time, and the colour at peak, can be inferred for use in classification.AT2020vwl is relatively red with (g − r) ≈ 0, but bluer TDEs with (g − r) > 0 are detected more frequently in g-band.Nonetheless, the fitting procedure still works well for this TDE.
each source light curve.An example of this fitting is shown in Figure 1, for a real TDE with sparse early data where the joint fit is required to constrain the gband rise and fade.
With these light curve fits, we can extract high-level parameters for each source.We specifically extract: • the peak magnitude in g-band.
• the time of peak in g-band.
• the colour at g-band peak (MJD).
• fade time (defined as the time in for the g-band light curve to return from peak to 0.5 mag below peak).
• the RBF length scale from the Gaussian process fit.
• the RBF amplitude from the Gaussian process fit.
• the Gaussian process 'score', which quantifies how well the model describes the data.
• To fully capture the multiple peaks which can be exhibited by many AGN and some transients, we count the number of inflection points in the light curve fit which occur pre-peak and separately count the post-peak inflection points.
• the mean detection cadence (total number of detections divided by time in days between first and last detection).
Entirely independently of the above procedure, we also try to fit the light curves with SALT2 supernova Type Ia models (Guy et al. 2007) using sncosmo (Barbary et al. 2016), and retrieve the underlying c/x 1 parameters, as well as the χ 2 , to serve as a proxy for the 'Ia'-ness of the light curve.
When run on a standard MacBook Pro without any parallelisation, the Gaussian process analysis requires ∼3s per transient on average.The time varies somewhat between individual transients, with more lightcurve detections leading to longer process times.sncosmo is faster, requiring ∼1s on average per source.The lightcurve analysis procedure is thus fast enough to scale to deeper surveys such as Rubin.For surveys with more than two bands, the model in Equation 1 could be generalised to a thermal model with a temperature and linear temperature evolution.

Additional features
In addition to parameters directly extracted from the ZTF photometry, additional contextual information is extracted for each source.The ZTF alerts themselves (Masci et al. 2019;Patterson et al. 2019) provide the catalogued 'sgscore' value for the source host (a binary machine-learning classification score based on morphology to distinguish stars from galaxies (Tachibana & Miller 2018)).Each individual detection also contains: • distpsnr1 -distance of detection to PS1 host in arcseconds, from which we calculate a median.
• distnr -pixel distance to nearest source in reference image, from which we calculate a median.
• sumrat -The ratio of summed pixels values in a detection to the sum of absolute pixel values, serving as a proxy for yin-yang subtraction artefacts.We calculate a median sumrat for each source.
• isdiffpos -boolean value for whether the detection is positive or negative, from which we calculate an overall fraction of positive detections.
We also cross-match the sources to their underlying PS1 hosts (Chambers et al. 2016), yielding g − r, r − i, i − z and z − y host colours.By construction, all sources will be close to a source with at least one PS1 detection.We also cross-match to MIR host colours (W1−W2, W3−W4) from WISE (Wright et al. 2010), and to underlying W1 variability using WISE+NEOWISE (Mainzer et al. 2014), similar to Yao et al. (2023).We also crossmatch to the Milliquas catalogue to known radio/Xray-selected AGN (Flesch 2023), yielding a boolean has milliquas flag.From the nuclear sample, we have 5264 sources with classifications which could in principle be used for analysis.The sample is dominated by the 4218 AGN (80.1%), but also includes 213 core-collapse supernovae (4.0%), 708 Type Ia supernovae (13.4%), 39 variable stars (0.7%), and 86 TDEs (1.6%).Additional quality cuts are then applied, to select a sample of nuclear transients with uniformly-derived properties.In particular, we restrict ourselves to sources which passed the light curve fitting described in Section 3, and had a significantly-measured fade time (i.e were detected at least 0.5 mag below peak).In practise, this requires sources to be detected multiple times in both g and r band, and to have a detection at least 0.5 mag below gband peak.Of the initial 5264 classified nuclear sources, only 3040 pass this additional 'fade and colour change' cut.All sources passing this step also have the other relevant light curve parameters such as score, colour at peak etc.
From these 3040 sources with high-quality light curves, we additionally select those for which all WISE host colours and PS1 host colours were available, and for which sncosmo successfully ran.Overall, half of the AGN (2153) and CCSNe (106) pass all cuts, along with 60% of SN Ia (427) and 64% of TDEs (55).However, only ∼8% variable stars (3) pass, due primarily to their erratic light curves.This ultimately leaves 2744 sources in our final 'nuclear ML sample', of which 55 are TDEs and the remaining 2689 are non-TDEs.The share of TDEs thus increases slightly from 1.6% of classified sources to 2.0% of the 'nuclear ML sample'.These steps are illustrated in Figure 2.

Training and Testing Sets
Given the small number of TDEs (55) in the dataset, it would not be possible to measure classier performance with reasonable accuracy using a simple division into separate training and testing sets.Even if 20% of the sources were reserved for testing, this would corresponds to just ∼11 TDEs, with consequently high uncertainty for metrics such as recall.Furthermore, given the small number of TDEs, the performance of a classifier on the test sample will be strongly influenced by the randomly- varying composition of the sample.If 'atypical TDEs' were randomly to be allocated to the training set, classifier performance would be much better than if they were allocated to the test set.
Instead, to maximise the number of TDEs available for training, and to minimise stochasticity, we employ the 'leave one out' k-fold stratified cross-validation to create testing and training sets (see e.g Hastie et al. 2009).We randomly divide our sample into 55 different equallysized groups, each containing one TDE.The non-TDEs are randomly sorted, and then allocated evenly to one of these groups.As 2689 is not exactly divisible by 55, some groups have 48 non-TDEs, while others have 49 non-TDEs.We select one group to be our test dataset, and use the remaining 54 groups as a training set.After training, we can derive performance metrics on the test dataset.
We can then repeat the process on a second group, again using the other 54 groups as a training set.This process is repeated for every single group in the dataset, meaning that 55 different classifiers are trained, with each source being tested once and used for training 55 times.To further reduce the variance in metrics, we repeat the process 10 times, each with a different random sorting of the data.By using the average performance of classifiers across groups and iterations, we can obtain more robust estimates of performance, and be certain that any outlier sources are fairly represented.

Dataset Augmentation
Given the severe class imbalance in nuclear transients, where TDEs represent a tiny minority (∼2% of the total), any classifier which simply rejected all candidate TDEs would already have an accuracy of ∼98%.To mitigate this effect, we employ Synthetic Minority Oversampling TEchnique (SMOTE) to generate a balanced training set (Chawla et al. 2011).With SMOTE, for each of the k-fold training sets, we randomly select pairs of TDE, and generate new pseudo-TDEs with properties lying a random distance between the two real TDEs.This process is repeated until the training set contains as many TDEs as non-TDEs (and is thus composed of 50% non-TDEs, ∼2% real TDEs and ∼48% pseudo-TDEs).Once trained on a fold, a classifer can then be tested on the test data, which contains only non-TDEs and real TDEs, to assess its performance.The process of generating pseudo-TDEs via SMOTE is repeated from scratch for each k-fold permutation on the train set, excluding the sources in the test set, so there is no contamination from test data in the training sample.

Classifier Architecture and Performance
With the balanced training sets built in Sections 4.1-4.3,we can train the tdescore classifier.tdescore is built with the XGBoost algorithm (Chen & Guestrin 2016), which employs a gradient-boosted decision tree architecture to build a classifier.For tdescore, we use the python implementation with 27 features.Given the risk of overfitting on our relatively small dataset, and the lack of an independent validation set to measure performance, we generally do not modify the default settings in XGBoost4 .We use 100 estimators, and to mitigate overtraining, we further adopt a subsampling rate of 70% for XGBoost to employ in each iteration of the boosting procedure.During training, we use the area under the precision-recall curve as the optimisation metric.Use of this metric ensures that both false positives and false negatives are minimised.The augmentation, training and testing is rapid, requiring approximately 5s for a single iteration on a typical MacBook Pro.
Having trained our classifier and applied it to the entire Nuclear ML sample, we then require a threshold score to determine which class each source is assigned.The precision and recall as a function of possible threshold is illustrated in Figure 3.As our base case, we adopt a threshold at which > 80% precision5 is achieved, with the corresponding confusion matrices shown in Figure 4.With this cut, 77.5% of TDEs are successfully recovered (∼43 TDEs).The classifier efficiently rejects non-TDEs, with 99.6% being correctly classified, while just 0.4% are misclassified as TDEs (∼11 non-TDEs).Given the unbalanced sample, this results in 80.2% of tdescoreselected candidates being real TDEs, with 19.8% being non-TDEs.
The appropriate threshold for classifiers such as tdescore ultimately depends on the intended scientific application.A high precision sample with lower recall6 may be preferable for rate studies or other population analysis, whereas a high recall might be desired to generate a complete spectroscopically-classified TDE sample where some contamination is acceptable.We consider an alternative stricter threshold, chosen such that at least 95% of tdescore-selected TDEs would be genuine.Applying this higher threshold produces a very clean sample of probable TDEs, which nonetheless retains a recall of 73.3% (∼40 TDEs and ∼2 non-TDEs).This confirms that nearly three quarters of genuine TDEs are confidently identified, receiving very high classifier scores.We also consider a loose threshold that is nearly complete, chosen such that a recall of at least 95% is achieved.With this loose cut, only ∼5% of TDEs are lost (∼3 TDEs), but the background is rejected with such efficiency (97.4%) that the share of TDEs in the sample reaches 45.7% (∼52 TDEs and ∼62 non-TDEs), versus just 2.0% in the parent training sample.tdescore is thus able to reject most of the background at very little cost to completeness.Further tests of tdescore using subsets of the parameters are detailed in Appendix Section B, which confirm that much of the background can be rejected even before lightcurve information is available.
As a crosscheck, we repeat the tdescore training without using the SMOTE augmentation described in Section 4.3.For the balanced threshold (defined as >80% precision), recall slightly increases from 77.5% to 79.5%, but for the clean threshold, recall falls from 73.3% to 72.0%.For the inclusive threshold (defined as >95% recall), precision falls substantially from 45.7% to 29.5%.Overall, the area under the precision/recall curve decreases from 0.893 to 0.882.The data augmentation step thus provides clear performance improvements for cases prioritising either high recall or precision.

UNDERSTANDING CLASSIFIER REASONING
To have confidence in the results of tdescore, it is important to understand whether classifications are based on sound reasoning.The global importance of different features are listed in Table 1.In agreement with Gomez et al. (2023), we find that color at peak is an important discriminator, confirming the well-known property that TDEs are atypically blue relatively to most other transients.However, given the overwhelming dominance of AGN as contaminant nuclear sources, we find that WISE W1−W2 colour is by the far the most important feature in identifying TDEs.This is not unexpected, given the ubiquity of WISE colour cuts as a method of selecting AGN (Stern et al. 2012).We also find that sncosmo analysis can be a useful tool, with the resultant χ 2 values being useful proxies for both SNIa (with good fits) and AGN (with poor fits).
.17 ZTF19aanyuyh (Type Ia Supernova) Figure 5. 'Waterfall plots' produced by SHAP for a TDE (top) and a supernova Type Ia (bottom), demonstrating the thinking behind the tdescore classifications.In both plots, red/right is more TDE-like, while blue/left is less TDE-like.The four most salient features for each source are shown, with the actual value for each parameter given in the leftmost column.Top: The TDE (ZTF19aapreis) has WISE colours inconsistent with an AGN host (W1−W2 = 0.0), a blue colour at peak (g−r = −0.3), is very nuclear (0.1" offset to PS1), and has very little cooling (0.003 mag per day).All these variables lead to a TDE classification.Bottom: The supernova (ZTF19aanyuyh) also has a WISE colours inconsistent with an AGN host (W1−W2 = 0.0), and a high detection rate (one datapoint per 1.3 days), supporting a possible TDE classification.However, the source also fades very rapidly (9.3 days), and is somewhat offset from its PS1 host (0.4").In combination, these other variables lead to a non-TDE classification.In both cases, the tdescore use of features closely approximates the reasoning that would be employed by an astronomer.
tdescore also attempts to overcome the 'black-boxproblem' by incorporating explainable AI.We analyse the tdescore classifier using SHapley Additive exPlanations (SHAP) python package (Lundberg & Lee 2017).SHAP explains the output of ML classifiers for individual objects, by estimating the local importance of each feature for a given source.This means that every individual tdescore classification can readily be understood and sanity-checked by humans.An illustration of tdescore reasoning for classifying a TDE, and a Type Ia supernova, are shown in Figure 5.In these cases, tdescore follows a decision-making process very similar to that employed by human scanners in ZTF.tdescore is a novel photometric classifier developed with the explicit aim of approximating the human scanning employed in ZTF.Our ZTF sample provides the largest homogeneous sample of nuclear transients by far (Yao et al. 2023), and thus presently serves as the best template for developing techniques to detect TDEs.tdescore combines well-tested algorithmic cuts to robustly identify nuclear transients, an agnostic light curve analysis technique using Gaussian processes, and a simple binary tree-based classifier using physicallymotivated features.
The sole other dedicated TDE classifier in the literature, fleet (Gomez et al. 2023), is based on an adapted supernova classifier.Gomez et al. (2023) began with a sample of spectroscopically-classified transients from the Transient Name Server, rather than a dedicated sample of nuclear transients as presented here.In other respects Gomez et al. (2023) followed a similar procedure to the one presented here, with an imbalanced sample of transients which are first analysed for light curve and host galaxy properties, augmentation via SMOTE and then performance assessment via k-fold cross-validation.fleet achieved just ≈40% recall with ≈50% precision for a loose selection, or alternatively ≈30% recall with ≈80% precision for a stricter selection, in contrast to the ∼80% recall and ∼80% precision in the tdescore balanced case.However, the performance is not directly comparable, because fleet was applied to only 40 days of photometry, rather than the full light curve history employed here.For a TDE such as that in Figure 1, 40 days would be insufficient to adequately measure fade or colour evolution.As detailed in the Appendix Section B, the performance of tdescore is closer to fleet if late-time data is ignored.
Looking further ahead, tdescore can serve as a template for obtaining a photometrically-selected sample of TDEs from surveys such as the Legacy Survey of Space and Time (LSST) with the Vera C. Rubin Observatory (Ivezić et al. 2019).In combination with photometric redshifts, an ML-based approach like tdescore could enable us to perform large-sample TDE demographic studies for the first time without use of any spectroscopic observations.In particular, Bricman & Gomboc (2020) estimated that LSST should detect >3000 TDEs per year, under the assumption of a conservative detection requirement of 2 magnitudes above the median 5-sigma limit.Pushing one magnitude deeper, to match the cuts employed by this work, would increase this number even further.The performance of tdescore suggests such a depth would be plausible using photometric selection, with the slow evolution of TDEs being well-suited to the expected LSST cadence.
The performance of tdescore with real-time ZTF data will be the subject of a future publication.There are many other possible uses of photometrically-selected TDEs, for example to build a much larger sample of probable TDEs to test possible multi-messenger correlations between neutrinos and TDEs (see e.g Stein 2019), for which there is growing evidence (Stein et al. 2021;Reusch et al. 2022;van Velzen et al. 2024;Jiang et al. 2023).Another use is to quickly identify candidate TDEs amongst transients detected by surveys at other wavelengths, through crossmatching to probable ZTF TDEs found by tdescore.We will use this method to aid searches for dust-obscured TDEs with the Wide-Field Infrared Transient Explorer (WINTER; Lourie et al. 2020), a newly-commissioned near-infrared survey telescope at Palomar Observatory.
Building broader TDE samples is important, because by construction, tdescore will not find TDEs that differ substantially from the existing ZTF TDE sample.In particular, given the importance of the W1−W2 colour, tdescore is likely to be heavily biased against finding TDEs in AGN.This is a direct consequence of the parent sample of ZTF TDEs, none of which occur in AGNlike hosts with W1−W2 > 0.7 (Stern et al. 2012).To find such 'AGN−TDEs' (or other outliers such as red TDEs or fast TDEs), we would first require a handful of spectroscopically-confirmed ZTF examples.As our understanding of TDE diversity improves, tdescore can be retrained to find a broader selection of TDEs.
Applying tdescore directly to future optical surveys should be relatively straightforward, because the classifier is trained almost exclusively on light curve features that are generic, and do not encode any specific ZTF survey information.However, there is also substantial scope for improvement in performance.While all ZTF light curves were analysed here in observer frame units, with no correction for redshift, ongoing industrial spectroscopic surveys such as DESI (DESI Collaboration et al. 2016) mean that spectroscopic redshifts will be available systematically for much of the local universe.
Even in the LSST/Rubin era, widespread adoption of photometric redshifts would enable intrinsic rest-frame properties such as peak luminosity to be employed for classification.Additionally, TDEs are generally characterised by luminous UV emission, and u-band colour is an excellent discriminator to find TDEs (see e.g.van Velzen et al. 2011).While no UV observations were used for tdescore, due to a lack of systematic coverage, Rubin will have u-band coverage of all transients on a ∼weekly cadence.At higher redshifts, much of the TDE rest-frame emission at UV wavelengths will also be detectable with optical LSST filters.There are thus many reasons to be optimistic that future iterations of tdescore will be able to outperform the classifier presented here.Table 2. Performance of tdescore for four parameter sets: information only about the host, information available shortly after discovery, information available by the time of peak, and the full parameter set.The performance of tdescore substantially with more data, but high performance is only achieved for the full dataset.

Figure 2 .
Figure 2. Top: Breakdown of the various cuts applied to the ZTF nuclear sample.Of 11699 ZTF sources, 5264 have a secure classification, 3040 also have a well-measured fade, and 2744 sources pass all cuts.Bottom: Of these 2744 sources used to train tdescore, 55 (2.0%) are TDEs.

Figure 3 .
Figure 3. Precision and recall as a function of tdescore threshold.The balanced threshold (chosen such that precision is at least 80%) is illustrated by the central vertical line.The inclusive (> 95% recall) and clean (> 95% precision) thresholds are illustrated by the left and right vertical lines, respectively.The corresponding confusion matrices for these three scenarios are shown in Figure 4.

Figure 4 .
Figure4.Prediction-normalised confusion matrices (left), and truth-normalised confusion matrices (right), showing the performance of tdescore on the real data for different thresholds.The dataset is highly imbalanced, as seen in Figure2.The source shuffling is performed 10 times, yielding averaged performance across the iterations, with the average expected number of sources for each category shown in brackets.Top: An inclusive threshold, optimised for recall.At the cost of 5% loss of TDEs, a sample is produced with a TDE fraction increased from ∼2% to ∼46%.Center: An intermediate threshold, chosen to achieve >80% precision.It achieves relatively high recall (77.5%).Bottom: A strict threshold, optimised for precision.>70% of TDEs pass this requirement, yielding a clean sample with <5% contamination rate.

Figure 6 .
Figure6.Breakdown of the validation method for classifications, as described in Section A. Each source requires both a human-assigned classification and a second piece of confirmatory evidence to be considered reliably classified.

Figure 7 .
Figure 7. Precision (top, solid) and recall (bottom, dashed) curves for the four parameter sets listed in Table2, as a function of threshold.Both precision and recall increase substantially as more data is added.

Table 1 .
Hastie et al. 2009e of all 27 features in tdescore, calculated by XGBoost (Chen & Guestrin 2016) using the standard averaging of importance across all decision trees in the final model (see e.gHastie et al. 2009).
Table 2, as a function of threshold.Both precision and recall increase substantially as more data is added.