Pterodactyls: A Tool to Uniformly Search and Vet for Young Transiting Planets In TESS Primary Mission Photometry

Kepler's short-period exoplanet population has revealed evolutionary features such as the Radius Valley and the Hot Neptune desert that are likely sculpted by atmospheric loss over time. These findings suggest that the primordial planet population is different from the Gyr-old Kepler population, and motivates exoplanet searches around young stars. Here, we present pterodactyls , a data reduction pipeline specifically built to address the challenges in discovering exoplanets around young stars and to work with TESS Primary Mission 30-min cadence photometry, since most young stars were not pre-selected TESS 2-min cadence targets. pterodactyls builds on publicly available and tested tools in order to extract, detrend, search, and vet transiting young planet candidates. We search five clusters with known transiting planets: Tucana-Horologium Association, IC 2602, Upper Centaurus Lupus, Ursa Major and Pisces Eridani. We show that pterodactyls recovers seven out of the eight confirmed planets and one out of the two planet candidates, most of which were initially detected in 2-min cadence data. For these clusters, we conduct injection-recovery tests to characterize our detection efficiency, and compute an intrinsic planet occurrence rate of 49+-20% for sub-Neptunes and Neptunes (1.8-6 Re) within 12.5 days, which is higher than Kepler's Gyr-old occurrence rates of 6.8+-0.3%. This potentially implies that these planets have shrunk with time due to atmospheric mass loss. However, a proper assessment of the occurrence of transiting young planets will require a larger sample unbiased to planets already detected. As such, pterodactyls will be used in future work to search and vet for planet candidates in nearby clusters and moving groups.


INTRODUCTION
The Kepler mission was instrumental in the discovery of thousands of exoplanets (Borucki et al. 2010;Borucki Corresponding author: Rachel B. Fernandes rachelfernandes@email.arizona.edu 2017), most of which were found to be closer to their star than Mercury is to our Sun. A closer look at this population of short-period exoplanets revealed two prominent features: the Hot Neptune Desert and the Radius Valley. The Hot Neptune Desert (3−10 R ⊕ ; <4 days) is a region in the Gyr-old, period-radius plane with a significant dearth of Neptune-sized planets (Beaugé & Nesvorný 2013). As Neptune-sized planets should be easier to find in short-period orbits, and many sub-Neptunes have been discovered with longer orbital periods from surveys such as CoRoT (Baglin et al. 2006) and Kepler, the Hot Neptune Desert appears to not be an observational bias. The Radius Valley, a region with a much lower frequency of planets with radii ∼1.8 R ⊕ than at ∼1.3 R ⊕ (Super-Earths) or at ∼2.4 R ⊕ (sub-Neptunes), was discovered by Fulton et al. (2017) using accurate stellar, and hence planetary, radii from the California-Kepler survey (Petigura et al. 2017;Johnson et al. 2017). More recently, the Radius Valley was also detected in K2 data (Hardegree-Ullman et al. 2020).
Both the Hot Neptune Desert and the Radius valley are most likely caused by atmospheric mass loss due to XUV photoevaporation (see, e.g. Owen & Wu 2013, 2017 and/or core-powered mass loss (see e.g., Ginzburg et al. 2016Ginzburg et al. , 2018Gupta & Schlichting 2019;Gupta & Schlichting 2020). Given that these features are most likely to be evolutionary, it is likely that the primordial population of short-period planets was very different. Uncovering that population requires discovering exoplanets around young stars which can be further used to test the theories of atmopheric mass loss.
The Kepler mission, however, was not able to detect any planets younger than 1 Gyr since there were only four open clusters (0.7-9 Gyr) in the Kepler field (Meibom et al. 2011(Meibom et al. , 2013. On the other hand, the K2 mission (Howell et al. 2014) discovered several young transiting planets while surveying the Taurus, Hyades, Praesepe and Upper Scorpius (e.g., Mann et al. 2016aMann et al. ,b, 2017Ciardi et al. 2018;Rizzuto et al. 2017Rizzuto et al. , 2018Gaidos et al. 2017;Vanderburg et al. 2018). Detailed studies of these planets have already begun to shed light on how these planets differ from their Gyr-old counterparts. For example, the planets in the 23 Myr-old V1298 Tau system, that currently occupy gaps in the Kepler distribution, is thought to be undergoing photoevaporation and is predicted to "shrink" or lose their atmospheres with time (David et al. 2019).
With the Transiting Exoplanet Sky Satellite (TESS) mission (Ricker et al. 2014), we now have the unique opportunity to detect planets around stars in more young clusters and associations than those observed with K2 . During its primary mission, TESS monitored each strip of the sky (also known as sectors) for approximately 27 days, which means that it can detect planets with an orbital period of ∼14 days (assuming two transits per candidate). Per sector, there are about 200,000 bright nearby stars that were pre-selected for the two year primary mission to be observed in a short 2-minute cadence mode known as the Candidate Target List (CTL; Stassun et al. 2018). Moreover, TESS obtained images of each sector at a 30-minute cadence which are also known as Full-Frame Images (FFIs). Over a billion stars that were observed in this mode are listed as part of the TESS Input Catalog (TIC), which contains a relatively untapped reservoir of young planetary systems. In fact, while a handful of young planets have already been detected using 2-min cadence data within 200 pc (e.g., Newton et al. 2019Newton et al. , 2021Rizzuto et al. 2020;Mann et al. 2020), only two have been detected thus far using the FFIs (Nardiello et al. 2020;Bouma et al. 2020).
Automating the detection of young transiting planets poses several challenges, the biggest one primarily being the removal or modeling of instrumental systematics and stellar variability − the latter of which is more important for young stars due to their high variability. Young stars also tend to have stellar rotation rates that are comparable to the orbital periods of short-period planets that can be detected by TESS (e.g., Rebull et al. 2016), thereby making it hard to find transiting planets in their light curves. A study led by Nardiello et al. (2020), PATHOS, used a PSF approach to extract light curves from TESS Primary Mission FFIs and searched for planets in them. However, most of their clusters were far away (>500 pc) and while they were able to find 35 planets, only two of these were within 200 pc thereby making precise radial velocity of these targets challenging. Additionally, TESS QuickLook Pipeline 1 (Huang et al. 2020a,b;Kunimoto et al. 2021) only considers stars brighter than T mag = 13.5 and is not optimized for highly variable stars such as the ones found in these clusters. To this effect, we have developed a dedicated pipeline to understand the survey completeness and thus properly calculate the intrinsic planet occurrence rates.
Here, we present pterodactyls 2 (Python Tool for Exoplanets: Really Outstanding Detection and Assessment of Close-in Transits around Young Local Stars), a pipeline that builds on publicly available and tested tools in order to extract, detrend, search, and vet young transiting planet candidates detected using TESS Primary Mission 30-minute cadence photometry. In Section 2, we discuss how we select the clusters that have been included in this analysis and, Section 3 describes the inner workings of pterodactyls. Finally, in Sections 5 & 6, we summarize our main results and some preliminary occurrence rates calculations followed by a discussion on our next steps (Section 7  (Mann et al. 2020), andTOI 451 b, c andd (Newton et al. 2021) were discovered using TESS 2-min cadence data and also had Spitzer data. All of these are confirmed planets except for HIP 67522 c. In contrast, the planets in IC 2602 were discovered using 30-min cadence data: TOI 837 b which is a confirmed planet with mass measurements from radial velocity (Bouma et al. 2020), andTIC 460950389 b (Nardiello et al. 2020) which is a Community TESS Object of Interest (CTOI) 3 .
In summary, a total of ten planets have been discovered so far in the selected clusters: of them eight were discovered in TESS 2-min cadence data and two in 30-min cadence data. Followup observations have confirmed eight of these while HIP 67522 c and TIC 460950389 b are only candidate exoplanets.
For the selected clusters and associations, we relied on the memberships provided in the BANYAN Σ (Gagné et al. 2018) and the Gaia DR2 open cluster member lists (Gaia Collaboration, Babusiaux et al. 2018) which gives us a total of 1940 stars out of which 1591 were observed during the TESS Primary Mission. Based on parameters available in the TIC, these stars span a large range in Gaia G magnitude (1.73-20.45) and stellar mass (0.11-3.2 M ). Table 1 provides a breakdown of the main properties of the moving groups and clusters that were included in this analysis while Figure 1 shows the relative positions of the clusters and their respective members that were analyzed in this work.

TRANSIT DETECTION PIPELINE
pterodactyls builds on publicly available and tested tools in order to extract the light curves (Section 3.1), detrend them (Section 3.2), search for transit-like signals (Section 3.3), and vet planet candidates (Section 3.4) in these clusters. We further optimized the efficiency of the detrending, search and vetting components of pterodactyls via injection-recovery tests. For the latter, we injected ∼1000 transit signals with planets ranging in size from 0.5-10 R ⊕ and with orbital periods between 0.5-10 days into a sample of our light curves using the Python-based batman 4 package (Kreidberg 2015). We then processed the light curves with pterodactyls and examined the distribution of planets that were recovered. We used this planet distribution as feedback in order to improve pterodactyls. The specifics of the extensive testing using planet injection-recovery tests are summarized in the respective sub-sections while the main steps of the pipeline are summarized in Figure 2.

Light Curve Extraction
Given that less than 10% of our targets were part of the pre-selected TESS 2-min cadence targets, we use the primary mission's 30-min cadence photometry extracted from TESS FFIs. Here, we employed eleanor 5 , an open-source tool developed by Feinstein et al. (2019). eleanor carries out background subtraction, and removal of spacecraft systematics such as jitter and pointing drift. eleanor provides four different types of light curves: raw light curves, corrected light curves, light curves with systematics removed using Principle Component Analysis (PCA), and Point Spread Function (PSF) modeled light curves.
In this work, we used PSF-modeled light curves since they better enable the recovery of small signals and minimize the effects of scattered light from the Earth and Moon (Feinstein et al. 2019). eleanor creates these PSF light curves by modeling the shape of the PSF as a multi-variate Gaussian (major axis, minor axis, and a rotation) for each star at each cadence. Then, eleanor integrates that Gaussian function to calculate the flux of the target star at each cadence. The quality flags assigned to the FFI data by the TESS team are implemented during the extraction of the light curves by eleanor (see Table 32 in the TESS Science Data Products Description Document 6 ). By default, we only keep the data with the bit value 0 i.e., the highest quality data. Once we download the light curves for each sector for a particular target, we 'stitch' them together and run the search on all available Primary Mission data for the target.

Accounting for Systematics
Any instrumental systematics that affect the shape of the PSF, such as the temperature of the instrument, are well-captured by the PSF model and the flux is conserved across it. However, there were several instrumental effects in these light curves that were not captured during the PSF modeling and remain in the light curve such as mid-sector flux drops caused by data downlink, and offsets in the light curves caused by errors in the uploaded Guidestar tables (see TESS' Data Release Notes 7 for a more detailed description of these issues). Since these instrumental effects would generate spikes in the light curve that affect pterodactyls' ability to detrend and search for planets, we further optimized these light curves by masking the mid-sector flux drops and rescaling any offsets. In some cases, eleanor could not properly fit the PSF flux of the target star possibly due to the target being faint, nearby bright stars, etc. We removed 148 such light curves from our sample.
We also masked a number of cadences in sectors 3, 10, 11, and 12 (see Figure 3) because we noticed that they triggered an anomalous number of transit-like detections in the search, indicating an issue with the light curves in those regions. We used a test similar to the "Skye" metric used by Thompson et al. (2018) that was also adopted for K2 data by Zink et al. (2020). The test measures the number of transit-like signals at each cadence and if the number of signals is >3σ, we masked those cadences before re-running the search.

Light Curve Detrending
In order to detrend the light curves of these young stars, we explored several options presented in Wōtan, an open-source Python package on GitHub 8 developed by Hippke et al. (2019). In their study, Hippke et al. (2019) used injection recovery tests and found that, in the presence of significant stellar variation, spline-based methods (where adjacent functional pieces join each other at fixed knots) maximized the fraction of injected planets that were recovered. Following up on this result, we tested the following three different spline techniques (see Appendix A for an example): i) a robust spline with iterative sigma clipping, ii) a Huber estimator spline

Planet Candidates Astrophysical False Positives
Threshold Crossing Events (TCEs) Figure 2. Schematic of pterodactyls. * Before passing to triceratops, we conducted a centroid test whenever the Target Pixel File (TPF) was available on MAST.
(Huber 1981) and, iii) a penalized spline with iterative sigma clipping (Eilers & Marx 1996). We also tested the commonly used Savitzky-Golay filter (Savitzky & Golay 1964) and found that it often over-fits for the transit thereby causing an inaccurate transit depth measurement, as pointed out in Hippke et al. (2019). We evaluated the detrending based on the low noise or the root mean square (hereafter, RMS) of the light curve, and recovery fraction from the injection-recovery tests. We found that the penalized spline with iterative sigma clipping works best for our sample of young stars as it gives the lowest RMS measured on a 7-hour window and had the highest recovery fraction (∼92%). We found that, in agreement with Nardiello et al. (2020), the brighter stars contribute more flux and have low RMS. We also see a lot of scatter in the faint star region where there is more flux contamination. Furthermore, the ratio of the recovered planetary radii to the injected radii was close to unity, thereby suggesting that we did not over-detrend our light curves.
In carrying out these tests, we also noticed that the edges of light curves are often challenging to detrend due to the lack of data on one side and would often trigger a search. As such, we decided not to consider 0.5 days worth of data on either edge. We also noticed that the penalized spline uses a cut of 2σ i.e., it uses sigma clipping in estimating the trend and does not consider any data that is more than 2σ away. While this works well for almost the entire sample, it does not do a good job for light curves with high amplitude (>2-3%) rotational signals where it clips out peaks which leaves periodic residuals that are picked up by the search. We manually identified 20 such light curves and applied a 3σ cut to them in order to achieve a better detrending.

Optimization Based on Stellar Rotation Rates
A penalized spline is essentially a piecewise polynomial fit in which the routine tries all numbers of splines up to a set maximum, and uses a penalty function to evaluate if a better fit outweighs the added degrees of freedom. The penalized spline provides additional flexibility: unlike other spline methods where the exact number of knots must be chosen manually, the penalized spline identifies the best number of knots by comparing it against the smaller residuals from an improved fit starting from a given maximum number. This feature turns out to be useful for our sample as some young stars are rapid rotators (<1 day) with periodic signals on timescales similar to the orbital period. Using injectionrecovery tests, we found that the highest planet detection efficiency could be achieved by limiting the maximum number of splines based on the stellar rotation rate.
To this effect, we used a Lomb-Scargle periodogram (Lomb 1976; Scargle 1982; implemented in astropy) to first calculate the stellar rotation rate and then implement the following step function to determine the maximum number of splines that should be used in order to best detrend any given light curve: • if R rot 2 days: max splines = 200 • if R rot > 2 days and 10 days: max splines = 200/R rot • if R rot ≥ 10 days: max splines = 25 If the periodogram was not able to find a period, it defaults to the length of the segment (half a sector since there is a break in the middle). Since that is ∼10 days, only a small number of splines (25) is used which is ideal for a light curve of low variability. However, this is a maximum number of splines which means that the detrending can still pick the right number of splines even if the rotation period is estimated to be shorter than it really is.
At this stage, we visually inspected all of the light curves and found 80 with variablility trends that were difficult to detrend e.g., extremely fast rotators with a rotational rate <0.5 days 9 . Since we were not able to properly detrend these light curves, we also would not have been able to find planets in them. As such, we removed these light curves from our sample and are left with ∼1359 light curves that we used to search for planets.

Planet Candidate Search
We searched the detrended light curves for periodic, transit-like signals with transitleastsquares (hereafter, TLS; Hippke & Heller 2019). TLS is optimized for signal detection of small planets by taking into account the effects of limb darkening to better model the ingress and egress shape (Manduca et al. 1977;Mandel & Agol 2002). TLS searches for these signals by phase-folding the light curve over a range of epochs, transit durations and orbital periods calculated using a function based on the mass and radius of the host star as detailed in Ofir (2014). The period search grid is set by TLS by dividing the length of the light curve by the minimum number of required transits which is set to two.
For the purposes of our work, we defined a Threshhold Crossing Event (TCE) as having a signal-to-noise ratio (SNR) of 7 and a signal detection efficiency (SDE) of 7 10 . Most transit survey studies choose an SDE value between 6 and 10 (see e.g., Dressing & Charbonneau 2015;Livingston et al. 2018;Siverd et al. 2012;Pope et al. 2016;Aigrain et al. 2016;Feliz et al. 2021). While a lower SDE would mean higher completeness, a higher SDE would reduce the number of false positives. Our chosen SNR and SDE cuts reduced the number of false positives, which arose from detrending artifacts caused by the significant (∼20%) number of rapid rotators with periodic signals comparable to stellar rotation rates. Additionally, an SNR and SDE cut of 7 recovers eight out of the ten the previously detected planets (see Section 5.1).

Candidate Vetting
Transiting surveys (like Kepler , K2 and TESS) that target a large number of stars face a major challenge while distinguishing false alarms (transit-like events caused by instrumental artifacts) and astrophysical false positives (transit-like events that are astrophysical in origin but are not caused by planets). The process of sorting out false alarms and false positives from real transit signals is known as vetting. In order to filter out any instrumental false alarms that could be due to detrending artifacts, we used criteria inspired by RoboVetter (Kepler ; Thompson et al. 2018) and EDI-Vetter (K2 ; Zink 2019; Zink et al. 2020) to create four checks in order to address the issues unique to our sample. Only the TCEs that pass the following tests are considered objects of interest: • At least two transits with data: We required two transit signals with in-transit data points in a given light curve which allowed us to find planets with orbital periods of up to 13.5 days around most of the stars (assuming each star only had single sector data).
If we required at least three unique transits like Kepler did, we would be limited to transit signals with orbital periods less than 9 days.
• Orbital period dissimilar to stellar rotation rate: While analysing the results of the search, we of-ten found residuals at orbital periods similar (∼20%) to the stellar rotation rate (and some of these at half the rotation rates), despite detrending. To combat this, we flag any such TCEs.
• Consistency in individual transit depths: We found that, quite often, false alarms have transit depths with larger variance than real planets. Similar to Kepler 's Robovetter Single Event Statistic (SES), we flagged signals with large variations in transit depth as likely false positives. We weed out these signals based on the ratio of the standard deviations of the individual transit depths to the average transit depth, which we set to be greater than 1.5 for a false positive. This cutoff was determined by preserving known planets while keeping the number of false positives low.
• Observed transit duration should be consistent with expected transit duration: On visual inspection, we found that some TCEs had transit durations that were inconsistent with those of planetary transits. For example, stellar variability can lead to dips with a duration longer than a transit, and single occurrence outliers can appear to have a shorter transit duration. In such cases, it is beneficial to compare the observed transit duration to the expected transit duration that is determined using the planetary parameters calculated by TLS. If the observed transit duration is within 75% of the expected transit duration, the TCE passes this test. This cut was also set to preserve known planets and reduce the number of false positives.
In addition, we also used two checks from EDI-Vetter Unplugged (Zink et al. 2020) that were initially designed to vet transits in K2 data but has now been adapted to vet TESS light curves (for e.g., see Feliz et al. 2021).
• Individual Transit Test: Derived from the Marshall test designed by Mullally et al. (2016) for Kepler , this test used transit models and those of common non-transiting false positives and fit these to the individual transits. It then used the Bayesian Information Criterion (BIC) to gauge the goodness of these fits, thereby flagging the non-transiting signals that are often assumed to belong to transiting planets.
• Secondary Eclipse Test: This test looked for transit events between the 40-60% phase range of the folded light curve, allowing for some mild eccentricity variations. If a meaningful secondary eclipse was identified i.e., the secondary eclipse has a transit depth >10% of the primary transit, this flag was triggered and the transit was labeled a false positive. Secondary eclipses with less than 10% depth can be associated with reflected light from hot Jupiters.
Once we removed the clear false alarms and only have transit-like signals, we were left with 21 pterodactyls Objects of Interest (hereafter POIs; see Tables 2 and 3) on which we conducted a centroid test (Hedges 2021) 11 in order to determine whether the transiting signal was indeed coming from the target star (see Appendix C). This test compares the distributions of centroids for the data inside and outside the transit. An offset would indicate that the transiting signal was coming from a contaminating source such as a nearby or background star. We chose a probability p of 5% for the Student t-test below which the centroids positions inside and outside transits are not considered to come from the same parent distribution.
All POIs were then passed on to triceratops 12 (Giacalone et al. 2021) − a vetting tool designed to distinguish bona fide planets from astrophysical false positives in TESS data. triceratops calculated the marginal likelihood of each transit-producing scenario using observational constraints such as the transit depth and transit duration from TLS for each potential host, positions and magnitude of nearby stars from Gaia dr2 (Gaia Collaboration et al. 2018), priors on planet occurrence and stellar multiplicity rates, and models of planetary transits and eclipsing binaries. These scenarios include transiting planets and eclipsing binaries originating from the target star, known nearby stars, and unknown unresolved stars located near the target on the sky. Lastly, triceratops calculates the Bayes factor between all scenarios involving a transiting planet around the target star and all astrophysical false positive scenarios. Using this Bayes factor, triceratops can place a statistically robust constraint on the reliability of any given planet candidate detected by TESS. Here, we made a transiting planet probability cut of 50% in order to preserve known planets.

FITTING FOR PLANET PROPERTIES
Once the POIs have passed all of the vetting tests described in Section 3.4, we fit the phase-folded light curve 13 in order to derive a more accurate planet radius. Since TLS only provided an initial fit to the phasefolded light curve and did not provide uncertainties on

RESULTS
We processed ∼1359 young stellar light curves extracted from TESS FFIs i.e., 30-min cadence data through pterodactyls (see Appendix D for the TIC IDs processed in this work). As an illustration, Figure 4 shows the different steps that were used in the recovery of the planet TOI 451 d. We detected 21 POIs that were then passed to triceratops where their phase-folded light curves were vetted to separate transiting planet signals from astrophysical false positives.

Recovery of Known Exoplanets
In the five clusters that were searched in this work, there are eight confirmed planets and two planet candidates. More specifically, three were single planet systems (DS Tuc A b, TIC 460950389 b, and TOI 837 b; the latter two discovered in 30min-cadence data) and seven were part of three multi-planet systems (HIP 67522 b and c, TOI 1726 b and c, and TOI 451 b, c and d), all discovered in 2-min cadence data (see Section 2). As pterodactyls, which is automated, is built to search for just one planet per star, the discovery of additional planets in a system is carried out after finding a first planet and masking it out.
Among the 21 POIs detected by pterodactyls in the first automated run, we find six of the confirmed exoplanets that were previously detected by various teams using TESS (summarized in Table 2; for light curves, see Appendix B). pterodactyls was able to recover the values for the orbital periods and radii for three of these planets that were consistent (within uncertainties) with the literature. This includes two single planets (TIC 460950389 b, and TOI 837 b) that were initially     discovered in 30-min cadence data, and one multi-planet system (TOI 451 b, c and d) that was initially discovered in 2-min cadence data. We first recovered TOI 451 d (4.1 R ⊕ ) at an orbital period of 16.36 days. After masking this transit, we were also able to recover TOI 451 c (3.1 R ⊕ ) at 9.19 days (see Figures 4 and 16). However, pterodactyls was not able to recover TOI 451 b (1.9 R ⊕ ) since it has a low SNR (∼5). We find that all of the confirmed planet and planet candidates that pterodactyls recovers pass the centroid test. Two out of the other three recovered transiting systems (out of a total of six as stated above) suffered from sparse cadence (few in-transit data points): DS Tuc A b and HIP 67522 b. For these, pterodactyls was able to accurately recover the orbital periods but not the same radii as in the literature. In the case of DS Tuc A b, at first we considerably underestimated the planet's radius value. On further inspection, we noticed that this was caused by very few in-transit data points that was due to the removal of "bad" data that was flagged using the TESS team's quality flags. In this specific case, we found that inclusion of certain quality flags lead to a better estimation of the planetary radius i.e 5.35 R ⊕ (see Figure 11) which is within errors of the value of 5.70 R ⊕ that was measured using TESS' 2-min cadence and Spitzer data (see Table 2). In the case of HIP 67522 b, the overestimation of the planet radius was severe: we recovered a radius of 13.4 R ⊕ while the planet's observed radius is 9.72 R ⊕ . This could be due to several factors. First, as can be seen in Figure 5, the middle transit was not recovered (but predicted by TLS) due to the default edge cutoff for all stars being 0.5 days. How-ever, the severe overestimation of the radius could also be due to the planet being discovered in the more abundant 2-min cadence data. We tested this by processing the publicly available 2-min cadence light curve through pterodactyls (see Figure 6) and found that with the increased number of in-transit data points (along with higher RMS compared to 30-min cadence data), we were able to recover a planet radius of 9.8 R ⊕ which is close to the literature value of 9.72 R ⊕ . We were not able to recover HIP 67522 c due to it being a single transit event and hence, beyond the scope of pterodactyls since it requires a minimum of two transits.
Finally, for the TOI 1726 system in UMa, we recovered a transit signal from a 2.48R ⊕ planet with an orbital period of 11.32 days. However, the TOI-1726 system has two small planets (b and c) of radii 2.15 R ⊕ and 2.67 R ⊕ , with orbital periods of 7.11 and 20.55 days respectively (Mann et al. 2020). Given that the planet sizes (and hence, transit depths) are similar, the signals were initially detected by pterodactyls as coming from one planet. Upon further inspection, we realized that TLS set the maximum orbital period in the grid based on the minimum number of transits. So, for single sector data (∼27 days) with the minimum number of transits set to 2, the maximum orbital period would be ∼13.5 days which is shorter than the orbital period of TOI 1726 c. With this in mind, for this case alone, we set the minimum number of transits to one and were able to recover TOI 1726 c (see Figure 14). We then masked this transit and re-processed it through pterodactyls and recovered TOI 1726 b as well (see Figure 15).
After implementing the aforementioned fixes, we find that all of the recovered orbital periods are the same as those reported in the discovery papers while for the radii, four are the same within 1σ and three within 2σ (Figure 7). The recovery of these planets demonstrates the usefulness of pterodactyls in discovering young planets in 30-min cadence data.

Astrophysical False Positives Vetting
Apart from the six confirmed planets, pterodactyls also detected 15 other POIs (see Table 3), including two eclipsing binary signals around two stars in UCL i.e., TIC 129116176 and TIC 148158540. We weren't able to conduct centroid tests on all of these targets since the Target Pixel File on the MAST server (which is used for the centroid test) were only available for three of the candidates (all of the ones in UCL) in Table 3 but not for the 12 other candidates (IC2602). As such, we conducted the centroid test whenever the data was available. For the targets, where we could not conduct the centroid test, we visually inspected the eleanor pixel by pixel light curves. While determining the probability of a transiting signal being due an astrophysical false positive or a transiting planet, triceratops (by default) takes into account planet occurrence rates priors that are based on the Gyr-old exoplanet population (e.g., Howard et al. 2012;Dong & Zhu 2013;Petigura et al. 2013;Dressing & Charbonneau 2015;Mulders et al. 2015;Mulders et al. 2018). However, if the planet population around young stars is different from that around Gyr-old stars, for example because young planets are larger in size, the probability that triceratops would calculate is likely to be biased. In order to test the effect of priors on the probabilities of the different scenarios investigated by triceratops, we also calculated the astrophysical false positive probability using non-informative (flat) priors that were recently implemented in triceratops.
Out of our remaining 15 POIs, we found that 11 were not affected by the choice of priors and had the same probability of being an astrophysical false positive in both cases. In two out of the remaining four, we detected a transiting signal in each each of them (TIC 466579089 and TIC 466035035; see Table 3) that had passed all our vetting tests. Using Gyr-old priors, triceratops calculated the most likely scenario to be a nearby eclipsing binary (NEB). However, when using flat priors, the most likely probability was calculated to be a transiting planet. Given that IC 2602 is an extremely crowded cluster (∼5 stars per TESS pixel), we used eleanor's pixel level light curves to investigate whether these transits were indeed around our target stars. In doing this, we found that the signal was coming from giant stars (TIC 466579123 and TIC 466035000, respectively) a couple of pixels away and actually contributed more flux to the pixel than our target star. With this information in hand, we ran pterodactyls on those stars and were able to detect the same signal in the light curve but, this time, triceratops found that the most likely scenario was that the signal was due to an eclipsing binary around TIC 466579123 and TIC 466035000.
When the false positive probability was calculated using flat priors, the remaining two POIs that were initially calculated to be nearby eclipsing binaries with Gyr-old priors, showed a >25% chance that the signal was either a nearby eclipsing binary or a nearby transiting planet. These POIs have large transits depths that could either be attributed to small stars, brown dwarfs or giant planets since the two populations have comparable radius distributions (Chen & Kipping 2017). This, along with the fact that identifying shallow secondary transits can be arduous, makes it harder for statistical validation tools such as triceratops to properly distinguish between these scenarios (Shporer et al. 2017    In cases such as these, masses obtained via radial velocity follow-up are necessary to differentiate between these scenarios. This illustrates the need to properly take into account flux contamination in young clusters while vetting planet candidates.

Detection Efficiency
An advantage of having an automated planet detection pipeline is that it enables the measurement of the survey detection efficiency which is necessary to compute intrinsic occurrence rates. The detection efficiency of our survey was calculated using injection-recovery tests on a 3×4 grid in Rp R from 0.01 to 0.3 and in or-bital period from 0.5 to 27 days , equally spaced in both Rp R and orbital period. The injections are carried out in the Rp R space because most stars in our clusters currently lack stellar properties. We injected a transit signal in each bin in each light curve in our sample while taking into account the flux contribution of each star as calculated by triceratops. We then passed these injections through pterodactyls and calculated the fraction that were recovered as planets to produce a detection efficiency map per cluster as well as an average map for the entire sample. The average detection efficiency map is presented in Figure 8 and shows that the overall detection efficiency does not exceed 50% for any bin below 0.1 Rp R (i.e., a Jupiter-sized planet assuming it orbits a Sun-like star). Interestingly, all the previously known . Left: Comparison of planetary radii recovered using pterodactyls (fitted with EXOTIC) vs. known planet radii for the eight recovered known planets. As can be seen, despite being recovered from 30-min cadence data, the radii are often consistent with those published. In two cases, TOI 1726 c and TOI 451 c, we slightly underestimate the planet's radius. We also overestimate the radius of the planet for HIP 6752 b. Right: Comparison of orbital periods recovered using pterodactyls vs. known orbital periods for the eight recovered planets to depict consistency between the results of our work and those from the discovery papers. Note: Orbital period errors are extremely small and hence cannot be seen in the plot.  recovered planets (denoted using blue stars in Figure 8) are in bins with fairly low values (<20%). This might pterodactyls 13 be due to the intrinsic occurrence of smaller vs. larger planets being higher (Mulders et al. 2015;Petigura et al. 2018;Mulders et al. 2018). We also investigated how the detection efficiencies of each cluster contributed to the overall detection efficiency. Figure 9 illustrates that the detection efficiency of a cluster can be quite different from the average. For example, Pisces-Eridani (top panel) has a significantly larger detection efficiency. IC 2602 (middle panel), on the other hand, has an extremely low detection efficiency. These differences might be due to age (IC 2602 is younger, hence its stars might be more variable) and/or to flux contamination due to stellar crowding, which is significantly more pronounced in IC 2602.

TIC ID
We further investigated the effects of flux contamination on IC 2602's detection efficiency by re-running the injection recovery tests without accounting for flux contamination. In other words, signals are injected into the light curve assuming that all of the flux comes from the target star and hence, there is no transit depth dilution. The detection efficiency map is given in the bottom panel of Figure 9 and it is similar to that of Pisces Eridani within 10 days and above 0.06 Rp R . This test demonstrates that flux contamination significantly reduces the detectability of all planets but stellar variability in the youngest associations still plays a role for the smaller planets.

Preliminary Calculation of Occurrence Rates
The intrinsic occurrence rate of planets (η) can be calculated from the fraction of stars with detected planets in a survey and the survey completeness. Here, we used the inverse detection efficiency method as follows: where comp bin is the survey completeness evaluated in the defined bin, n p is the number of detected planets in the bin and n is the number of surveyed stars. The survey completeness is computed by combining the detection efficiency and the geometric transit probability, which is given by Here, R is the stellar radius, and a is the average semimajor axis which is calculated using Kepler 's third law. The uncertainty on the occurrence rate was calculated from the square root of the number of detected planets in the bin, assuming Poissonian statistics. In our analysis, we only considered sub-Neptunes and Neptunes (0.017-0.055 Rp R or 1.8-6 R ⊕ , assuming a solar radius) within 12.5 days (about half a TESS sec- Figure 8. Overall detection efficiency of pterodactyls while taking into account flux contamination. The confirmed planets are depicted using blue stars, except for TOI 1726 b whose recovery required changing pterodactyls. Darker bins represent regions of higher detection efficiency. n is the total number of stars in our sample while n is the total number of injections done. Black box denotes the bin over which the intrinsic occurrence rates were calculated i.e., sub-Neptunes and Neptunes (0.017-0.055 Rp R or 1.8-6 R⊕, assuming a solar radius) within 12.5 days (about half a TESS sector).
tor) 14 because we wanted to understand the primordial population of sub-Neptunes and Neptunes before being stripped off of their atmospheres. With an average detection efficiency of 9% and a f geo of 0.1 (at a geometric mean orbital period of 3.5 days and assuming a Solartype star), we compute an occurrence rate of 49±20% for our sample of young clusters.
We then use epos (Mulders et al. 2018) to calculate the Kepler Gyr-old FGK (Sun-like) occurrence in the same bin. epos is a well documented and tested Python code developed by our team 15 , it is available on GitHub 16 , and has already been used in several publications to compute occurrence rates as well as to compare planet formation models to the Kepler exoplanetary systems (e.g., Kopparapu et al. 2018;Pascucci et al. 2018;Fernandes et al. 2019;Mulders et al. 2019;Pascucci et al. 2019;Mulders et al. 2020). Here, we computed an occurrence rate value of 6.8±0.3%, which is much lower than our computed value for young clusters. This could be attributed to several reasons, the major one being 14 Here, we use the Rp R fit by TLS and not the values from EXOTIC since EXOTIC is not part of pterodactyls and is only used to fit for the planetary parameters once the POI is vetted by triceratops. 15 http://eos-nexus.org/epos/ that our sample is heavily biased towards clusters with known planets. For example, if we increased the sample of young stars to search by a factor of six, which is about all nearby young stars within ∼200 pc and assumed the same detection efficiency with no new planets, the occurrence would drop to ∼8% which is the same as the Kepler value. Furthermore, since most of the stars in the sample do not have stellar radii, masses or effective temperatures, it is very likely that we are considering non-FGK stars, most likely lower mass stars (Henry et al. 2006;Winters et al. 2014), while calculating the occurrence rate which would make the comparison to Kepler inaccurate, where the sample is mostly Sun-like stars. Therefore, in order to better estimate the occurrence rates of planets in young clusters, we will need to expand our search to all nearby clusters, measure the stellar parameters for those stars, and only compute the occurrence for FGK stars in our sample. However, if we continue to see this increased occurrence after we expand our sample to include all nearby clusters and moving groups, it could imply that there is an excess of larger planets at young ages possibly because their atmospheres have not yet been stripped due to photoevaporation and/or core-powered mass loss.

SUMMARY AND FUTURE WORK
In this paper, we present pterodactyls − a Pythonbased pipeline that is built using publicly available codes, and is specifically designed to extract, detrend, search for, and vet transiting exoplanets in young clusters using TESS' Primary Mission 30-min cadence FFIs. Here, we only consider 5 specific young clusters and moving groups: Tucana-Horologium Association, IC 2602, Upper Centaurus Lupus, Ursa Major, and Pisces Eridani. These clusters were chosen solely due to the presence of eight confirmed planets and two candidates which were used to validate pterodactyls. Using pterodactyls, • We were able to recover eight out of ten known transiting planets/candidates. For the two that we were not able to recover (both initially discovered using 2min cadence data), one planet was too small to detect (PiEri; TOI 451 b) and the other is a long period, single transit planet (UCL; HIP 67522 c), which is beyond the capabilities of the pipeline.
• We also detected 15 transiting signals, all of which were found to be either eclipsing binaries or nearby eclipsing binaries when their phase-folded light curves were vetted by triceratops.
• While taking into account the stellar flux contribution, we conducted injection-recovery tests in order to measure the detection efficiency of the pterodactyls.
We found that the overall detection efficiency does not exceed 50% for any bin below 0.1 Rp R (i.e., a Jupitersized planet orbiting a Sun-like star) and that all the recovered planets are in bins with fairly low values (<20%), hinting that even in young clusters smaller planets are more frequent than giant planets.
• For our biased sample, we computed a planet occurrence rate of 49±20% which is significantly higher than Kepler 's Gyr-old occurrence rates of 6.8±0.3% for sub-Neptunes and Neptunes (0.017-0.055 Rp R or 1.8-6 R ⊕ , assuming a Solar radius) within 12.5 days (about half a TESS sector).
Since most of these planets were initially detected in 2min cadence data, our work shows that 30-min cadence data can be used to detect young transiting planets and thus has the potential for more planet discoveries. The planet occurrence rate we calculated for our sample is higher than that for Kepler 's Gyr old stars. However, we realize that this value is heavily biased since we only consider clusters with confirmed/candidate planets and can be further improved upon by using pterodactyls to search for young (< 1 Gyr) transiting planets in a larger sample of nearby clusters and moving groups (<200 pc) using TESS Primary and Extended Mission FFIs. We also plan on uniformly characterizing all the stars in our sample, so as to only consider FGK stars in future studies. With this young population of transiting, shortperiod planets, we hope to establish how the radius distribution of transiting exoplanets evolved over time, and therefore provide observational constraints on the mass loss mechanisms of planetary atmospheres.  In order to find the best detrending routine for our sample, we followed Hippke et al. (2019) which used injectionrecovery tests and found that for highly variable stars, such as those found in young cluster, spline-based methods maximized the fraction of recovered injected planets. We specifically tested the following three splines available via Wōtan: robust spline with iterative sigma clipping, Huber estimator spline Huber (1981), and a penalized spline with iterative sigma clipping Eilers & Marx (1996). Here, we demonstrate how each of these splines detrend the light curve of HIP 67522 which is in UCL and also has two planets around it. As can be seen below, the penalized spline does the best job of detrending the light curve and has the lowest RMS.          The following plots show the results of the centroid tests for the recovered, known, young transiting planets using pterodactyls. We find that, in all cases, no significant offset was detected i.e., the transiting signal was indeed coming from the target star.

C.2. POIs/Potential Binaries
Hedges (2021) uses the TESS Target Pixel File (TPF; and the aperture information therein) in order to conduct the centroid test. However, TPFs were only available for three of the potential binaries (all of the ones in UCL; shown below) in Table 3 but not for the 12 other candidates (IC2602). As such, we were only able to use this test when the TPF was available. As can be seen below in two of the cases, we see no significant offsets. However, in the case of TIC 177631209, an offset is detected which is consistent with triceratops' Flat prior prediction i.e., the transiting signal is actually coming from a nearby/background star.