Gaussian Processes and Nested Sampling Applied to Kepler's Small Long-period Exoplanet Candidates

There are more than 5000 confirmed and validated planets beyond the solar system to date, more than half of which were discovered by NASA’s Kepler mission. The catalog of Kepler’s exoplanet candidates has only been extensively analyzed under the assumption of white noise (i.i.d. Gaussian), which breaks down on timescales longer than a day due to correlated noise (point-to-point correlation) from stellar variability and instrumental effects. Statistical validation of candidate transit events becomes increasingly difficult when they are contaminated by this form of correlated noise, especially in the low-signal-to-noise (S/N) regimes occupied by Earth–Sun and Venus–Sun analogs. To diagnose small long-period, low-S/N putative transit signatures with few (roughly 3–9) observed transit-like events (e.g., Earth–Sun analogs), we model Kepler's photometric data as noise, treated as a Gaussian process, with and without the inclusion of a transit model. Nested sampling algorithms from the Python UltraNest package recover model evidences and maximum a posteriori parameter sets, allowing us to disposition transit signatures as either planet candidates or false alarms within a Bayesian framework.


INTRODUCTION
The NASA Kepler space telescope (Borucki et al. 2010;Koch et al. 2010;Borucki 2016) launched in 2009 and observed ∼ 200, 000 stars within its primary field of view over the course of roughly 4 yr.With instru-mental error budgets capable of detecting Earth-sized planets in year-long orbits around Sun-like stars, Kepler aimed to directly measure the occurrence rate of such objects, otherwise known as eta-Earth (η ⊕ ; Borucki et al. 2010).Although the fulfillment of this objective was impeded by greater noise contamination from both stellar and instrumental effects than initially anticipated (Gilliland et al. 2011(Gilliland et al. , 2015)), significant progress has still been made.To aid in this effort, our work debuts a novel Bayesian framework employing nested sampling (Skilling 2004(Skilling , 2006) ) alongside simultaneous correlated noise modelling with Gaussian processes (GPs; Stein 1999;Rasmussen & Williams 2006) to more accurately conduct Planet-Candidate (PC)-False-Alarm (FA) dispositioning and characterization.As an aside, this study distinguishes FAs-being instrumental or astrophysical variability which mimic transit events-from astrophysical False-Positives (FPs)-being transit-like events produced by eclipsing binary stars (EBs) and blends.
Currently, no potential Earth-Sun or Venus-Sun analog system from the Kepler sample has been shown to be reliable.Moreover, the occurrence rates for planets with 0.5 < R p < 1.75 R ⊕ and 64 < P < 500 days as shown in Figure 2 of Hsu et al. (2019) are either upper bounds or detections with statistical significance less than 2 standard deviations, so extrapolation to regions of parameter space with fewer candidates would incur large statistical uncertainties.Thus, the estimate of η ⊕ (and η ♀ ) can be improved via more robust reliability estimates not only in the Earth-Sun and Venus-Sun analog bins but also in adjacent bins containing few verified planets.FAs, not astrophysical FPs such as EBs, become the primary issue for Kepler Object of Interest (KOI; Thompson et al. 2018a) discrimination in these regions (see Figure 6 of Thompson et al. 2018a).Note that published KOI catalogs do not distinguish between FAs and FPs, dispositioning both classes of objects as FPs, because their purpose is to distinguish planet candidates from non-candidates.
Having undergone thorough preconditioning via the Presearch Data Conditioning (PDC; Twicken et al. 2010;Stumpe et al. 2012;Smith et al. 2012) module of the Kepler Science Operations Center (SOC) Science Processing Pipeline (Jenkins et al. 2010a) in an attempt to mitigate instrumental trends common among all stars on the detector, the data products of KOIs should ideally only contain intrinsic stellar variability (granulation, spots, flares, oscillations, etc.) and transiting exoplanet/eclipsing stellar binary signatures; however, instrumental systematics (sudden pixel sensitivity dropout, rolling band, bad pixels, cosmic rays, etc.)which impact light curves non-uniformly-can also persist (Caldwell et al. 2010;Gilliland et al. 2011;Clarke et al. 2014;Gilliland et al. 2015;Van Cleve & Caldwell 2016;Kawahara & Masuda 2019).
As previous studies, such as Data Release 25 (DR25; Twicken et al. 2016;Mathur et al. 2017;Thompson et al. 2018a), do not model the transit event and correlated noise simultaneously or compute the individual reliability for any single target, their results are left susceptible to FA misidentification (Foreman-Mackey et al. 2015); instead, interpolation is performed across orbital period and Multiple Event Statistic (MES) using populationlevel injection results (Christiansen 2017;Bryson et al. 2020).Another example, Caceres et al. (2019), statistically classifies Earth-sized Kepler PCs in the presence of correlated noise.However, their approach is frequentist, does not reveal any long-period PCs, and does not robustly estimate transit parameters.By analyzing individual light curves on a per-target basis, our work better safeguards against FAs while also improving the accuracy and robustness of PC characterization in comparison to previous population-level approaches.
Accordingly, the data of any given KOI can be interpreted as having originated from a transiting PC with some noise contamination or as a purely noise FA.To assess the probability that small long-period lowsignal-to-noise ratio (S/N) patterns of photometric dips with few (roughly 3 -9) observed transit-like events (i.e., the regime that includes Earth-Sun and Venus-Sun analogs) are of astrophysical origin (i.e., represent true PCs or background/hierarchical EBs which induce transit-like dips), we model Kepler 's photometric data as noise, treated as a GP, with and without the inclusion of a transit model.These are hereby denoted as the transit plus Gaussian process (TGP) and GP models, representing PC and FA interpretations, respectively; model parameters are described in Table 1.Here, two qualitatively different models are being compared: one with a pattern of transit-shaped dips (TGP) and the other without (GP).The former wields more degrees of freedom and accordingly will fit the data more closely, but we must ask whether these additional parameters are justified.To provide a principled answer, we employ Bayesian model comparison.
Rooted in Bayes's theorem, nested sampling algorithms from the Python (Van Rossum 1995a,b,c,d;Dubois et al. 1996;Oliphant 2007) UltraNest (Buchner 2016(Buchner , 2019(Buchner , 2021) ) package recover maximum a posteriori (MAP) parameter sets and evidences of each model, allowing for transit signatures to be dispositioned in terms of PC and FA probabilities within a Bayesian framework.It is important to clarify that this work does not attempt to qualify KOIs beyond PC or FA status; this is in sharp contrast to Kepler planet catalogs, which disposition FAs together with FPs.
The simultaneous modeling of correlated noise additionally provides more robust constraints on transit model parameters.Thus, the analysis that we present herein also improves the characterization of PCs, most significantly in terms of their radii.
We describe our proposed methodology herein and apply it to select Kepler targets, including potential   2023) results (small black dots with colored outlines) in order to facilitate a visual comparison of the physical parameters recovered for each KOI analyzed by both studies.Green lines outline the range within which a KOI may be deemed sufficiently "Earth-Sun-like"; these are defined according to nominal Earth values for R p , and either P or S 0 as x ∈ 1 ± √ 2 − 1 x ⊕ .Note that all KOIs were also uniformly filtered by R ⋆ with respect to solar values according to these same bounds.We are complete in both boxes drawn by these lines.For ease of reference, Earth (⊕) and Venus (♀) are also plotted.
Earth-Sun and Venus-Sun analogs (see Figure 1).In § 2 we lay the statistical foundation for Bayesian model comparison, GPs, and nested sampling ( § 2.1) before proceeding with the construction of our TGP and GP models ( § 2.2), an overview of the software architecture ( § 2.3), and ending with a summary of how we obtain derived parameters from fitted solutions ( § 2.4).We identify a sample population of small long-period low-S/N KOIs-including Kepler 's most Earth-Sun-like exoplanet systems: Kepler-62f (KOI-701.04;Borucki et al. 2013), Kepler-442b (KOI-4742.01;Torres et al. 2015), and Kepler-452b (KOI-7016.01;Jenkins et al. 2015)whose preceding Markov Chain Monte-Carlo (MCMC; Metropolis et al. 1953;Hastings 1970)  This section is closed off by a comparison of the Bayesian evidence against the standard metrics of MES and S/N in § 4.3.We conclude with a summary of this paper's leading results in § 5 and outline future work in § 6.A list of terminology and acronyms can be found in § 6.

METHODOLOGY
In this section, we introduce the reader to fundamental methodology upon which we base our analysis, beginning with a summary of Bayesian statistics and evidence-based model comparison in § 2.1.Our com-bined treatment of white and correlated noise by use of a Gaussian distribution and Matérn 3/2 kernel GP is established next.Following this, § 2.2 provides a breakdown of each model (TGP and GP) in terms of their parameters.A step-by-step outline of our UltraNest software architecture and model fitting process for any given KOI can be read in § 2.3.Derived parameter calculations are detailed in § 2.4.

Model Comparison
Bayes's theorem (Bayes & Price 1763; Laplace 1774)which forms the basis of Bayesian statistics and probability theory-describes the process by which our knowledge of an event (posterior) is probabilistically updated according to existing information (prior) and new observations (likelihood).In other words, it allows us to adjust our understanding of the world in order to make better informed decisions/predictions.From a statistical perspective, we can model observed data, D, via the inference of model parameters, θ, using Bayes's theorem: where the posteriors, P D (θ), are represented in terms of the likelihood, L D (θ), priors, π (θ), and Bayesian evidence (marginal likelihood integral), Z D .Here, Z D gives the probability associated with observing this realization of D and is defined as: The Z D encodes both L D (θ) and π (θ) information, so it is often employed as a metric of model suitability.Should one know the most suitable model for a given problem, the computationally expensive Z D can be readily discarded in favor of obtaining only P D (θ) of modelled θ (e.g., likelihood-driven techniques such as MCMC).However, it is uncommon in real-world problems to possess the most suitable model with which D is described in totality.As such, Z D , and by extension the Bayes's factor of any two models, A and B, Accordingly, increasingly positive values of B promote the existence of the transiting PC while their negative counterparts suggest a FA signal originating purely from noise.Values of B near zero indicate no statistically significant improvement given by the addition of transit parameters to the fit with respect to the null (noise) hypothesis; that is not to say that these are definitively PCs or FAs or that either fit is necessarily less robust, but that no statistically significant difference exists between hypotheses.
Given an informed choice of kernel (covariance function; see Rasmussen & Williams 2006), a GP may target specific behavior or systematics within a given data set-this is particularly useful when attempting to fit correlated noise present in photometric time series observations.The Squared-Exponential (Radial Basis Function; Cheney 1966;Davis 1975;Powell 1981) and Matérn (Matèrn 1960) kernel families have become popular for the treatment of systematics in astronomy (Gibson et al. 2012;Roberts et al. 2012;Barclay et al. 2015;Aigrain et al. 2015;Evans et al. 2015;Czekala et al. 2015;Aigrain et al. 2016;Littlefair et al. 2017;Foreman-Mackey et al. 2017;Angus et al. 2018;Livingston et al. 2019;Brahm et al. 2023;Aigrain & Foreman-Mackey 2023, etc.); while both are generally well-suited to smooth signal applications (e.g., stellar variability), the latter is also capable of handling rougher interference (e.g., sudden pixel sensitivity dropout as illustrated by Figure 19 of Thompson et al. 2018a).Given the known characteristics of stellar and instrumental FA sources which contaminate Kepler photometry (Gilliland et al. 2011(Gilliland et al. , 2015;;Van Cleve & Caldwell 2016;Van Cleve et al. 2016;Thompson et al. 2018a), we adopt the Matérn 3/2 kernel: Here, σ c and l c describe the amplitude and length scales of the correlated noise with which every pair of data points, t and t ′ , is conditioned.White noise is incorporated as a scaling factor to the error bars belonging to each photometric observation and is obtained by fitting the standard deviation, σ w , of a zero-mean Gaussian.
Nested sampling is a popular class of algorithm which approximates Equation 2and provides posterior inference(s) as byproducts given D, L D (θ), and π (θ).Our current infrastructure makes use of UltraNest, which requires user-defined L D (θ), and prior transforms or quantile functions mapping between physical parameter and unit hypercube sampling spaces.Uniform priors are used for all TGP/GP parameters excluding limbdarkening parameters, q 1 and q 2 , which instead use Gaussian priors (Kipping 2013).• σc: Amplitude scale of Equation 6[unitless].

Summary of Models
• lc: Length scale of Equation 6[unitless].
Photometric exoplanet transits were modeled using transitfit5 (Rowe 2016).The lightcurve model uses the analytic limb-darkening transit from Mandel & Agol (2002) and assumes non-interacting Keplerian orbits.The model is parameterized with ρ ⋆ , q 1 , q 2 , T 0 , P , b, R p /R ⋆ , F 0 , √ e cos (ω), √ e sin (ω), and a photometric dilution factor (see Table 1).The model can additionally include the effects of geometric albedo, ellipsoidal variations, and secondary eclipses.The calculation of Keplerian orbits derives the scaled semi-major axis, a/R ⋆ , based on ρ ⋆ ; this calculation assumes that the planetary mass, M p , is much less than the mass of the host star, M ⋆ .For all presented models in this paper, we assume: (1) circular orbits (i.e., zero eccentricity, e) such that √ e cos (ω) = √ e sin (ω) = 0, (2) no dilution1 (unresolved binaries), (3) no star-planet interactions, and ( 4) that the planet is completely dark (no reflection or emission).Limb-darkening parameters are from the tables of Claret & Bloemen (2011) for the Kepler bandpass.The shape information of low-S/N putative transit signatures within our regime of interest: ( 1) leaves e and ω very weakly constrained and ( 2) makes uninformative limbdarkening priors superfluous.These assumptions and inputs to the modelling approach are similar to transit model results presented in DR25.

Software Architecture
The general step-by-step outline for the fitting of an individual KOI is detailed in this section.
1. Since the preceding MCMC architecture of Lissauer et al. (2023) modelled the transit events of prewhitened data rather than simultaneously fitting correlated noise and transit events as performed in this study, we treat their transit solutions as initial guesses to define focused prior widths for our transit model in UltraNest; this mitigates computationally wasteful exploration of uninformative/unlikely parameter space.
2. For noise model hyperparameters, we define physically-motivated prior widths accordingly: (a) While Kepler photometry typically falls within 10 -20% of the predicted white noise budget (Gilliland et al. 2011(Gilliland et al. , 2015 To solve for the Matérn 3/2 GP's hyperparameters prior to each iteration's likelihood evaluation, matrix inversion must be performed.Since the transit events are effectively isolated in time, their correlated noise components can be approximated to share negligible covariance; we represent this by means of a block-diagonal approximation to the kernel, drastically decreasing the computational burden of matrix inversions.Naturally, benefits in performance scale with the number of transit events in a given KOI's data set.For the task of inversion, we use Cholesky decomposition-a method roughly twice as efficient as lower-upper (LU) decomposition (Cholesky 1924;Banachiewicz 1942;Press et al. 1986;Schwarzenberg-Czerny 1995).Nonetheless, each iteration is still expensive.

Derived Parameters
Formulas for derived parameters can be found within this section.We compute transit duration according to Equation 16of Seager & Mallén-Ornelas (2003), rewritten here using Kepler's Third Law (Kepler 1619) as: The insolation flux, S 0 ≡ L/4πa 2 , can be combined with Kepler's Third Law to yield: Isochrone-derived stellar luminosity and mass from Berger et al. (2020) are used alongside our fitted orbital period to compute insolation flux via Equation 8.
Although we do not model eccentricity directly (e.g., Van Eylen et al. 2019), a minimum eccentricity may be estimated following Kipping (2014) and Torres et al. (2015) via the comparison of stellar density recovered by our model against independent estimates: To obtain distributions of independent parameters, we use the reported value and lower/upper uncertainties to sample from a two-piece normal distribution (Wallis 2014).For example, the R p distribution is derived via a convolution between the fitted R p /R ⋆ posterior distribution and a two-piece Gaussian distribution of R ⋆ constructed from Berger et al. (2020).

CANDIDATES MODELLED HEREIN
Targets were selected from a comprehensive catalog of Kepler candidates with revised light curve analyses (Lissauer et al. 2023) The nature of this regime places candidates at significant risk of being FAs; MES ≲ 8 and S/N ≲ 10 are predominantly exhibited within our sample.Furthermore, KOIs-5044.01,5971.01,7621.01, and 7923.01 yield suspect derived transit durations, t T , which diverge significantly from those expected for equatorial transits of planets on circular orbits, t T,c ; regardless, similar performance here between TGP and GP models (or PC and FA hypotheses) has resulted in none of these KOIs possessing high B.
In addition to those candidates listed above, we also included other small long-period PCs; among these are three of Kepler 's validated exoplanets whose characteristics most closely approach the Earth-Sun analog regime-excellent targets against which we may baseline our framework-: Kepler-62f (KOI-701.04),, and ).The comprehensive target list is given in Table 2 alongside fitted and derived parameters, as well as Berger et al. (2020) stellar property inputs for the free parameter choice of σ d = 8 and σ p = 5.  2023), and MES is given by DR25 where available; those KOIs not found by DR25 were also missed in Data Release 24, so their MES are instead taken from Q1 -16 (Mullally et al. 2015) and identified using " * ".All fitted/derived parameters are in bold.These values are reported for the free parameter choice of σ d = 8 and σ p = 5.

NUMERICAL RESULTS
There are three leading results to be discussed in this section, beginning with an overview of recovered TGP and GP solutions for cases of strong and weak PC evidence in § 4.1.Here, the former is demonstrated by the baseline target, Kepler-62f, and the latter by a member of our sample KOI population, KOI-5227.01.This is followed by § 4.2, which investigates the influence that varying the free model parameters, σ d and σ p , has on the B; these control phased photometric data window and prior widths, respectively.We conclude with a heuristic evaluation of the fitted B against reported DR25 (or Q1 -16) MES and derived S/N scores for our target population in § 4.3.A summary of results for the complete KOI sample can be found in Figure 1 and Table 2.

Strong and Weak Cases
For the demonstration of strong and weak PC evidence, we compare the recovered UltraNest TGP and GP solutions given free parameter choices of σ d = 8 and σ p = 5 for Kepler-62f and KOI-5227.01. Figure 2 shows the photometric data-preprocessed according to §2.3overlaid by TGP and GP solutions in unfolded original and folded GP-corrected states.Although we know Kepler-62f to be a bona fide exoplanet, both the phase (Figure 2) and corner (Figure 7) plots demonstrate competitive performance between PC and FA hypotheses; similar model performance is shown for KOI-5227.01(see Figure 8 and Figure 9).While it is nontrivial to individually discriminate or relatively rank Kepler-62f and KOI-5227.01by eye, our statistical analysis places them among the strongest and weakest PCs of the σ d = 8 and σ p = 5 subset, with recovered B values of 35.8 +1.2 −1.1 and −0.1 +1.1 −1.2 , respectively.This translates to strong favor for the former's PC status whereas the latter can be said to fall within the range of values which we deem indistinguishable in that sufficiently strong evidence supporting either hypothesis is lacking.Our example also serves to highlight the importance of both joint transitnoise modelling and robust Bayesian model comparison techniques-especially when working within lower signal-to-noise regimes such as these-, without which we would be more susceptible to target misclassification.
Generally, we have seen (Figure 4 and Figure 5) two sub-populations reflective of the above comparison emerge from our analysis (strong PCs and inconclusive/weak PCs and/or FAs).Included in the strong PC group are reassuringly three known exoplanets, , in addition to two more promising but less potentially Earth-Sun-like PCs, KOI-2719.02and KOI-6971.01.The remainder of Table 2's candidates fall within the currently indistinguishable range of ∼ ±10 B; further observations and/or deeper probabilistic analyses are likely required before more definitive conclusions may be made.
It should be noted that we do not yet observe candidates whose B strongly favors the FA hypothesis.We suspect this to be the result of: (1) survivor bias potentially introduced by our lack of targets with MES or S/N below ∼ 7 (i.e., we have yet to analyze sufficiently poor photometry) and/or (2) a miscalibrated B scale (i.e., the magnitude of the Z D penalty incurred by the TGP model's additional parameters and by extension, the B floor, are currently unknown).To obtain statistically robust conclusions to these hypotheses, future work will implement large-scale injection-recovery testing.

Varying Free Parameters
There are two free parameters required to initialize our modelling pipeline, these being σ d and σ p .Of concern to us is their influence on the recovered posteriors and B. We begin with σ d , which acts as a multiplicative factor to the width of the phased photometric data window, defined as 2σ d t T , where t T is the transit duration as found in the preceding MCMC solution of Lissauer et al. (2023).In order to better model the correlated noise present within each transit event, we may leverage out-of-transit observations, which locally share noise characteristics with those in-transit.Naturally, the question then arises as to how much out-of-transit data should be included when defining the phased photometric data window of any given fit?While there exists an abundance of available out-of-transit data, two constraining factors must be considered: (1) computational cost and (2) information gain.While only so much can be done in terms of computing power and software optimization, we can more deeply consider the notion that correlation between in-transit and out-of-transit noise decreases with increasing distance from the transit midpoint.To simplify things, the upper-bound on the GP's l c prior can be set as the width of the phased photometric data window.It follows that a suitable choice of phased photometric data window width then preserves the GP's ability to accurately model timescales relevant to the transit event(s) and subsequent statistical meaning of TGP-GP model comparison; in other words, the possibility of solutions preferring longer-timescale fluctuations with little in-transit information is mitigated.
Core to the Bayesian approach of statistical model comparison is the consideration of a priori knowledge, as seen in Equation 2. While intended to be used with informative (non-uniform) priors, it is not uncommon to lack the independent parameter constraints and distributions necessary for this.Such is the case in this study for all transit and noise model parameters, excluding q 1 and q 2 , whose prior distributions are precalculated with well-defined support according to Kipping (2013).Varying searchable parameter space does not pose much of an issue apart from sampling inefficiencies when employing informative priors, as exploration beyond their regions of significant probability density with naturally well-defined support returns little to no information.The same cannot be said for uniform priors, whose normalization biases measurements of the B via Equation 2(for a detailed study regarding priors and caveats such as this within the context of Bayesian inference and model selection, see Llorente et al. 2022).
To explore this implication, we propose a two-step method of quantile standardization to assess the potential bias induced by choices of σ p with fixed σ d = 2. First, overly-wide priors are cast using σ p = 25 in an attempt to fully capture the desired empirical posteriors.Once recovered, these can be used to provide accurate quantiles as: with which the prior widths of subsequent runs can be defined; this notation is not to be confused with the limb-darkening parameters, q 1 and q 2 .In our code, these quantiles are set by the corner quantile(x, q, weights) function, with x and weights arguments given by σ d = 2, σ p = 25 UltraNest empirical posteriors and q by 1 − q (σ p ) or q (σ p ) for lower and upper bounds, respectively.A detailed example of this process with accompanying visuals can be found in Figure 3.To assess correlation between the B and σ d , the sample population of KOIs was fit for σ d ∈ [2,4,8,16] and fixed σ p = 5.The resulting emergence of two KOI subpopulations was immediately apparent (see Figure 4), these being strong PCs with promising follow-up potential (red) and inconclusive/weak PCs and/or FAs (blue).Generally, the PC group demonstrates positive trajectory with respect to σ d across all metrics whereas the latter group evolves in a relatively flat fashion.Here, the B exhibits a clearly defined sub-population boundary at a value of approximately 10. Overall, this behavior suggests that the dispositioning of PCs from FAs is largely independent of the chosen σ d , meaning that smaller phased photometric data windows may be favored to reduce the computational burden while retaining sufficient information; this additionally supports a transit-like timescale upper-bound on the GP's l c .
As expected, the B possesses a general inverse proportionality to σ p resulting from Equation 2(see Figure 5).Conveniently however, this trend is roughly similar across the entire KOI sample population, indicating that sample populations with prior widths set by a consistent/shared choice of quantile will experience a population-wide shift in B plus some small variance of ∼ ±5 such that relative ranking between targets are still valid.Interestingly, it seems that this variance is reduced to stochastic order with greater probability of PC status, meaning that we can be relatively more confident in our findings for the red sub-population.While these tests must be performed on a much larger KOI population to draw any statistically robust conclusions, if the general inverse trend is to hold, one could theoretically fit and recover an empirical correction relating the B and σ p .For now, there does not seem to be a preferred σ p in terms of bias reduction, but we can recommend quantiles corresponding to at least σ p = 3 in order to promote sufficient prior-posterior information gain and properly recovered posteriors.3: A single-parameter mock example of the process used to obtain quantile-based prior widths for standardization tests conducted in Figure 5.In the first nested sampling run, a wide net is cast to ensure the parameter's empirical posterior distribution (black) is captured in its entirety (top).In subsequent runs, quantiles (solid red) of this complete posterior are used to define subsequent prior widths (middle), with which associated fractions of the complete posterior are recovered (bottom).While demonstrated with uniform priors, this is not a constraining factor; we apply this methodology to both uniform and non-uniform priors.In this example, skewed, non-Gaussian posterior behavior causes standard deviations (dashed blue) from the complete posterior's median (dashed-dotted green) to be inefficient in capturing parameter space information compared to quantiles.Since recovered posteriors are not guaranteed to be Gaussian in nature, we ensure consistent information gain by focusing only on relevant regions of parameter space via quantiles.The middle panel also visualizes how the normalized amplitude of a uniform prior is dictated by its width; this is a known origin of bias in Equation 2.  Finally, Figure 4 and Figure 5 both suggest the presence of five strong PCs: three known exoplanets, , and two new additions residing within the habitable zones of their host  If verified with current fitted parameters (see Table 2), KOI-7621.01would rank alongside  in terms of Earth-Sun analog candidacy.That being said, some points of contention must be addressed regarding this candidate: (1) we derive a nearly parabolic minimum eccentricity of e min = 0.89 +0.02 −0.04 and ( 2) the photometric data contains long-timescale fluctuations of considerable amplitude, likely caused by spot modulation.Since eccentricity is degenerate with the mean stellar density and impact parameter-found to be ρ ⋆ = 76 +26 −31 g cm −3 and b = 0.33 +0.28  −0.23 , whereas Berger et al. (2020) obtained ρ ⋆ = 1.00 ± 0.11 g cm −3 -, we cannot readily conclude whether this is a highly-eccentric orbit or the consequence of grazing transits.
In terms of obtaining best-fitting model parameters, fluctuations caused by spot modulation affect a greater number of data points than those caused by transittimescale events such that our GP's l c is motivated toward longer-timescales, thereby foregoing the ability to represent transit events in favor of producing an overall "better" fit.It follows in subsequent TGP-GP model comparison that the TGP model will always outperform the GP-only solution.We can then expect the B to become artificially inflated with increasing σ d in the presence of long-timescale correlated noise.Of the strong PC sub-population, KOI-2719.02 is the only target to experience this spot modulation inflation effect; its strong PC status is not invalidated however, as even with a sub-unity GP l c from the σ d = 2 and σ p = 5 solution, its B remains significant in magnitude.
Our future work will adopt more aggressive data preconditioning techniques (e.g., application of lowfrequency bandpass filters) in an effort to mitigate B inflation by focusing the GP on transit-like timescales.All things considered, both KOI-5704.01and KOI-7621.01certainly warrant further investigation in future studies.

Comparisons to Multiple Event Statistic and
Signal-to-Noise Ratio For our final result, we empirically compare the Bayesian evidence approach against Q1 -16 (or DR25 where available) MES and our derived S/N according to Equation 5of Rowe et al. (2015); the latter two being standard metrics for candidate discrimination in TCE/KOI searches and catalogs.Figure 6 shows that our novel methodology presents the ability to clearly distinguish between strong PCs (red) and periodic transit events possessing inconclusive/weak evidence-based model preference with respect to the PC and FA hypotheses (blue) in regimes contaminated by high-levels of white and correlated noise, as demonstrated by the ample separation of these two sub-populations.This is in contrast to both the MES and S/N, which completely mix populations, save for Kepler-62f and Kepler-442b as the only KOIs that we analyzed with MES > 10.Although we observe no strong PCs with derived S/N < 10, the appearance of inconclusive/weak PCs or FAs beyond this threshold illustrates a potential deficiency of S/N when used as a PC-FA discrimination metric in comparison to the B. In the context of choosing MES or S/N cutoffs for searches, this means that strong PCs are likely to be lost and/or inconclusive/weak PCs and/or FAs included.Should this clear separation between populations holds across larger KOI samples, the B could substantially reduce this blind spot.

CONCLUSIONS
Our analysis of targets via the simultaneous modeling of transits alongside a combined white and correlated noise GP yields fundamental transit parameters (e.g., scaled planetary radius, R p /R ⋆ ) and Bayesian evidencedriven PC-FA model comparison in the most robust approach to date.It is then important to note that there is a discrepancy between our results and those of DR25, as illustrated by the case of Kepler-452b.While DR25 yields an estimated reliability value of ∼ 40%  (see Figure 11 of Thompson et al. 2018a), we recover a strongly favored PC status of B = 24.9+1.1 −1.2 .Note that the Bayesian evidence does not directly translate to a reliability percentage, so here we simply compared the interpretations of their results; however, we aim to create a mapping between the two in future works.To do so, we next plan to conduct extensive injection-recovery testing, which will also facilitate an understanding of the B floor from strong FAs.
Having performed Bayesian model comparison between PC and FA hypotheses on the periodic transit-like photometric events of each KOI listed in Table 2, we report strong PC dispositions of Kepler-62f, Kepler-442b, and Kepler-452b-agreeing with preexisting studiesplus the two new additions of KOI-2719.02and KOI-6971.01,as well as two moderately strong PCs, KOI-5704.01and KOI-7621.01.
Preliminary testing indicates a demand for the choice of free model parameters, σ d and σ p , to be shared across any given sample population of KOIs in order to promote statistically sound comparisons between targets.Furthermore, smaller phased photometric data windows (lower σ d ) and consistent quantile-based prior widths likely mitigate potential biases.
The recovered posteriors of fitted/derived parameters were used to obtain a statistical description of the S/N with uncertainties on a per-target basis, rather than its point-estimate counterpart commonly reported in previous studies.The TGP approach also yields similar to significantly improved values of S/N with respect to those reported by Lissauer et al. ( 2023) (e.g., see KOI-2719.02and KOI-7621.01 in Table 2).
That being said, both MES and S/N exhibited vulnerability to candidate misidentification, whereas the B was able to clearly distinguish strong PCs from inconclusive/weak PCs and/or FAs.Regardless of whether the B is adopted as a standard metric in PC-FA dispositioning, the MES and S/N should undergo additional investigation and be used thoughtfully.

NEXT STEPS
When allocated one node (32 CPU threads) on high-performance computing clusters, our current nested sampling infrastructure sees typical per-target timescales on the order of a week.As such, our future work will instead rely on the development of a Simulation-Based Inference (SBI; Cranmer et al. 2020) machine learning infrastructure; these have seen great success in recent years (see Alsing et al. 2018Alsing et al. , 2019;;Tejero-Cantero et al. 2020;Miller et al. 2020Miller et al. , 2022;;Legin et al. 2023b).The amortized nature of SBI will allow for computationally efficient deployment across parameter space in catalog-wide applications to current and future missions (Kepler,K2,TESS,PLATO,etc.).Cutting-edge supporting frameworks/methodologies (see McEwen et al. 2021;Jeffrey & Wandelt 2023;Legin et al. 2023a) will facilitate the core Bayesian evidence-based approach debuted here.

Logged Bayes's Factor, B:
The logged Bayes's factor representing the difference in logged Bayesian evidences, Z D , between any two models (e.g., TGP and GP) applied to the same data set, D (see Equation 3, Equation 4and Equation 5).
21. Maximum a Posteriori, MAP: The most probable set of modelled parameters as given by Bayes's theorem (see Equation 1).
22. Mean Stellar Density, ρ ⋆ : The mean stellar density in units of g cm −3 (fitted transit model parameter).

Multiple Event Statistic, MES:
A measure describing the combined significance of all observed transits in the detrended and whitened light curve with the assumption of a linear ephemeris, T 0 (Jenkins 2002;Thompson et al. 2018a).
24. Orbital Period, P : The KOI's time series orbital period in units of days (fitted transit model parameter).32.Scaled Semi-Major Axis, a/R ⋆ : Unitless ratio of the companion exoplanet's semi-major axis scaled with respect to the host's stellar radius.
33. Signal-to-Noise, S/N: The quantification of a desired signal's quality with respect to the level of unwanted noise contamination (see Equation 5of Rowe et al. 2015).
34. Stellar Radius, R ⋆ : The radius of the host star, given in this study by Berger et al. (2020).
35.Threshold Crossing Event, TCE: A periodic signal identified by the Transiting Planet Search (TPS; Jenkins et al. 2010b;Twicken et al. 2016;Jenkins 2020) module of the Kepler Science Operations Center (SOC) Science Processing Pipeline (Jenkins et al. 2010a).

Transit plus Gaussian process Model, TGP:
The PC hypothesis model (see Table 1).
37. Transit Duration, t T : The total time taken for the (exoplanet) companion to occult its host (star) from ingress to egress (i.e., beginning to end; see Equation 7).
38. White Noise Amplitude Scale, σ w : The unitless scaling factor to DR25 reported photometric errors (fitted noise model parameter).

APPENDIX B STRONG AND WEAK CASES SUPPLEMENTARY FIGURES
Appendix B contains Figure 7 -Figure 9.

Figure 1 :
Figure 1: Our sample population of KOIs (big colored circles with black outlines) and the remaining KOI background population (small black dots), distributed according to planetary radius, R p , orbital period, P (left), and insolation flux, S 0 (right).Note that our sample uses the newly fitted/derived results of this work whereas background KOIs draw from the preceding MCMC solutions of Lissauer et al. (2023).Our KOIs are colored by their logged Bayes's factor, B-as recovered by the modelling of each individual KOI under PC (transit plus Gaussian process, TGP) and FA (Gaussian process only, GP) hypotheses-such that greater positive values indicate strong planet-candidacy and vice-versa for FAs, while those near-zero can be interpreted as possessing inconclusive/weak significance either way (see § 2.1).The B values corresponding to our solutions are also used to outline associated Lissauer et al. (2023) results (small black dots with colored outlines) in order to facilitate a visual comparison of the physical parameters recovered for each KOI analyzed by both studies.Green lines outline the range within which a KOI may be deemed sufficiently "Earth-Sun-like"; these are defined according to nominal Earth values for R p , and either P or S 0 as x ∈ 1 ± √ 2 − 1 x ⊕ .Note that all KOIs were also uniformly filtered by R ⋆ with respect to solar values according to these same bounds.We are complete in both boxes drawn by these lines.For ease of reference, Earth (⊕) and Venus (♀) are also plotted.

Figure 2 :
Figure 2: Kepler 's photometric PDC data (black) for all observed transit events of Kepler-62f with σ d = 8 and σ p = 5, overlaid by MAP (red) TGP (PC hypothesis; top) and GP (FA hypothesis; bottom) model solutions alongside corresponding O-C (data-MAP) residuals.GP-corrected phase-folded results are shown in the rightmost column and are accompanied by O-C residuals.O-C residual histograms are overlaid by Gaussian distributions with zero mean and TGP or GP MAP-scaled median photometric error standard deviations (dashed red) in order to help identify signs of overfitting (i.e., non-Gaussian O-C residuals).

Figure
Figure3: A single-parameter mock example of the process used to obtain quantile-based prior widths for standardization tests conducted in Figure5.In the first nested sampling run, a wide net is cast to ensure the parameter's empirical posterior distribution (black) is captured in its entirety (top).In subsequent runs, quantiles (solid red) of this complete posterior are used to define subsequent prior widths (middle), with which associated fractions of the complete posterior are recovered (bottom).While demonstrated with uniform priors, this is not a constraining factor; we apply this methodology to both uniform and non-uniform priors.In this example, skewed, non-Gaussian posterior behavior causes standard deviations (dashed blue) from the complete posterior's median (dashed-dotted green) to be inefficient in capturing parameter space information compared to quantiles.Since recovered posteriors are not guaranteed to be Gaussian in nature, we ensure consistent information gain by focusing only on relevant regions of parameter space via quantiles.The middle panel also visualizes how the normalized amplitude of a uniform prior is dictated by its width; this is a known origin of bias in Equation2.

Figure 4 :
Figure 4: The logged Bayes' factor, B, of each KOI listed in Table2as they vary with the phased photometric data window multiplier, σ d .Red and blue subpopulations correspond to strong PCs and periodic transit events possessing inconclusive/weak evidence-based model preference with respect to the PC and FA hypotheses, respectively; solid lines indicate previously validated planets.

Figure 5 :
Figure 5: Same as Figure 4, but with respect to σ p , which sets quantile-based prior widths.

Figure 6 :
Figure 6: DR25 MES (left) and our S/N (middle) metrics compared against our logged Bayes's factor, B, and their histogram (right) for the KOI sample given in Table2.Red and blue sub-populations correspond to strong PCs and periodic transit events possessing inconclusive/weak evidence-based model preference with respect to the PC and/or FA hypotheses, respectively; solid lines indicate previously validated planets.Note the ample separation between strong PCs and inconclusive/weak PCs (or FAs) revealed by the B, which MES and S/N are otherwise blind to.

25.
Photometric Zero-Point, F 0 : The relative (unitless) photometric zero-point offset (fitted transit model parameter).26.Planet-Candidate, PC: A KOI which has passed FA vetting procedures but has yet to undergo/pass FP vetting and/or be confirmed by alternative observation techniques.27.Planetary Radius, R p : The radius of the companion exoplanet.28.Posterior Distribution, P D (θ) : The updated probability of modelled parameters, θ, given new data, D, and informed by combining the likelihood, L D (θ), priors, π (θ), and Bayesian evidence, Z D (see Equation 1).29. Prior Distribution, π (θ) : The initial probability or belief about given model parameters, θ, before any new data, D, is taken into account (see Equation 1).30.Prior Width Free Parameter, σ p : The factor used to define fitted parameter prior widths for UltraNest in terms of MCMC-recovered standard deviations with respect to their maximumlikelihood-estimator values as given by Lissauer et al. (2023) before quantile-defined widths are obtained using Equation 10 as discussed in § 4.2.31.Scaled Planetary Radius, R p /R ⋆ : The unitless ratio of companion planetary and host stellar radii (fitted transit model parameter).

Table 2 :
Key Parameters Summary Berger et al. (2020)ry for our KOI sample.From left to right are KOI numbers (KOI-701.04,KOI-4742.01,andKOI-7016.01correspondtorespectively;none of the other KOIs are validated Kepler planets), orbital period, P , transit duration, t T , central transit duration, t T,c , insolation flux, S 0 , planetary and stellar radii, R p and R ⋆ , stellar effective temperature, T eff , Multiple Event Statistic, MES, signal-tonoise, S/N, and logged Bayes's factor, B. Of these, the P and B are fitted (inputs), t T , t T,c , S 0 , R p , and S/N are derived (outputs), R ⋆ and T eff are given byBerger et al. (2020), R ′ p and S/N ′ are given byLissauer et al. (

Table 2 .
Red and blue sub-populations correspond to strong PCs and periodic transit events possessing inconclusive/weak evidence-based model preference with respect to the PC and/or FA hypotheses, respectively; solid lines indicate previously validated planets.Note the ample separation between strong PCs and inconclusive/weak PCs (or FAs) revealed by the B, which MES and S/N are otherwise blind to.