Nauyaca: a New Tool to Determine Planetary Masses and Orbital Elements through Transit Timing Analysis

Eliab F. Canul; Héctor Velázquez; Yilen Gómez Maqueo Chew

doi:10.3847/1538-3881/ac2744

1. Introduction

Transit timing variations (TTVs; Agol et al. 2005; Holman & Murray 2005) is to date the most successful method to measure precise masses of Earth-sized transiting planets harbored in multiplanet systems that could not be identified by other means, for instance, radial velocities (Steffen 2016; Mills & Mazeh 2017). TTVs contain valuable information that allows us to determine orbital properties that are useful to predict transit ephemeris, orbital stability, and dynamical evolution (for a review of TTVs, see Agol & Fabrycky 2018 and references therein).

Deriving planet parameters (masses and orbital elements) from observed transit times is known as the TTVs inversion problem. Since the first planetary system was characterized by TTVs (Kepler-9; Holman et al. 2010), many authors have used TTVs to determine planetary masses for systems where all known planets transit (e.g., Masuda 2014; Agol et al. 2021; Saad-Olivera et al. 2020) and also to characterize nontransiting planets (e.g., Nesvorný et al. 2012, 2013; Becker et al. 2015; Masuda 2017; Saad-Olivera et al. 2017; Carpintero & Melita 2018).

Two main approaches have been largely developed to invert TTVs based on analytical and numerical approximations. Analytical approaches take advantage of the low computational cost at the expense of the limitation to specific scenarios such as planets near mean motion resonances, low eccentric orbits, or specific two-planet systems (Nesvorný & Morbidelli 2008; Nesvorný 2009; Nesvorný & Beaugé 2010; Lithwick et al. 2012; Agol & Deck 2016; Linial et al. 2018). Numerical models including N-body integrations seem to be an unavoidable route to study more diverse scenarios than those considered by the analytical techniques, but they also complement and double-check results from these analytical methods.

The inversion problem requires methods to fully explore the parameter space of the planets independent of the type of models mentioned above. Many works have used combinations of techniques and models to deal with the TTVs inversion problem, for example: minimization routines and analytical models (Nesvorný & Beaugé 2010); genetic algorithm and numerical N-body models (Carpintero & Melita 2018); genetic and optimization algorithms plus N-body simulations (Borsato et al. 2014); a coupled simulated annealing algorithm plus N-body (Meschiari & Laughlin 2010); Markov chain Monte Carlo (MCMC) methods and analytic models (Tuchow et al. 2019); MCMC and N-body (Becker et al. 2015; Jontof-Hutter et al. 2016; Agol et al. 2021); minimization plus MCMC using N-body simulations (Masuda 2014); a multimodal nested sampling algorithm combined with either N-body (e.g., Nesvorný et al. 2013; Saad-Olivera et al. 2017, 2020) or with both N-body and analytic models (e.g., Masuda 2017; Yoffe et al. 2021); a combination of MCMC plus analytical and numerical models (Lithwick et al. 2012; Hadden & Lithwick 2016, 2017); and MCMC and minimization plus analytical tools (Gajdoš & Parimucha 2019). Despite the numerous works that employ computational tools, there is a scarcity of available tools in a ready-to-use package that allows one to deal with the TTVs inversion problem in an intuitive, easy, and confident way.

In this work, we introduce Nauyaca,¹ an easy-to-use Python tool dedicated to deal with the TTVs inversion problem using the N-body approach. The numerical tool, even though it is computationally expensive compared to analytical approximations, is more able to address general situations such as, for example, the number of planets, prograde or retrograde orbits, planets out of resonances, or those with eccentric and non-coplanar orbits. The only required data are transit ephemeris per planet and the stellar mass and radius. Additionally, any prior knowledge about any planetary parameter can be supplied in order to better constrain the parameter space.

This paper is structured as follows. In Section 2, we describe the main features of the tool and the modules functionality. Section 3 describes the creation of a mock catalog of synthetic planetary systems and midtransit times to test the tool. In Section 4 we discuss the election of the parameter space and the parameters for the minimization and MCMC algorithms. In Section 5, we apply Nauyaca to the whole simulated catalog and discuss the consistency between the input planetary parameters and those found by the tool. Section 6 is dedicated to briefly highlight caveats of our tool, and finally in Section 7 we summarize our findings and we make suggestions about the procedure and parameters to deal with real data.

2. Methods

We implemented a Python package, named Nauyaca, focused on the determination of planet masses and orbital elements through midtransit times fitting for planets around single parent stars. Our tool is equipped with minimization routines and an MCMC method exclusively adapted to fit transit times series from an N-body approach. Nauyaca manages the exploration of the parameter space with the main goal of finding solutions of planetary parameters that produce midtransit times consistent with observations for each planet. The tool can work in a parallelized scheme, so multicore machines are preferred for the best performance.

We incorporated TTVFast ² (Deck et al. 2014), an optimized fast code (5–20 times faster than standard methods) to make transit timing models. In short, TTVFast receives a set of initial conditions for the planets (mass and orbits) and the stellar mass, and perform an N-body simulation. At the same time, an incorporated Keplerian interpolator calculates midtransit times by interpolating the orbits when a planet is detected crossing the star. Even though N-body simulations could be time consuming and computationally expensive, we decided to not implement analytical or semi-analytical model approximations because many of these models are just valid for planets in low-order eccentricities, planets near first-order orbital resonances, or a fixed number of planets. We opted for a general purpose approach that could work with more diverse planetary configurations and, thus, we decided to use TTVFast.

We define Θ_j ≡ {mass, P, ecc, inc, ω, M, Ω}_j as a set of planet parameters for the jth planet, where the parameters are, respectively: the mass, period, eccentricity, inclination, argument of periastron, mean anomaly, and longitude of ascending node. The orbital elements in Θ_j correspond to the instantaneous elements at a specific reference time t₀, which is specified by the user or automatically selected by Nauyaca. Orbital elements are defined in a fixed astrocentric coordinate system with the X–Y plane spanning the plane of the sky and the observer located orthogonally at +Z.

Given a constant stellar mass M_* and radius R_*, the algorithms incorporated in Nauyaca make proposals to conform an initial condition (Θ) for all of the planets in order to run TTVFast, which results in a model of transit times per planet. Then, to perform the TTVs fitting, we assume that transit errors are independent, following a normal distribution, and thus we set the log-likelihood function in the form

$\begin{eqnarray}&&\mathrm{log}{{ \mathcal L }}_{j}({{\rm{\Theta }}}_{j}| {T}_{j})=-\displaystyle \frac{1}{2}{\chi }_{j}^{2}-\ \sum _{i}^{{N}_{j}}\ \displaystyle \frac{1}{2}\ \mathrm{log}(2\pi {\sigma }_{j,i}^{2})\end{eqnarray} \tag{ 1 }$

where

$\begin{eqnarray}&&{\chi }_{j}^{2}=\sum _{i}^{{N}_{\mathrm{tran}}}\ {\left[\displaystyle \frac{{t}_{j}(i)\ -\ {t}_{j}^{\mathrm{sim}}(i)}{{\sigma }_{j}(i)}\right]}^{2},\end{eqnarray} \tag{ 2 }$

with j denoting numbered planets and i denoting their respective transit epochs over the total number of transits N_tran. Here, t corresponds to the observed transit time for a specified epoch, ${t}^{{\rm{sim}}}$ is the simulated transit time calculated by the model, and σ corresponds to the uncertainty in the central time. We take errors σ_j(i) as the mean of the upper and lower errors of the ith transit time. The total log-likelihood is calculated by adding the individual log-likelihoods $\mathrm{log}{{ \mathcal L }}_{j}$ of the planets.

The fitting procedure is performed over the available transit times, but it is possible to include planets in the system without transit times that interact with the transiting planets. The explored space also includes those of the nontransiting (or undetected) planets, and thus we can get information about their planetary parameters. However, considering planets without transit data is left for a future work.

2.1. Modules

Nauyaca incorporates several modules with techniques adapted specifically for the TTVs inversion problem. We describe the functionalities of these modules below.

2.1.1. Setup

Here, we describe the preparation of the data before running the simulations. In the context of object-oriented programming, we define a Planetary System object by specifying the stellar mass and radius and the harbored planets. Transit times fitting will be operated over Planetary Systems. Then, we define the planets by establishing the information about their transit times ephemeris and the allowed parameter space. The transit ephemeris per planet should include the epoch number, time of transit, and lower and upper timing uncertainties. In the simulations, the transit epochs (integer transit numbers) of all of the planets are counted starting from 0 after t₀. Thus, the user must be aware that the epoch numbers are properly labeled and referenced to the same reference time t₀.

The time span of the simulations is automatically chosen to encompass the full time of observations in the ephemeris data. If no time step is defined for the simulations, it is automatically set to be 30 steps per orbit of the internal planet (∼3.33% of the internal planet period). With this time step Deck et al. (2014) demonstrated the effectiveness of determining transit times with an accuracy within 10 s for a ∼22.3 days period planet. A time step <5% of the internal planet period is recommendable to reach that accuracy.

Regarding the parameter space, the tool requires specific boundaries for each parameter in order to perform an effective sampling, avoiding nonphysical regions or redundant information. Each planet in the Planetary System must define its own parameter space. If there is no constraint for any individual or none of the parameters in Θ_j, a set of default boundaries is established. By default, masses are restricted to be in the range from 1 lunar mass to 80 Jupiter masses. However, when information about the stellar mass is supplied, the upper planetary mass limit is recalculated to be at most 1% of the stellar mass. This is done to keep valid the N-body Hamiltonian internally solved by TTVFast. In Figure 1, we show the currently measured planet masses M_p or ${M}_{p}\sin i$ from Exoplanet Archive³ . It shows the planet mass limit corresponding to 1% of the stellar mass. We find that ∼95% of the currently known measured planet masses have mass ratios with the parent star lower than 1%, and thus the selected cutoff of 1% is statistically valid for most of the currently known planets. The default period space is limited to be between 0.1 and 1000 days. The eccentricity is limited between 1e-6 and 0.9, where the lower limit is different from exactly 0 to avoid undefined orbital angles. Inclination is defined between 0° and 180°. In the case of periodic boundaries for the argument of periastron, ω, and mean anomaly, M, fixing boundaries in the range 0°–360° could lead to an improper sampling process when the solution is near to these borders. This problem is solved internally by parameterizing the angles by means of ω + M and ω – M. The boundaries in the parameterized space encompass 720°. At the end of the sampling process, the results in the parameterized space are mapped to be between 0° and 360° for the individual ω and M. It is an internal process, so the user only defines these angles between 0° and 360°. The ascending node also must be defined between 0° and 360°. These default ranges can be modified by the user to better constrain any parameter or to keep it fixed. In the case of assuming constant parameters, these are not part of the sampling process.

**Figure 1.** Currently measured planet masses (or minimum mass) and those of the parent stars. Solid line depicts the planet-to-star mass ratio equal to 1%. Planets at left of the solid line have mass ratios less than 1% and correspond to ∼95% of the total planet sample currently available in the Exoplanet Archive. Dashed and dotted lines correspond to the mass limits to be considered stars and planets, 0.08 M_⊙ and 80 M_jup, respectively.
Download figure:
Standard image High-resolution image

After setting the parameter space, Nauyaca normalizes the boundaries of all of the planets to dimensionless boundaries between 0 and 1. The whole sampling process (both for optimization and the MCMC) is performed with the normalized boundaries and is returned to physical values at the end of the runs. This normalization is done to remove the differences in orders of magnitude between parameters, which enhances the sampling performance.

2.1.2. Optimization Module

It combines minimization algorithms ordered sequentially to reach solutions Θ_j that best explain the transit times. These results can be used as initial guesses for the MCMC. We tested many algorithms and their calling order and found that the sequence Differential Evolution (DE; Storn & Price 1997), Powell (PW; Powell 1964), and Nelder−Mead (NM; Gao & Han 2012) progressively minimizes the total χ² (Equation (2)). This sequence is not arbitrary but reflects the nature of exploration of each algorithm. First, DE is capable of exploring a large parameter space without the necessity of requiring an initial guess. The algorithm is fully stochastic and no information about the smoothness of the space is required. The only mandatory information needed is the parameter space to explore, which has been set in the Setup module. A drawback of this algorithm is the slow convergence rate, and thus the outputs of this method could correspond to unconverged solutions. Even so, these solutions are better than any random proposal in the parameter space. Therefore, these are used as an initial guess to run PW, which is suitable to perform a minimum searching, assuming that the space around the starting point is continuous although complex. Finally, NM takes the solution previously found by PW and performs a downhill simplex method assuming that locally the parameter space is smooth and unimodal.

We show in Figure 2 an example of the progressive χ² minimization for the transit times fitting of a two-planet system with 190 transit times in total. For this example we performed 320 realizations. Background lines connect solutions of the same run, where a gradual descending of the χ² throughout the sequence is observed. The box plots show the intervals of the χ² achieved with the different algorithms, where typically the χ² is reduced by around 4–5 orders of magnitude between the first and last methods.

Performing multiple realizations of this process enhances the chance of finding a global minimum but also provides us with a global view of the most probable regions in the parameter space (this will be addressed in detail in Section 4.3). It is a fast way of finding a starting point region with valuable information that can help to delimit the searching radius that the MCMC routine will explore.

2.1.3. MCMC Module

We adapted an MCMC method to explore the parameter space. We chose the Parallel-Tempering sampler, ptemcee (Swendsen & Wang 1986; Earl & Deem 2005; Foreman-Mackey et al. 2013; Vousden et al. 2015), which is a well-suited sampler for exploring a multimodal and a highly dimensional parameter space, such as in the case of planetary systems, whose parameter space increases as ∼7 times the number of planets (one for mass and six orbital elements). The main idea behind the technique is using armies of walkers belonging to a "temperature ladder" that explore the parameter space in different details. The posterior distribution π is modified according to the temperature T, given by π_T ∝ L(Θ)^1/T p(Θ), with L and p the likelihood and the prior distributions, respectively. Walkers belonging to hotter temperatures sample more efficiently from the prior and those in colder temperatures better sample regions with high probability. Walkers in different temperatures have a probability of swapping positions in the parameter space that depends on their current positions and temperatures, such that those in colder temperatures can also explore from the prior and vice versa. This technique decreases the chance that walkers get stuck in local solutions and explores more efficiently the whole parameter space where other standard samplers could fail (see Vousden et al. 2015 for more details about this technique).

3. Mock Catalog

We created a catalog of synthetic planetary systems, defining stellar masses and radii, and planetary masses and orbital elements. We calculated transit times for these planets and applied Nauyaca to this synthetic catalog in order to test the efficiency in recovering the planetary parameters (catalog entries) that gave rise to the synthetic transit times per planet. Throughout the text, we will refer to the parameter values reported in our mock catalog as trues.

3.1. Catalog Creation

In order to set up planetary systems as realistically as possible, we selected masses and radii for the stellar hosts and masses, radii, and orbital periods for planets from the available data of confirmed planetary systems from the Exoplanet Archive.⁴ This was done to create synthetic planetary systems with stellar and planetary parameters based on observations. The catalog includes different planetary multiplicities (number of planets in the system), ranging from two up to five planets. Below we describe the procedure followed to create the catalog.

First, we selected systems with planets discovered by the transit method. Second, we selected systems with reported stellar masses and radii. For those with unavailable data but with reported effective temperature, surface gravity, and metallicity, we derived the stellar mass or radius from the work of Torres et al. (2010). Third, we selected those systems for which all of their planets have reported masses, radii, and periods (systems whose planets have these parameters unreported were discarded). This procedure reduces significantly the number of planets, dropping to 2.5% (105 planets) of the original observed planets. In order to increase the number of synthetic planetary systems we applied an over-sampling technique (SMOTE; Chawla et al. 2011), which makes new samples by interpolation of the k-nearest neighbors. Thus, we tripled the number of planetary systems and their parameters, namely, stellar mass and radius, and planetary masses, radii, and periods. The remaining five orbital elements to complete a unique planetary configuration were made from random uniform distributions in the intervals: ecc [0.05, 0.2], inc [89 fdg 5, 90 fdg 5], ω [0°, 360°], M [0°, 360°], and Ω [88°, 92°]. We restricted the intervals of the inclination inc and ascending node Ω to get near coplanar and prograde orbits. These restrictions in the construction of the catalog are independent of the tool test. In practice, we can expand the boundaries of any parameter to be explored, as long as they have a physical meaning (see Section 4.1).

Once the complete set of planetary parameters is established, an N-body integration was performed over 10⁶ orbits of the internal planet using REBOUND (Rein & Liu 2012; Rein & Tamayo 2015). We used the chaos indicator MEGNO (Cincotta et al. 2003; Maffione et al. 2011; Rein & Tamayo 2015) to test the dynamical stability of the proposed planetary system. A MEGNO value of around ∼2 indicates a quasi-periodic motion (regular stable orbits). If the set of parameters of the proposed planetary system turned out to be stable (without a planetary ejection or with 1.7 < MEGNO < 2.3), then we appended the final state of the N-body simulation as an entry in the catalog. It should be pointed out that the orbital elements of these entries correspond to the osculating elements at the end of the stable N-body runs and they are not the same as the random values taken from the intervals of parameters indicated above. Finally, we used these entries as initial conditions to run TTVs simulations encompassing 130 transits of the internal planet using TTVFast. We selected that number to mimic the number of transits for planets with periods <10 days, during ∼3 yr of observation. We used a fixed number of transits rather than a fixed time span since it is more relevant for the inversion problem (Nesvorný & Morbidelli 2008). As a result, we got 130 transit times ephemeris for all of the internal planets, and a few less for the remaining planets according to their periods and orbital configurations.

We assigned synthetic errors to these transit times according to the analytical approximation of the timing precision (Agol & Fabrycky 2018),

$\begin{eqnarray}&&{\sigma }_{t}=\displaystyle \frac{1}{\sqrt{2}}\ {\tau }^{1/2}{\dot{N}}^{-1/2}{\delta }^{-1}\end{eqnarray} \tag{ 3 }$

where τ is the approximate duration of the transit ingress $\tau \approx 2.2\mathrm{minutes}\left({R}_{p}/{R}_{\oplus }\right){\left({M}_{* }/{M}_{\odot }\right)}^{-1/3}{\left(P/10\,{day}\right)}^{1/3}$ , which is a function of planetary radius R_p, orbital period P, and stellar mass M_*, $\delta ={({R}_{p}/{R}_{* })}^{2}$ is the depth of the transit, and $\dot{N}$ is related to the Poisson noise due to the count rate of the star. We assumed $\dot{N}$ to be constant and equal to 1 × 10⁷ e⁻ minutes⁻¹ to mimic the value in the Kepler CCDs for a star of magnitude ∼12 in the Kepler band (Gilliland et al. 2011). Uncertainties from Equation (3) take planetary parameters derived from the Kepler photometry, and, therefore, our study is centered on the characterization of Kepler-like transits. Finally, we added white noise to the transit time models assuming a normal distribution with the mean equal to the true transit times and with standard deviation equal to the typical uncertainties given by Equation (3).

The full mock catalog is composed of twelve two-planet systems, eight three-planet systems, eight four-planet systems, and four five-planet systems, giving a total of 32 systems harboring 100 planets in total. We remark that all of the planets in the catalog transit, and thus our current study does not include the characterization of nontransiting planets. Tables in the Appendix include the parameters of the simulated mock catalog. Figure 3 shows the distributions of the parameters in the catalog according to the planetary multiplicity.

**Figure 3.** Histograms of the stellar and planetary parameters that compose the full mock catalog. Different colored lines represent the properties grouped by planet multiplicity for systems with two (solid blue), three (dashed orange), four (dotted cyan), and five (dashed–dotted red) planets. Remaining orbital elements (not shown here) are taken from uniform distributions. The Appendix contains the data tables of these histograms.
Download figure:
Standard image High-resolution image

The mock catalog of planetary parameters produces a variety of TTVs with different properties, namely, amplitudes, periodicity, and time span. TTVs signals in our catalog have amplitudes that range from almost zero minutes to 180 minutes, with an overall mean of ∼18 minutes. Mean TTVs amplitudes grouped by multiplicity reach 21, 21, 10, and 23 minutes for two-, three-, four-, and five-planet systems, respectively. It translates to a wide range of signal-to-noise ratios (S/N), ranging from ∼1 up to ∼400 with a mean of ∼23 (using the definition of the ratio of the TTVs amplitude and the timing uncertainty; Nesvorný 2019). There is also a variety of observation time spans, which range from ∼230 to ∼5800 days, with a mean of 1200 days.

4. Settings

In this section we outline the election of the parameters to run the tool, including the restrictions of the parameter space, the election of fine-tuning parameters for the MCMC, and the usage of the optimizer results to determine a reasonable starting point for the sampling process.

4.1. Planetary Boundaries

As described in Section 2.1.1, Nauyaca requires specific boundaries for each planetary parameter in Θ. These boundaries delimit the parameter space that will be explored. Here, we discuss how to delimit the parameter space (including the establishment of constant parameters) before carrying out the recovery test over the mock catalog.

From fitting real transit observations it is possible to determine the planetary radius ratio, the orbital period from consecutive transits, and the impact parameter. In the case of the last two, we find from current data of observed exoplanets (from the Exoplanet Archive) that orbital periods are determined, on average, with uncertainties of ∼10⁻⁴ days. However, orbital inclinations are on average determined at a level better than 1° (also according to the Exoplanet Archive). Considering nearly coplanar orbits, the line of nodes of each planet are nearly aligned and mutual inclinations are close to 0°, which can help to reduce the dimensionality of the problem. Observed transiting exoplanets have been shown to have small mutual inclinations, typically below 3° (Fang & Margot 2012; Fabrycky et al. 2014), which has been demonstrated to have a negligible effects on TTVs (Nesvorný 2009; Nesvorný & Vokrouhlický 2014; Hadden & Lithwick 2016). Furthermore, because the orbital inclination is a well-restricted observational parameter (with a mean error of 0 fdg 78 with a dispersion of 2 fdg 2, according to data from the Exoplanet Archive), it can be kept as a fixed parameter. Additionally, in test runs we let the inclination vary between ∼80°–100° (which is a comprehensible range of inclinations that allows the transits) and confirmed that this angle has limited effects on the results for the transit time models. Nearly aligned ascending nodes represent prograde orbits and anti-aligned ones represent retrograde orbits. Thus, in order to model nearly coplanar orbits we kept fixed the orbital inclinations to their true values, inc_j,True. We also kept fixed the ascending node of the internal planets Ω_1,True (≈90°), which by construction of the coordinate system can be fixed.

For our recovery test, we delimited the planetary parameter space to these ranges, bracketing the lower and upper limits: mass [0.0123, 10 mass_j,True] M_⊕, P [P_True − δ t, P_True + δ t] days, ecc [10⁻⁵, 0.3], inc (fixed) [inc_True] deg, ω [0, 360] deg, M [0, 360] deg, and Ω [70, 110] deg. Here, the lower and upper boundaries for masses correspond to approximately a Moon mass and 10 times the true planetary mass, respectively. The boundaries in period P are around the true period of the planet with a width δ t corresponding to a typical observed period error given the orbital period itself. We estimated δ t by doing a linear regression between observed period errors as a function of the orbital period (data from the Exoplanet Archive), such that δ t = mP + b [days], where we determined m = 2.11 × 10⁻⁵ and b = 4.4019 × 10⁻⁵. The lower limit in eccentricity (ecc) was set to be small but different from 0 in order to avoid an undefined argument of periastron, while the upper limit was allowed to have values up to 0.3. This chosen range of eccentricities is compatible with 80% of the currently observed planet eccentricities. Argument of periastron (ω) and mean anomaly (M) take the usual definition between 0° and 360°. The boundaries of the ascending nodes (Ω) for all of the planets, except the internals, were restricted to a search radius around ≈ Ω_1,True ± 20 deg, and thus we consider only prograde orbits for simplicity.

4.2. Parameters for the MCMC

We inspected the dependence of the parameters that govern the MCMC. The chosen parallel-tempering MCMC method (Vousden et al. 2015) is fine-tuned by two main parameters, namely, the number of temperatures (N_temps) and the maximum temperature ( ${T}_{\max }$ ) to build the temperature ladder. The sampling performance also depends on the number of walkers per temperature (N_w) and finally on the number of iterations (N_iter) per chain. We carried out many tests with combinations of these parameters and found that N_temps ≲ 15 with N_w ≲ 150 in a run over N_iter ∼ 5 × 10⁵ are enough in most cases to recover the true parameters Θ_j,True from the mock catalog within 1σ. We also note that ${T}_{\max }\sim {10}^{2}\mbox{--}{10}^{3}$ is a good choice for the maximum temperature. ${T}_{\max }=\infty$ (as suggested in Vousden et al. 2015) is not adequate for our purposes since walkers belonging to this temperature would propose steps outside our predefined boundaries. We confirmed this behavior on several occasions during our tests. Other parameters to control the dynamics of the temperature adjustment are internally defined within Nauyaca following the suggestions by Vousden et al. (2015).⁵

For the recovery test presented in this work, we imposed uniform log-priors for simplicity with the functional form

$\begin{eqnarray}\mathrm{log}({{ \mathcal P }}_{k})=\left\{\begin{array}{ll}0 & {b}_{\mathrm{low}}\leqslant k\leqslant {b}_{\mathrm{upp}}\\ -\infty & \mathrm{otherwise}\ \end{array}\right.,\end{eqnarray} \tag{ 4 }$

where b_low and b_upp correspond to the lower and upper boundaries for the kth planetary parameter. Table 1 shows a list of the model parameters for each planet as well as the adopted range of uniform priors. The choice of these validity ranges has been described previously in Section 4.1. From these parameters, inc was set as a constant to the true inclination, inc_True, taken from the mock catalog. For each system, Ω considers uniform priors except for the innermost planet (Planet 1 in the catalog entries), which is set to the true value Ω_1,True. Thereby, the posterior probability (up to a constant) is given by the sum of the log-likelihood (Equation (1)) and the log-prior (4) functions. In practice, any other prior functions can be supplied to Nauyaca to calculate the posterior probability.

Table 1. Model Parameters and the Selected Priors for the Transit Timing Fit

Parameter	Prior	Units
mass	${ \mathcal U }$ (0.0123, 10 × mass_True )	M_⊕
P	${ \mathcal U }$ (P_True − δ t, P_True + δ t )	days
ecc	${ \mathcal U }$ (1e-06, 0.3)
inc	inc_True	deg
ω	${ \mathcal U }$ (0, 360)	deg
M	${ \mathcal U }$ (0, 360)	deg
Ω	${ \mathcal U }$ (70, 110) or Ω_1,True	deg

Note. ${ \mathcal U }$ [b_low, b_upp] denotes the uniform ranges between lower and upper boundaries. Single values are the fixed parameters in the simulations. Parameters with label True take the data values from the mock catalog (see the Appendix). See text for details.

Download table as: ASCII Typeset image

4.3. MCMC Initialization from Optimizer Results

Solving the inversion problem of a number of interacting planets, N_pla, implies the exploration of a parameter space of dimensions ∼7N_pla. The nature of the problem is computationally demanding since an N-body integration should be done at each iteration per walker. The wall-clock time also increases with the observational time span. Thereby, choosing a strategic initial guess for walkers adapted for the parallel-tempering MCMC is crucial to minimize these side effects.

Starting at random points from the prior function would be a reasonable choice assuming that we have informed previous knowledge about the parameters. In general, it would not be the case for planets characterized for the first time. This motivated us to use optimizer results to make an educated initial guess about the planetary parameters. Optimizers take advantage of having both a low computing time consumption in comparison with a full MCMC run (between 1% and 3% of the total time) and an ability to find many modes in the parameter space since realizations are independent among themselves. Although these solutions could be just rough approximations of more detailed solutions, they could help to identify high probability regions suitable for initialization. Initializing walkers near an optimum sensible place is better than using any random point in the parameter space (Hogg & Foreman-Mackey 2018). Even more, initialization using multistart local optimization results is found to enhance the exploration quality and be more suitable for multichain methods (e.g., Hug et al. 2013; Ballnus et al. 2017).

We present three strategies implemented in Nauyaca to initialize walkers using the information from the optimizers, namely, Gaussian, picked, and ladder. In order to set initial values from these strategies, we first made a filter over the total number of optimizer solutions (N_opt) by sorting them according to their χ². Then, we took a fraction (f_best; defined between 0 and 1) of that ordered list that includes the uppermost solution. That subset of solutions Θ_opt was then used to initialize walkers.

In Figure 4, we show examples of initialization using these strategies for many values of f_best. For the current example, we focus on the mass space of two planets identified with ID = pl2_id52 in Table 4. We performed 320 realizations of the optimizers to draw an initial walker population for a ladder of N_temps = 10 temperatures, considering different values of the parameter f_best. The individual solutions from Θ_opt are shown with squares in the first column, and the cyan star shows the position of the true mass. Colored dots are the initial walkers belonging to different temperatures. As f_best is reduced, solutions with high χ² are discarded. We detail these initialization strategies adapted to the parallel-tempering MCMC:

1.
Gaussian. Walkers are drawn from a Gaussian centered at the mean of Θ_opt with a 1σ value corresponding to the data dispersion, for each dimension. This is the simplest initialization method and possibly the most frequently used. The main difference is that here the mean and standard deviation are based on the independent random realizations and not on previous knowledge of the planetary parameters (for example, masses from radial velocity measurements). Thus, if the optimizers find a well-restricted solution for any dimension, the MCMC will be able to find the global high probability region faster in contrast to a uniform random initialization. From Figure 4, it is seen that using this strategy while reducing f_best could help to identify the zone in the parameter space where the global minimum could exist. Thus, by using this strategy most of the local modes are covered but there is not special attention given to the individual modes or to the possible correlations between parameters.
2.
Picked. Solutions from Θ_opt are randomly picked and the initial walkers are drawn from the vicinity of those solutions. If the optimizers find many modes, this strategy ensures that walkers will be drawn from around all of the modes which could correspond to local minima or the global minimum. From Figure 4, it is seen that this strategy confines the initial population of walkers as f_best is reduced while keeping the dependency between parameters. Here, walkers are equally distributed around any mode independent of their temperature and therefore all of the modes are initially sampled with the same frequency. This could be suitable for solving problems where apparently there is not a unique region where optimizer results agglomerate.
3.
Ladder. Solutions from Θ_opt are divided into an integer number of chunks equal to the number of temperatures. Walkers belonging to temperature 1 (the main temperature; blue dots) are drawn from the first chunk, which includes the uppermost solution (i.e., with the lower χ²). Walkers belonging to the second temperature are drawn from the first and second chunks. The same rule is followed for the rest of the temperatures until, finally, walkers for the hottest temperature (yellow dots) are drawn from around all of the solutions in Θ_opt. From Figure 4 it is seen that using this strategy, the modes with the lower χ² are highlighted as f_best is reduced. Unlike the picked strategy, ladder assigns the outstanding modes to colder walkers while hotter walkers sample the more disperse ones. It allows the exploration of other modes but avoids getting stuck in these local minima.

The choice of the best strategy and parameter f_best can vary according to the problem. Ideally, a visual inspection of the optimizer solutions (as in Figure 4) could help to identify modes in the parameter space that allows us to decide how to initialize walkers. In practice, a statistical indicator (e.g., standard deviation) or a clustering method could be helpful to choose the initialization: Gaussian for high dispersed data, picked for highly multimodal parameter spaces, and ladder for a multimodal space with a main mode (as in the example of Figure 4).

Note however that the usage of the optimization module and the proposed initialization strategies is not a mandatory step in Nauyaca prior to the implementation of the MCMC. Any initial walker population can be provided by the user as long as these proposals are inside the physical boundaries described in Section 2.1.1. Nonetheless, we empirically find that following this heuristic procedure notably enhances the MCMC performance at a low computational cost.

5. Results and Discussion

We applied Nauyaca and the same fitting procedure to the mock catalog with the aim of inverting the process, going from the synthetic transit times to the planetary parameters and then comparing with the original values in the mock catalog, which we will refer to as the true values. For each run, we kept fixed the inclinations and the ascending node of the first planet, as described in Section 4.1. All of the solutions are determined at the synthetic reference epoch t₀ = 0 days, to match with the reference time of the catalog construction.

The procedure consists of the following steps: (1) Providing the true stellar mass and radius and transit ephemeris per planet (midtransit times), and initializing the parameter space; (2) running the optimizers and choosing the best ∼5%–10% of the solutions to initialize walkers using the ladder strategy; and (3) taking the data from step 2 as an initial walkers population for running the MCMC over a fixed number of steps and using a Gelman−Rubin statistic (<1.01; Gelman & Rubin 1992) and a Geweke test (Z-score < 1; Geweke 1992) to assess the convergence and stationarity of the chains.

We performed the same procedure with the parameters summarized in Table 2 according to the number of planets in the system (N_pla). Since the dimensionality of the parameter space scales with the number of planets, we increased the number of optimizer realizations (N_opt) and MCMC steps, accordingly. Along the MCMC runs we did a thinning by saving the current state of the chains at a predefined number of steps (shown in Table 2), which also allowed us to diminish the memory requirements. At the end of the runs we measured the mean autocorrelation time of the averaged chains and we determined typical values between 30 and 90, with a mean of 70 steps. With these values we determined the effective sample size, getting typical values between 2400 and 6800 with a mean of 3800 independent samples. For a pair of systems with 3 (ID = pl3_id4) and 4 (ID = pl4_id3) planets, we repeated the MCMC process with the same parameters in Table 2 but changing the initialization strategy from ladder to Gaussian. By doing this, we initialized the MCMC with less informative points and we found consistent results within 1σ of the initial results.

Table 2. Used Parameters for the Recovery Test, According to Planet Multiplicity

N_pla	N_opt	N_Temps	f_best (%)	Walkers	Steps × 10⁵	Thinning
2	320	10	6.5	80	2.5	100
3	416	10	6.0	80	3.5	100
4	512	10	5.0	100	4.5	200
5	640	12	7.5	120	6.5	200

Note. Walkers refer to number of walkers drawn per temperature.

Download table as: ASCII Typeset image

Using 16 cores per job (i.e., per planetary system), Nauyaca was able to fit two-planet systems in ∼15 hr, on average. The time increased with increasing complexity of the planetary system, reaching up to ∼5.6 days for five-planet systems. Most of the time is spent on running the MCMC, since in comparison the optimizers are quite fast, running N_opt solutions (specified in Table 2) between 10 minutes and 2.5 hr depending on number of optimizer runs. Note however that the wall-clock time depends on many factors, for example the number of planets, the time span of observations, and the number of transits to fit, which in our case exceeded ∼150 transits for two-planet systems (see the number of transits per planet in the Appendix). Thus, lower computational requirements would be enough for simpler systems than those considered in this work. All of the simulations were performed with the supercomputer Miztli at the Universidad Nacional Autónoma de México.

5.1. Optimizer Results

We ran the optimizers to explore the parameter space with the aim of minimizing the differences between the observed and modeled transit times (Equation (2)). Since these runs are independent of each other, increasing the number of realizations enhances the chance of finding more minima that could correspond to local minima or the global minimum. We took a fraction (namely f_best) of the whole set of solutions with the best χ² to build a subset of solutions Θ_opt, which were used to initialize walkers.

Setting an optimum composition of Θ_opt is an interplay among N_opt, f_best, and the number of temperatures. Thus, there is not a unique way of defining these parameters. Although, we noticed that in most cases f_best < 10% is an appropriate fraction using N_opt ∼ 300–500. We used f_best shown in Table 2, according to planet multiplicity. That selection translates to 21, 25, 26, and 48 solutions that comprise Θ_opt for systems with two, three, four, and five planets, respectively. Taking higher values of f_best means taking more solutions from optimizers with increasing χ². Usually solutions with high χ² are located far from the meaningful modes, resembling random proposals (as can be noticed from Figure 4). Thus, selecting almost all of the optimizer solutions (f_best ∼ 1) could reduce the contribution of the optimizers to locate optimal initial regions to draw walkers.

In Figure 5, we show the results of the optimizers by measuring the differences between the solutions found and the true values. The solutions considered in this figure are part of the subset denoted previously as Θ_opt. Therefore, most of the solutions from the optimizers with higher χ² values were discarded. The color map indicates the percentage of solutions falling inside each bin. The darkest bins near to zero depict planets for which ≳50% of the solutions better approximate the true solutions. Subpanels with almost uniform yellow/white colors are those parameters less constrained by the optimizers.

For the case of the planet masses, 90% of the solutions in Θ_opt are typically determined within a factor ∼2–6 with respect to the true masses. Even though the dispersion tends to increase with increasing mass and number of planets, the solutions approach the correct mass with a relative low dispersion. In the case of the periods, the solutions tend to span over all of the allowed space. We found this behavior is due to the narrowness between our selected lower and upper boundaries, which scales according to the period (see Section 4.1). We noticed that when the boundaries are enlarged (for example, with a width of ±0.01 days) the solutions agglomerate around the true periods. Eccentricities are in general well restricted only for two-planet systems, where the dispersion reaches up to ∼0.06 around the true values. For higher multiplicities the solutions scatter out, with a typical dispersion of ∼0.08. We found that optimizers tend to overestimate eccentricities as the number of planets increases. For the argument of periastron (ω) and mean anomaly (M), a similar behavior is found. They are better restricted only for two-planet systems, since for higher multiplicities the dispersion increases, tending to be evenly distributed over the parameter space. Ascending nodes are the less restricted angles given our established boundaries for this parameter. Note however that for the recovery test, we just considered limited prograde solutions (as described in Section 4.1).

Although the initial search region explored by the MCMC would be narrowed down as a result of the optimization process, the chains were initialized in these high density regions and explored all of the allowed parameter space since the adopted uniform priors were not modified.

5.2. MCMC Results

In the work carried out by Nesvorný & Beaugé (2010) the authors test their tool TTVIM (which combines a minimization algorithm with an analytic approximation for inverting TTVs) using synthetic transit observations with the aim of recovering planet masses and orbital elements of a nontransiting planet. Nesvorný & Beaugé (2010) used upper limits in relative and absolute errors in planet parameters to decide whether the solutions were correctly recovered or not. Unlike Nesvorný & Beaugé (2010), we considered as recovered those solutions from the posterior distributions consistent with the true values within the 68% (1σ) and 95% (2σ) credible intervals of the total posteriors, assuming that they follow a normal distribution. These solutions are marked in Figure 6 as circles and squares. The comparison is made for planet parameters: mass, period (P), eccentricity (ecc), periastron longitude (ϖ; defined as the sum of ascending node and argument of periastron, ϖ = Ω + ω), and the true longitude (ℓ; defined as ℓ = ν + ϖ, where ν is the true anomaly, which is obtained from a Fourier expansion with the terms of the mean anomaly and eccentricity). Some of our posteriors exhibit more than one peak, manifesting the nature of the selected sampler. Median values and errors are usually centered on the prominent peaks so the data in Figure 6 are representative of our results. In practice, a dynamical stability test could help to distinguish the physically possible solutions. This study is underway, but it is out of the scope of the present work.

From Figure 6, we take the discrete values of the differences between recovered and true data (excluding the unrecovered data marked as gray crosses), assuming that discrete medians are representative of the MCMC results. This was done for each planet multiplicity and planet parameter. From the resulting distributions, we measured the standard deviation that represents, on average, the typical precision achieved in the recovery test for different parameters and planet multiplicities. For masses, we found the minimum dispersion of 0.7 M_⊕ for the two-planet systems, and a maximum of 14 M_⊕ for five-planet systems. For P, a minimum dispersion of 9 s for two-planet systems, and a maximum of 110 s for five-planet systems. For ecc, a minimum dispersion of 0.007 for two-planet systems, and a maximum of 0.03 for three-planet systems. For ϖ, a minimum of 35° for five-planet systems, and a maximum of 50° for three-planet systems. For ℓ, a minimum dispersion of 3° for two-planet systems, and a maximum of 7 fdg 5 for four-planet systems. Intermediate dispersions were found for multiplicities not mentioned in these ranges.

We also calculated the Kolmogorov–Smirnov (KS) statistic over the whole distribution of true values and the posterior medians for all of the parameters for planets in the full catalog. We found, for masses, a KS statistic = 0.07 (p-value = 0.96); for P, KS statistic = 0.01 (p-value = 1.0); for ecc, KS statistic = 0.25 (p-value = 3×10⁻³); for ϖ KS statistic =0.17 (p-value = 0.11), and for ℓ, KS statistic = 0.03 (p-value = 0.99). Thus, we cannot reject the hypothesis that the distribution of masses, periods (P), and true longitudes (ℓ) are statistically drawn from the same input distributions.

Figure 7 shows the percentage of planets consistent with the true parameters within 1σ and 2σ, when grouping planets with the same multiplicity. In Table 3 we summarize these results also considering the global percentages for the full catalog (independent of multiplicity). For masses, periods, and true longitudes, the global recovery percentages are consistent with the expected statistical values, i.e., around ∼68% of the planet parameters are recovered within 1σ and ∼95% within 2σ. These parameters also exhibit consistency with the input catalog according to the KS test. However, we note that ecc and ϖ have similar recovery percentages but are far below these statistically expected values. For these two parameters we found, respectively, 52% and 55% within 1σ and 80% and 76% within 2σ (see Table 3).

Table 3. Percentage of Planets Consistent With the True Parameters Within 68% (1σ) and 95% (2σ) Credible Interval

Planet multiplicity	Mass		Period		Eccentricity		Periastron longitude (ϖ)		True longitude (ℓ)
	1σ	2σ	1σ	2σ	1σ	2σ	1σ	2σ	1σ	2σ
	(%)	(%)	(%)	(%)	(%)	(%)	(%)	(%)	(%)	(%)
2 (24 planets)	66	87	58	95	66	100	54	87	66	91
3 (24 planets)	58	91	62	80	54	91	45	66	70	95
4 (32 planets)	75	96	78	100	47	78	62	78	75	96
5 (20 planets)	85	95	65	85	40	45	55	70	65	75

Full catalog (100 planets)	71	93	67	91	52	80	55	76	70	91

Note. Percentages are given for specific planet multiplicities and for the full catalog.

Download table as: ASCII Typeset image

We investigate the possible causes of both the similarity and low recovery rates by inspecting the dependence between both parameters. We found that ∼45% of planets with ecc ≲ 0.05 recover both ecc and ϖ at the same time. The remaining planets do not recover at least one of these parameters. By contrast, planets with ecc ≳ 0.05 recover both parameters ∼90% of the times, showing that the low recovery rates occur mainly for low eccentricity orbits. In our catalog, most of the planets belonging to systems with three or more planets have eccentricities below ∼0.05, as shown in Figure 3. Even more, from Figure 7, it is seen that eccentricity is the only parameter that diminishes its recovery rate as the number of planets increases (teal lines with stars).

We find that the cause of the similarity and the global low recovery rate for ecc and ϖ is a result of the difficulty of finding a well-defined argument of periastron as the orbits tend to be more circular, which in our case occurs mainly for planets belonging to high multiplicity systems. We will show in Section 5.4 that the trend of the eccentricity shown in Figure 7 and the low recovery rate of ecc (and consequently ϖ) can be partially explained by the diminishing of the number of transits available for external planets belonging to high multiplicity systems.

5.3. TTVs

In Figure 8 we illustrate the TTVs of two systems with simulated data of different qualities and their orbits. The top figure corresponds to the system with ID = pl2_id52 in Table 4 for which we estimate an S/N of 8.4 and 5.0 for planets 1 and 2, respectively. The bottom figure corresponds to the system with ID = pl3_id1 in Table 5 for which we estimate an S/N of 1.6, 2.08, and 2.06 for planets 1, 2, and 3, respectively. Left panels show 100 orbital configurations (thin colored orbits) reconstructed from the planetary parameters taken randomly from our MCMC posteriors. For comparison, the true orbits are plotted as solid black lines. Transits are detected each time the planets cross the +Z-axis, which corresponds to the line of sight. Right panels show the 100 TTV signals (colored solid lines) using the same planet parameters of the orbits. Synthetic observations are shown by empty circles with error bars. For these two systems, the TTV models are consistent with the synthetic data, but with a better planetary determination for the top system, which has a better S/N. Hence, the quality of the transit times directly affects the determination of the orbital configurations. For example, we identify that in the case of the three-planet system, the argument of periastron shows a wide range of possible solutions with respect to the true position (dashed black lines).

We statistically measured the fitted residuals from the whole catalog by taking 100 random samples per planet from the MCMC posteriors and then calculating their corresponding TTVs. We took the differences between data and the fitted transit times, grouping these residuals according to their multiplicity. The resulting distributions are shown in Figure 9. We found a typical standard deviation for these residuals of 2.5 minutes for two-planet, 2.3 minutes for three-planet, 3 minutes for four-planet, and 5.3 minutes for five-planet systems.

**Figure 9.** Typical residuals from the TTVs fitting grouped by planet multiplicity. Individual histograms were built by summing residuals of 100 random solutions per planet from the whole catalog.
Download figure:
Standard image High-resolution image

5.4. The Impact of the Data Quality and Quantity

The TTVs inversion problem challenges not only the methods or algorithms, but also the observational data requirements. In this subsection we address this issue by considering the quality (measured by the S/N) and quantity (N_tran) of the data required in order to correctly determine planetary parameters. We focus on the determination of planetary mass and eccentricity.

Previous works have proposed different answers regarding the required number of transits and data quality to invert TTVs. Saad-Olivera et al. (2019) suggested that a large number of TTVs combined with a high S/N can be enough to robustly estimate planetary parameters without the necessity of radial velocity measurements. A detailed analysis regarding the characterization of nontransiting planets was conducted by Veras et al. (2011), finding that at least 50 transits are needed to invert the problem. Nesvorný & Morbidelli (2008) developed a method based on perturbation theory to characterize two-planet systems, and found that an S/N ∼ 15–30 and about N_tran ≳20 are typically required to uniquely characterize these systems.

We carried out an analysis of the parameters involved in the fitting procedure, looking for possible correlations between the data quality and quantity, the intrinsic planetary parameters (including the number of planets, N_pla), and the results we found using Nauyaca. A first inspection suggested a trend with the number of transits and thus it is convenient to define the total signal-to-noise ratio (S/N)_T as the product of ${\rm{S}}/{\rm{N}}\times \sqrt{{N}_{\mathrm{tran}}}$ . Figure 10 shows the fractional error in the determination of the planet mass and eccentricity, and the dependency with the (S/N)_T, N_tran, and N_pla. Here, 2σ_mass and 2σ_ecc correspond to our derived uncertainties taken from the posteriors encompassing the 95% credible interval. These uncertainties are divided by the true mass and eccentricity, and thus, Figure 10 can work as a guide to put some constraints on the determination of these parameters given the quality and quantity of the data.

**Figure 10.** Influence of the data quality and quantity over the proper determination of the planet mass (left panels) and eccentricity (right panels). Horizontal axes denote the total signal-to-noise ratio ${({\rm{S}}/{\rm{N}})}_{T}={\rm{S}}/{\rm{N}}\times \sqrt{{N}_{\mathrm{tran}}}$ and vertical axes are the fractional errors derived from our results. The 2σ corresponds to the 95% credible interval from the posteriors and Mass_True and ecc_True are the *true* values in the catalog. The recovered planets are marked with circles and the unrecovered planets are marked with crosses. In the top panels, colors correspond to the numbers of transits per planet (N_tran) binned by 20 transits. In the bottom panels, planets are colored according to the number of planets in their systems.
Download figure:
Standard image High-resolution image

**Figure 10.** Influence of the data quality and quantity over the proper determination of the planet mass (left panels) and eccentricity (right panels). Horizontal axes denote the total signal-to-noise ratio ${({\rm{S}}/{\rm{N}})}_{T}={\rm{S}}/{\rm{N}}\times \sqrt{{N}_{\mathrm{tran}}}$ and vertical axes are the fractional errors derived from our results. The 2σ corresponds to the 95% credible interval from the posteriors and Mass_True and ecc_True are the *true* values in the catalog. The recovered planets are marked with circles and the unrecovered planets are marked with crosses. In the top panels, colors correspond to the numbers of transits per planet (N_tran) binned by 20 transits. In the bottom panels, planets are colored according to the number of planets in their systems.
Download figure:
Standard image High-resolution image

In general, it is seen that the fractional error (related with the adequate determination of the parameters) has a correlation with the quality of the data and in a second term with the quantity. Moreover, the deficiency of one of these quantities is in some cases compensated for by the other one. We observed that planets with individual high (S/N)_T can constrain the planetary mass and eccentricity even with a relative low N_tran. Complementary, a large N_tran can compensate for a low (S/N)_T. Planets with a low (S/N)_T exhibit a larger dispersion of their fractional errors for their masses and eccentricities, although, those with a large N_tran usually have a better determination (i.e., a lower fractional error). Comparing the top and bottom panels, it can be discerned that the majority of the unrecovered planets are those with a number of transits ≲60–80 (white/yellow crosses), corresponding to systems with three, four, and five planets. However, planets of these multiplicities with low (S/N)_T can still find the correct solution but with a poor constraint (white/yellow circles in the upper left part of the top diagrams). Also notice from the bottom panels that the trend between the fractional errors and the total (S/N)_T seems to hold when grouped by planet multiplicities.

From these results it is seen that (S/N)_T and N_tran have a significant impact on the determination of these parameters. The low recovery rate for ecc discussed in Section 5.2 is then explained by the low (S/N)_T of our synthetic data and the low number of transits per planet (≲20–40) for systems with many planets. Low (S/N)_T can be due to noisy measurements of the transit timing or due to low amplitudes of the TTVs signals. In the first case, the limitation is in part caused by the instruments and observational strategy. In the second case, low amplitudes correspond to planets in less perturbed (almost circular) orbits. In this situation, inverting TTV signals results in less constrained orbits.

6. Considerations and Caveats

Here, we highlight many of the considerations made in this work that could be relevant for the users of Nauyaca. We also point out many of the limitations and caveats of using this tool.

1.
In this work we assume the continuous monitoring of the planet transits and thus there are no data gaps in our ephemeris. In practice this could be unrealistic, especially for ground-based observations where telescope time is limited. Although, as discussed by Nesvorný & Morbidelli (2008), the number of transits is determinant in the planetary characterization rather than the time distribution.
2.
We remind the reader that there is no unique way of choosing the fine-tuning parameters for running the algorithms in Nauyaca, since it will depend on the problem to be solved. Although, the parameters chosen here for synthetic systems can work as a guide for real planets.
3.
Regarding the usage of Nauyaca with real systems, the user must be aware that stellar mass and radius are well restricted to consider them as constants. In cases where one or both of these parameters exhibits large uncertainties, we suggest running many simulations with a grid of stellar parameters including the values with the lower and upper uncertainties.
4.
We remind the reader that TTVFast and therefore Nauyaca is adapted to deal with planets around single parent stars, and hence planets in circumbinary systems or other configurations are not allowed.
5.
We also must take into account that the fitting is performed over the midtransit times instead of the TTVs. Since TTVs are the transit time signals in an O − C diagram, the calculated data depend on the number of epochs and the method used to make the linear regression over the transit times. Fitting to TTVs instead of transit times alone can result in an imprecise period determination of about 40 minutes for a ∼70 days period planet, as shown by Carpintero & Melita (2018). For that reason, we performed the fitting methods over the raw midtransit times.
6.
The simulations are performed over a pre-fixed time span with a pre-selected time step that by default is set to P₁/30, where P₁ is the period of the internal planet.
7.
When computing the midtransit times, the planet radius is not taken into account to assess whether a transit occurs. Only the coordinates of the planet center are considered to determine if the planet transits. Thus, grazing transits (as for example in WASP-67b; Mancini et al. 2014) are currently not considered.

7. Summary and Conclusions

In this work we present Nauyaca,⁶ a Python package that encompasses minimization routines (Optimization module) and an MCMC method (MCMC module) exclusively adapted to find planet parameters (masses and orbital elements) that best reproduce the transit times based on numerical simulations. Even though the numerical method is more computationally expensive (compared to analytical approximations), it is more suitable to address more general situations, such as considering many planets with varied orbital configurations. Nauyaca requires transit ephemeris per planet and the stellar mass and radius. Additionally, any previous knowledge about validity ranges can be supplied in order to better constrain the parameter space.

Previous studies of transit timing analysis have used synthetic data to test new techniques or to quantify the relation between properties of TTVs and planetary parameters (e.g., Nesvorný & Morbidelli 2008; Meschiari & Laughlin 2010; Veras et al. 2011). However, in most cases these studies have been limited to the study of two-planet systems where one of the planets transits and the other acts as the perturber. Here, we analyze a large sample of synthetic transiting planets with planetary parameters based on the current planet data from the Exoplanet Archive (see Section 3.1). For these planets we calculate the midtransit times to use them as input to Nauyaca (Section 5). This allows us to characterize the performance of Nauyaca by measuring the consistency rate between the catalog entries and the parameters determined by the tool.

For all of the systems, we run optimization algorithms to test the performance for many planet multiplicities (Section 5.1). Optimizers take advantage of the low computation time in contrast to that of a full MCMC run and provides an overall outlook of the regions in the parameter space with a higher probability (the planet boundaries were defined in Section 4.1). We find that the best performance is achieved for the two-planet systems for any dimension, and for higher multiplicities the results are varied. The optimizers would define high density regions for the masses of the planets within a factor ∼2–6, allowing us to initialize the chains around 20% of the real masses. Eccentricities are in general well restricted only for two-planet systems. Nevertheless, the optimizers for higher multiplicities are not well suited to define high density regions where the MCMC chains would be initialized. Orbital angles remain in general loosely constrained, except for two-planet systems.

We draw the initial walker population for the parallel-tempering MCMC using the best <10% of the solutions from the optimizer runs. We find a good agreement between the input parameters in the catalog and those found in the recovery test with the MCMC, at 2σ (Section 5.2). The global recovery percentages for masses, periods, and true longitudes are in concordance with the statistically expected values of ∼68% within 1σ and ∼95% within 2σ. These parameters also exhibit consistency with the input catalog according to a Kolmogorov–Smirnov test. By contrast, eccentricities and periastron longitudes have similar low recovery rates. Even more, the recovery rate for eccentricity diminishes as the number of planets increases. We find that the cause of the similarity and the global low recovery rate between ecc and ϖ is a combination of the difficulty to determine the argument of periastron as the orbits tend to be circular and the reduction of the number of transits for external planets, mainly for high multiplicity systems. In these scenarios, the S/N of the data plays a determinant role to correctly determine these parameters.

Depending on the planet multiplicities, a typical mass precision accomplished in the recovery test ranges from ∼1–14 M_⊕. Periods achieve a precision of between 10 s and 2 minutes. Eccentricities reach a precision of between ∼0.01–0.03. Periastron longitudes and true longitudes have typical precisions of ∼40° and 6°, respectively.

We investigate the effect of data quality and quantity (Section 5.4) on the proper determination of the planetary mass and eccentricity. We find that, in general, quality is more important than quantity, although in many cases one parameter can compensate for the deficiency of the other one. However, we warn that the results in this part of our study should be taken as a guide, since the simulated planet sample in our catalog is limited.

Finally, we make suggestions about the fine-tuning parameters involved in the procedure. Depending on the computation facilities, parameters in Table 2 could be a reasonable starting point to make the optimization and MCMC runs. Note that in our mock catalog, the "observed" time span and the number of transits in the systems are varied. Therefore, the fine-tuning parameters can be scaled to be adapted to specific problems. We suggest performing, at least, between 100–150 optimizer runs (N_opt) times the number of planets in the system, and taking <10%–30% of the best solutions to initialize walkers. The choice of the initialization strategy presented in this work (Section 4.3) depends on the parameter space itself, which is unknown by nature. We suggest using the ladder or picked strategy if nothing is known about the parameter space (as in the majority of the situations) and the Gaussian strategy if the parameter space is somehow constrained, as for example when considering fixed angles, fixed periods, circular orbits, etc. Of course, the suggested methodology used in this work is not mandatory when using Nauyaca, since, for example, optimization routines and the proposed initialization strategies are optional. The user has the freedom to select the tools that are best suited to the TTVs inversion problem. However, we note an improvement in the MCMC performance and the results when following the proposed method shown in this work.

In a forthcoming work, beyond the scope of this paper, we will show the application of the tool to more specific situations, such as systems with nontransiting planets, those with highly mutual inclinations, and those with missed transit data. We will also show the application to real systems in order to revisit planet parameters of previously characterized planets and also with new planets.

We provide the data used throughout this work in electronic format at 10.5281/zenodo.5218498. Nauyaca first release can be found at 10.5281/zenodo.5230451.

The authors gratefully acknowledge the computing time granted by DGTIC-UNAM for access to the supercomputer Miztli in the HTC group, under the project with code LANCAD-UNAM-DGTIC-361. We thank Gabriel Perren and Luis M. Pavón for useful discussions. We also thank the anonymous referees for their useful comments and suggestions made to improve the quality of the manuscript. E.F.C. also acknowledges the PhD grant awarded by the CONACYT Graduate Fellowship. This work was supported by UNAM-PAPIIT IN-107518 and BG-101321. H.V. was supported by the project UNAM-PAPIIT IN-101918. This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.

Facilities: Supercomputer Miztli - , Exoplanet Archive. -

Software: Numpy (Oliphant 2015), Scipy (Virtanen et al. 2020), H5py (Collette 2013), Matplotlib (Hunter 2007), Seaborn (Waskom et al. 2017).

Appendix: Catalog Tables

Here, we collect the data tables of the mock catalog that show the planet properties of the synthetic planetary systems. The Tables are grouped by planet multiplicity: Table 4 for two-planet, Table 5 for three-planet, Table 6 for four-planet, and Table 7 for five-planet systems. System IDs follow the syntax pl{N_pla}_id{ID}, where N_pla is the number of planets in the system and ID is an internal identifier number. Orbital elements correspond to the osculating elements at the simulated reference time of t₀ = 0 days. These tables can be found in electronic format at: 10.5281/zenodo.5218498.

Table 4. Input Catalog of Two-planet Systems

System ID	M_*	R_*	Planet	mass	P	ecc	inc	ω	M	Ω	N_tran
	(M_⊙)	(R_⊙)		(M_⊕)	(day)		(deg)	(deg)	(deg)	(deg)
pl2_id0	0.91	0.86	1	5.721	5.1118	0.0921	91.264	154.81	147.77	89.75	130
			2	27.969	11.762	0.1019	89.759	57.27	313.49	89.94	57
pl2_id1	1.09	1.3	1	12.599	9.55216	0.12	89.816	188.41	193.31	89.72	130
			2	15.199	21.05799	0.0939	90.213	41.55	201.25	90.83	59
pl2_id7	1.09	1.61	1	0.769	15.09222	0.1012	90.244	280.58	160.98	90.04	131
			2	0.899	22.79987	0.0895	89.978	338.53	186.65	91.98	86
pl2_id9	1.28	1.98	1	37.186	22.95311	0.0747	88.69	210.99	286.99	89.03	130
			2	79.457	42.8698	0.1313	90.332	223.66	303.88	88.9	69
pl2_id23	1.0704	1.2077	1	11.592	10.68521	0.1346	89.983	150.27	358.03	90.51	130
			2	12.233	20.88751	0.1098	90.039	41.13	358.67	89.81	67
pl2_id28	0.9127	0.8627	1	5.674	5.31963	0.2359	88.165	110.47	136.63	88.59	130
			2	27.034	11.81586	0.0713	90.249	184.33	289.6	88.64	58
pl2_id35	1.0843	1.3948	1	10.257	10.78381	0.0945	90.567	67.59	349.14	90.66	130
			2	13.115	19.67467	0.21	89.874	44.17	128.06	89.76	71
pl2_id38	1.0408	1.0681	1	10.07	12.40501	0.1372	90.949	41.03	91.03	88.58	130
			2	7.749	20.61714	0.0941	89.34	36.34	185.82	91.93	78
pl2_id39	0.6513	0.6236	1	5.744	4.01342	0.1685	91.171	101.62	57.74	91.32	130
			2	18.837	8.29871	0.1206	89.692	68.22	15.26	88.9	63
pl2_id45	0.9331	0.9199	1	24.2	16.70586	0.0898	89.331	40.92	221.72	88.17	130
			2	119.359	30.74872	0.0931	90.339	108.98	130.39	90.93	70
pl2_id52	0.9133	0.8633	1	5.665	5.35468	0.1237	90.717	296.22	16.39	88.36	130
			2	26.856	11.83319	0.0727	90.052	187.26	86.62	88.95	59
pl2_id62	1.0801	1.3877	1	9.775	17.95549	0.0884	91.163	174.38	74.34	89.82	130
			2	8.062	32.45119	0.0399	89.168	124.35	28.63	89.02	72

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as: ASCII Typeset image

Table 5. Input Catalog of Three-planet Systems

System ID	M_*	R_*	Planet	mass	P	ecc	inc	ω	M	Ω	N_tran
	(M_⊙)	(R_⊙)		(M_⊕)	(day)		(deg)	(deg)	(deg)	(deg)
pl3_id1	1.08	1.49	1	7.31	6.88701	0.0218	89.978	309.12	121.87	90.51	130
			2	7.049	12.81716	0.0297	89.835	10.79	150.06	91.57	70
			3	3.0	35.32724	0.0096	90.831	68.07	213.07	90.92	25
pl3_id3	0.97	1.11	1	6.992	3.50448	0.0269	89.909	267.55	132.89	89.7	130
			2	17.163	7.6433	0.0278	90.043	318.49	168.74	89.57	59
			3	16.527	14.85607	0.0084	89.971	70.05	279.07	89.76	31
pl3_id4	1.08	1.0	1	7.31	34.54958	0.0384	90.17	133.3	186.18	88.79	130
			2	4.132	66.07813	0.0077	90.58	111.04	96.02	90.46	68
			3	133.488	125.84333	0.0373	90.045	159.98	192.7	89.06	36
pl3_id7	1.04	0.94	1	2.225	45.15447	0.0155	89.55	87.22	251.23	90.94	130
			2	4.132	85.30295	0.0322	90.459	49.2	12.57	91.31	69
			3	7.628	130.21746	0.028	89.744	23.47	72.62	90.52	45
pl3_id9	0.695	0.7204	1	5.402	1.75109	0.0465	91.436	38.82	345.74	89.37	130
			2	4.694	4.59028	0.0236	88.986	161.25	348.33	90.29	49
			3	6.988	8.2469	0.0252	89.615	8.84	310.5	89.16	28
pl3_id10	0.5224	0.4423	1	0.077	10.4707	0.0406	89.809	292.62	246.14	88.3	130
			2	1.98	14.10509	0.0244	89.909	239.76	191.31	88.46	97
			3	0.67	23.57467	0.0306	90.336	82.64	149.24	88.51	58
pl3_id11	1.0658	1.4631	1	7.238	6.71494	0.0359	90.821	4.54	160.56	90.27	130
			2	6.862	12.53776	0.0074	90.446	204.83	162.09	90.43	70
			3	3.032	34.44641	0.0062	88.187	291.73	88.84	91.03	26
pl3_id16	0.7275	0.7804	1	4.713	2.82797	0.0348	90.742	65.05	64.67	90.01	130
			2	1.662	5.08958	0.0397	89.0	17.85	26.57	88.51	73
			3	4.073	7.75808	0.0164	89.538	350.87	3.59	88.92	48

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as: ASCII Typeset image

Table 6. Input Catalog of Four-planet Systems

System ID	M_*	R_*	Planet	mass	P	ecc	inc	ω	M	Ω	N_tran
	(M_⊙)	(R_⊙)		(M_⊕)	(day)		(deg)	(deg)	(deg)	(deg)
pl4_id0	0.69	0.7	1	11.267	0.65853	0.042	90.167	194.71	163.89	90.11	130
			2	0.289	7.81455	0.051	90.613	159.71	101.81	88.59	11
			3	8.899	14.70854	0.0327	90.045	181.81	76.05	89.8	6
			4	14.299	19.46914	0.0082	89.852	117.87	82.94	90.06	4
pl4_id2	1.28	1.52	1	10.488	3.74307	0.0287	90.513	314.46	190.51	88.26	130
			2	15.574	10.42773	0.0403	89.628	281.5	301.3	88.81	47
			3	106.155	22.34794	0.0147	90.044	310.83	231.24	88.63	22
			4	34.961	54.40659	0.0229	90.077	235.71	31.83	88.83	9
pl4_id3	1.0	1.04	1	5.301	6.16484	0.0177	90.556	47.02	108.04	90.16	130
			2	10.488	13.56747	0.0189	89.597	194.95	171.7	90.32	59
			3	8.101	23.97665	0.0165	90.391	207.63	258.73	90.89	33
			4	11.124	43.86076	0.009	89.728	73.95	273.67	90.43	18
pl4_id7	1.028	1.0878	1	5.092	5.81643	0.0334	88.339	49.89	70.58	87.76	130
			2	10.36	12.55872	0.0189	89.747	175.26	215.81	89.24	61
			3	7.6	22.10718	0.0228	90.328	67.44	114.15	90.16	34
			4	10.83	40.44915	0.0403	90.276	216.75	198.03	90.8	19
pl4_id8	1.0624	1.3935	1	6.392	6.79961	0.0573	89.9	337.25	301.98	89.45	130
			2	7.688	11.63317	0.0116	88.195	158.65	100.6	90.52	76
			3	8.049	19.20383	0.0272	91.121	26.99	166.45	91.86	46
			4	7.836	31.30115	0.0293	89.897	125.59	297.78	90.03	29
pl4_id9	1.1894	1.3388	1	10.829	11.76765	0.0486	90.31	210.63	299.0	91.07	130
			2	7.719	24.42051	0.0079	90.433	120.73	69.75	89.35	62
			3	23.683	46.86163	0.0169	89.907	300.94	231.49	89.62	32
			4	9.565	76.28288	0.0666	89.866	162.63	222.65	90.03	20
pl4_id11	1.0606	1.3832	1	6.36	6.78081	0.0352	90.006	198.33	243.94	88.32	130
			2	7.77	11.68647	0.0237	89.284	163.69	31.72	87.64	75
			3	8.05	19.35073	0.0635	89.77	249.47	31.64	89.32	46
			4	7.932	31.66494	0.0158	90.597	250.34	179.08	90.15	28
pl4_id13	1.1416	1.2819	1	4.244	4.40357	0.0435	88.887	76.63	318.04	90.25	130
			2	9.84	8.45655	0.0132	88.65	63.97	97.3	88.76	67
			3	5.565	14.52725	0.0411	90.086	172.97	94.37	90.04	39
			4	9.635	26.67856	0.0051	91.074	299.01	232.92	89.57	21

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as: ASCII Typeset image

Table 7. Input Catalog of Five-planet Systems

System ID	M_*	R_*	Planet	mass	P	ecc	inc	ω	M	Ω	N_tran
	(M_⊙)	(R_⊙)		(M_⊕)	(day)		(deg)	(deg)	(deg)	(deg)
pl5_id0	0.81	0.76	1	4.3	5.28442	0.0286	87.353	247.95	146.13	89.42	130
			2	3.0	7.0787	0.0118	89.263	284.41	138.07	89.43	98
			3	3.814	10.31011	0.0504	91.07	50.12	206.02	90.19	67
			4	8.899	16.1445	0.0194	91.287	113.77	39.55	89.8	42
			5	5.2	27.4549	0.0373	89.23	338.07	141.81	90.33	25
pl5_id1	0.69	0.64	1	9.535	5.71468	0.0502	90.721	290.05	321.9	89.96	130
			2	4.132	12.44489	0.0275	89.508	253.55	49.09	89.22	60
			3	13.984	18.15237	0.0293	89.873	283.18	204.01	89.39	41
			4	35.915	122.39306	0.0341	89.783	18.1	213.39	90.14	6
			5	34.961	267.23151	0.0313	89.853	155.82	85.67	90.09	3
pl5_id2	0.8053	0.7553	1	4.504	5.31476	0.0202	88.981	342.17	193.54	89.05	130
			2	3.044	7.24149	0.0787	90.105	74.54	256.88	90.4	95
			3	4.21	10.63179	0.0339	90.111	290.26	238.28	89.61	65
			4	9.951	20.27765	0.0031	89.968	175.85	193.59	89.92	34
			5	6.358	36.78033	0.0299	90.235	264.02	264.33	88.03	18
pl5_id3	0.6966	0.6466	1	9.245	5.69128	0.0371	90.184	288.33	278.12	89.56	130
			2	4.069	12.14691	0.0362	90.147	301.5	328.87	89.41	61
			3	13.421	17.72588	0.0189	90.025	355.59	331.8	89.31	42
			4	34.419	116.48534	0.0301	89.696	161.07	110.69	90.03	6
			5	33.314	253.77612	0.0382	90.213	221.67	152.91	90.3	3

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as: ASCII Typeset image

Nauyaca: a New Tool to Determine Planetary Masses and Orbital Elements through Transit Timing Analysis

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction