The following article is Open access

Nauyaca: a New Tool to Determine Planetary Masses and Orbital Elements through Transit Timing Analysis

, , and

Published 2021 November 24 © 2021. The Author(s). Published by the American Astronomical Society.
, , Citation Eliab F. Canul et al 2021 AJ 162 262 DOI 10.3847/1538-3881/ac2744

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1538-3881/162/6/262

Abstract

The transit timing variations method is currently the most successful method to determine dynamical masses and orbital elements for Earth-sized transiting planets. Precise mass determination is fundamental to restrict planetary densities and thus infer planetary compositions. In this work, we present Nauyaca, a Python package dedicated to finding planetary masses and orbital elements through the fitting of observed midtransit times from an N-body approach. The fitting strategy consists of performing a sequence of minimization algorithms (optimizers) that are used to identify high probability regions in the parameter space. These results from optimizers are used for initialization of a Markov chain Monte Carlo method, using an adaptive Parallel-Tempering algorithm. A set of runs are performed in order to obtain posterior distributions of planetary masses and orbital elements. In order to test the tool, we created a mock catalog of synthetic planetary systems with different numbers of planets where all of them transit. We calculate their midtransit times to give them as an input to Nauyaca, testing statistically its efficiency in recovering the planetary parameters from the catalog. For the recovered planets, we find typical dispersions around the real values of ∼1–14 M for masses, between 10–110 s for periods, and between ∼0.01–0.03 for eccentricities. We also investigate the effects of the signal-to-noise ratio and number of transits on the correct determination of the planetary parameters. Finally, we suggest choices of the parameters that govern the tool for the usage with real planets, according to the complexity of the problem and computational facilities.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Transit timing variations (TTVs; Agol et al. 2005; Holman & Murray 2005) is to date the most successful method to measure precise masses of Earth-sized transiting planets harbored in multiplanet systems that could not be identified by other means, for instance, radial velocities (Steffen 2016; Mills & Mazeh 2017). TTVs contain valuable information that allows us to determine orbital properties that are useful to predict transit ephemeris, orbital stability, and dynamical evolution (for a review of TTVs, see Agol & Fabrycky 2018 and references therein).

Deriving planet parameters (masses and orbital elements) from observed transit times is known as the TTVs inversion problem. Since the first planetary system was characterized by TTVs (Kepler-9; Holman et al. 2010), many authors have used TTVs to determine planetary masses for systems where all known planets transit (e.g., Masuda 2014; Agol et al. 2021; Saad-Olivera et al. 2020) and also to characterize nontransiting planets (e.g., Nesvorný et al. 2012, 2013; Becker et al. 2015; Masuda 2017; Saad-Olivera et al. 2017; Carpintero & Melita 2018).

Two main approaches have been largely developed to invert TTVs based on analytical and numerical approximations. Analytical approaches take advantage of the low computational cost at the expense of the limitation to specific scenarios such as planets near mean motion resonances, low eccentric orbits, or specific two-planet systems (Nesvorný & Morbidelli 2008; Nesvorný 2009; Nesvorný & Beaugé 2010; Lithwick et al. 2012; Agol & Deck 2016; Linial et al. 2018). Numerical models including N-body integrations seem to be an unavoidable route to study more diverse scenarios than those considered by the analytical techniques, but they also complement and double-check results from these analytical methods.

The inversion problem requires methods to fully explore the parameter space of the planets independent of the type of models mentioned above. Many works have used combinations of techniques and models to deal with the TTVs inversion problem, for example: minimization routines and analytical models (Nesvorný & Beaugé 2010); genetic algorithm and numerical N-body models (Carpintero & Melita 2018); genetic and optimization algorithms plus N-body simulations (Borsato et al. 2014); a coupled simulated annealing algorithm plus N-body (Meschiari & Laughlin 2010); Markov chain Monte Carlo (MCMC) methods and analytic models (Tuchow et al. 2019); MCMC and N-body (Becker et al. 2015; Jontof-Hutter et al. 2016; Agol et al. 2021); minimization plus MCMC using N-body simulations (Masuda 2014); a multimodal nested sampling algorithm combined with either N-body (e.g., Nesvorný et al. 2013; Saad-Olivera et al. 2017, 2020) or with both N-body and analytic models (e.g., Masuda 2017; Yoffe et al. 2021); a combination of MCMC plus analytical and numerical models (Lithwick et al. 2012; Hadden & Lithwick 2016, 2017); and MCMC and minimization plus analytical tools (Gajdoš & Parimucha 2019). Despite the numerous works that employ computational tools, there is a scarcity of available tools in a ready-to-use package that allows one to deal with the TTVs inversion problem in an intuitive, easy, and confident way.

In this work, we introduce Nauyaca, 1 an easy-to-use Python tool dedicated to deal with the TTVs inversion problem using the N-body approach. The numerical tool, even though it is computationally expensive compared to analytical approximations, is more able to address general situations such as, for example, the number of planets, prograde or retrograde orbits, planets out of resonances, or those with eccentric and non-coplanar orbits. The only required data are transit ephemeris per planet and the stellar mass and radius. Additionally, any prior knowledge about any planetary parameter can be supplied in order to better constrain the parameter space.

This paper is structured as follows. In Section 2, we describe the main features of the tool and the modules functionality. Section 3 describes the creation of a mock catalog of synthetic planetary systems and midtransit times to test the tool. In Section 4 we discuss the election of the parameter space and the parameters for the minimization and MCMC algorithms. In Section 5, we apply Nauyaca to the whole simulated catalog and discuss the consistency between the input planetary parameters and those found by the tool. Section 6 is dedicated to briefly highlight caveats of our tool, and finally in Section 7 we summarize our findings and we make suggestions about the procedure and parameters to deal with real data.

2. Methods

We implemented a Python package, named Nauyaca, focused on the determination of planet masses and orbital elements through midtransit times fitting for planets around single parent stars. Our tool is equipped with minimization routines and an MCMC method exclusively adapted to fit transit times series from an N-body approach. Nauyaca manages the exploration of the parameter space with the main goal of finding solutions of planetary parameters that produce midtransit times consistent with observations for each planet. The tool can work in a parallelized scheme, so multicore machines are preferred for the best performance.

We incorporated TTVFast 2 (Deck et al. 2014), an optimized fast code (5–20 times faster than standard methods) to make transit timing models. In short, TTVFast receives a set of initial conditions for the planets (mass and orbits) and the stellar mass, and perform an N-body simulation. At the same time, an incorporated Keplerian interpolator calculates midtransit times by interpolating the orbits when a planet is detected crossing the star. Even though N-body simulations could be time consuming and computationally expensive, we decided to not implement analytical or semi-analytical model approximations because many of these models are just valid for planets in low-order eccentricities, planets near first-order orbital resonances, or a fixed number of planets. We opted for a general purpose approach that could work with more diverse planetary configurations and, thus, we decided to use TTVFast.

We define Θj ≡ {mass, P, ecc, inc, ω, M, Ω}j as a set of planet parameters for the jth planet, where the parameters are, respectively: the mass, period, eccentricity, inclination, argument of periastron, mean anomaly, and longitude of ascending node. The orbital elements in Θj correspond to the instantaneous elements at a specific reference time t0, which is specified by the user or automatically selected by Nauyaca. Orbital elements are defined in a fixed astrocentric coordinate system with the XY plane spanning the plane of the sky and the observer located orthogonally at +Z.

Given a constant stellar mass M* and radius R*, the algorithms incorporated in Nauyaca make proposals to conform an initial condition (Θ) for all of the planets in order to run TTVFast, which results in a model of transit times per planet. Then, to perform the TTVs fitting, we assume that transit errors are independent, following a normal distribution, and thus we set the log-likelihood function in the form

Equation (1)

where

Equation (2)

with j denoting numbered planets and i denoting their respective transit epochs over the total number of transits Ntran. Here, t corresponds to the observed transit time for a specified epoch, ${t}^{{\rm{sim}}}$ is the simulated transit time calculated by the model, and σ corresponds to the uncertainty in the central time. We take errors σj (i) as the mean of the upper and lower errors of the ith transit time. The total log-likelihood is calculated by adding the individual log-likelihoods $\mathrm{log}{{ \mathcal L }}_{j}$ of the planets.

The fitting procedure is performed over the available transit times, but it is possible to include planets in the system without transit times that interact with the transiting planets. The explored space also includes those of the nontransiting (or undetected) planets, and thus we can get information about their planetary parameters. However, considering planets without transit data is left for a future work.

2.1. Modules

Nauyaca incorporates several modules with techniques adapted specifically for the TTVs inversion problem. We describe the functionalities of these modules below.

2.1.1. Setup

Here, we describe the preparation of the data before running the simulations. In the context of object-oriented programming, we define a Planetary System object by specifying the stellar mass and radius and the harbored planets. Transit times fitting will be operated over Planetary Systems. Then, we define the planets by establishing the information about their transit times ephemeris and the allowed parameter space. The transit ephemeris per planet should include the epoch number, time of transit, and lower and upper timing uncertainties. In the simulations, the transit epochs (integer transit numbers) of all of the planets are counted starting from 0 after t0. Thus, the user must be aware that the epoch numbers are properly labeled and referenced to the same reference time t0.

The time span of the simulations is automatically chosen to encompass the full time of observations in the ephemeris data. If no time step is defined for the simulations, it is automatically set to be 30 steps per orbit of the internal planet (∼3.33% of the internal planet period). With this time step Deck et al. (2014) demonstrated the effectiveness of determining transit times with an accuracy within 10 s for a ∼22.3 days period planet. A time step <5% of the internal planet period is recommendable to reach that accuracy.

Regarding the parameter space, the tool requires specific boundaries for each parameter in order to perform an effective sampling, avoiding nonphysical regions or redundant information. Each planet in the Planetary System must define its own parameter space. If there is no constraint for any individual or none of the parameters in Θj , a set of default boundaries is established. By default, masses are restricted to be in the range from 1 lunar mass to 80 Jupiter masses. However, when information about the stellar mass is supplied, the upper planetary mass limit is recalculated to be at most 1% of the stellar mass. This is done to keep valid the N-body Hamiltonian internally solved by TTVFast. In Figure 1, we show the currently measured planet masses Mp or ${M}_{p}\sin i$ from Exoplanet Archive 3 . It shows the planet mass limit corresponding to 1% of the stellar mass. We find that ∼95% of the currently known measured planet masses have mass ratios with the parent star lower than 1%, and thus the selected cutoff of 1% is statistically valid for most of the currently known planets. The default period space is limited to be between 0.1 and 1000 days. The eccentricity is limited between 1e-6 and 0.9, where the lower limit is different from exactly 0 to avoid undefined orbital angles. Inclination is defined between 0° and 180°. In the case of periodic boundaries for the argument of periastron, ω, and mean anomaly, M, fixing boundaries in the range 0°–360° could lead to an improper sampling process when the solution is near to these borders. This problem is solved internally by parameterizing the angles by means of ω + M and ω M. The boundaries in the parameterized space encompass 720°. At the end of the sampling process, the results in the parameterized space are mapped to be between 0° and 360° for the individual ω and M. It is an internal process, so the user only defines these angles between 0° and 360°. The ascending node also must be defined between 0° and 360°. These default ranges can be modified by the user to better constrain any parameter or to keep it fixed. In the case of assuming constant parameters, these are not part of the sampling process.

Figure 1.

Figure 1. Currently measured planet masses (or minimum mass) and those of the parent stars. Solid line depicts the planet-to-star mass ratio equal to 1%. Planets at left of the solid line have mass ratios less than 1% and correspond to ∼95% of the total planet sample currently available in the Exoplanet Archive. Dashed and dotted lines correspond to the mass limits to be considered stars and planets, 0.08 M and 80 Mjup, respectively.

Standard image High-resolution image

After setting the parameter space, Nauyaca normalizes the boundaries of all of the planets to dimensionless boundaries between 0 and 1. The whole sampling process (both for optimization and the MCMC) is performed with the normalized boundaries and is returned to physical values at the end of the runs. This normalization is done to remove the differences in orders of magnitude between parameters, which enhances the sampling performance.

2.1.2. Optimization Module

It combines minimization algorithms ordered sequentially to reach solutions Θj that best explain the transit times. These results can be used as initial guesses for the MCMC. We tested many algorithms and their calling order and found that the sequence Differential Evolution (DE; Storn & Price 1997), Powell (PW; Powell 1964), and Nelder−Mead (NM; Gao & Han 2012) progressively minimizes the total χ2 (Equation (2)). This sequence is not arbitrary but reflects the nature of exploration of each algorithm. First, DE is capable of exploring a large parameter space without the necessity of requiring an initial guess. The algorithm is fully stochastic and no information about the smoothness of the space is required. The only mandatory information needed is the parameter space to explore, which has been set in the Setup module. A drawback of this algorithm is the slow convergence rate, and thus the outputs of this method could correspond to unconverged solutions. Even so, these solutions are better than any random proposal in the parameter space. Therefore, these are used as an initial guess to run PW, which is suitable to perform a minimum searching, assuming that the space around the starting point is continuous although complex. Finally, NM takes the solution previously found by PW and performs a downhill simplex method assuming that locally the parameter space is smooth and unimodal.

We show in Figure 2 an example of the progressive χ2 minimization for the transit times fitting of a two-planet system with 190 transit times in total. For this example we performed 320 realizations. Background lines connect solutions of the same run, where a gradual descending of the χ2 throughout the sequence is observed. The box plots show the intervals of the χ2 achieved with the different algorithms, where typically the χ2 is reduced by around 4–5 orders of magnitude between the first and last methods.

Figure 2.

Figure 2. Example of the gradual χ2 minimization when running the sequence of algorithms Differential Evolution, Powell, and Nelder−Mead. Background lines connect the χ2 values of the solutions in the same run. Box edges show from bottom to top, the first and third quartiles of the data. Lines in the middle of the boxes denote the medians. Whiskers extend from the minimum to maximum values.

Standard image High-resolution image

Performing multiple realizations of this process enhances the chance of finding a global minimum but also provides us with a global view of the most probable regions in the parameter space (this will be addressed in detail in Section 4.3). It is a fast way of finding a starting point region with valuable information that can help to delimit the searching radius that the MCMC routine will explore.

2.1.3. MCMC Module

We adapted an MCMC method to explore the parameter space. We chose the Parallel-Tempering sampler, ptemcee (Swendsen & Wang 1986; Earl & Deem 2005; Foreman-Mackey et al. 2013; Vousden et al. 2015), which is a well-suited sampler for exploring a multimodal and a highly dimensional parameter space, such as in the case of planetary systems, whose parameter space increases as ∼7 times the number of planets (one for mass and six orbital elements). The main idea behind the technique is using armies of walkers belonging to a "temperature ladder" that explore the parameter space in different details. The posterior distribution π is modified according to the temperature T, given by πT L(Θ)1/T p(Θ), with L and p the likelihood and the prior distributions, respectively. Walkers belonging to hotter temperatures sample more efficiently from the prior and those in colder temperatures better sample regions with high probability. Walkers in different temperatures have a probability of swapping positions in the parameter space that depends on their current positions and temperatures, such that those in colder temperatures can also explore from the prior and vice versa. This technique decreases the chance that walkers get stuck in local solutions and explores more efficiently the whole parameter space where other standard samplers could fail (see Vousden et al. 2015 for more details about this technique).

3. Mock Catalog

We created a catalog of synthetic planetary systems, defining stellar masses and radii, and planetary masses and orbital elements. We calculated transit times for these planets and applied Nauyaca to this synthetic catalog in order to test the efficiency in recovering the planetary parameters (catalog entries) that gave rise to the synthetic transit times per planet. Throughout the text, we will refer to the parameter values reported in our mock catalog as trues.

3.1. Catalog Creation

In order to set up planetary systems as realistically as possible, we selected masses and radii for the stellar hosts and masses, radii, and orbital periods for planets from the available data of confirmed planetary systems from the Exoplanet Archive. 4 This was done to create synthetic planetary systems with stellar and planetary parameters based on observations. The catalog includes different planetary multiplicities (number of planets in the system), ranging from two up to five planets. Below we describe the procedure followed to create the catalog.

First, we selected systems with planets discovered by the transit method. Second, we selected systems with reported stellar masses and radii. For those with unavailable data but with reported effective temperature, surface gravity, and metallicity, we derived the stellar mass or radius from the work of Torres et al. (2010). Third, we selected those systems for which all of their planets have reported masses, radii, and periods (systems whose planets have these parameters unreported were discarded). This procedure reduces significantly the number of planets, dropping to 2.5% (105 planets) of the original observed planets. In order to increase the number of synthetic planetary systems we applied an over-sampling technique (SMOTE; Chawla et al. 2011), which makes new samples by interpolation of the k-nearest neighbors. Thus, we tripled the number of planetary systems and their parameters, namely, stellar mass and radius, and planetary masses, radii, and periods. The remaining five orbital elements to complete a unique planetary configuration were made from random uniform distributions in the intervals: ecc [0.05, 0.2], inc [89fdg5, 90fdg5], ω [0°, 360°], M [0°, 360°], and Ω [88°, 92°]. We restricted the intervals of the inclination inc and ascending node Ω to get near coplanar and prograde orbits. These restrictions in the construction of the catalog are independent of the tool test. In practice, we can expand the boundaries of any parameter to be explored, as long as they have a physical meaning (see Section 4.1).

Once the complete set of planetary parameters is established, an N-body integration was performed over 106 orbits of the internal planet using REBOUND (Rein & Liu 2012; Rein & Tamayo 2015). We used the chaos indicator MEGNO (Cincotta et al. 2003; Maffione et al. 2011; Rein & Tamayo 2015) to test the dynamical stability of the proposed planetary system. A MEGNO value of around ∼2 indicates a quasi-periodic motion (regular stable orbits). If the set of parameters of the proposed planetary system turned out to be stable (without a planetary ejection or with 1.7 < MEGNO < 2.3), then we appended the final state of the N-body simulation as an entry in the catalog. It should be pointed out that the orbital elements of these entries correspond to the osculating elements at the end of the stable N-body runs and they are not the same as the random values taken from the intervals of parameters indicated above. Finally, we used these entries as initial conditions to run TTVs simulations encompassing 130 transits of the internal planet using TTVFast. We selected that number to mimic the number of transits for planets with periods <10 days, during ∼3 yr of observation. We used a fixed number of transits rather than a fixed time span since it is more relevant for the inversion problem (Nesvorný & Morbidelli 2008). As a result, we got 130 transit times ephemeris for all of the internal planets, and a few less for the remaining planets according to their periods and orbital configurations.

We assigned synthetic errors to these transit times according to the analytical approximation of the timing precision (Agol & Fabrycky 2018),

Equation (3)

where τ is the approximate duration of the transit ingress $\tau \approx 2.2\mathrm{minutes}\left({R}_{p}/{R}_{\oplus }\right){\left({M}_{* }/{M}_{\odot }\right)}^{-1/3}{\left(P/10\,{day}\right)}^{1/3}$, which is a function of planetary radius Rp , orbital period P, and stellar mass M*, $\delta ={({R}_{p}/{R}_{* })}^{2}$ is the depth of the transit, and $\dot{N}$ is related to the Poisson noise due to the count rate of the star. We assumed $\dot{N}$ to be constant and equal to 1 × 107 e minutes−1 to mimic the value in the Kepler CCDs for a star of magnitude ∼12 in the Kepler band (Gilliland et al. 2011). Uncertainties from Equation (3) take planetary parameters derived from the Kepler photometry, and, therefore, our study is centered on the characterization of Kepler-like transits. Finally, we added white noise to the transit time models assuming a normal distribution with the mean equal to the true transit times and with standard deviation equal to the typical uncertainties given by Equation (3).

The full mock catalog is composed of twelve two-planet systems, eight three-planet systems, eight four-planet systems, and four five-planet systems, giving a total of 32 systems harboring 100 planets in total. We remark that all of the planets in the catalog transit, and thus our current study does not include the characterization of nontransiting planets. Tables in the Appendix include the parameters of the simulated mock catalog. Figure 3 shows the distributions of the parameters in the catalog according to the planetary multiplicity.

Figure 3.

Figure 3. Histograms of the stellar and planetary parameters that compose the full mock catalog. Different colored lines represent the properties grouped by planet multiplicity for systems with two (solid blue), three (dashed orange), four (dotted cyan), and five (dashed–dotted red) planets. Remaining orbital elements (not shown here) are taken from uniform distributions. The Appendix contains the data tables of these histograms.

Standard image High-resolution image

The mock catalog of planetary parameters produces a variety of TTVs with different properties, namely, amplitudes, periodicity, and time span. TTVs signals in our catalog have amplitudes that range from almost zero minutes to 180 minutes, with an overall mean of ∼18 minutes. Mean TTVs amplitudes grouped by multiplicity reach 21, 21, 10, and 23 minutes for two-, three-, four-, and five-planet systems, respectively. It translates to a wide range of signal-to-noise ratios (S/N), ranging from ∼1 up to ∼400 with a mean of ∼23 (using the definition of the ratio of the TTVs amplitude and the timing uncertainty; Nesvorný 2019). There is also a variety of observation time spans, which range from ∼230 to ∼5800 days, with a mean of 1200 days.

4. Settings

In this section we outline the election of the parameters to run the tool, including the restrictions of the parameter space, the election of fine-tuning parameters for the MCMC, and the usage of the optimizer results to determine a reasonable starting point for the sampling process.

4.1. Planetary Boundaries

As described in Section 2.1.1, Nauyaca requires specific boundaries for each planetary parameter in Θ. These boundaries delimit the parameter space that will be explored. Here, we discuss how to delimit the parameter space (including the establishment of constant parameters) before carrying out the recovery test over the mock catalog.

From fitting real transit observations it is possible to determine the planetary radius ratio, the orbital period from consecutive transits, and the impact parameter. In the case of the last two, we find from current data of observed exoplanets (from the Exoplanet Archive) that orbital periods are determined, on average, with uncertainties of ∼10−4 days. However, orbital inclinations are on average determined at a level better than 1° (also according to the Exoplanet Archive). Considering nearly coplanar orbits, the line of nodes of each planet are nearly aligned and mutual inclinations are close to 0°, which can help to reduce the dimensionality of the problem. Observed transiting exoplanets have been shown to have small mutual inclinations, typically below 3° (Fang & Margot 2012; Fabrycky et al. 2014), which has been demonstrated to have a negligible effects on TTVs (Nesvorný 2009; Nesvorný & Vokrouhlický 2014; Hadden & Lithwick 2016). Furthermore, because the orbital inclination is a well-restricted observational parameter (with a mean error of 0fdg78 with a dispersion of 2fdg2, according to data from the Exoplanet Archive), it can be kept as a fixed parameter. Additionally, in test runs we let the inclination vary between ∼80°–100° (which is a comprehensible range of inclinations that allows the transits) and confirmed that this angle has limited effects on the results for the transit time models. Nearly aligned ascending nodes represent prograde orbits and anti-aligned ones represent retrograde orbits. Thus, in order to model nearly coplanar orbits we kept fixed the orbital inclinations to their true values, incj,True. We also kept fixed the ascending node of the internal planets Ω1,True (≈90°), which by construction of the coordinate system can be fixed.

For our recovery test, we delimited the planetary parameter space to these ranges, bracketing the lower and upper limits: mass [0.0123, 10 massj,True] M, P [PTrueδ t, PTrue + δ t] days, ecc [10−5, 0.3], inc (fixed) [incTrue] deg, ω [0, 360] deg, M [0, 360] deg, and Ω [70, 110] deg. Here, the lower and upper boundaries for masses correspond to approximately a Moon mass and 10 times the true planetary mass, respectively. The boundaries in period P are around the true period of the planet with a width δ t corresponding to a typical observed period error given the orbital period itself. We estimated δ t by doing a linear regression between observed period errors as a function of the orbital period (data from the Exoplanet Archive), such that δ t = mP + b [days], where we determined m = 2.11 × 10−5 and b = 4.4019 × 10−5. The lower limit in eccentricity (ecc) was set to be small but different from 0 in order to avoid an undefined argument of periastron, while the upper limit was allowed to have values up to 0.3. This chosen range of eccentricities is compatible with 80% of the currently observed planet eccentricities. Argument of periastron (ω) and mean anomaly (M) take the usual definition between 0° and 360°. The boundaries of the ascending nodes (Ω) for all of the planets, except the internals, were restricted to a search radius around ≈ Ω1,True ± 20 deg, and thus we consider only prograde orbits for simplicity.

4.2. Parameters for the MCMC

We inspected the dependence of the parameters that govern the MCMC. The chosen parallel-tempering MCMC method (Vousden et al. 2015) is fine-tuned by two main parameters, namely, the number of temperatures (Ntemps) and the maximum temperature (${T}_{\max }$) to build the temperature ladder. The sampling performance also depends on the number of walkers per temperature (Nw) and finally on the number of iterations (Niter) per chain. We carried out many tests with combinations of these parameters and found that Ntemps ≲ 15 with Nw ≲ 150 in a run over Niter ∼ 5 × 105 are enough in most cases to recover the true parameters Θj,True from the mock catalog within 1σ. We also note that ${T}_{\max }\sim {10}^{2}\mbox{--}{10}^{3}$ is a good choice for the maximum temperature. ${T}_{\max }=\infty $ (as suggested in Vousden et al. 2015) is not adequate for our purposes since walkers belonging to this temperature would propose steps outside our predefined boundaries. We confirmed this behavior on several occasions during our tests. Other parameters to control the dynamics of the temperature adjustment are internally defined within Nauyaca following the suggestions by Vousden et al. (2015). 5

For the recovery test presented in this work, we imposed uniform log-priors for simplicity with the functional form

Equation (4)

where blow and bupp correspond to the lower and upper boundaries for the kth planetary parameter. Table 1 shows a list of the model parameters for each planet as well as the adopted range of uniform priors. The choice of these validity ranges has been described previously in Section 4.1. From these parameters, inc was set as a constant to the true inclination, incTrue, taken from the mock catalog. For each system, Ω considers uniform priors except for the innermost planet (Planet 1 in the catalog entries), which is set to the true value Ω1,True. Thereby, the posterior probability (up to a constant) is given by the sum of the log-likelihood (Equation (1)) and the log-prior (4) functions. In practice, any other prior functions can be supplied to Nauyaca to calculate the posterior probability.

Table 1. Model Parameters and the Selected Priors for the Transit Timing Fit

ParameterPriorUnits
mass ${ \mathcal U }$(0.0123, 10 × massTrue ) M
P ${ \mathcal U }$(PTrueδ t, PTrue + δ t )days
ecc ${ \mathcal U }$(1e-06, 0.3)
incincTrue deg
ω ${ \mathcal U }$(0, 360)deg
M ${ \mathcal U }$(0, 360)deg
Ω ${ \mathcal U }$(70, 110) or Ω1,True deg

Note. ${ \mathcal U }$[blow, bupp] denotes the uniform ranges between lower and upper boundaries. Single values are the fixed parameters in the simulations. Parameters with label True take the data values from the mock catalog (see the Appendix). See text for details.

Download table as:  ASCIITypeset image

4.3. MCMC Initialization from Optimizer Results

Solving the inversion problem of a number of interacting planets, Npla, implies the exploration of a parameter space of dimensions ∼7Npla. The nature of the problem is computationally demanding since an N-body integration should be done at each iteration per walker. The wall-clock time also increases with the observational time span. Thereby, choosing a strategic initial guess for walkers adapted for the parallel-tempering MCMC is crucial to minimize these side effects.

Starting at random points from the prior function would be a reasonable choice assuming that we have informed previous knowledge about the parameters. In general, it would not be the case for planets characterized for the first time. This motivated us to use optimizer results to make an educated initial guess about the planetary parameters. Optimizers take advantage of having both a low computing time consumption in comparison with a full MCMC run (between 1% and 3% of the total time) and an ability to find many modes in the parameter space since realizations are independent among themselves. Although these solutions could be just rough approximations of more detailed solutions, they could help to identify high probability regions suitable for initialization. Initializing walkers near an optimum sensible place is better than using any random point in the parameter space (Hogg & Foreman-Mackey 2018). Even more, initialization using multistart local optimization results is found to enhance the exploration quality and be more suitable for multichain methods (e.g., Hug et al. 2013; Ballnus et al. 2017).

We present three strategies implemented in Nauyaca to initialize walkers using the information from the optimizers, namely, Gaussian, picked, and ladder. In order to set initial values from these strategies, we first made a filter over the total number of optimizer solutions (Nopt) by sorting them according to their χ2. Then, we took a fraction (fbest; defined between 0 and 1) of that ordered list that includes the uppermost solution. That subset of solutions Θopt was then used to initialize walkers.

In Figure 4, we show examples of initialization using these strategies for many values of fbest. For the current example, we focus on the mass space of two planets identified with ID = pl2_id52 in Table 4. We performed 320 realizations of the optimizers to draw an initial walker population for a ladder of Ntemps = 10 temperatures, considering different values of the parameter fbest. The individual solutions from Θopt are shown with squares in the first column, and the cyan star shows the position of the true mass. Colored dots are the initial walkers belonging to different temperatures. As fbest is reduced, solutions with high χ2 are discarded. We detail these initialization strategies adapted to the parallel-tempering MCMC:

  • 1.  
    Gaussian. Walkers are drawn from a Gaussian centered at the mean of Θopt with a 1σ value corresponding to the data dispersion, for each dimension. This is the simplest initialization method and possibly the most frequently used. The main difference is that here the mean and standard deviation are based on the independent random realizations and not on previous knowledge of the planetary parameters (for example, masses from radial velocity measurements). Thus, if the optimizers find a well-restricted solution for any dimension, the MCMC will be able to find the global high probability region faster in contrast to a uniform random initialization. From Figure 4, it is seen that using this strategy while reducing fbest could help to identify the zone in the parameter space where the global minimum could exist. Thus, by using this strategy most of the local modes are covered but there is not special attention given to the individual modes or to the possible correlations between parameters.
  • 2.  
    Picked. Solutions from Θopt are randomly picked and the initial walkers are drawn from the vicinity of those solutions. If the optimizers find many modes, this strategy ensures that walkers will be drawn from around all of the modes which could correspond to local minima or the global minimum. From Figure 4, it is seen that this strategy confines the initial population of walkers as fbest is reduced while keeping the dependency between parameters. Here, walkers are equally distributed around any mode independent of their temperature and therefore all of the modes are initially sampled with the same frequency. This could be suitable for solving problems where apparently there is not a unique region where optimizer results agglomerate.
  • 3.  
    Ladder. Solutions from Θopt are divided into an integer number of chunks equal to the number of temperatures. Walkers belonging to temperature 1 (the main temperature; blue dots) are drawn from the first chunk, which includes the uppermost solution (i.e., with the lower χ2). Walkers belonging to the second temperature are drawn from the first and second chunks. The same rule is followed for the rest of the temperatures until, finally, walkers for the hottest temperature (yellow dots) are drawn from around all of the solutions in Θopt. From Figure 4 it is seen that using this strategy, the modes with the lower χ2 are highlighted as fbest is reduced. Unlike the picked strategy, ladder assigns the outstanding modes to colder walkers while hotter walkers sample the more disperse ones. It allows the exploration of other modes but avoids getting stuck in these local minima.

Figure 4.

Figure 4. Comparison between three strategies to initialize walkers for the MCMC with the information from the optimizers. This example considers the walkers initialization over the two-dimensional space of the masses for a system with two planets (ID = pl2_id52; Table 4). The first column of panels shows the solutions found by the optimizers. The position of the true masses of these planets is shown by the cyan star. Color bar shows the logarithm of the χ2 for these solutions. Remaining columns show the walkers initialization using different strategies (see the text for a full description). Colored dots are the initial walkers belonging to the coldest (1; blue), intermediate (6; orange), and the hottest (10; yellow) temperatures. For clarity, walkers belonging to intermediate temperatures are not shown. Rows from top to bottom show the parameter fbest taking the 100%, 50%, and 25% of the best solutions from optimizers, from which the initial walkers are drawn.

Standard image High-resolution image

The choice of the best strategy and parameter fbest can vary according to the problem. Ideally, a visual inspection of the optimizer solutions (as in Figure 4) could help to identify modes in the parameter space that allows us to decide how to initialize walkers. In practice, a statistical indicator (e.g., standard deviation) or a clustering method could be helpful to choose the initialization: Gaussian for high dispersed data, picked for highly multimodal parameter spaces, and ladder for a multimodal space with a main mode (as in the example of Figure 4).

Note however that the usage of the optimization module and the proposed initialization strategies is not a mandatory step in Nauyaca prior to the implementation of the MCMC. Any initial walker population can be provided by the user as long as these proposals are inside the physical boundaries described in Section 2.1.1. Nonetheless, we empirically find that following this heuristic procedure notably enhances the MCMC performance at a low computational cost.

5. Results and Discussion

We applied Nauyaca and the same fitting procedure to the mock catalog with the aim of inverting the process, going from the synthetic transit times to the planetary parameters and then comparing with the original values in the mock catalog, which we will refer to as the true values. For each run, we kept fixed the inclinations and the ascending node of the first planet, as described in Section 4.1. All of the solutions are determined at the synthetic reference epoch t0 = 0 days, to match with the reference time of the catalog construction.

The procedure consists of the following steps: (1) Providing the true stellar mass and radius and transit ephemeris per planet (midtransit times), and initializing the parameter space; (2) running the optimizers and choosing the best ∼5%–10% of the solutions to initialize walkers using the ladder strategy; and (3) taking the data from step 2 as an initial walkers population for running the MCMC over a fixed number of steps and using a Gelman−Rubin statistic (<1.01; Gelman & Rubin 1992) and a Geweke test (Z-score < 1; Geweke 1992) to assess the convergence and stationarity of the chains.

We performed the same procedure with the parameters summarized in Table 2 according to the number of planets in the system (Npla). Since the dimensionality of the parameter space scales with the number of planets, we increased the number of optimizer realizations (Nopt) and MCMC steps, accordingly. Along the MCMC runs we did a thinning by saving the current state of the chains at a predefined number of steps (shown in Table 2), which also allowed us to diminish the memory requirements. At the end of the runs we measured the mean autocorrelation time of the averaged chains and we determined typical values between 30 and 90, with a mean of 70 steps. With these values we determined the effective sample size, getting typical values between 2400 and 6800 with a mean of 3800 independent samples. For a pair of systems with 3 (ID = pl3_id4) and 4 (ID = pl4_id3) planets, we repeated the MCMC process with the same parameters in Table 2 but changing the initialization strategy from ladder to Gaussian. By doing this, we initialized the MCMC with less informative points and we found consistent results within 1σ of the initial results.

Table 2. Used Parameters for the Recovery Test, According to Planet Multiplicity

Npla Nopt NTemps fbest (%)WalkersSteps × 105 Thinning
2320106.5802.5100
3416106.0803.5100
4512105.01004.5200
5640127.51206.5200

Note. Walkers refer to number of walkers drawn per temperature.

Download table as:  ASCIITypeset image

Using 16 cores per job (i.e., per planetary system), Nauyaca was able to fit two-planet systems in ∼15 hr, on average. The time increased with increasing complexity of the planetary system, reaching up to ∼5.6 days for five-planet systems. Most of the time is spent on running the MCMC, since in comparison the optimizers are quite fast, running Nopt solutions (specified in Table 2) between 10 minutes and 2.5 hr depending on number of optimizer runs. Note however that the wall-clock time depends on many factors, for example the number of planets, the time span of observations, and the number of transits to fit, which in our case exceeded ∼150 transits for two-planet systems (see the number of transits per planet in the Appendix). Thus, lower computational requirements would be enough for simpler systems than those considered in this work. All of the simulations were performed with the supercomputer Miztli at the Universidad Nacional Autónoma de México.

5.1. Optimizer Results

We ran the optimizers to explore the parameter space with the aim of minimizing the differences between the observed and modeled transit times (Equation (2)). Since these runs are independent of each other, increasing the number of realizations enhances the chance of finding more minima that could correspond to local minima or the global minimum. We took a fraction (namely fbest) of the whole set of solutions with the best χ2 to build a subset of solutions Θopt, which were used to initialize walkers.

Setting an optimum composition of Θopt is an interplay among Nopt, fbest, and the number of temperatures. Thus, there is not a unique way of defining these parameters. Although, we noticed that in most cases fbest < 10% is an appropriate fraction using Nopt ∼ 300–500. We used fbest shown in Table 2, according to planet multiplicity. That selection translates to 21, 25, 26, and 48 solutions that comprise Θopt for systems with two, three, four, and five planets, respectively. Taking higher values of fbest means taking more solutions from optimizers with increasing χ2. Usually solutions with high χ2 are located far from the meaningful modes, resembling random proposals (as can be noticed from Figure 4). Thus, selecting almost all of the optimizer solutions (fbest ∼ 1) could reduce the contribution of the optimizers to locate optimal initial regions to draw walkers.

In Figure 5, we show the results of the optimizers by measuring the differences between the solutions found and the true values. The solutions considered in this figure are part of the subset denoted previously as Θopt. Therefore, most of the solutions from the optimizers with higher χ2 values were discarded. The color map indicates the percentage of solutions falling inside each bin. The darkest bins near to zero depict planets for which ≳50% of the solutions better approximate the true solutions. Subpanels with almost uniform yellow/white colors are those parameters less constrained by the optimizers.

Figure 5.

Figure 5. Optimizer results shown as the differences between the solutions found and the true values, grouped by the number of planets (columns) and for each planetary parameter (rows). Individual subpanels are binned along the horizontal axis by the true planetary parameters. Note that these labels are arranged in increasing order but they are not equally spaced. The color code shows the percentage of the solutions found by optimizers falling inside each bin. The total number of solutions for two, three, four, and five-planet systems are 21, 25, 26, and 48, respectively, which correspond to Nopt times fbest, reported in Table 2.

Standard image High-resolution image

For the case of the planet masses, 90% of the solutions in Θopt are typically determined within a factor ∼2–6 with respect to the true masses. Even though the dispersion tends to increase with increasing mass and number of planets, the solutions approach the correct mass with a relative low dispersion. In the case of the periods, the solutions tend to span over all of the allowed space. We found this behavior is due to the narrowness between our selected lower and upper boundaries, which scales according to the period (see Section 4.1). We noticed that when the boundaries are enlarged (for example, with a width of ±0.01 days) the solutions agglomerate around the true periods. Eccentricities are in general well restricted only for two-planet systems, where the dispersion reaches up to ∼0.06 around the true values. For higher multiplicities the solutions scatter out, with a typical dispersion of ∼0.08. We found that optimizers tend to overestimate eccentricities as the number of planets increases. For the argument of periastron (ω) and mean anomaly (M), a similar behavior is found. They are better restricted only for two-planet systems, since for higher multiplicities the dispersion increases, tending to be evenly distributed over the parameter space. Ascending nodes are the less restricted angles given our established boundaries for this parameter. Note however that for the recovery test, we just considered limited prograde solutions (as described in Section 4.1).

Although the initial search region explored by the MCMC would be narrowed down as a result of the optimization process, the chains were initialized in these high density regions and explored all of the allowed parameter space since the adopted uniform priors were not modified.

5.2. MCMC Results

In the work carried out by Nesvorný & Beaugé (2010) the authors test their tool TTVIM (which combines a minimization algorithm with an analytic approximation for inverting TTVs) using synthetic transit observations with the aim of recovering planet masses and orbital elements of a nontransiting planet. Nesvorný & Beaugé (2010) used upper limits in relative and absolute errors in planet parameters to decide whether the solutions were correctly recovered or not. Unlike Nesvorný & Beaugé (2010), we considered as recovered those solutions from the posterior distributions consistent with the true values within the 68% (1σ) and 95% (2σ) credible intervals of the total posteriors, assuming that they follow a normal distribution. These solutions are marked in Figure 6 as circles and squares. The comparison is made for planet parameters: mass, period (P), eccentricity (ecc), periastron longitude (ϖ; defined as the sum of ascending node and argument of periastron, ϖ = Ω + ω), and the true longitude (; defined as = ν + ϖ, where ν is the true anomaly, which is obtained from a Fourier expansion with the terms of the mean anomaly and eccentricity). Some of our posteriors exhibit more than one peak, manifesting the nature of the selected sampler. Median values and errors are usually centered on the prominent peaks so the data in Figure 6 are representative of our results. In practice, a dynamical stability test could help to distinguish the physically possible solutions. This study is underway, but it is out of the scope of the present work.

Figure 6.

Figure 6. Solutions found by the parallel-tempering MCMC compared with the true planetary parameters. The comparison is made, from top to bottom, over the following planetary parameters: mass, period (P), eccentricity (ecc), periastron longitude (ϖ), and true longitude (). The recovery test was done over the transit times originated by these planets. Columns show the results according to the number of planets in each system. Horizontal axes denote the input data (true planetary parameters) and vertical axes denote the differences between the posterior medians recovered by the MCMC and the true parameters. Dark color circles show data consistent within 1σ and light color squares those consistent within 2σ. Vertical error bars are colored with dark or light colors for 1σ and 2σ, respectively. Gray crosses denote the planets whose parameters were inconsistent with the true input data. The percentages of planets that are consistent with the input data are summarized in Table 3.

Standard image High-resolution image

From Figure 6, we take the discrete values of the differences between recovered and true data (excluding the unrecovered data marked as gray crosses), assuming that discrete medians are representative of the MCMC results. This was done for each planet multiplicity and planet parameter. From the resulting distributions, we measured the standard deviation that represents, on average, the typical precision achieved in the recovery test for different parameters and planet multiplicities. For masses, we found the minimum dispersion of 0.7 M for the two-planet systems, and a maximum of 14 M for five-planet systems. For P, a minimum dispersion of 9 s for two-planet systems, and a maximum of 110 s for five-planet systems. For ecc, a minimum dispersion of 0.007 for two-planet systems, and a maximum of 0.03 for three-planet systems. For ϖ, a minimum of 35° for five-planet systems, and a maximum of 50° for three-planet systems. For , a minimum dispersion of 3° for two-planet systems, and a maximum of 7fdg5 for four-planet systems. Intermediate dispersions were found for multiplicities not mentioned in these ranges.

We also calculated the Kolmogorov–Smirnov (KS) statistic over the whole distribution of true values and the posterior medians for all of the parameters for planets in the full catalog. We found, for masses, a KS statistic = 0.07 (p-value = 0.96); for P, KS statistic = 0.01 (p-value = 1.0); for ecc, KS statistic = 0.25 (p-value = 3×10−3); for ϖ KS statistic =0.17 (p-value = 0.11), and for , KS statistic = 0.03 (p-value = 0.99). Thus, we cannot reject the hypothesis that the distribution of masses, periods (P), and true longitudes () are statistically drawn from the same input distributions.

Figure 7 shows the percentage of planets consistent with the true parameters within 1σ and 2σ, when grouping planets with the same multiplicity. In Table 3 we summarize these results also considering the global percentages for the full catalog (independent of multiplicity). For masses, periods, and true longitudes, the global recovery percentages are consistent with the expected statistical values, i.e., around ∼68% of the planet parameters are recovered within 1σ and ∼95% within 2σ. These parameters also exhibit consistency with the input catalog according to the KS test. However, we note that ecc and ϖ have similar recovery percentages but are far below these statistically expected values. For these two parameters we found, respectively, 52% and 55% within 1σ and 80% and 76% within 2σ (see Table 3).

Figure 7.

Figure 7. Percentage of planets whose results from the MCMC are consistent with the input parameters in the mock catalog. Different kind of lines represent the consistency within 1σ (solid lines and filled symbols) and 2σ (dotted lines and empty symbols). Results are grouped by planet multiplicities where the total number of planets is labeled at bottom.

Standard image High-resolution image

Table 3. Percentage of Planets Consistent With the True Parameters Within 68% (1σ) and 95% (2σ) Credible Interval

Planet multiplicityMassPeriodEccentricityPeriastron longitude (ϖ)True longitude ()
 1σ 2σ 1σ 2σ 1σ 2σ 1σ 2σ 1σ 2σ
 (%)(%)(%)(%)(%)(%)(%)(%)(%)(%)
2 (24 planets)668758956610054876691
3 (24 planets)58916280549145667095
4 (32 planets)759678100477862787596
5 (20 planets)85956585404555706575
Full catalog (100 planets)71936791528055767091

Note. Percentages are given for specific planet multiplicities and for the full catalog.

Download table as:  ASCIITypeset image

We investigate the possible causes of both the similarity and low recovery rates by inspecting the dependence between both parameters. We found that ∼45% of planets with ecc ≲ 0.05 recover both ecc and ϖ at the same time. The remaining planets do not recover at least one of these parameters. By contrast, planets with ecc ≳ 0.05 recover both parameters ∼90% of the times, showing that the low recovery rates occur mainly for low eccentricity orbits. In our catalog, most of the planets belonging to systems with three or more planets have eccentricities below ∼0.05, as shown in Figure 3. Even more, from Figure 7, it is seen that eccentricity is the only parameter that diminishes its recovery rate as the number of planets increases (teal lines with stars).

We find that the cause of the similarity and the global low recovery rate for ecc and ϖ is a result of the difficulty of finding a well-defined argument of periastron as the orbits tend to be more circular, which in our case occurs mainly for planets belonging to high multiplicity systems. We will show in Section 5.4 that the trend of the eccentricity shown in Figure 7 and the low recovery rate of ecc (and consequently ϖ) can be partially explained by the diminishing of the number of transits available for external planets belonging to high multiplicity systems.

5.3. TTVs

In Figure 8 we illustrate the TTVs of two systems with simulated data of different qualities and their orbits. The top figure corresponds to the system with ID = pl2_id52 in Table 4 for which we estimate an S/N of 8.4 and 5.0 for planets 1 and 2, respectively. The bottom figure corresponds to the system with ID = pl3_id1 in Table 5 for which we estimate an S/N of 1.6, 2.08, and 2.06 for planets 1, 2, and 3, respectively. Left panels show 100 orbital configurations (thin colored orbits) reconstructed from the planetary parameters taken randomly from our MCMC posteriors. For comparison, the true orbits are plotted as solid black lines. Transits are detected each time the planets cross the +Z-axis, which corresponds to the line of sight. Right panels show the 100 TTV signals (colored solid lines) using the same planet parameters of the orbits. Synthetic observations are shown by empty circles with error bars. For these two systems, the TTV models are consistent with the synthetic data, but with a better planetary determination for the top system, which has a better S/N. Hence, the quality of the transit times directly affects the determination of the orbital configurations. For example, we identify that in the case of the three-planet system, the argument of periastron shows a wide range of possible solutions with respect to the true position (dashed black lines).

Figure 8.

Figure 8. Examples of TTVs fitting for a two-planet system (ID = pl2_id52; Table 4) and three-planet system (ID = pl3_id1; Table 5). In left panels, dark solid lines correspond to the true orbits and the lighter color orbits correspond to 100 orbital solutions randomly selected from the MCMC posteriors. Black dotted lines indicate the position of the argument of periastron. In this reference frame, the observer is located at +Z. These orbit plots have been made with an adapted version of the OrbitPlot in REBOUND (Rein & Liu 2012). Right panels show the simulated TTVs (circles with error bars), while colored lines show the 100 random fits with the same planet parameters of the orbits. In both systems, the internal planets are labeled as Planet 1 and the order increases outwards.

Standard image High-resolution image

We statistically measured the fitted residuals from the whole catalog by taking 100 random samples per planet from the MCMC posteriors and then calculating their corresponding TTVs. We took the differences between data and the fitted transit times, grouping these residuals according to their multiplicity. The resulting distributions are shown in Figure 9. We found a typical standard deviation for these residuals of 2.5 minutes for two-planet, 2.3 minutes for three-planet, 3 minutes for four-planet, and 5.3 minutes for five-planet systems.

Figure 9.

Figure 9. Typical residuals from the TTVs fitting grouped by planet multiplicity. Individual histograms were built by summing residuals of 100 random solutions per planet from the whole catalog.

Standard image High-resolution image

5.4. The Impact of the Data Quality and Quantity

The TTVs inversion problem challenges not only the methods or algorithms, but also the observational data requirements. In this subsection we address this issue by considering the quality (measured by the S/N) and quantity (Ntran) of the data required in order to correctly determine planetary parameters. We focus on the determination of planetary mass and eccentricity.

Previous works have proposed different answers regarding the required number of transits and data quality to invert TTVs. Saad-Olivera et al. (2019) suggested that a large number of TTVs combined with a high S/N can be enough to robustly estimate planetary parameters without the necessity of radial velocity measurements. A detailed analysis regarding the characterization of nontransiting planets was conducted by Veras et al. (2011), finding that at least 50 transits are needed to invert the problem. Nesvorný & Morbidelli (2008) developed a method based on perturbation theory to characterize two-planet systems, and found that an S/N ∼ 15–30 and about Ntran ≳20 are typically required to uniquely characterize these systems.

We carried out an analysis of the parameters involved in the fitting procedure, looking for possible correlations between the data quality and quantity, the intrinsic planetary parameters (including the number of planets, Npla), and the results we found using Nauyaca. A first inspection suggested a trend with the number of transits and thus it is convenient to define the total signal-to-noise ratio (S/N)T as the product of ${\rm{S}}/{\rm{N}}\times \sqrt{{N}_{\mathrm{tran}}}$. Figure 10 shows the fractional error in the determination of the planet mass and eccentricity, and the dependency with the (S/N)T , Ntran, and Npla. Here, 2σmass and 2σecc correspond to our derived uncertainties taken from the posteriors encompassing the 95% credible interval. These uncertainties are divided by the true mass and eccentricity, and thus, Figure 10 can work as a guide to put some constraints on the determination of these parameters given the quality and quantity of the data.

Figure 10.

Figure 10. Influence of the data quality and quantity over the proper determination of the planet mass (left panels) and eccentricity (right panels). Horizontal axes denote the total signal-to-noise ratio ${({\rm{S}}/{\rm{N}})}_{T}={\rm{S}}/{\rm{N}}\times \sqrt{{N}_{\mathrm{tran}}}$ and vertical axes are the fractional errors derived from our results. The 2σ corresponds to the 95% credible interval from the posteriors and MassTrue and eccTrue are the true values in the catalog. The recovered planets are marked with circles and the unrecovered planets are marked with crosses. In the top panels, colors correspond to the numbers of transits per planet (Ntran) binned by 20 transits. In the bottom panels, planets are colored according to the number of planets in their systems.

Standard image High-resolution image

In general, it is seen that the fractional error (related with the adequate determination of the parameters) has a correlation with the quality of the data and in a second term with the quantity. Moreover, the deficiency of one of these quantities is in some cases compensated for by the other one. We observed that planets with individual high (S/N)T can constrain the planetary mass and eccentricity even with a relative low Ntran. Complementary, a large Ntran can compensate for a low (S/N)T . Planets with a low (S/N)T exhibit a larger dispersion of their fractional errors for their masses and eccentricities, although, those with a large Ntran usually have a better determination (i.e., a lower fractional error). Comparing the top and bottom panels, it can be discerned that the majority of the unrecovered planets are those with a number of transits ≲60–80 (white/yellow crosses), corresponding to systems with three, four, and five planets. However, planets of these multiplicities with low (S/N)T can still find the correct solution but with a poor constraint (white/yellow circles in the upper left part of the top diagrams). Also notice from the bottom panels that the trend between the fractional errors and the total (S/N)T seems to hold when grouped by planet multiplicities.

From these results it is seen that (S/N)T and Ntran have a significant impact on the determination of these parameters. The low recovery rate for ecc discussed in Section 5.2 is then explained by the low (S/N)T of our synthetic data and the low number of transits per planet (≲20–40) for systems with many planets. Low (S/N)T can be due to noisy measurements of the transit timing or due to low amplitudes of the TTVs signals. In the first case, the limitation is in part caused by the instruments and observational strategy. In the second case, low amplitudes correspond to planets in less perturbed (almost circular) orbits. In this situation, inverting TTV signals results in less constrained orbits.

6. Considerations and Caveats

Here, we highlight many of the considerations made in this work that could be relevant for the users of Nauyaca. We also point out many of the limitations and caveats of using this tool.

  • 1.  
    In this work we assume the continuous monitoring of the planet transits and thus there are no data gaps in our ephemeris. In practice this could be unrealistic, especially for ground-based observations where telescope time is limited. Although, as discussed by Nesvorný & Morbidelli (2008), the number of transits is determinant in the planetary characterization rather than the time distribution.
  • 2.  
    We remind the reader that there is no unique way of choosing the fine-tuning parameters for running the algorithms in Nauyaca, since it will depend on the problem to be solved. Although, the parameters chosen here for synthetic systems can work as a guide for real planets.
  • 3.  
    Regarding the usage of Nauyaca with real systems, the user must be aware that stellar mass and radius are well restricted to consider them as constants. In cases where one or both of these parameters exhibits large uncertainties, we suggest running many simulations with a grid of stellar parameters including the values with the lower and upper uncertainties.
  • 4.  
    We remind the reader that TTVFast and therefore Nauyaca is adapted to deal with planets around single parent stars, and hence planets in circumbinary systems or other configurations are not allowed.
  • 5.  
    We also must take into account that the fitting is performed over the midtransit times instead of the TTVs. Since TTVs are the transit time signals in an OC diagram, the calculated data depend on the number of epochs and the method used to make the linear regression over the transit times. Fitting to TTVs instead of transit times alone can result in an imprecise period determination of about 40 minutes for a ∼70 days period planet, as shown by Carpintero & Melita (2018). For that reason, we performed the fitting methods over the raw midtransit times.
  • 6.  
    The simulations are performed over a pre-fixed time span with a pre-selected time step that by default is set to P1/30, where P1 is the period of the internal planet.
  • 7.  
    When computing the midtransit times, the planet radius is not taken into account to assess whether a transit occurs. Only the coordinates of the planet center are considered to determine if the planet transits. Thus, grazing transits (as for example in WASP-67b; Mancini et al. 2014) are currently not considered.

7. Summary and Conclusions

In this work we present Nauyaca, 6 a Python package that encompasses minimization routines (Optimization module) and an MCMC method (MCMC module) exclusively adapted to find planet parameters (masses and orbital elements) that best reproduce the transit times based on numerical simulations. Even though the numerical method is more computationally expensive (compared to analytical approximations), it is more suitable to address more general situations, such as considering many planets with varied orbital configurations. Nauyaca requires transit ephemeris per planet and the stellar mass and radius. Additionally, any previous knowledge about validity ranges can be supplied in order to better constrain the parameter space.

Previous studies of transit timing analysis have used synthetic data to test new techniques or to quantify the relation between properties of TTVs and planetary parameters (e.g., Nesvorný & Morbidelli 2008; Meschiari & Laughlin 2010; Veras et al. 2011). However, in most cases these studies have been limited to the study of two-planet systems where one of the planets transits and the other acts as the perturber. Here, we analyze a large sample of synthetic transiting planets with planetary parameters based on the current planet data from the Exoplanet Archive (see Section 3.1). For these planets we calculate the midtransit times to use them as input to Nauyaca (Section 5). This allows us to characterize the performance of Nauyaca by measuring the consistency rate between the catalog entries and the parameters determined by the tool.

For all of the systems, we run optimization algorithms to test the performance for many planet multiplicities (Section 5.1). Optimizers take advantage of the low computation time in contrast to that of a full MCMC run and provides an overall outlook of the regions in the parameter space with a higher probability (the planet boundaries were defined in Section 4.1). We find that the best performance is achieved for the two-planet systems for any dimension, and for higher multiplicities the results are varied. The optimizers would define high density regions for the masses of the planets within a factor ∼2–6, allowing us to initialize the chains around 20% of the real masses. Eccentricities are in general well restricted only for two-planet systems. Nevertheless, the optimizers for higher multiplicities are not well suited to define high density regions where the MCMC chains would be initialized. Orbital angles remain in general loosely constrained, except for two-planet systems.

We draw the initial walker population for the parallel-tempering MCMC using the best <10% of the solutions from the optimizer runs. We find a good agreement between the input parameters in the catalog and those found in the recovery test with the MCMC, at 2σ (Section 5.2). The global recovery percentages for masses, periods, and true longitudes are in concordance with the statistically expected values of ∼68% within 1σ and ∼95% within 2σ. These parameters also exhibit consistency with the input catalog according to a Kolmogorov–Smirnov test. By contrast, eccentricities and periastron longitudes have similar low recovery rates. Even more, the recovery rate for eccentricity diminishes as the number of planets increases. We find that the cause of the similarity and the global low recovery rate between ecc and ϖ is a combination of the difficulty to determine the argument of periastron as the orbits tend to be circular and the reduction of the number of transits for external planets, mainly for high multiplicity systems. In these scenarios, the S/N of the data plays a determinant role to correctly determine these parameters.

Depending on the planet multiplicities, a typical mass precision accomplished in the recovery test ranges from ∼1–14 M. Periods achieve a precision of between 10 s and 2 minutes. Eccentricities reach a precision of between ∼0.01–0.03. Periastron longitudes and true longitudes have typical precisions of ∼40° and 6°, respectively.

We investigate the effect of data quality and quantity (Section 5.4) on the proper determination of the planetary mass and eccentricity. We find that, in general, quality is more important than quantity, although in many cases one parameter can compensate for the deficiency of the other one. However, we warn that the results in this part of our study should be taken as a guide, since the simulated planet sample in our catalog is limited.

Finally, we make suggestions about the fine-tuning parameters involved in the procedure. Depending on the computation facilities, parameters in Table 2 could be a reasonable starting point to make the optimization and MCMC runs. Note that in our mock catalog, the "observed" time span and the number of transits in the systems are varied. Therefore, the fine-tuning parameters can be scaled to be adapted to specific problems. We suggest performing, at least, between 100–150 optimizer runs (Nopt) times the number of planets in the system, and taking <10%–30% of the best solutions to initialize walkers. The choice of the initialization strategy presented in this work (Section 4.3) depends on the parameter space itself, which is unknown by nature. We suggest using the ladder or picked strategy if nothing is known about the parameter space (as in the majority of the situations) and the Gaussian strategy if the parameter space is somehow constrained, as for example when considering fixed angles, fixed periods, circular orbits, etc. Of course, the suggested methodology used in this work is not mandatory when using Nauyaca, since, for example, optimization routines and the proposed initialization strategies are optional. The user has the freedom to select the tools that are best suited to the TTVs inversion problem. However, we note an improvement in the MCMC performance and the results when following the proposed method shown in this work.

In a forthcoming work, beyond the scope of this paper, we will show the application of the tool to more specific situations, such as systems with nontransiting planets, those with highly mutual inclinations, and those with missed transit data. We will also show the application to real systems in order to revisit planet parameters of previously characterized planets and also with new planets.

We provide the data used throughout this work in electronic format at 10.5281/zenodo.5218498. Nauyaca first release can be found at 10.5281/zenodo.5230451.

The authors gratefully acknowledge the computing time granted by DGTIC-UNAM for access to the supercomputer Miztli in the HTC group, under the project with code LANCAD-UNAM-DGTIC-361. We thank Gabriel Perren and Luis M. Pavón for useful discussions. We also thank the anonymous referees for their useful comments and suggestions made to improve the quality of the manuscript. E.F.C. also acknowledges the PhD grant awarded by the CONACYT Graduate Fellowship. This work was supported by UNAM-PAPIIT IN-107518 and BG-101321. H.V. was supported by the project UNAM-PAPIIT IN-101918. This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program.

Facilities: Supercomputer Miztli - , Exoplanet Archive. -

Software: Numpy (Oliphant 2015), Scipy (Virtanen et al. 2020), H5py (Collette 2013), Matplotlib (Hunter 2007), Seaborn (Waskom et al. 2017).

Appendix: Catalog Tables

Here, we collect the data tables of the mock catalog that show the planet properties of the synthetic planetary systems. The Tables are grouped by planet multiplicity: Table 4 for two-planet, Table 5 for three-planet, Table 6 for four-planet, and Table 7 for five-planet systems. System IDs follow the syntax pl{Npla}_id{ID}, where Npla is the number of planets in the system and ID is an internal identifier number. Orbital elements correspond to the osculating elements at the simulated reference time of t0 = 0 days. These tables can be found in electronic format at: 10.5281/zenodo.5218498.

Table 4. Input Catalog of Two-planet Systems

System ID M* R* Planetmass P eccinc ω M Ω Ntran
 (M)(R) (M)(day) (deg)(deg)(deg)(deg) 
pl2_id00.910.8615.7215.11180.092191.264154.81147.7789.75130
   227.96911.7620.101989.75957.27313.4989.9457
pl2_id11.091.3112.5999.552160.1289.816188.41193.3189.72130
   215.19921.057990.093990.21341.55201.2590.8359
pl2_id71.091.6110.76915.092220.101290.244280.58160.9890.04131
   20.89922.799870.089589.978338.53186.6591.9886
pl2_id91.281.98137.18622.953110.074788.69210.99286.9989.03130
   279.45742.86980.131390.332223.66303.8888.969
pl2_id231.07041.2077111.59210.685210.134689.983150.27358.0390.51130
   212.23320.887510.109890.03941.13358.6789.8167
pl2_id280.91270.862715.6745.319630.235988.165110.47136.6388.59130
   227.03411.815860.071390.249184.33289.688.6458
pl2_id351.08431.3948110.25710.783810.094590.56767.59349.1490.66130
   213.11519.674670.2189.87444.17128.0689.7671
pl2_id381.04081.0681110.0712.405010.137290.94941.0391.0388.58130
   27.74920.617140.094189.3436.34185.8291.9378
pl2_id390.65130.623615.7444.013420.168591.171101.6257.7491.32130
   218.8378.298710.120689.69268.2215.2688.963
pl2_id450.93310.9199124.216.705860.089889.33140.92221.7288.17130
   2119.35930.748720.093190.339108.98130.3990.9370
pl2_id520.91330.863315.6655.354680.123790.717296.2216.3988.36130
   226.85611.833190.072790.052187.2686.6288.9559
pl2_id621.08011.387719.77517.955490.088491.163174.3874.3489.82130
   28.06232.451190.039989.168124.3528.6389.0272

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as:  ASCIITypeset image

Table 5. Input Catalog of Three-planet Systems

System ID M* R* Planetmass P eccinc ω M Ω Ntran
 (M)(R) (M)(day) (deg)(deg)(deg)(deg) 
pl3_id11.081.4917.316.887010.021889.978309.12121.8790.51130
   27.04912.817160.029789.83510.79150.0691.5770
   33.035.327240.009690.83168.07213.0790.9225
pl3_id30.971.1116.9923.504480.026989.909267.55132.8989.7130
   217.1637.64330.027890.043318.49168.7489.5759
   316.52714.856070.008489.97170.05279.0789.7631
pl3_id41.081.017.3134.549580.038490.17133.3186.1888.79130
   24.13266.078130.007790.58111.0496.0290.4668
   3133.488125.843330.037390.045159.98192.789.0636
pl3_id71.040.9412.22545.154470.015589.5587.22251.2390.94130
   24.13285.302950.032290.45949.212.5791.3169
   37.628130.217460.02889.74423.4772.6290.5245
pl3_id90.6950.720415.4021.751090.046591.43638.82345.7489.37130
   24.6944.590280.023688.986161.25348.3390.2949
   36.9888.24690.025289.6158.84310.589.1628
pl3_id100.52240.442310.07710.47070.040689.809292.62246.1488.3130
   21.9814.105090.024489.909239.76191.3188.4697
   30.6723.574670.030690.33682.64149.2488.5158
pl3_id111.06581.463117.2386.714940.035990.8214.54160.5690.27130
   26.86212.537760.007490.446204.83162.0990.4370
   33.03234.446410.006288.187291.7388.8491.0326
pl3_id160.72750.780414.7132.827970.034890.74265.0564.6790.01130
   21.6625.089580.039789.017.8526.5788.5173
   34.0737.758080.016489.538350.873.5988.9248

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as:  ASCIITypeset image

Table 6. Input Catalog of Four-planet Systems

System ID M* R* Planetmass P eccinc ω M Ω Ntran
 (M)(R) (M)(day) (deg)(deg)(deg)(deg) 
pl4_id00.690.7111.2670.658530.04290.167194.71163.8990.11130
   20.2897.814550.05190.613159.71101.8188.5911
   38.89914.708540.032790.045181.8176.0589.86
   414.29919.469140.008289.852117.8782.9490.064
pl4_id21.281.52110.4883.743070.028790.513314.46190.5188.26130
   215.57410.427730.040389.628281.5301.388.8147
   3106.15522.347940.014790.044310.83231.2488.6322
   434.96154.406590.022990.077235.7131.8388.839
pl4_id31.01.0415.3016.164840.017790.55647.02108.0490.16130
   210.48813.567470.018989.597194.95171.790.3259
   38.10123.976650.016590.391207.63258.7390.8933
   411.12443.860760.00989.72873.95273.6790.4318
pl4_id71.0281.087815.0925.816430.033488.33949.8970.5887.76130
   210.3612.558720.018989.747175.26215.8189.2461
   37.622.107180.022890.32867.44114.1590.1634
   410.8340.449150.040390.276216.75198.0390.819
pl4_id81.06241.393516.3926.799610.057389.9337.25301.9889.45130
   27.68811.633170.011688.195158.65100.690.5276
   38.04919.203830.027291.12126.99166.4591.8646
   47.83631.301150.029389.897125.59297.7890.0329
pl4_id91.18941.3388110.82911.767650.048690.31210.63299.091.07130
   27.71924.420510.007990.433120.7369.7589.3562
   323.68346.861630.016989.907300.94231.4989.6232
   49.56576.282880.066689.866162.63222.6590.0320
pl4_id111.06061.383216.366.780810.035290.006198.33243.9488.32130
   27.7711.686470.023789.284163.6931.7287.6475
   38.0519.350730.063589.77249.4731.6489.3246
   47.93231.664940.015890.597250.34179.0890.1528
pl4_id131.14161.281914.2444.403570.043588.88776.63318.0490.25130
   29.848.456550.013288.6563.9797.388.7667
   35.56514.527250.041190.086172.9794.3790.0439
   49.63526.678560.005191.074299.01232.9289.5721

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as:  ASCIITypeset image

Table 7. Input Catalog of Five-planet Systems

System ID M* R* Planetmass P eccinc ω M Ω Ntran
 (M)(R) (M)(day) (deg)(deg)(deg)(deg) 
pl5_id00.810.7614.35.284420.028687.353247.95146.1389.42130
   23.07.07870.011889.263284.41138.0789.4398
   33.81410.310110.050491.0750.12206.0290.1967
   48.89916.14450.019491.287113.7739.5589.842
   55.227.45490.037389.23338.07141.8190.3325
pl5_id10.690.6419.5355.714680.050290.721290.05321.989.96130
   24.13212.444890.027589.508253.5549.0989.2260
   313.98418.152370.029389.873283.18204.0189.3941
   435.915122.393060.034189.78318.1213.3990.146
   534.961267.231510.031389.853155.8285.6790.093
pl5_id20.80530.755314.5045.314760.020288.981342.17193.5489.05130
   23.0447.241490.078790.10574.54256.8890.495
   34.2110.631790.033990.111290.26238.2889.6165
   49.95120.277650.003189.968175.85193.5989.9234
   56.35836.780330.029990.235264.02264.3388.0318
pl5_id30.69660.646619.2455.691280.037190.184288.33278.1289.56130
   24.06912.146910.036290.147301.5328.8789.4161
   313.42117.725880.018990.025355.59331.889.3142
   434.419116.485340.030189.696161.07110.6990.036
   533.314253.776120.038290.213221.67152.9190.33

Note. Column names are: system ID, stellar mass, stellar radius, planet number, planetary mass, period, eccentricity, inclination, argument of periastron, mean anomaly, ascending node, and number of simulated transits.

Download table as:  ASCIITypeset image

Footnotes

Please wait… references are loading.
10.3847/1538-3881/ac2744