Kinematic Evidence of an Embedded Protoplanet in HD 142666 Identified by Machine Learning

J. P. Terry; C. Hall; S. Abreau; S. Gleyzer

doi:10.3847/1538-4357/acc737

1. Introduction

Protoplanetary accretion disks are the sites of planet formation. The newest generation of telescopes, such as the Atacama Large Millimeter/submillimeter Array (ALMA), has unprecedented capabilities for observing protoplanetary disks. For the first time, we can not only resolve disks themselves, but also quantify the motion of the dust and gas within them. Disks display a striking variety of structures such as rings (ALMA Partnership et al. 2015; Dipierro et al. 2018), likely caused by dust trapping due to forming planets (Pinilla et al. 2012; Dipierro et al. 2015), and spirals (Pérez et al. 2016), which may be caused by forming planets (e.g., Dong et al. 2015b) or another mechanism such as gravitational instability (e.g., Dong et al. 2015a; Meru et al. 2017; Hall et al. 2018). This new information has greatly advanced our understanding of the processes underlying the formation and evolution of planetary systems.

Planets and physical processes, such as gravitational instability, influence the motion within the disk. This causes the material to deviate from simple Keplerian motion. Comparing the observed motion against purely Keplerian motion provides information on the bodies and processes present in the disk (Hall et al. 2020; Longarini et al. 2021; Paneque-Carreño et al. 2021; Bae et al. 2022; Pinte et al. 2022; Terry et al. 2022b). Non-Keplerian motion has been used to uncover a variety of structures, including localized perturbations associated with gaps and planets (Pinte et al. 2018, 2019, 2020; Teague et al. 2018) as predicted by Perez et al. (2015).

Kinematic analysis is limited by our ability to accurately identify non-Keplerian motion. The deviations can be small and frequently occur in noisy images. It is therefore not only a difficult and slow process to identify them, but there is also the strong possibility of overlooking their occurrence. Any signature that is overlooked is a missed opportunity to detect either a forming planet or some other process, such as the GI Wiggle indicative of gravitational instability (Hall et al. 2020) or the vertical shear instability (Barraza-Alfaro et al. 2021).

Machine learning (ML) provides a useful tool for this task. ML has quickly become ubiquitous in both society and the sciences, everything from self-driving cars (Bojarski et al. 2016) to medicine (Parmar et al. 2015). Recent efforts in astronomy have made it clear that ML is a powerful method even with simulated training data (Jo & Kim 2019; Alexander et al. 2020; Möller & de Boissière 2020). ML, and in particular computer vision, excels at the analysis of images (Voulodimos et al. 2018). In some cases, it has even been shown to outperform humans (Zhou et al. 2021). It is therefore naturally suited for application to the noisy data sets in observational astronomy.

Using ML models developed in a previous work (Terry et al. 2022a), we identify a strong and localized deviation from Keplerian motion in HD 142666. Using the current widely accepted field standard method (Pinte et al. 2018, 2019; Teague et al. 2018), we perform smoothed particle hydrodynamic (SPH) simulations to recreate the kinematic structure of the disk. The agreement is significant when a 5 M_J planet is included at 75 au. We conclude that HD 142666 hosts a planet.

The paper is arranged as follows: Section 2 describes the models and simulations used. Section 3 shows the results of applying the models and simulating the system. Section 4 gives our conclusions.

2. Methods

2.1. Machine Learning

We use the ML models described in Terry et al. (2022a) and describe them here for completeness. We use two different architectures: EfficientNetV2 (Tan & Le 2021) and RegNet (Xu et al. 2022). All models were made using PyTorch (Paszke et al. 2019), albeit with significant modifications to the default models and hyperparameters. We denote these models as EN47, EN61, EN75, RN47, RN61, and RN75. Table 1 gives performance metrics for the models: model accuracy at 50% and 95% decision thresholds and the area under the receiver operating characteristic curve (AUC).

Table 1. Model Performance Metrics from Terry et al. (2022a)

Value	EN47	EN61	EN75	RN47	RN61	RN75
Accuracy at 50% cutoff (%)	97 ± 0.5	97 ± 0.5	93 ± 0.7	78 ± 1.1	98 ± 0.4	95 ± 0.6
Accuracy at 95% cutoff (%)	96 ± 0.5	94 ± 0.5	88 ± 0.9	65 ± 1.3	96 ± 0.6	92 ± 0.7
AUC	0.99 ± 0.002	0.99 ± 0.003	0.98 ± 0.003	0.86 ± 0.010	>0.99 ± 0.001	0.98 ± 0.032

Download table as: ASCII Typeset image

The models were trained using synthetic observations from the MCFOST (Pinte et al. 2006, 2009) radiative transfer code. MCFOST inputs were drawn from 1000 PHANTOM (Price et al. 2018) SPH simulations of systems with and without planets (Terry et al. 2022a). Each MCFOST calculation outputs a position–position–velocity cube from ¹³CO transition lines (J = 2 → 1 and J = 3 → 2). The cubes were convolved spatially and spectrally and noise was added in order to replicate current observational capabilities.

The model inputs (i.e., radiative transfer outputs) are images of dimension C × H × W, where C is the number of input channels, H is the height of the image, and W is the width of the image (here, H = W = 600 pixels). A typical grayscale or RGB image will have C = 1 or 3, respectively. We instead input an entire position–position–velocity cube.

Observations vary significantly in the number of channels that cover the disk, but the typical range is between ≈40 and 100. To address this, we train three different implementations of each model, which gives us a total of six models. The difference between each implementation is the number of input velocity channels (C = 47, 61, or 75).

Each model outputs a two-component vector such that the sum of the components is 1, i.e., it has undergone softmax activation (Goodfellow et al. 2016). This can be interpreted as the probability that the given input belongs to a certain class, i.e., planet versus no-planet class. The models also output images of their internal activation structure, which we consider to be the more important output in this context. While the models were not trained to pinpoint the locations of planets—a job more suited for semantic image segmentation (Minaee et al. 2022)—the activation structure can inform us which regions the model finds important when making its classification decision. Terry et al. (2022a) found that the activations were able to highlight velocity channels with non-Keplerian motion in systems that host planet(s).

To this end, we apply our previously trained models to ALMA data of the HD 142666 system. We inspect the softmax values and activation structures to gain insight into whether a planet might be in the system and, if so, where its signature is the strongest.

2.2. Observational Data

The HD 142666 data were taken from the DSHARP catalog⁶ (Andrews et al. 2018; Huang et al. 2018). Data includes ¹²CO line emission (J = 2 → 1) and 1.25 mm continuum images. The system was imaged with a beam with FWHM of 77 × 61 mas (≈11 × 9 au) with an rms noise of 1.3 mJy beam⁻¹; channels have a 0.35 km s⁻¹ resolution (Andrews et al. 2018). Figure 1 shows selected velocity channels overlaid on the continuum.

**Figure 1.** Line emission overlaid on continuum. Left: Δv = − 1.4 km s⁻¹ channel. Middle: Δv = − 1.75 channel. Right: Δv = − 2.1 channel. The continuum beam is in magenta, and the line emission beam is in cyan.
Download figure:
Standard image High-resolution image

The image was cropped to focus on the disk, and a subset of velocity channels was used. The channels were reshaped to 600 × 600 pixels and normalized such that all pixel values were between 0 and 1.

2.3. Hydrodynamical Simulations

We run a suite of SPH simulations using PHANTOM, varying the mass of the embedded planet between 1 and 5 M_J. For each simulation, we create channel maps using MCFOST in the same way that the original training data were made. The kink is approximately 75 au from the center of the disk, so we place a planet at this distance.

System parameters are taken from Rubinstein et al. (2018), Andrews et al. (2018), and Huang et al. (2018). The stellar mass, temperature, and radius are 2.0 M_⊙, 7500 K, and 2.2 R_⊙, respectively. The disk has a mass of 0.0533 M_⊙, an inner radius of 1.3 au, and an outer radius of 150 au. The system is inclined at 62° with a position angle of 162 and an azimuth of 72°. It is located 148 pc from Earth.

The SPH outputs are used to create line emission maps to mimic ALMA capabilities. These calculations are done using the MCFOST radiative transfer code (Pinte et al. 2006, 2009). Each calculation uses 10⁸ photon packets and includes carbon/silicate dust (Draine & Lee 1984) with a dust-to-gas ratio of 1:100. The resulting outputs were convolved spatially and spectrally to match the observed line emission resolution.

3. Results and Discussion

Figure 2 shows that HD 142666 has a strong, localized kink that is detected by the ML models. The kink is particularly visible in the upper middle (Δv = − 1.75 km s⁻¹) channel. The lower row shows activation structures that roughly correspond to the above channels.

The average softmax value is over 0.84, which means that the models predict the probability that the input for HD 142666 contains a planet to be over 84%. This prompts further scrutiny of the activations, which we use to determine the most probable channel that contains the kink.

The strength and localization of the newly identified kink are reminiscent of the kinks in HD 163296 and HD 97048. As with HD 163296, the kink in the gas is outside of the radial extent of the continuum disk. Both of these disks were found to host planets after SPH simulations containing a planet recreated the kinematic structure observed in CO observations (Pinte et al. 2018, 2019; Teague et al. 2018). We apply this same method to HD 142666 to demonstrate that the kink identified by our models is consistent with kinks identified by conventional means in HD 163296 and HD 97048.

We found that a simulation of a protoplanetary disk with a 5 M_J planet most accurately reproduced the observation. Figure 3 shows the results. A localized kink in the vicinity of the planet is clear in the upper left panel Figure 3 (Δv = − 2.3 km s⁻¹). This kink is visible to a lesser extent in the Δv = − 2.0 km s⁻¹ in the upper right panel of Figure 3, which is also the case in Figure 1. There is strong agreement between this feature and the non-Keplerian channel identified by our models: both display a kink of approximately the same shape and size at approximately the same radial location. This can be seen in the lower left panel and right panel panel of Figure 3. Note that the simulation and observation do not display the strongest kink in the same velocity channel. This is simply a relic of the finite temporal resolution of the simulation, which makes it extremely unlikely that the simulation will be saved when the planet is exactly coincident with the observation. The temporal resolution of the simulation was increased to mitigate this effect, but it persists to some extent.

We conclude that HD 142666 hosts a planet.

We note that our conclusion is confirmed using the same methods described by Teague et al. (2018) and Pinte et al. (2018, 2019). However, what is new about our approach is that the non-Keplerian motion was first identified by ML models, highlighting a protoplanet candidate that had previously been missed upon visual analysis. Verification of the evidence is still done using the same methodology as previous works (Pinte et al. 2018, 2019). We strongly advocate that this should always be done for any potential discovery.

3.1. Future Work and Limitations

This work shows that ML can effectively identify non-Keplerian motion even if it is missed by humans. However, our work can be improved upon. The primary limitation is the fact that localizing non-Keplerian motion was not the explicit goal of these models when they were trained. Their purpose was classification without any attempt of segmentation or object detection. Models specifically designed to pinpoint deviations would likely be more effective. Rather than inspecting activation structures—of which there can be hundreds—the model would directly output a prediction of the location. This would be a more precise and straightforward method to detect the non-Keplerian signature, but it would not remove the need to perform follow-up simulations. We intend to explore this possibility in future works.

Networks such as PGNets (Zhang et al. 2022) and DPNNet-2.0 (Auddy et al. 2021) offer a potentially fruitful route that would increase the accuracy and speed of the analysis of disks and channels highlighted by our models. These networks are designed to infer planetary mass from continuum images. One could use our models to determine if it is likely that a disk hosts a planet and, if so, feed the corresponding continuum images into the secondary networks. The predicted planet mass could then be used as a starting point for follow-up simulations rather than simply starting an uninformed parameter sweep. This would speed up the verification step of the procedure. Such a pipeline could be useful to explore in future works.

4. Conclusion

We have applied ML models created by Terry et al. (2022a) to the DSHARP data of HD 142666. All models strongly predict the presence of at least one planet. The activation structures highlight a strong, unreported, and localized kink. An SPH simulation with a 5 M_J planet at 75 au is able to recreate the newly identified kinematic structure. By the previously established benchmarks and methods for kinematic planet detection, we conclude that HD 142666 hosts a planet.

This work demonstrates the utility of applying ML to the analysis of protoplanetary disks. By highlighting non-Keplerian features in the disk, ML models are able to guide planet-detection efforts. The signatures of the planet were previously overlooked by human analysts, and the traditional analysis was only performed because of the information given by the models. We anticipate that this method can identify new non-Keplerian features in both existing and future protoplanetary observations.

This paper makes use of the following ALMA data: ADS/JAO.ALMA #2016.1.00484.L. ALMA is a partnership of ESO (representing its member states), NSF (USA) and NINS (Japan), together with NRC (Canada), MOST and ASIAA (Taiwan), and KASI (Republic of Korea), in cooperation with the Republic of Chile. The Joint ALMA Observatory is operated by ESO, AUI/NRAO and NAOJ. J.T. was a participant in the 2022 Machine Learning for Science (ML4SCI) Google Summer of Code program. S.G. was supported in part by the National Science Foundation Award No. 2108645. This study was supported in part by resources and technical expertise from the Georgia Advanced Computing Resource Center, a partnership between the University of Georgia's Office of the Vice President for Research and Office of the Vice President for Information Technology.

Kinematic Evidence of an Embedded Protoplanet in HD 142666 Identified by Machine Learning

Article metrics

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction