The following article is Open access

Using Bayesian Deep Learning to Infer Planet Mass from Gaps in Protoplanetary Disks

, , , , and

Published 2022 September 5 © 2022. The Author(s). Published by the American Astronomical Society.
, , Citation Sayantan Auddy et al 2022 ApJ 936 93 DOI 10.3847/1538-4357/ac7a3c

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/936/1/93

Abstract

Planet-induced substructures, like annular gaps, observed in dust emission from protoplanetary disks, provide a unique probe for characterizing unseen young planets. While deep-learning-based models have an edge in characterizing a planet's properties over traditional methods, such as customized simulations and empirical relations, they lacks the ability to quantify the uncertainties associated with their predictions. In this paper, we introduce a Bayesian deep-learning network, "DPNNet-Bayesian," which can predict planet mass from disk gaps and also provides the uncertainties associated with the prediction. A unique feature of our approach is that it is able to distinguish between the uncertainty associated with the deep-learning architecture and the uncertainty inherent in the input data due to measurement noise. The model is trained on a data set generated from disk–planet simulations using the fargo3d hydrodynamics code, with a newly implemented fixed grain size module and improved initial conditions. The Bayesian framework enables the estimation of a gauge/confidence interval over the validity of the prediction, when applied to unknown observations. As a proof of concept, we apply DPNNet-Bayesian to the dust gaps observed in HL Tau. The network predicts masses of 86.0 ± 5.5 M, 43.8 ± 3.3 M, and 92.2 ± 5.1 M, respectively, which are comparable to those from other studies based on specialized simulations.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Observed substructures, like concentric rings and gaps, from near-infrared images of protoplanetary disks (PPDs) in dust emission (ALMA Partnership et al. 2015; Andrews et al. 2016; Huang et al. 2018a) are often interpreted as the signatures of an embedded planet. However, owing to limitations in the current planet search techniques (Fischer et al. 2014; Lee 2018), the direct detection of planets in disks is still a challenge.

Only recently have there been measurements using gas kinematics (Pinte et al. 2018; Teague et al. 2018; Pinte et al. 2019, 2020) that have found planetary signatures, such as "kinks" due to velocity perturbations, inside the observed dust gaps. Additionally, the direct detection using H α emission of an accreting planet (Keppler et al. 2018; Wagner et al. 2018; Haffert et al. 2019) inside the cavity of the transition disk PDS-70 has provided compelling evidence in favor of planetary gaps.

Thus, characterizing the large-scale planet-induced distortions observed in PPDs offers a unique opportunity to probe unseen young planets during the epoch of their formation. This is achieved by identifying and comparing the features with theoretical models of planetary gaps generated using customized simulations (Crida et al. 2006; Paardekooper & Papaloizou 2009; Duffell & Macfadyen 2013; Fung et al. 2014; Duffell 2015; Kanagawa et al. 2015a; Zhang et al. 2018; Ilee et al. 2020) or empirical relations (Kanagawa et al. 2016; Lodato et al. 2019). However, owing to the increasing sample size of the observed disks and the complexity of disk–planet simulations, one needs to run hundreds of customized simulations to analyze each observed system. This is too expensive and inefficient to be of practical use as a detection tool. The alternative is to use deep-learning models, trained with simulation data (generated only once), to detect/characterize exoplanets from the observed data in a more efficient and accurate way.

In Auddy & Lin (2020) and Auddy et al. (2021), hereafter Papers I and II, respectively, we successfully designed deep-learning models using a multilayer perceptron (MLP) and/or a Convolutional Neural Network (CNN) to predict planet mass from the planetary gaps in PPDs. Both models (DPNNet-1.0 and DPNNet-2.0) are end-to-end pipelines that are capable of predicting the mass of an exoplanet from an observed PPD harboring a gap-inducing planet. Recently, Zhang et al. (2022) have also developed a CNN-based model to directly infer the planet mass from radio dust continuum images. However, one of the main challenges with traditional deep-learning models is quantifying the uncertainties associated with their predictions. This has direct implications for how a deep-learning model performs when trained on limited and/or noisy data. Furthermore, the inability of the traditional deep-learning model to answer "I do not know" or "I am not confident" is concerning, particularly when treating out-of-training distribution points (i.e., input data points that are outside the parameter space explored in the training data set). Thus, it is crucial to understand the limitations of a deep-learning model by quantifying the errors associated with its predictions and identifying the source of the uncertainty.

Recently, some of these shortcomings have been addressed by introducing probabilistic methods, such as Bayesian techniques, to deep-learning architectures (Kendall & Gal 2017; Jospin et al. 2020; Wilson & Izmailov 2020). A Bayesian Neural Network (BNN) improves on traditional deep-learning models by providing uncertainty estimates associated with the predictions of these models. BNNs are truly unique in their ability to distinguish between the uncertainty that is caused by the limitations of the model and the uncertainty that is inherent to the input data. More so, these architectures can even respond to pathological out-of-training cases with a large error bar.

For a classical neural network, one would have deterministic point estimates from a single optimized network as the output of the network (e.g., see Paper I). A BNN considers an aggregation of predictions (also known as marginalization) from multiple independent predictor networks and performs a model average to obtain the output as a distribution. A BNN is a type of stochastic neural network, in which the weights are given as a distribution. It enables us to understand the uncertainty that is associated with the network architecture. This is particularly powerful in constraining the error when performing a regression task to estimate parameters (such as planet mass, in our case) using deep-learning models. Thus, any prediction comes with a confidence interval, meeting the requirement of a scientific experiment, and one that can be compared with other methods of parameter estimation.

In this paper, we develop and introduce "DPNNet-Bayesian," a standard MLP-based model designed using Bayesian formalism to predict planet mass from disk gaps as well as to quantify the uncertainty associated with the prediction. Our network is trained with synthetic data from explicit planet–disk simulations using the fargo3d hydrodynamics code, with a newly implemented fixed grain size module and improved initial conditions.

The paper is organized as follows. In Section 2, we give an overview of Bayesian deep learning, its implementation, and its advantages over traditional deep-learning approaches. In Section 3, we describe the disk–planet hydrosimulations as well as the parameter space considered for the current study. We introduce the BBN architecture implemented in DPNNet-Bayesian in Section 4, and discuss in detail the steps used to preprocess the data. In Section 5, we give the predictions of, along with the error estimates from, our trained DPNNet-Bayesian, based on simulated data. In Section 6, as a proof of concept, we deploy DPNNet-Bayesian to observed PPDs around HL Tau and AS 209 to infer the planet masses and the associated errors. Finally, in Section 7, we discuss the results and draw conclusions.

2. Bayesian Deep Learning: A General Overview

A neural network architecture consists of a few thousands (or even millions) of nodes/neurons that are densely interconnected. For this architecture, let

  • 1.  
    θ represent all the internal model parameters, and
  • 2.  
    D represent the training data, which consists of a specified set of observed features Dx (e.g., gap size) and their corresponding output labels Dy (e.g., planet mass).

Deep learning is a technique for optimizing the model parameters θ based on a training data set D (Goodfellow et al. 2016; Shrestha & Mahmood 2019). This is done by computing the minimum "cost" using some method of backpropagation. Usually, the cost function is defined in terms of the log likelihood of the training data set, and thus the training process can be defined as a maximum-likelihood estimation. This approach might suffer from overconfidence, as it is deterministic in nature. An alternative to this is to implement the use of stochastic components in the neural network. This can account for the uncertainties in the determination of the network parameters θ, which finally shows up as an estimated error in the predictions made by a deep-learning architecture (Jospin et al. 2020). For these types of neural nets, the stochastic elements can be introduced in terms of stochastic model weights or activation. Thus, instead of a single deterministic output from the traditional deep-learning model, for the stochastic networks a marginalization of the output is considered, based on some statistical principle (Wilson & Izmailov 2020).

In order to design a predictive model based on BNN, one starts with a suitable choice of deep-learning architecture. The next step is to choose a systematic prescription, in order to introduce the stochastic elements into the network architecture. This can be modeled in terms of some possible prior distribution of the model parameters, p(θ).

From Bayes' theorem, we get

Equation (1)

where Dy denotes the training labels and Dx are the training features, as mentioned earlier. The Bayesian posterior distribution, p(θD), depicts the model parameters as a probabilistic distribution, rather than some deterministic value.

The Bayesian approach allows us to compute the model averages by essentially integrating over the models, parameterized by the model parameters θ. The distribution of the possible values of y (e.g., planet mass) given some input x (e.g., dust gap width) is given by

Equation (2)

In Equation (2), the variance of $p(y| x,{\theta }^{{\prime} })$ captures the uncertainty due to the measurement noise of the input data, while the variance of $p({\theta }^{{\prime} }| D)$ corresponds to the uncertainty associated with the neural network architecture.

The classical training would correspond to p(θD) being replaced by a constant value of θ, but in the Bayesian case this would typically be a probability distribution with some finite variance (Wilson & Izmailov 2020; Abdar et al. 2021). Also, computing p(θD) from the training data is one of the most challenging and difficult aspects of the BNN architectures, due to the large size of the parameter space. This is further discussed in the following sections.

2.1. The Variational Inference Approach to Bayesian Deep Learning

Due to the large dimensionality of the sampling space, it can be quite challenging to merge the concepts of a neural network with the Bayesian approach. As the posterior distribution is extremely complex, directly estimating p(θD) to perform the integral in Equation (2) is not always possible. Different methods, involving various approximations, have been implemented to do this. One of the most popular and direct methods for sampling the posterior distribution, p(θD), is using a Markov Chain Monte Carlo (MCMC) method (Bardenet et al. 2017). However, MCMC often has issues with scalability, owing to the large model sizes, and it can be computationally expensive as well.

Instead of sampling from the posterior directly, the method of variational inference (Blei et al. 2017; Mohan et al. 2022) can be used as an alternative. A distribution (referred to as the variational distribution), parameterized by an arbitrary set of parameters ϕ, is used to approximate the posterior. The parameter ϕ is learned in a way that the distribution qϕ (θ) becomes close to the exact posterior p(θD). The closeness between the posterior and qϕ (θ) is quantified in terms of the Kullback–Leibler (KL) divergence (Zhang et al. 2019; Jospin et al. 2020). More precisely, the formalism of the evidence lower bound (ELBO) is used for this purpose:

Equation (3)

Minimizing the ELBO is equivalent to minimizing the KL divergence, as $\mathrm{ELBO}=\mathrm{log}(p(D))-{D}_{\mathrm{KL}}({q}_{\phi }| | p)$, where log (p(D)) depends on the prior and DKL is the KL divergence. Such a variational approach has been used in a variety of other astronomical applications, such as in Roberts et al. (2013), Perreault Levasseur et al. (2017), Leung & Bovy (2019), and Spindler et al. (2021).

2.2. Features and Advantages of BNN

Once the ELBO function is optimized, and the variational distribution is thus learned/trained, we can use it to predict the labels of new (unseen) observations. The predictive posterior distribution p(y*x*, D) is given by integrating out the model/variational parameters (trained using the data set D):

Equation (4)

where (x*, y*) represent the features and labels, respectively, and the superscript "∗" denotes unseen data. In order to obtain this predictive distribution, we use Monte Carlo sampling. First, we sample from the distribution of model parameters N times:

Equation (5)

Then xi is sampled n times from the distribution of the input features, whose width is determined by the errors associated with the feature itself, to estimate the predictive distribution $p({y}_{i}^{* }| {x}_{i}^{* },{\theta }^{j})$. For N × n iterations, we get

Equation (6)

The error/uncertainty in the predictions of the BNN can be broadly classified into two categories: epistemic and aleatoric uncertainty (Abdar et al. 2021; Hüllermeier & Waegeman 2021). "Aleatoric" uncertainty (AL) represents the uncertainty inherent in the data that is given as the input for the model (e.g., the measurement noise). In contrast, "Epistemic" uncertainty (EP) represents the limitations of the machine-learning (ML)–based model that usually stem from a lack of information about the system. Taking a larger data set for the training of the model (spanning a larger parameter space) or improving the ML architecture can typically reduce the EP. On the other hand, as the AL is intrinsic to the measurement noise of the input data, it cannot be reduced by solely improving the ML model. Both these uncertainties propagate to the output to give the net predictive uncertainty (Abdar et al. 2021), as shown in Figure 1.

Figure 1.

Figure 1. This schematic diagram shows the workflow of a typical BNN, along with how the uncertainty/errors propagate.

Standard image High-resolution image

We can measure the EP, , from the variance of the posterior distribution of the model parameters, p(θj D), given as:

Equation (7)

σEP quantifies the intrinsic uncertainty associated with the model. It enables us to understand the correlation between the model's performance, the quantity/quality of the data, and the parameters of the ML architecture. Thus, monitoring σEP can give insights into improving the overall ML model.

The AL, is obtained from the variance of $p({y}_{i}^{* }| {x}_{i}^{* },{\theta }^{j})$, given as:

Equation (8)

σAL quantifies the statistical uncertainty/error associated with the various measurements that go as input features into the neural net at the training stage, as well as when the model is deployed for real-world application. Knowing σAL, one can monitor the errors in the predictions of the model as the measurement noise of the input features changes. The advantage of knowing both σEP and σAL is that when testing the model, one can trace the source of the uncertainty and improve the model/data set accordingly.

3. Disk–Planet Simulations

We model the structural variations of a dusty PPD due to an embedded gap-opening planet using the fargo3d hydrodynamic simulation code (Benítez-Llambay & Masset 2016). We update the public version of the code by parameterizing the dust–gas interaction with a constant particle size, instead of a fixed Stokes number. This includes a generalization of the steady-state profiles derived by Benítez-Llambay et al. (2019), which are presented in Appendix. We generate the data set with the updated simulations to train the BBN model. The simulation setup and the parameter space are discussed in the next section.

3.1. PPD Setup

We model a simple 2D razor-thin disk with an embedded planet of mass MP located at R = R0. The planet is on a Keplerian orbit around a central star of mass M* and it is not migrating. The unit of time is the planet's orbital period P0 = 2πK0, where ΩK0 is the Keplerian frequency. The subscript "0" denotes evaluation at the planet's location R0.

The disk's initial gas surface density profile is given as

Equation (9)

where σ0 is the exponent of the surface density profile and Σg0 = 10−4 is a constant in the code unit, which is arbitrary for non-self-gravitating simulations without orbital migration. For reference, however, this surface density scale corresponds to Toomre parameters ${ \mathcal Q }\geqslant 80$ for all of our disk models, which are sufficient for the disk to be gravitationally stable (Toomre 1964). The disk is flared with a constant flaring index F, such that the aspect ratio varies as

Equation (10)

where H is the disk scale height and h0 is the aspect ratio at R = R0. The disk is locally isothermal, with a sound-speed profile of cs (R) = h0 R Ω K. To mimic the disk's turbulence, we include viscous forces in the gas disk using the standard α-prescription (Shakura & Sunyaev 1973), where the kinematic viscosity ν = α cs 2K and α is set by Equation (A21).

We consider a single population of dust particles modeled as a pressureless fluid (Jacquet et al. 2011). The dust particle size is parameterized using the Stokes number Sts Ω K, where ts is the stopping time characterizing the strength of the dust–gas coupling (Weidenschilling 1977). The initial dust surface density is Σd = epsilon Σg, where epsilon is the dust-to-gas ratio set by Equation (A19). The dust's backreaction onto the gas is included in the simulations.

3.2. Simulation Setup and Parameter Space

We run fargo3d planet–disk simulations on graphical procession units. Our computational domains are R ∈ [0.4, 3.0]R0 and ϕ ∈ [0, 2π], which are discretized with 512 × 512 logarithmically spaced cells. The radial and azimuthal boundaries are set to their initial equilibrium solutions, given in Equations (A10)–(A13). The planet mass is allowed to grow slowly over first 100P0. We implement the wave-killing (de Val-Borro 2006) module to damp the planet-generated waves to zero near the boundaries, to minimize the reflection and interference with disk morphology.

We target planets from super-Earth to Saturn size with masses MP ∈ [8,120] M around a 1 M central star. We model the disk for a range of values of the disk aspect ratio h0 ∈ [0.025, 0.1], viscosity parameters α0 ∈ [10−4, 10−2], flaring index F ∈ [0,0.25], and power-law slope of the surface density profile σ0 ∈ [0.05, 1.2]. In each disk, we consider a single dust species of fixed size characterized by the Stokes number St ∈ [10−3, 10−1] and abundance epsilon0 ∈ [0.01,0.1]. This range of St corresponds to particle radii ranging from ∼1 mm to ∼10 cm, for a typical gas surface density of 100 g cm−2 and an internal grain density of 1 g cm−3. More details of the sampling of the parameter space in order to generate the data set are given in Section 4.2.

4. DPNNet-Bayesian

In this section, we introduce the Bayesian implementation of the DPNNet model.

4.1. Network Architecture

The DPNNet model, as presented in Paper I, was based on artificial neural networks, with a linear layer at the end for performing regression. In the Bayesian adaption of DPNNet, the front end of the model includes an MLP layer with 256 units of neurons having a ReLu activation function (this part of the network is the same as DPNNet-1.0). This is followed by a fully connected variational dense layer having 128 neurons, in which the BNN algorithms are implemented to probe the model uncertainty. We use the method of variational inference, as described in Section 2.1, to account for the stochasticity of the model parameters. Each model parameter is treated as a distribution having some variance. The stochastic sampling from this variational distribution gives an estimate of the model uncertainty. Furthermore, we use a trainable prior distribution for the parameter space, p(θ), where θ is drawn from a normal distribution. This is used to determine the approximate posterior distribution of the model weights using variational inference.

4.2. Data Acquisition and Preprocessing

For the training and testing of the model, we produce the new data set following the same procedure as used in Paper I. We generate a uniform random distribution of the parameters, each centered within the sampling interval, using the Latin hypercube sampling method (McKay et al. 1979; Iman et al. 1981). We run a total of 1800 simulations with a range of input parameters, as described in Section 3.2. From each simulation, we obtain the dust and gas surface density maps of the PPD with the embedded planet. Figure 2 shows sample images of the dust surface density distributions and the corresponding radial profiles of the azimuthally averaged surface densities from some of the simulations. These improved disk–planet simulations (with fixed grain sizes and initial equilibrium disk setups) result in a much more physical model for dust–gas interaction compared to Paper I and Paper II.

Figure 2.

Figure 2. The upper panel shows the normalized dust surface density distributions of disks with different initial conditions and for different planet masses at 3000P0. The disk parameters are indicated at the top right of each plot, together with the planet mass at the bottom right. The lower panel shows the radial profiles of the azimuthally averaged surface densities at the beginning of the simulation (Σd0; the dashed black lines) and at the end of the simulation after the gap is formed (Σd; the solid red lines). The horizontal arrow marks the dust gap width. The gap width is the distance between the inner and outer edge of the gap (the vertical dotted lines) where Σd(R) reaches 50% of the initial surface density Σd0. The crosses indicate the minimum surface density Σdmin.

Standard image High-resolution image

Following Paper I and Kanagawa et al. (2016), we define the dust gap width as

Equation (11)

where Rd,in and Rd,out are the inner and outer edge of the gap where the dust surface density Σd equals a predefined threshold fraction of Σd0. For consistency with Paper I, we adopt a threshold fraction of half the initial surface. We acknowledge that this definition of wd has its limitations, as it fails for shallow gaps (for low-mass planets) when the gap depth is less than the threshold function. However, this does not limit the model's application, as DPNNet-Bayesian can be retrained for any threshold function or other definition of the gap width.

In the current paper, we only consider simulations with either one dust gap or two dust gaps. We remove models with >2 dust gaps, as the number of samples is not adequate to effectively train the model. To build our feature space, we simply define two dust gap widths for each simulation, wd1 and wd2, but set wd2 = 0 if only a single dust gap is formed.

The initial data set consists of 1800 simulations. We preprocess the data by eliminating runs that do not open up detectable axisymmetric gaps or that have more than two gaps. After the initial filtering, we have data from 1098 simulations. Each simulation is characterized by its input parameters (MP, α0, h0, σ0, F, St, and epsilon0) and the associated gap width(s) in dust (wd1, wd2) and gas (wg). In this paper, we will not use the gas gap width as an input parameter for training the model, since this is not directly measured from the observations.

Once the gap widths have been measured, we assign each simulation a set of feature variables (wd1, wd2, epsilon0, α0, St, h0, σ0, and F) that will go as the inputs for our model, and we label the planet mass, MP, as the target variable. The data for each feature variable are then normalized using the standard (Z-score) scaling, by subtracting the mean and scaling it by the standard deviation.

4.3. Training DPNNet-Bayesian

DPNNet-Bayesian is implemented using Google TensorFlow Probability (Dillon et al. 2017), which is an open source platform that integrates probabilistic methods with deep neural networks. We randomly split the data into two blocks, for a training set consisting of 80% of the simulation data and a testing set consisting of 20% of the simulation data. Furthermore, for model validation, we use 20% of the training data set. We use the Adam optimizer with a learning rate of 0.001. The model is trained for ∼2200 epochs, and the early-stopping callback algorithm is used to prevent the model from overfitting.

5. Results

Once the model has been trained, it is ready to be deployed to predict the planet mass from observed disk features. We test the network's performance on the test data set that the model has not been exposed to. This enables us to quantify the network's accuracy in predicting the planet mass from unseen data. The test data set comprises all the feature variables along with the target variable (the true value of the planet mass). The feature variables (wd1, wd2, epsilon0, α0, St, h0, ${\sigma }_{0}$ , and F) are normalized with the same scaling that the network was trained on. The trained DPNNet-Bayesian network takes the feature variables as inputs and predicts the corresponding planet mass. Unlike the standard MLP model, for the same set of input feature variables, DPNNet-Bayesian gives the predicted output as a distribution.

Figure 3 shows the correlation between the simulated and predicted planet mass in Earth-mass ( M) units for all the test samples. For each set of input variables, we conduct the experiment (i.e., deploy the trained network) 200 times, to obtain the marginalization of the predicted output. Thus, we obtain a distribution for the predicted planet mass (as demonstrated in Figure 4) capturing the uncertainties associated with the parameters of the trained model. The error bars give the standard deviation associated with each predicted value. The predicted values lie along the black line, indicating a strong correlation with the true planet mass. This is reflected by the r2 score of 0.82 computed over the average prediction.

Figure 3.

Figure 3. Correlation between the simulated and predicted mass in units of M. The vertical error bars give the standard deviation associated with each predicted value, obtained using the BNN. The r2 score indicates the goodness of the fit.

Standard image High-resolution image
Figure 4.

Figure 4. The distributions of the predicted planet masses for sample disk features selected randomly from the test data set. Left panel: the distribution captures the EP associated with DPNNet due to lack of data and/or variations of the model parameters. Middle panel: the distribution represents the AL that arises due to errors/variations inherent in the data. Right panel: the distribution captures the combined uncertainty in the predicted planet mass due to both EP and AL. The red dashed line represents the mean planet mass in each of the plots.

Standard image High-resolution image

5.1. Error Analysis Using BNN

As mentioned before, the Bayesian approach offers a way of quantifying and understanding the uncertainties associated with the predictions from deep neural networks. As an example, in Figure 4, we demonstrate the different uncertainty estimates associated with predictions from DPNNet-Bayesian for sample disk features selected randomly from the test data set. We pick a disk–planet simulation with parameters/features epsilon0 = 0.04, α0 = 0.008, St = 0.08, h0 = 0.03, σ0 = 1.11, and F = 0.21 and measured dust gap widths wd1 = 0.74, wd2 = 0. Since these are simulated data, the true value of the planet mass, MP = 46 M, is known. To capture the EP, we deploy the trained model N times for this fixed set of disk features. We set N = 200, with the i index fixed, in Equation (6), and for each iteration the model predicts a unique and independent value. This results in the distribution of the predicted planet mass displayed in the left panel of Figure 4. The mean (μEP = 47.6 M), shown with the red dotted line, is taken as the final predicted value, while the standard deviation (σEP = 5.8 M ) represents the uncertainty associated with the trained model.

The measured disk features are also prone to statistical errors, due to the measurement techniques and the limitations in the observed data. While the parameters in simulated models are well constrained, that is not the case for observed disk data. In order to simulate the uncertainties associated with the measured disk parameters, we use a Monte Carlo analysis. Instead of using fixed true values for the disk parameters, we randomly sample them n times from their respective standard normal distributions, with the mean set to the true value and the standard deviation equal to a 1% spread for each of the parameters. We set n = 1000, with j fixed, in Equation (6), such that for each set of sampled input parameters the model predicts a planet mass. The center panel of Figure 4 demonstrates σAL = 8.3 M corresponding to the predicted planet mass due to the measurement noise of the input disk parameters.

The rightmost panel in Figure 4 captures the total uncertainty in the predicted planet mass due to both EP and AL. We randomly sample the disk parameters, n = 1000 times, from their standard normal distribution, and for each draw we deploy the model N = 200 times. This results in a distribution of the predicted planet mass with mean μ = 49.8 M and standard deviation σ = 8.1 M.

6. Application to Observations

As a first test of our newly improved DPNNet-Bayesian model, we deploy it to estimate the planet masses from the observed dust gaps in the PPDs around HL Tau and AS 209. The results are summarized in Table 1 and discussed below.

Table 1. Inferred Planet Masses from the Gaps in HL Tau and AS 209

    Observed Features    DPNNet-  Other Models
 Bayesian 
Name M* Rgap wd1 h0 ${{\alpha }}_{0}$ St ${{\epsilon }}_{0}$ σ0 Mp0 Mp1 Mp2 Mp3 Mp4 Mp5 Mp6
 (M)(au)      (MJ)(MJ)(MJ)(MJ)(MJ)(MJ)(MJ)
 1.00100.810.0510−3 0.0050.011.00.27 ± 0.021.400.200.350.20
HL Tau1.00300.230.0710−3 0.0050.011.00.14 ± 0.010.200.200.170.27
 1.00800.290.1010−3 0.0050.011.00.30±0.020.500.200.260.55
AS 2090.8390.420.0410−4 0.0160.011.00.06 ± 0.010.37
 0.83990.310.0810−4 0.0160.021.00.12 ± 0.010.180.21

Note. MP0 is the mass (in the Jupiter-mass unit MJ) predicted by the DPNNet-Bayesian model using the input features wd1, h00 , St0 , and σ0, as given in columns 4–9, and setting wd2 = 0. MP1 is the mass predicted by Kanagawa et al. (2016), implementing the empirical relation. MP2, MP3, and MP4 are the masses inferred using customized simulations of HL Tau by Dong et al. (2015), Jin et al. (2016), and Dipierro et al. (2015), respectively. MP5 is the predicted planet mass of AS 209 by Zhang et al. (2018). MP6 is the predicted planet mass of AS 209 by Fedele et al. (2018), using specialized simulations.

Download table as:  ASCIITypeset image

6.1. HL Tau

HL Tau is a well-studied system, which shows distinct axisymmetric—possibly planet-induced—gaps in the dust emission (ALMA Partnership et al. 2015). Several studies involving hydrodynamic simulations (Dipierro et al. 2015; Dong et al. 2015; Jin et al. 2016) have suggested that each of the gaps are induced by a single planet. With the assumption that the observed gaps are all planet-induced, we use the properties of the observed gaps as input features for our BNN model to predict the planet mass. DPNNet-Bayesian predicts a distribution for the planet masses for each of the gaps. For the three identified gaps at R = 10 au, 30 au, and 80 au, we obtain the gap widths (wd1 = 0.81, 0.23, and 0.29) and adopt the disk aspect ratios of h0 = 0.05, 0.07, and 0.10, respectively, from Kanagawa et al. (2015b, 2016). HL Tau is mostly laminar, with viscosity α0 = 10−3 (Dipierro et al. 2015) and a central star mass of M* = 1 M. We set a canonical value of the dust-to-gas ratio epsilon0 = 10−2 (ALMA Partnership et al. 2015; Dong et al. 2015) and adopt a mean Stokes number St = 0.005 (Dipierro et al. 2015) for all three gaps. Since the uncertainty associated with the input parameters is not well constrained across the literature, we take a conservative approximation and consider a 1%–5% variation in the gap widths only. The top panel of Figure 5 shows the distribution of the predicted masses in the three identified gaps for the given disk parameters and gap widths. The mean values along with the standard deviations of the predicted planet masses at 10 au, 30 au, and 80 au are 86.0 ± 5.5 M, 43.8 ± 3.3 M, and 92.2 ± 5.1 M, respectively. The standard deviation captures the total uncertainty, which includes the uncertainty associated with the BNN architecture (σEP is estimated as 4.0 M, 3.3 M, and 5.0 M, respectively, for the three gaps) along with the AL. For the assumed variation of 1%–5% in the gap widths, σAL is comparable to σEP. However, σAL can vary, depending on the noise in the input data. These estimates are mostly consistent with the inferred planet masses from direct numerical simulations like those of Dong et al. (2015), Jin et al. (2016), Dipierro et al. (2015), and DPNNet-1.0 in Paper I.

Figure 5.

Figure 5. Top panels: the distributions of the predicted planet masses at the three identified gaps in HL Tau, obtained using DPNNet-Bayesian. The mean values along with the standard deviations of the predicted planet masses at 10 au, 30 au, and 80 au are 86.0 ± 5.5 M, 43.8 ± 3.3 M, and 92.2 ± 5.1 M, respectively. Bottom panel: the distributions of the predicted masses for the two identified gaps in AS 209 for the given disk parameters and gap widths. The mean predicted planet masses along with the standard deviations are 20.6 ± 2.2 M and 36.3 ± 2.5 M at 9 au and 99 au, respectively.

Standard image High-resolution image

6.2. AS 209

AS 209 is another well-studied system, hosting multiple gaps at 9, 24, 35, 61, 90, 105, and 137 au (Huang et al. 2018b). However, for simplicity, we only consider the two most prominent gaps, at R = 9 and 99 au. We adopt the disk parameters and dust gap widths from Zhang et al. (2018) and use a range of values for dust abundance, particle size, and disk viscosity that are within our training parameter space. For example, we set α0 = 10 −4, St = 156.9 × 10−4, and the surface density profile σ0 = 1 (Fedele et al. 2018) for both the gaps. Next, for the innermost gap at R = 9 au, we assign values to the other parameters: wd1 = 0.42, h0 = 0.04, and epsilon0 = 0.012. For the outer gaps at R = 99 au, we similarly select wd1 = 0.31, h0 = 0.08, and epsilon0 = 0.017. Once again, we consider a 1%−5% variation in the gap widths as a conservative approximation for capturing the uncertainty associated with their measurement. The bottom panel of Figure 5 shows the distributions of the predicted masses for the two identified gaps in AS 209, for the given disk parameters and gap widths. The mean predicted planet masses along with the standard deviations are 20.6 ± 2.2 M and 36.3 ± 2.5 M at 9 au and 99 au, respectively. The standard deviation captures the total uncertainty, which includes the EP (σEP is 2.0 M and 2.1. M for the two gaps, respectively) along with the AL. σAL is comparable to σEP for the assumed variation of 1%–5% in gap widths. However, σAL can vary, depending on the noise in the input data. These estimates, particularly at R = 9 au, are lower compared to Zhang et al. (2018). The discrepancy is partially due to differences in the ways that the gap widths are measured, the uncertainty in the adopted disk parameters, and the simulation setups.

7. Discussion and Conclusion

In this paper, we have introduced DPNNet-Bayesian, a BNN-based model that characterizes the masses of unseen exoplanets from observed PPDs. Compared to its predecessor, DPNNet-1.0, the Bayesian approach has the following advantages. First, it quantifies the errors associated with the predictions of the network and traces the source of the uncertainty. Second, it gives a framework to understand various regularization techniques that are used in traditional deep-learning approaches (Jospin et al. 2020). Finally, compared to traditional deep-learning models, where the weights are initialized by some implicit prior, the Bayesian framework accounts for this prior distribution explicitly.

For a BNN, one trains a distribution/ensemble of deep-learning models and then, based on the Bayesian formalism, performs model averaging/marginalization of the output. It is to be noted that the BNN approach is fundamentally different from "deep ensembles," where different initializations are used to train a set of networks, with the outputs being combined/optimized. More so for the BNN approach, a weighted averaging is achieved by taking into account the posteriors of the network parameters as weights. This is different from deep ensembles, where the averaging is directly performed with the output of the network.

Furthermore, the error/uncertainty associated with the output of a BNN model is different from the standard metrics, like the rms error (RMSE) and/or the mean absolute error (MAE) that are used in traditional deep-learning models. Both the MAE and the RMSE are measures of the error (∣MP,simulationMP,predicted∣) when the model is deployed on a validation/testing data set for which the "true" target values are known. However, they do not necessarily quantify the errors accurately (or provide a confidence interval) when applied to an unknown data set for which the target values are not known. This can be problematic, as there is no way of knowing how accurate the prediction is, so one needs to trust the validation/testing error as an indication of the accuracy of the prediction. Since a BBN model works on the principle of marginalization, rather than optimization, the output is obtained by sampling from a posterior distribution, thus providing the necessary confidence interval.

The performance of any ML-based model depends on the quality and/or quantity of the training data set. The Bayesian approach helps us to understand the correlation between the quality/quantity of the data set and the model performance by monitoring the change in model uncertainty. The knowledge of the EP and the AL enables the model to learn from a smaller data set, by regulating the overfitting or underfitting. Thus, the use of BNN architecture plays a pivotal role in improving the training process, as well as determining the optimal deep-learning architecture.

We conclude by highlighting the key results:

  • 1.  
    DPNNet-Bayesian estimates the planet mass from the gap width observed in the dust emission. It takes as its input dust properties (abundance, Stokes numbers) and gas disk properties (aspect ratio, surface density profile, viscosity), along with dust gap widths from observations, and predicts the embedded planet mass.
  • 2.  
    For the training of DPNNet-Bayesian, we generated a data set from disk–planet simulations using the fargo3d hydrodynamics code, with a newly implemented fixed grain size module and improved initial conditions. The trained model was tested on the simulated data set in order to illustrate its applicability. The predicted mass closely correlated with the "true" value of the simulated planet mass.
  • 3.  
    Compared to DPNNet-1.0 (Paper I), the DPNNet-Bayesian architecture additionally estimates the error associated with the prediction of the model. It further distinguishes between the uncertainty pertaining to the deep-learning architecture (EP) and the errors associated with the input variables due to measurement noise (AL).
  • 4.  
    We deploy DPNNet-Bayesian to dust gaps observed in the PPDs around HL Tau and AS 209. Our network predicts masses of 86.0 ± 5.5 M, 43.8 ± 3.3 M, and 92.2 ± 5.1 M for the gaps at 10 au, 30 au, and 80 au, respectively, at HL Tau. Similarly, for the AS 209 disk, we find planet masses of 20.6 ± 2.2 M and 36.3 ± 2.5 M at the 9 au and 99 au gaps, respectively. These estimates are in close agreement with the results from other studies based on specialized simulations (Dong et al. 2015; Zhang et al. 2018).

We thank the anonymous referee for the constructive comments. S.A. and J.B.S. acknowledge support from NASA under Emerging Worlds through grant 80NSSC20K0702. D.C. acknowledges support from NASA under Emerging Worlds through grant 80NSSC21K0037. M.K.L. is supported by the Ministry of Science and Technology of Taiwan (grants 107-2112-M-001-043-MY3, 110-2112-M-001-034-, and 110-2124-M-002-012-) and an Academia Sinica Career Development Award (AS-CDA-110-M06). Numerical simulations were performed on the TIARA cluster at ASIAA, as well as the TWCC cluster at the National Center for High-performance Computing in Taiwan.

Appendix: Steady-state Drift Solution for Fixed Particle Size

The current public version of the fargo3d code has a multifluid setup, with pressureless dust species interacting with the gas by means of a drag force parameterized by a constant Stokes number. We update this version of the code to incorporate a steady background solution for the radial drift of dust of fixed particle sizes instead of a fixed Stokes number. Following the generalized steady-state solutions from Benítez-Llambay et al. (2019) for a vertically integrated disk with an isothermal equation of state, with pressure P = cs 2Σ, we rewrite the exact background solutions for the gas and dust (i.e., in the limit of zero coupling) as

Equation (A1)

Equation (A2)

respectively, where ${{\boldsymbol{v}}}_{K}=R{{\rm{\Omega }}}_{K}(r)\hat{{\boldsymbol{\phi }}}$ is the Keplerian velocity, $\hat{{\boldsymbol{\phi }}}$ is the unit vector in the azimuth, and

Equation (A3)

When the dust and gas are coupled, we write their velocities as v d,g v d,g + δ v d,g . Considering the steady-state axisymmetric Navier–Stokes equations for perturbed velocities (see Equations (63)–(66) in Benítez-Llambay et al. 2019), the gas and dust velocity perturbations in the radial direction are given as

Equation (A4)

Equation (A5)

and the azimuthal counterpart is

Equation (A6)

Equation (A7)

where

Equation (A8)

Equation (A9)

Expanding Equations (A4) to (A7) using Equations (A8) and (A9), we get

Equation (A10)

Equation (A11)

Equation (A12)

Equation (A13)

The above solutions, Equations (A10)–(A13), are solutions to the steady-state drift solution only if they satisfy the following continuity equations:

Equation (A14)

The particle size is parameterized by the Stokes number, St = ts ΩK, where the stopping time characterizing the strength of the dust–gas coupling is given as

Equation (A15)

Here, ρp is the density of the dust particles, s is the radius of the dust particles, and Σg is the gas surface density. For a fixed particle size, s = constant while the Stokes number St ∝ Σg −1 is a function of disk radius, since Σg${R}^{-{{\sigma }}_{0}}$ , unlike the default setup, where St is set to constant.

To maintain a constant mass flux, i.e., ${{\rm{\partial }}}_{R}\left(R{{\rm{\Sigma }}}_{g}{\delta }{v}_{{\rm{g}}{\rm{R}}}\right)\,={{\rm{\partial }}}_{R}\left(R{{\rm{\Sigma }}}_{d}{\delta }{v}_{d}\right)=0.$, we set

Equation (A16)

Here, we only consider the gas component, but same result can be arrived at for dust as well. Furthermore, since $R{{\rm{\Sigma }}}_{{\rm{g}}}\propto {\left(\tfrac{R}{{R}_{0}}\right)}^{1-{{\sigma }}_{0}}$, ${\delta }{v}_{{\rm{g}}{\rm{R}}}$ must scale as

Equation (A17)

such that the net mass flux is constant. Thus, from Equations (A10) and (A17) we can establish

Equation (A18)

where a = −2β (β − 1)vK St epsilon, b = β (2γ + 1), c = 2β2 (γ + 1) + 2β2 St2(γ + 1), and ${\delta }{v}_{{\rm{g}}{\rm{R}}0}=\tfrac{{a}_{0}{{\epsilon }}_{0}}{{{\epsilon }}_{0}^{2}+{b}_{0}{{\epsilon }}_{0}+{c}_{0}}$, with the subscript "0" referring to the values at $R={R}_{0}$. Rearranging Equation (A18), we solve for the dust-to-gas ratio epsilon and arrive at

Equation (A19)

where ${d}_{0}=\tfrac{{a}_{0}{{\epsilon }}_{0}}{{{\epsilon }}_{0}^{2}+{b}_{0}{{\epsilon }}_{0}+{c}_{0}}{\left(\tfrac{R}{{R}_{0}}\right)}^{{{\sigma }}_{0}-1}$ and the negative root gives a physical solution. Thus, in order to satisfy the condition for constant mass flux for a fixed particle size, we adjust the dust-to-gas ratio epsilon accordingly.

Furthermore, the viscous velocity of the gas V2D vis without dust grains is given as (Lynden-Bell & Pringle 1974)

Equation (A20)

where ν = αcs h is the kinematic viscosity. For the steady-state background solution, i.e., V2D vis = 0, we set ν Σg R1/2 = constant in Equation (A20), which is equivalent to

Equation (A21)

where the subscript "0" refers to the values at $R={R}_{0}$ .

We initialize the disk–planet fargo3d simulation with the updated dust-to-gas ratio in Equation (A19) and fix the particle size to a constant value using the fixed particle module. Figure 6 shows the azimuthally averaged initial dust and gas radial surface densities and the final surface density profiles after the system has evolved 1000P0. As demonstrated, the steady-state profile is maintained with little evolution, in the absence of an embedded planet.

Figure 6.

Figure 6. The radial profiles of the azimuthally averaged gas and dust surface densities at the beginning of the simulation and after 1000P0. The steady-state profile is maintained with little evolution, in the absence of an embedded planet.

Standard image High-resolution image
Please wait… references are loading.
10.3847/1538-4357/ac7a3c