Predicting Exoplanet Masses and Radii: A Nonparametric Approach

Bo Ning; Angie Wolfgang; Sujit Ghosh

doi:10.3847/1538-4357/aaeb31

1. Introduction

An exoplanet's mass and radius can constrain its bulk composition when models of planetary internal structures are fit to these observed properties (e.g., Fortney et al. 2007a; Seager et al. 2007; Valencia et al. 2007; Rogers et al. 2011; Lopez & Fortney 2014). Measuring many planets' masses and radii can therefore illuminate (1) the range of exoplanet compositions produced by planet formation and evolution processes; and (2) how these compositions trend with planet and stellar properties of interest, such as planet mass, orbital period, host star mass, and host star metallicity. The existence and nature of these trends, along with the amount of astrophysical scatter around them, provides observational constraints on planet formation theory (e.g., Ida & Lin 2010; Dawson et al. 2015; Lee & Chiang 2015; Lopez & Rice 2016; Owen & Wu 2017).

The population distributions of these mass and radius measurements therefore contain valuable information about the range of planetary compositions. To obtain the population distributions is particularly necessary for the thousands of super-Earth- and sub-Neptune-sized planets discovered by the Kepler mission (Coughlin et al. 2016), as this population is not represented in our own Solar System.

Initially population studies of small, low-mass planets focused on the marginal mass (Howard et al. 2010; Mayor et al. 2011) or marginal radius (Youdin 2011; Howard et al. 2012; Dressing & Charbonneau 2013; Fressin et al. 2013; Petigura et al. 2013) distributions; this was driven in part by the difficulty of obtaining a sample of planets that both transited their host stars and were suitable for precision radial velocity (RV) follow-up. This changed once the Kepler follow-up program published mass estimates for their chosen subset of Kepler candidates (Marcy et al. 2014). Recent analyses of Kepler planets' transit timing variations have also produced robust mass determinations for dozens of multiple-planet systems (Jontof-Hutter et al. 2016; MacDonald et al. 2016; Mills et al. 2016; Hadden & Lithwick 2017). However, mass measurements for the majority of individual Kepler planet candidates are still unavailable. As a result, the mass–radius (M–R) relations estimated based on the existing mass and radius measurements provides a useful tool for astronomers to predict masses based on observed radii for those planets that only have radius measurements.

1.1. Previous Work on Mass–Radius Relations

There have been several M–R relations proposed in the exoplanet literature; all of them assume that exoplanet masses and radii follow a power law:

1.
Lissauer et al. (2011) fit a relation to Earth and Saturn and obtained M = R^2.06, where M and R are in Earth units.
2.
Wu & Lithwick (2013) calculated the masses of 22 planet pairs based on the amplitudes of their sinusoidal transit timing variations (TTVs) and found that M = 3R.
3.
Weiss & Marcy (2014) used the 42 planets chosen by the Kepler team to be followed up with radial velocity measurements (Marcy et al. 2014) and found the power law is M = 2.69R^0.93 for planets with 1.5 < R < 4 R_⊕.
4.
Hadden & Lithwick (2014) revisited the M–R relation for 56 low-eccentricity TTV planets and fit M = 14.9 (R/3R_⊕)^0.65 to them.
5.
Wolfgang et al. (2016; hereafter denoted as WRF16) selected planets with radii <8 R_⊕ from the NASA Exoplanet Archive (up to the date 2015 January 30) and found two different power-law relations: M = 2.7R^1.3 for <4 R_⊕ planets and M = 1.6R^1.8 for <8 R_⊕ planets. Unlike the earlier results, which used basic least squares regression, WRF16 built a hierarchical Bayesian model that incorporated observational measurement uncertainty, the intrinsic, astrophysical scatter of planet masses, and non-detections and upper limits.
6.
Mills & Mazeh (2017) fit power laws to various subsets of exoplanet masses and radii, testing for differences between RV and TTV M–R relations in two different period bins.
7.
Chen & Kipping (2017) built on the hierarchical Bayesian model from WRF16 to fit a continuous broken power-law model across a much larger range of masses and radii. They found that a four-segment M–R relation best described their data, with Terran, Neptunian, Jovian, and Stellar regions. They fit a different power law and intrinsic scatter to each segment.

1.2. Moving to a Nonparametric Approach

The power-law model is simple to fit to exoplanet masses and radii, and its parameters are easy to interpret. However, there are several reasons to move beyond it. First, we already know that a single power-law relation will not be able to describe the M–R relation for the entire population, as degeneracy pressure causes the radius–mass relation to flatten out around a Jupiter mass. Chen & Kipping (2017) addressed this by assuming a broken power law with multiple segments. As soon as one expands the space of models beyond those with a handful of parameters, it is useful to consider more flexible nonparametric models as well.⁶ Indeed, there is no particular reason to believe that exoplanet masses and radii should follow power laws as a population, because the processes that dominate planet formation for small rocky planets are different from those that dominate the formation of gas giants. Furthermore, we do not expect the distribution of masses at a given radius to be Gaussian, as was assumed by WRF16 and Chen & Kipping (2017), or even symmetric. There is already evidence from the hierarchical modeling checking performed by WRF16 that a Gaussian distribution does not completely reproduce the observed exoplanet masses and radii.

Moving forward, we adopt a nonparametric approach that allows us to relax these assumptions. There are many benefits of using nonparametric models, including

1.
They can take on variety of shapes to fit the data, which can be advantageous for making predictions that are less model-dependent. Therefore, there is no need to impose abrupt breaks for modeling M–R relations across a wide range of sizes.
2.
They can model the joint distribution of mass and radius, and thus provide a self-consistent way to predict both mass from radius and radius from mass over the entire exoplanet mass and radius range.
3.
Eventually we want to understand masses and radii as a function of many other star and planet properties like stellar mass, metallicity, orbital period, planet multiplicity, and even eccentricity. We do not have clear guidance from theory about the functional form of the dependence in these additional dimensions, and so we should not expect the mass–radius-X distribution to follow a power law.

2. The Model

There are at least two ways to model the exoplanet mass–radius relation. The first is to approach it as a regression problem. When one performs regression, such as using ordinary least squares (OLS) to fit a line to data, the goal is to illuminate the relationship between an independent variable and another quantity that depends on it—the dependent variable. For exoplanets, we are indeed concerned with how mass and radius are related, as it illuminates how planet compositions change as a function of the planet's size or mass. However, it is not so clear which is the independent variable and which is the dependent variable. From a theoretical point of view, mass is the more fundamental property and so should be the independent variable.⁷ On the other hand, the small fraction of Kepler planets that only have mass measurements creates a practical need to convert Kepler radii to masses (or in other words, to predict masses from radii), which requires the radius to be the independent variable.

Moreover, there are non-negligible measurement uncertainties associated with both mass and radius. OLS regression, which ignores the uncertainties on the independent variable, will consistently underestimate the slope of the regression line when there are uncertainties in both variables (see Isobe et al. 1990). This occurs because OLS only minimizes the distance between the points and the line in the vertical direction, which over-corrects for the additional vertical distance introduced by uncertainties in the dependent variable (for lines with slope $\ne 0$ , the horizontal scatter produced by uncertainties in the dependent variable will also produce some additional vertical scatter around the line). As a result, the OLS line which fits mass to radius will not be the inverse of the line which fits radius to mass. This is a problem if our goal is to estimate the underlying fundamental relationship between radius and mass.

As our goal in this paper is to provide mass predictions from measured radii as well as radius predictions from mass, we decide to approach the problem differently. WRF16 already showed that a simple regression line (i.e., a one-to-one function that deterministically maps radius to mass) is an insufficient description of the existing data: there exists an intrinsic, astrophysical scatter in the mass–radius relation. Therefore, we also want this approach to allow for a distribution of masses at any one radius. We can express this distribution as $g(m| r)$ , the conditional probability distribution of mass given radius (i.e., a function describing the probability of a planet having a certain mass when its radius takes a specific value).⁸ Through the definition of conditional probability, every conditional probability distribution can be expressed as a ratio of the joint probability distribution g(m, r) to the marginal probability distribution of the variable on which the conditioning occurs (g(r)):

$\begin{eqnarray}&&g(m| r)=\displaystyle \frac{g(m,r)}{g(r)}=\displaystyle \frac{g(m,r)}{\int g(m,r){dm}}.\end{eqnarray} \tag{ 1 }$

Likewise,

$\begin{eqnarray}&&g(r| m)=\displaystyle \frac{g(m,r)}{g(m)}=\displaystyle \frac{g(m,r)}{\int g(m,r){dr}}.\end{eqnarray} \tag{ 2 }$

Therefore, to get both conditional distributions we need only to model the joint distribution: the probability of a planet having a certain mass and a certain radius. It is this joint distribution that we define in Section 2.1 and fit to our data in Section 2.2. For the rest of the paper, we will use g(m, r) to discuss the general properties of joint mass, radius distributions. We will use f(m, r) to refer to our specific formulation of the joint mass, radius distribution as given by Equation (7).

2.1. The Bernstein Polynomials Model

Our nonparametric model for the joint probability distribution of mass and radius is a bivariate distribution that consists of a tensor product of sequence of location-scaled and transformed beta densities that can also be viewed as Bernstein polynomials. Each marginal distribution can be expressed as a linear combination of Bernstein polynomials (BPs), leading to a mixture of beta densities. When normalized, BPs have the same functional form as beta distributions (see Equation (3)). For a visual representation of Bernstein polynomials, see Figure 9, where we plot BPs as a function of their degree (denoted by k or l in Equation (7)).

The properties of BPs and a discussion of their advantages over other choices for basis functions are described in Appendix A. In addition, we note that we do not use Gaussian process regression, which is a popular nonparametric prediction method in exoplanet science, for a number of reasons. First, a joint density estimation approach is more appropriate for our purposes than a regression approach, as argued previously. Second, the marginal distribution of the M–R is not in general normally distributed, as we demonstrate in Section 5. Third, fitting a mixture of Gaussians to describe the joint density would involve three free parameters per component (mixture weight, mean, and variance), rather than the one free parameter per component that we use here (the mixture weight). Fourth, a mixture of Gaussians produces an identifiability problem, wherein different components can be interchangeable, that is not present in the Bernstein polynomial model due to the ordered nature of the different terms (see Equation (7) and the figures in Appendix A for how the Bernstein polynomial terms; i.e., the different β_k for a given d are distinguishable from each other).

Before introducing our model, we first define some mathematical notations that will be used throughout the paper. Let ${M}_{i}^{\mathrm{obs}}$ and ${R}_{i}^{\mathrm{obs}}$ be the reported mass and radius measurements for ith planet in our data set, and ${\sigma }_{{M}_{i}}^{\mathrm{obs}}$ and ${\sigma }_{{R}_{i}}^{\mathrm{obs}}$ be the reported standard deviations for their measurement errors. We denote M_i and R_i as the true mass and radius for ith planet that would have been observed if there were no measurement errors. We denote their joint distribution as f(m, r), and the M–R relation as the conditional distribution $f(m| r)$ . We let ${{\boldsymbol{M}}}^{\mathrm{obs}}$ stand for a vector containing the observations $({M}_{1}^{\mathrm{obs}},\,...,\,{M}_{n}^{\mathrm{obs}});$ similarly, ${{\boldsymbol{R}}}^{\mathrm{obs}}$ , ${{\boldsymbol{\sigma }}}_{M}^{\mathrm{obs}}$ , and ${{\boldsymbol{\sigma }}}_{R}^{\mathrm{obs}}$ stand for the set of their respective observations. We let ${ \mathcal N }(\mu ,\sigma )$ stand for a normal distribution with mean μ and standard deviation σ, and denote the probability density of a Beta distribution with shape parameters i and $d-i+1$ by

$\begin{eqnarray}&&{\beta }_{\mathrm{id}}(a)=d\displaystyle \left(\genfrac{}{}{0em}{}{d-1}{i-1}\right){a}^{i-1}{(1-a)}^{d-i},\end{eqnarray} \tag{ 3 }$

where 0 ≤ a ≤ 1, and d > 0 is an integer that relates to the degree of a Bernstein polynomial.

Our hierarchical model can be written as follows:

$\begin{eqnarray}&&{M}_{i}^{\mathrm{obs}}\mathop{\sim }\limits^{\mathrm{ind}}{ \mathcal N }({M}_{i},{\sigma }_{{M}_{i}}^{\mathrm{obs}}),\end{eqnarray} \tag{ 4 }$

$\begin{eqnarray}&&{R}_{i}^{\mathrm{obs}}\mathop{\sim }\limits^{\mathrm{ind}}{ \mathcal N }({R}_{i},{\sigma }_{{R}_{i}}^{\mathrm{obs}}),\end{eqnarray} \tag{ 5 }$

$\begin{eqnarray}&&({M}_{i},{R}_{i})\mathop{\sim }\limits^{\mathrm{iid}}f(m,r| {\boldsymbol{w}},d,d^{\prime} ),\end{eqnarray} \tag{ 6 }$

where the symbol $\mathop{\sim }\limits^{\mathrm{ind}}$ stands for "independently distributed as," and the symbol $\mathop{\sim }\limits^{\mathrm{iid}}$ stands for "identically and independently distributed as." Let ${\boldsymbol{w}}=({w}_{11},\,...,\,{w}_{{dd}^{\prime} }$ ) be a set of weights that describes how much each corresponding term in the following series contributes to the overall joint distribution f(m, r):

$\begin{eqnarray}&&f(m,r| {\boldsymbol{w}},d,d^{\prime} )=\displaystyle \sum _{k=1}^{d}\displaystyle \sum _{l=1}^{d^{\prime} }{w}_{\mathrm{kl}}\displaystyle \frac{{\beta }_{{kd}}\left(\tfrac{m-\underline{M}}{\overline{M}-\underline{M}}\right)}{\overline{M}-\underline{M}}\displaystyle \frac{{\beta }_{{ld}^{\prime} }\left(\tfrac{r-\underline{R}}{\overline{R}-\underline{R}}\right)}{\overline{R}-\underline{R}}.\end{eqnarray} \tag{ 7 }$

To ensure that Equation (7) remains a probability distribution (i.e., that it integrates to 1), we impose the constraint ${\sum }_{k=1}^{d}{\sum }_{l=1}^{d^{\prime} }{w}_{{kl}}=1$ and that w_kl ≥ 0 for all values of k and l. The notations $\underline{M}$ , $\overline{M}$ , $\underline{R}$ , and $\overline{R}$ are used to denote the lower and upper bounds, respectively, for mass and radius. The values of the upper and lower bounds can be determined by a set of observed values of masses and radii; for example, one could choose the lower mass bound to be the minimum ( ${M}_{i}^{\mathrm{obs}}-{\sigma }_{{M}_{i}}^{\mathrm{obs}}/n$ ) in the data set and the upper mass bound to be the maximum ( ${M}_{i}^{\mathrm{obs}}+{\sigma }_{{M}_{i}}^{\mathrm{obs}}/n$ ). The bounds could also be set as the minimum and maximum mass and radius theoretically expected for an exoplanet. For regions with no observations, the BP model reverts to the overall mean of the whole function; this happens when the values of $\underline{M}$ , $\overline{M}$ , $\underline{R}$ , and $\overline{R}$ are chosen to be far away from the nearest observations (see the discussion in Appendix B.1).

Note that Equations (4)–(6) form a multi-level model, in that the measurement process is modeled as a separate random process from the underlying true distribution of exoplanet masses and radii. It therefore has a similar hierarchical structure as the hierarchical Bayesian models defined in WRF16 and Chen & Kipping (2017).

2.2. Model Inference

Although the model may look complex at first glance, the inference is relatively straightforward. There are only two types of unknown parameters in the model: the weights ${\boldsymbol{w}}$ and the degrees d and d'. Once d and d' are determined using the method that we will introduce in the later part of this section. The two parameters i and $d-i+1$ in each beta distribution (see Equation (3)) are also determined. This is advantageous compared to other mixture models; for example, a mixture of Gaussians would require that each mean and standard deviation of component distribution be estimated by the data. Moreover, it avoids the common identifiability issues created by label switching within the mixture of Gaussians. Another advantage of our mixture of beta distributions is that parameter space (which is a simplex) is a bounded convex set, which helps to guarantee the existence of the optimally estimated parameters. Here we take a maximum likelihood approach to estimate the parameters by explicitly deriving the likelihood using one-dimensional numerical integration.

The likelihood function of the model (4)–(6) can be written as

$\begin{eqnarray}&&\begin{array}{l}L({\boldsymbol{w}},d,d^{\prime} | {{\boldsymbol{M}}}^{\mathrm{obs}},{{\boldsymbol{R}}}^{\mathrm{obs}},{{\boldsymbol{\sigma }}}_{M}^{\mathrm{obs}},{{\boldsymbol{\sigma }}}_{R}^{\mathrm{obs}})\\ =\ {\int }_{\underline{M}}^{\overline{M}}{\int }_{\underline{R}}^{\overline{R}}f({{\boldsymbol{M}}}^{\mathrm{obs}},{{\boldsymbol{R}}}^{\mathrm{obs}},m,r| {\boldsymbol{w}},d,d^{\prime} ,{{\boldsymbol{\sigma }}}_{M}^{\mathrm{obs}},{{\boldsymbol{\sigma }}}_{R}^{\mathrm{obs}})\ {drdm}\\ =\ \displaystyle \prod _{i=1}^{n}{\int }_{\underline{M}}^{\overline{M}}{\int }_{\underline{R}}^{\overline{R}}f({M}_{i}^{\mathrm{obs}}| m,{\sigma }_{{M}_{i}}^{\mathrm{obs}})f({R}_{i}^{\mathrm{obs}}| r,{\sigma }_{{R}_{i}}^{\mathrm{obs}})\\ \quad \times \ f(m,r| {\boldsymbol{w}},d,d^{\prime} )\ {drdm}\\ =\ \displaystyle \prod _{i=1}^{n}\displaystyle \sum _{k=1}^{d}\displaystyle \sum _{l=1}^{d^{\prime} }{w}_{{kl}}{\int }_{\underline{M}}^{\overline{M}}\displaystyle \frac{1}{{\sigma }_{{M}_{i}}^{\mathrm{obs}}}{\varphi }\left(\displaystyle \frac{{M}_{i}^{\mathrm{obs}}-m}{{\sigma }_{{M}_{i}}^{\mathrm{obs}}}\right)\displaystyle \frac{{\beta }_{{kd}}\left(\tfrac{m-\underline{M}}{\overline{M}-\underline{M}}\right)}{\overline{M}-\underline{M}}\ {dm}\\ \quad \times \ {\int }_{\underline{R}}^{\overline{R}}\displaystyle \frac{1}{{\sigma }_{{R}_{i}}^{\mathrm{obs}}}{\varphi }\left(\displaystyle \frac{{R}_{i}^{\mathrm{obs}}-r}{{\sigma }_{{R}_{i}}^{\mathrm{obs}}}\right)\displaystyle \frac{{\beta }_{{ld}^{\prime} }(\tfrac{r-\underline{R}}{\overline{R}-\underline{R}})}{\overline{R}-\underline{R}}\ {dr},\end{array}\end{eqnarray} \tag{ 8 }$

where ${\varphi }$ (·) is the standard normal distribution, the two integral terms are essentially two constants and can be calculated numerically by using integrate function in ${\mathsf{R}}$ or by any other Gaussian quadrature method available for one-dimensional numerical integration. Therefore, we denote

$\begin{eqnarray*}\begin{array}{rcl}{c}_{{kl},i} & \equiv & {\int }_{\underline{M}}^{\overline{M}}\displaystyle \frac{1}{{\sigma }_{{M}_{i}}^{\mathrm{obs}}}{\varphi }\left(\displaystyle \frac{{M}_{i}^{\mathrm{obs}}-m}{{\sigma }_{{M}_{i}}^{\mathrm{obs}}}\right)\displaystyle \frac{{\beta }_{{kd}}\left(\tfrac{m-\underline{M}}{\overline{M}-\underline{M}}\right)}{\overline{M}-\underline{M}}\ {dm}\\ & & \times \ {\int }_{\underline{R}}^{\overline{R}}\displaystyle \frac{1}{{\sigma }_{{R}_{i}}^{\mathrm{obs}}}{\varphi }\left(\displaystyle \frac{{R}_{i}^{\mathrm{obs}}-r}{{\sigma }_{{R}_{i}}^{\mathrm{obs}}}\right)\displaystyle \frac{{\beta }_{{ld}^{\prime} }\left(\tfrac{r-\underline{R}}{\overline{R}-\underline{R}}\right)}{\overline{R}-\underline{R}}\ {dr},\end{array}\end{eqnarray*}$

and we use the symbol ${{\boldsymbol{c}}}_{i}=({c}_{11,i},\,...,{c}_{{dd}^{\prime} ,i})$ to simplify the expression.

Then, for given values d and d', we find an estimator of ${\boldsymbol{w}}$ that maximizes the log-likelihood,

$\begin{eqnarray}&&\begin{array}{l}\mathrm{maximize}:\quad \mathrm{log}L=\displaystyle \sum _{i=1}^{n}\mathrm{log}({{\boldsymbol{c}}}_{i}^{T}{\boldsymbol{w}}),\\ \mathrm{subject}\ \mathrm{to}:\quad \displaystyle \sum _{k=1}^{d}\displaystyle \sum _{l=1}^{d^{\prime} }{w}_{{kl}}=1,{w}_{{kl}}\geqslant 0,\\ \qquad \qquad \mathrm{for}\ \mathrm{all}\ k=1,\,...,d,l=1,\,...,d^{\prime} .\end{array}\end{eqnarray} \tag{ 9 }$

This is a convex optimization problem that can be readily solved using any standard numerical optimization methods. For our applications, we have used the ${\mathsf{R}}$ package Rsolnp to solve this optimization problem.

Now we discuss how we choose d and d', which act as tuning parameters relative to the smoothness of the underlying density function. We first choose a set of candidate values for d and d', such as 2, 3, ..., $n/\mathrm{log}(n)$ , where n is the sample size (i.e., number of exoplanets). We use 10-fold cross validation to choose the optimal value of degrees d and d' from those candidate values (see more detailed discussions on choosing the number of folds in the cross-validation method in Chapter 7.10 of Hastie et al. 2001). To conduct the 10-fold cross validation, we separate the data set randomly into 10 disjoint subsets with equal sizes. Then we leave out the sth subset, denoted by D_s ( $s=1,\,...,\,10$ ), and use the remaining nine subsets of data to estimate the parameters. Doing this for each D_s in turn results in 10 estimated sets of weights ${\hat{{\boldsymbol{w}}}}^{(s)}$ based on observations $i\notin {D}_{s}$ . We plug in each ${\hat{{\boldsymbol{w}}}}^{(s)}$ along with the corresponding data that was used to estimate each set of weights to obtain an estimated value for the log-likelihood. Mathematically, we are calculating the quantity

$\begin{eqnarray*}&&\mathrm{log}{L}^{\mathrm{pred}}=\displaystyle \sum _{s=1}^{10}\displaystyle \sum _{i\notin {D}_{s}}\mathrm{log}({{\boldsymbol{c}}}_{i}^{T}{\hat{{\boldsymbol{w}}}}^{(s)})\end{eqnarray*}$

for different possible values of d and d'. The optimum degrees for the BP model are then the d and d' which give the largest value of $-\mathrm{log}{L}^{\mathrm{pred}}$ .

Besides using the cross-validation method, other popular methods such as AIC (Akaike information criterion) and BIC (Bayesian information criterion) are also possible when data samples are abundant. When the sample size is relatively small, both methods will under-select the degrees, as in the example provided in Turnbull & Ghosh (2014). This is the reason why we choose to use the 10-fold cross-validation method.

The model may require d and d' to be large, and thus the number of weights that need to be estimated is also large. In such a situation, one may be concerned whether we have enough data to estimate the weights. Fortunately, most of the estimated weights turn out to be numerically zero, due to the fact that parameter space is simplex and therefore convex. This drastically reduces the number of effective parameters in our model. For example, mass predictions computed using the largest 25 weights of the fit performed in Section 5 are indistinguishable from predictions produced using the full set of weights, to the 3rd or 4th decimal in log(M).

Once we obtained $\hat{d}$ , $\hat{d}^{\prime}$ , and $\hat{{\boldsymbol{w}}}$ , the estimated values of d, d', and ${\boldsymbol{w}}$ , the M–R relation can be derived following Equation 1:

$\begin{eqnarray}\begin{array}{rcl}f(m| r,\hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} ) & = & \displaystyle \frac{f(m,r| \hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} )}{{\int }_{\underline{M}}^{\overline{M}}f(m,r| \hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} ){dm}}\\ & = & \displaystyle \frac{f(m,r| \hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} )}{f(r| \hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} )},\end{array}\end{eqnarray} \tag{ 10 }$

where

$\begin{eqnarray*}&&f(r| \hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} )=\displaystyle \sum _{k=1}^{\hat{d}}\displaystyle \sum _{l=1}^{\hat{d}^{\prime} }{\hat{w}}_{{kl}}\displaystyle \frac{{\beta }_{l\hat{d}^{\prime} }\left(\tfrac{r-\underline{R}}{\overline{R}-\underline{R}}\right)}{(\overline{R}-\underline{R})}.\end{eqnarray*}$

From Equation (10), mean, variance, and prediction intervals can be obtained analytically.

As noted in Section 2.1, we could also estimate our nonparametric model from a Bayesian framework by assigning priors to the parameters ${\boldsymbol{w}}$ , d, and d'. For example, we can assign a Dirichlet prior to the weights as

$\begin{eqnarray*}&&\pi ({w}_{11},{w}_{12}...,\,{w}_{{dd}^{\prime} })\sim \mathrm{Dir}({\alpha }_{11},{\alpha }_{12},\,\ldots ,\,{\alpha }_{{dd}^{\prime} }),\end{eqnarray*}$

where ${\sum }_{k=1}^{d}{\sum }_{l=1}^{d^{\prime} }{\alpha }_{{kl}}=1$ , and assign a Poisson or uniform prior for d and d', respectively. The inference can be carried out using a Markov chain Monte Carlo algorithm (MCMC; i.e., Petrone 1999a, 1999b; Petrone & Wasserman 2002). However, initial investigations into this approach indicated that the MCMC algorithm would take a much longer time to run. As a result, we employ a maximum likelihood method rather than a Bayesian method in this paper.

3. Data

We apply the Bernstein polynomial model to obtain the M–R relations from two data sets. The first data set is taken from WRF16 to enable a direct comparison between their parametric mass–radius relation and our nonparametric one (this comparison is illustrated in Figure 1). Specifically, we use their extended radial velocity data set, denoted in WRF16 as "RV only, <8 R_⊕," and that consists of all planets in their Table 2 except those labeled "c." In this study, we also exclude the planets whose mass measurements are only upper limits, as the R package we use to find the maximum likelihood estimates of the weights ${\boldsymbol{w}}$ does not allow censored data. The results from this benchmark data set (N = 60) are presented in Section 4.

**Figure 1.** Two M–R relations are plotted. The blue line is the mean power-law M–R relation estimated by WRF16; the shaded region is the central 68% of the predicted masses, showing the distribution of the mass predictions around the mean relation. The dark blue line along with its shaded region is our nonparametric mean M–R relation and the 68% prediction regions estimated using the Bernstein polynomial model. Comparing the parametric M–R with the nonparametric M–R, the means of the M–R relations noticeably differ in certain radius regimes. The parametric model is optimistic with the assumption that mass and radius follow an increasing power-law relation, even in the region where observations are few and poorly constrain the relation. The nonparametric M–R method is more adaptive to the observed data. The red lines denote physical restrictions for planet masses. Data points are displayed in the background with their measurement errors.
Download figure:
Standard image High-resolution image

The second data set consists of all planets with an assigned Kepler name, whose mass measurements originated from either radial velocities (RV) or from N-body dynamical fits to transit timing variations (TTVs). We note that TTV planets could have astrophysically different densities; however, we see that the inclusion of high-quality TTV masses in the current manuscript as a reasonable decision, both because prior work has made the same decision (Weiss & Marcy 2014 and WRF16) and because a data set that contains both TTV and RV masses better represents the full range of densities that an M–R relation would need to reproduce. We obtained this information from the NASA Exoplanet Archive (Akeson et al. 2013) on 2017 June 5; for this data set, we also excluded planets with only upper limits reported on their masses. With these restrictions, our Kepler-only data set consists of 127 planets with robust mass measurements; the results from this science data set are presented in Section 5.

3.1. Discussion of Selection Effects

We choose to restrict our science data set (results presented in Section 5) to Kepler planets in an effort to minimize the influence of survey selection effects. Selection effects manifest in two ways for the mass–radius relation:

1.
The probability of detecting a planet is a non-constant function of either their mass (for RV detection) or radius (for transit detection), with smaller or less massive planets being more difficult to detect.
2.
The probability that a known planet has its radius (for RV detections) or its mass (for transit detections) measured by follow-up observations is a complicated, often unpublished function of the planet's discovery mass/radius, the predicted radius/mass, and the host star's brightness, spectral type, activity indicators, sky position, and so on.

These two selection effects impact the inferred mass–radius relation in different ways. The first affects the relative amount of data at large versus small radii (or at high versus low mass). Correcting for this effect becomes important when one's scientific goal is a joint mass, radius probability distribution g(m, r) that reflects the underlying, true distribution of exoplanet masses and radii. For example, if one tried to use the data from the Exoplanet Archive as is with no correction for detection completeness, they would incorrectly conclude that ∼1.1R_Jupiter and ∼1M_Jupiter is the most probable mass and radius for an exoplanet, as more Jupiter-sized planets have had their masses measured than smaller planets. We already know that this potential conclusion to be incorrect: the numerous occurrence rate studies which use Kepler data to correct for variable detection efficiency across the full range of planet radii (e.g., most recently Foreman-Mackey et al. 2014; Fulton et al. 2017; Hsu et al. 2018) have shown that smaller planets are much more prevalent than Jupiter-sized ones. Therefore, if one wishes to understand the probability of an exoplanet existing with a specific mass and radius (i.e., characterize g(m, r)), they must account for the different surveys' detection completeness.

Fortunately, this is not the goal of this work. We are scientifically interested in $g(m| r)$ , the conditional distribution which gives us the M–R relation. In going from g(m, r) to $g(m| r)$ , we marginalize over r, which divides out the completeness correction in that dimension. With this correction disappearing from the equation for $g(m| r)$ , we can safely ignore the effect of the selection bias due to detectability of different radii planet. (see Equation (1)). This leaves only the second concern for our mass predictions: how transiting planets are selected for RV follow-up.

To address this concern, one needs to know how planets were chosen for follow-up. To date, no RV follow-up group has published their selection function in a quantitative way that would allow us to incorporate it into an analysis like this. Furthermore, given that the target list often evolves as RV data are acquired, this selection function is probably intractable to calculate for the current data set. Progress is being made toward this end by designing RV follow-up campaigns that have definable selection functions (i.e., Burt et al. 2018 and Montet 2018), but for the current data set, this remains practically out of reach. No other papers on the M–R relation have incorporated corrections for completeness or selection effects into their analysis. We follow this precedent, acknowledging that is a clear area for future work, and choose to focus instead on the novel development of a nonparametric analysis.

4. Benchmarking to WRF16

As we look to apply our nonparametric method to a wider range of exoplanet masses and radii than considered by WRF16, it is nonetheless prudent to benchmark our results to their results.

The parametric power-law M–R relation obtained by WRF16 is plotted in Figure 1. Quantitatively, it is $m| r\sim { \mathcal N }(1.6{r}^{1.8},2.9)$ for <8 R_⊕, which assumes that the M–R relation follows a normal distribution.

In Figure 1, we also plot the mean and 68% prediction intervals of M–R relation estimated using the proposed nonparametric method (the gray region). Based on the 10-fold cross-validation method, the values for $\hat{d}$ and $\hat{d}^{\prime}$ are chosen to be 50 and 20, respectively. Physically motivated boundaries for any M–R relations are denoted by the red lines. Those boundaries are calculated based on the constraint that $0\lt {M}_{i}\leqslant {M}_{i,\mathrm{pureFe}}$ , $i=1,\,...,n$ , where ${M}_{i,\mathrm{pureFe}}$ is calculated using the rock-iron internal structure models of Fortney et al. (2007a, 2007b),

$\begin{eqnarray*}&&\mathrm{log}({M}_{i,\mathrm{pureFe}})=\displaystyle \frac{-b\pm \sqrt{{b}^{2}-4a(c-{R}_{i})}}{2a},\end{eqnarray*}$

with a = 0.0975, b = 0.4938, and c = 0.7932.

Here we digress from the discussion to introduce three different uncertainty regions that we will mention in the article: Bayesian credible intervals, prediction intervals, and bootstrap confidence intervals. The Bayesian credible intervals are obtained using a Bayesian method. WRF16 obtains their Bayesian credible intervals from sample draws of the posterior distributions. For the nonparametric model used in this article, the uncertainty regions are called the prediction intervals because they are obtained from the prediction distribution of either radius given mass or mass given radius. To be more explicit, the prediction intervals are obtained from the distribution $f(m| r,\hat{{\boldsymbol{w}}},\hat{d},\hat{d}^{\prime} )$ , where $\hat{{\boldsymbol{w}}}$ is the maximum likelihood estimator (MLE) of ${\boldsymbol{w}}$ . The prediction intervals do not account for variations on parameters $\hat{{\boldsymbol{w}}}$ , $\hat{d}$ , and $\hat{d^{\prime} }$ themselves. To account for the variation of the parameters in the model, we use the bootstrap method. The bootstrap method is conducted by resampling (with replacement) the observations N times (here, we take N = 100). For each resampling, we obtain $\hat{d}$ , $\hat{d^{\prime} }$ , and $\hat{{\boldsymbol{w}}}$ . Then the bootstrap confidence intervals are obtained by plugging in N sets of these estimated values into the M–R relation function one by one, and identifying the 16% and 84% quantiles of the mean relation over those N realizations. Note that the Bayesian credible intervals, the prediction intervals, and the bootstrap confidence intervals not only have different interpretations, but also are derived from different models: the Bayesian credible intervals are obtained from WRF16's Bayesian hierarchical power-law model, and the other two intervals are obtained from our nonparametric model. Due to these differences, we will not compare these intervals. Instead, we focus on comparing the mean M–R relations from each method.

Now, let's go back to the discussion on Figure 1. At first glimpse, a distinct feature of this plot from the nonparametric M–R relation is that the uncertainty region is quite large for R > 5 R_⊕ compared to the region obtained from the parametric M–R relation. This is because there are only a few observations in that region, and the measurement error for each observation is relatively large. One may worry that the estimated mean M–R relation may be misleading in this context. However, we view this as a strength of using the nonparametric model for at least two reasons: First, there is no concrete theory to support that the population-level relationship between mass and radius follows a power law, and so the parametric M–R relation may be too optimistic with its precision in regions with little data. Second, from Figure 2, the constant variance assumed by the power-law model, even in the >5 R_⊕ regime, may not reasonable. The variance estimated using our nonparametric model behaves more as expected: the variance is smaller in the <5 R_⊕ region where the observations are abundant, and it becomes larger in the >5 R_⊕ region.

**Figure 2.** Intrinsic scatter is the estimated standard deviation (s.d.) for $f(m| r)$ . The parametric model (cyan) is under-fitting the M–R relation, and thus has a smaller estimation for the s.d. compared to the nonparametric model (dark blue). The nonparametric model suggests that the intrinsic scatter is not a constant along with radii. In this figure, the dark blue line plots the mean s.d. across all bootstrapped realizations estimated by the nonparametric method, and the corresponding shaded region denotes the bootstrap 16% and 84% confidence intervals. The cyan line denotes the posterior median of the intrinsic scatter term in the parametric model; the associated shaded area denotes the 16% and 84% quantiles of the posterior samples for that parameter.
Download figure:
Standard image High-resolution image

In Figure 2, we also plot the uncertainty regions for the estimated intrinsic scatter from the two models—that is, the 16% and 84% Bayesian credible intervals using the parametric model, and 16% and 84% bootstrap confidence intervals using the nonparametric model. Again, we would like to point out that the intrinsic scatter estimated by the nonparametric model is clearly not a constant. In fact, WRF16 also believed that the intrinsic scatter may change as planets increase the size. However, they only modeled the intrinsic scatter as a linear function of radius, and this model was not flexible enough to capture the true behavior: they found that the posterior distribution of the slope of that linear function included zero, and thus concluded that the intrinsic scatter was a constant with radius.

The normal assumption in the parametric model is also disfavored by observations. From Figure 1, the prediction intervals obtained from the nonparametric model suggests that the conditional distribution of mass given radius is not always normally distributed, as there are regions where those intervals are not symmetric around the mean. When WRF16 checked the normal assumption after fitting their model, they also found evidence that this assumption does not hold.

5. The Joint Exoplanet Mass–Radius Distribution

In this section, we repeat our nonparametric M–R relation analysis using all robustly measured Kepler observations provided in the NASA Exoplanet Archive, as described in Section 3. Since we consider a larger range of planet masses and radii, we will display the M–R relation in log scale and perform our analysis on $\mathrm{log}m$ and $\mathrm{log}r$ , rather than m and r. Note that the M–R relation in the original scale of masses and radii can be obtained by applying the Jacobian method to the joint distribution of $\mathrm{log}m$ and $\mathrm{log}r$ .

Similar to Equations (4)–(7), the joint distribution for $\mathrm{log}m$ and $\mathrm{log}r$ is

$\begin{eqnarray}&&\begin{array}{l}f(\mathrm{log}m,\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )\\ =\ \displaystyle \sum _{k=1}^{d}\displaystyle \sum _{l=1}^{d^{\prime} }{w}_{{kl}}\displaystyle \frac{{\beta }_{{kd}}\left(\tfrac{\mathrm{log}m-\mathrm{log}\underline{M}}{\mathrm{log}\overline{M}-\mathrm{log}\underline{M}}\right){\beta }_{{ld}^{\prime} }\left(\tfrac{\mathrm{log}r-\mathrm{log}\underline{R}}{\mathrm{log}\overline{R}-\mathrm{log}\underline{R}}\right)}{(\mathrm{log}\overline{M}-\mathrm{log}\underline{M})(\mathrm{log}\overline{R}-\mathrm{log}\underline{R})}.\end{array}\end{eqnarray} \tag{ 11 }$

Inferring the parameters w, d, d' for this model is similar to the inference performed for the previous model.

5.1. M–R relation of the Full Kepler Data Set

The M–R relation is the conditional distribution of log-scaled mass given log-scaled radius, derived from Equation (1):

$\begin{eqnarray}&&f(\mathrm{log}m| \mathrm{log}r,{\boldsymbol{w}},d,d^{\prime} )=\displaystyle \frac{f(\mathrm{log}m,\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )}{f(\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )},\end{eqnarray} \tag{ 12 }$

with $f(\mathrm{log}m,\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )$ as in (11) and

$\begin{eqnarray*}&&f(\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )=\displaystyle \sum _{k=1}^{d}\displaystyle \sum _{l=1}^{d^{\prime} }{w}_{{kl}}\displaystyle \frac{{\beta }_{{ld}}\left(\tfrac{\mathrm{log}r-\mathrm{log}\underline{R}}{\mathrm{log}\overline{R}-\mathrm{log}\underline{R}}\right)}{\mathrm{log}\overline{R}-\mathrm{log}\underline{R}}.\end{eqnarray*}$

We use 10-fold cross-validation (see Section 2.2) to select d (corresponding to mass) and d' (corresponding to radius), and find that $\hat{d}=55$ and $\hat{d}^{\prime} =50$ . After we obtain $\hat{d}$ and $\hat{d}^{\prime}$ , we then plug in their values to estimate ${\boldsymbol{w}}$ using the MLE method. Because the values of $\hat{d}$ and $\hat{d}^{\prime}$ may change under different realizations of the random number generator, we repeat the 10-fold cross-validation method five times to assess its stability, each time using a different random seed. We found that four out of five times, the cross-validation gives the same choices for d and d', with the dissenting realization suggesting a slightly larger number for $\hat{d}$ . Thus we decide to use $\hat{d}=55$ and $\hat{d}^{\prime} =50$ to estimate the M–R relation. After we obtain $\hat{d}$ , $\hat{d}^{\prime}$ , and $\hat{{\boldsymbol{w}}}$ , we derive the prediction distribution of the M–R relation and employ the bootstrap method described in Section 4 to obtain the bootstrap confidence intervals for the mean.

We plot the mean along with the prediction intervals and the bootstrap confidence intervals in Figure 3. In the plot, the gray region represents the 16% and 84% prediction intervals and the blue region represents the 16% and 84% bootstrap confidence intervals for the mean. The black line represents the estimated M–R relation. From the plot, we see that the M–R relation below 0.8 R_⊕ is unexpectedly similar to a step function. This is because there is only one isolated data point in that region, which the model tries to connect with the nearby points via smooth polynomials. Fortunately, we see that the bootstrap confidence intervals in this region are very wide, indicating that the M–R relation is not well understood in this region, and that more data are needed.

The M–R relations can be roughly split into three parts: 0.8 R_⊕ ≤ r ≲ 5 R_⊕, 5 R_⊕ ≲ r ≲ 11 R_⊕, and ≳11 R_⊕. In the region between 0.8 R_⊕ and 5 R_⊕, the bootstrap confidence intervals narrow as data points become abundant for larger radii, which suggests the M–R relation is more accurate. The almost linear relation suggests that the power-law relation may not be a misleading assumption in this radius regime. Transitioning from the 5 R_⊕ region to the 11 R_⊕ region, the number of observations decreases, and we observe both the prediction intervals and bootstrap confidence intervals becoming slightly wider. In the region >11 R_⊕, the M–R relation is more flat or even decreases as the radius becomes larger. These findings are consistent with well-known features of the M–R relation (see the discussion in Section 6.1).

Figure 4 shows how the intrinsic scatter changes as a function of radius; because the intrinsic scatter is defined as the standard deviation of $f(\mathrm{log}m| \mathrm{log}r)$ (see Equation (12)), the intrinsic scatter is technically in units of $\mathrm{log}m$ . We plot this on the left and the original scale on the right for a more intuitive comparison. Starting at the smallest radii, we see that the bootstrap confidence intervals are quite wide. This reflects the fact that there is only one data point below ∼0.8 R_⊕, and so the intrinsic scatter is quite uncertain in this regime. In the region between 0.8 R_⊕ and ≲5 R_⊕, the intrinsic scatter is an increasing function with radius. The increasing pattern becomes more obvious in the right plot. This finding is consistent with the result shown in Figure 2, even though the two results are based on different data sets. With more data points to constrain this region, we see that the size of the intrinsic scatter in Figure 4 is smaller than in Figure 2. In the >5 R_⊕ region, the intrinsic scatter behaves nonlinearly. It decreases first and then increases at radii dominated by inflated Jupiters. As discussed in Section 6.1, the M–R relation is difficult to interpret in the Jupiter regime because degeneracy pressure creates a vertical line in this space that is not well captured by the conditional. See the R–M relation in Figure 6 for a clearer picture of the model's behavior in the Jupiter regime.

**Figure 4.** Intrinsic scatter of the M–R relation, in log scale (left) and linear mass scale (right). Mathematically, the intrinsic scatter is the estimated standard deviation of $f(\mathrm{log}m| \mathrm{log}r)$ . We note the behavior of the intrinsic scatter in the three regions visually identified in Figure 3: for 0.8 R_⊕ ≤ r ≲ 5 R_⊕, the intrinsic scatter is increasing; for 5 R_⊕ ≲ r ≲ 11 R_⊕, it starts to decline; and for ≳11 R_⊕, it is nonlinear. In the figure, the dark blue curve is the estimated standard deviation of the M–R relation and the shaded area is the uncertainty region between the 16% and 84% bootstrap confidence intervals.
Download figure:
Standard image High-resolution image

5.2. Mass Predictions for Given Radii

In Figure 5, we plot five conditional densities for mass given radius at different radius values: 1 R_⊕, 3 R_⊕, 5 R_⊕, 10 R_⊕, and 15 R_⊕. These curves represent the mass predictions for planets at those radii (i.e., the probability of a planet of that size having a certain mass). The shaded regions represent the uncertainty in that distribution and correspond to the 68% bootstrap confidence regions. While these distributions look roughly Gaussian in log scale, when transformed to a linear scale, all the densities are skewed to the right. The two densities within the region >5 R_⊕ are more skewed than the rest. Thus no matter the the size of the planets, the intrinsic scatter is definitely not Gaussian in linear scale, as was assumed by WRF16. The lognormal intrinsic scatter assumed by Chen and Kipping (2017) is more appropriate for mass given radius, although it fails to capture the low-mass tail corresponding to super-puffy planets at 7–10 R_⊕ and <30 M_⊕.

5.3. The Radius–Mass (R–M) Relation

As we are modeling the joint distribution of mass and radius (in log scale), the radius–mass relation (R–M relation) can also be derived easily. From Equation (2), the conditional distribution for $\mathrm{log}r$ given $\mathrm{log}m$ is

$\begin{eqnarray}&&f(\mathrm{log}r| \mathrm{log}m,{\boldsymbol{w}},d,d^{\prime} )=\displaystyle \frac{f(\mathrm{log}m,\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )}{f(\mathrm{log}m| {\boldsymbol{w}},d,d^{\prime} )},\end{eqnarray} \tag{ 13 }$

with $f(\mathrm{log}m,\mathrm{log}r| {\boldsymbol{w}},d,d^{\prime} )$ in (11) and

$\begin{eqnarray*}&&f(\mathrm{log}m| {\boldsymbol{w}},d,d^{\prime} )=\displaystyle \sum _{k=1}^{d}\displaystyle \sum _{l=1}^{d^{\prime} }{w}_{{kl}}\displaystyle \frac{{\beta }_{{ld}}\left(\tfrac{\mathrm{log}m-\mathrm{log}\underline{M}}{\mathrm{log}\overline{M}-\mathrm{log}\underline{M}}\right)}{\mathrm{log}\overline{M}-\mathrm{log}\underline{M}}.\end{eqnarray*}$

After we plug in $\hat{d}$ , $\hat{d}^{\prime}$ , and $\hat{{\boldsymbol{w}}}$ , we plot this relation along with the prediction intervals and bootstrap confidence intervals in Figure 6. Note that the R–M relation is not a simple flip of the M–R relation around the 1:1 line; this is due to the fact that the M–R relation is actually a distribution that is asymmetric around a mean relation that is not a straight line.

**Figure 6.** Nonparametric estimation of the R–M relation; note that this is not simply the inverse of the M–R relation. The dark blue line is the mean R–M relation, and the gray area is the 16% and 84% prediction intervals. The light blue area is the uncertainty region between the 16% and 84% bootstrap confidence intervals.
Download figure:
Standard image High-resolution image

The R–M relation is highly uncertain for <0.8 M_⊕ due to the single isolated data point in that region; this is reflected in the wide bootstrap confidence intervals. The bootstrap confidence intervals narrow as mass increases, reflecting higher certainty in the mean relation, given the data. This mean R–M relation is an increasing function up to 100 M_⊕; above that, the R–M relation becomes flatter due to degeneracy pressure.

We also plot the 68% bootstrap confidence intervals for each of the conditional distributions for 1, 10, 50, 100, and 500 M_⊕. Almost all the densities are skewed to the right, especially when the displayed log scale is transformed to linear scale. In the 0.8 M_⊕ ≤ m ≤ 10² M_⊕ region, the three densities look quite different, indicating rapid change in the radius distribution from 1 to 50 M_⊕.

5.4. Predicting K2 Planets

In this section, we apply the estimated M–R relation to predict masses of K2 planets, given their radius. The K2 planets data are obtained from the NASA Exoplanet Archive on 2017 June 5. In Figure 8, we first plot the prediction intervals from the M–R relation for all radii, then plot the K2 planets on top of the band. We see that for many of the planets with R < 10 R_⊕, the prediction intervals do not cover their masses. However, for planets with R > 10 R_⊕, the prediction intervals do span the masses for all the planets. This suggests that there is a significant bias for K2 planets. One explanation for this is that the most massive planets reach a high mass detection significance threshold first and thus are published first. Burt et al. (2018) discusses this bias and its implication for population mass–radius analyses in detail.

6. Implications for Exoplanet Compositions

The nonparametric M–R relation that we have fit to the full Kepler data set yields several astrophysical results. In the following sections, we discuss them in detail.

6.1. Assessing "Well-understood" M–R Features

A crucial test of any method is how well it reproduces features that are well established by multiple studies using different techniques. To this end, we emphasize that our relation is not just the average line, but also the predictive distribution around the line. We also note that our goal was not to reproduce prior mass–radius relations, as one of our motivations for applying a nonparametric method was to assess how well the features identified in more parametric models persist when the strong assumptions in those models are relaxed.

To start with, our nonparametric M–R relation reproduces the well-established result that more massive planets ≲10² M_⊕ are on average larger (see Figure 3). This result is not a surprise: more massive planets are expected to form earlier in the protoplanetary disk lifetime and thus should able to accrete more gas-rich material, which quickly increases the planets' radii. Indeed, all M–R relations in the literature (see Section 1.1) show that mass (or radius) is an increasing function of radius (or mass) in the sub-Jupiter regime. However, above about half the mass of Jupiter, degeneracy pressure begins to noticeably affect the radii of hydrogen-dominated bodies (Zapolsky & Salpeter 1969), and the R–M relation should flatten out. Our nonparametric approach readily reproduces this result (see Figure 6).

The electron degeneracy pressure would manifest as a vertical line in the M–R relation of Figure 3; when taking the marginal of the joint mass–radius distribution to produce the $f(m| r)$ (mass given radius) conditional distribution, vertical features like this are collapsed, with their extent represented as the probability distribution at that radius. Figure 7 displays these probability distributions, and shows that there is some probability of more massive planets at 10 and 15 R_⊕. In fact, the difference between $f(m| r)$ and $f(r| m)$ in the Jupiter regime is an argument for why a two-way M–R relation is necessary: there are some features that are easier to capture one way as opposed to the other.

**Figure 7.** Conditional distributions for radii given masses at 1, 10, 50, 100, 500 M_⊕. As in Figure 5, the shaded region corresponds to the uncertainty in the predictive distributions (i.e., the 16% and 84% bootstrap confidence intervals). These predictive distributions are more skewed than those for mass given radius.
Download figure:
Standard image High-resolution image

6.2. Location of Transition Points

Identifying transitions in the exoplanet mass–radius relation is important for our physical understanding of the types of planets that exist, based on their compositions.

No sharp transitions are visible in Figure 3. Nevertheless, there appears to be at least three segments to the M–R relation: ≲5 R_⊕, ∼5 to ∼11 R_⊕, and ≳11 R_⊕ radii, which roughly corresponds to sub-Neptunes, sub-Saturns, and Jupiters. A comparison of Figure 3, the M–R relation, with Figure 6, the R–M relation, shows that there is uncertainty in the sub-Neptune to sub-Saturn transition point. This is because we relax the assumption that the distribution around the average relation is symmetric and constant within a segment, as is assumed by WRF16 and Chen & Kipping (2017) via both studies' use of a Gaussian distribution to characterize the intrinsic scatter term. With this restriction lifted, the average mass given radius is no longer the inverse of the average radius given mass, as the conditionals shown in these two plots are produced by marginalizing over a joint mass and radius distribution that is asymmetric. When the joint distribution is much wider in one-dimension than it is in the other, as is the case around 4–7 R_⊕, then the conditionals can look quite different. A relatively large spread in radii for planets with 3–10 M_⊕ falls adjacent to a small spread in radii for planets around 30 M_⊕; this is what causes ambiguity in identifying a transition from Neptunes to Saturns.

Upon visual inspection, it is also clear that there are not currently enough mass measurements of extrasolar planets ≤2 R_⊕ to justify a super-Earth transition point. This result is at odds with some previously published results on the M–R relation—for example, Weiss & Marcy (2014), Rogers (2015), Chen & Kipping (2017), and Fulton et al. (2017). In the following discussion, we will address each of these works individually.

Weiss & Marcy (2014) fixed the transition point at 1.5 R_⊕ based on a visual inspection of the density versus radius plane. When their data set is plotted with the reported error bars, it becomes apparent that there is little empirical evidence for a transition at 1.5 R_⊕, given the size of the density or mass error bars for the smallest planets. Indeed, one could draw a straight line through the mass and radius points and have it fit the data just as well as the chosen relation with the transition.

To our best knowledge, Rogers (2015) provided the strongest evidence of a transition in the M, R plane among all four of these studies. This paper parameterized the transition as the radius below which 100% of planets must be rocky (i.e., their masses are greater than the minimum mass of a rocky planet at that size). This study tested both a step function for the fraction of planets that are rocky and a more gradual transition; given the size of the error bars, a step function was sufficient to describe this data set. While useful in its physical interpretation, this parametrization does not guarantee that there is a visible kink or transition in the M–R relation. Indeed, no such kink is visible in their data set, displayed in their Figure 1. This is because the Rogers parameterization of the transition tests for the absence of planets in one region of the M, R parameter space (that of very low-mass planets in the <1.5 R_⊕ radius regime); if their absence does not sufficiently alter the average M–R relation, then it will not produce a kink in the M–R plane. Considering the very large measurement uncertainties in this very small planet regime, it is indeed the case that the absence of these planets does not significantly shift the average M–R relation to produce a kink.

Fulton et al. (2017) looked only at the marginal radius distribution. There is no guarantee that the bimodality they see in the radius distribution will produce a kink, or a transition, in the M–R plane. Indeed, the transition is likely gradual with a period dependence. Additional follow-up to measure masses for a larger sample of these small planets is needed before a claim of a Fulton-like gap or transition in the M–R plane is warranted.

The tension between the existence of a super-Earth transition in Chen & Kipping (2017) and our lack of one arises from the fact that we do not include the minor Solar System bodies in our data set. It is not unreasonable to expect that the compositions of mainly refractory bodies in other planetary systems will be similar to those in ours. However, including this data without inflating the uncertainties to match those we obtain for extrasolar systems leads to misleadingly tight constraints for ∼1 M_⊕ planets. Indeed, there is only one planet between 1 and 3 M_⊕ in their data set, so most of the information about the "Terran" transition point comes from extrapolating up from Solar System minor bodies and extrapolating down from Neptunes, rather from measurements at the transition point itself. We choose a less Solar System–constrained approach to characterize the distribution of compositions in the 1–3 R_⊕ regime, choosing instead to develop a flexible method that can best let the extrasolar data speak for itself as more super-Earths are discovered and followed up by TESS.

6.3. Intrinsic Dispersion

The intrinsic dispersion in mass given radius slightly increases from 1 to 5 R_⊕ (see Figure 4), but varies much more in radius given mass (see the width of the gray region in Figure 6). This indicates that planet compositions in the 3–10 M_⊕ range exhibit more diversity than others, and that planets with 1–5 R_⊕ fall into a relatively narrow range of masses. The large spread in radius over a narrow mass regime could have a number of physical interpretations: perhaps these super-Earths/sub-Neptunes fall at a transition point where scaled-up terrestrial planet formation co-exists with scaled-down gaseous planet formation. Perhaps planet evolution is particularly dramatic in this mass regime; photoevaporation has been shown to be important for such low-density low-mass planets (e.g., Lopez & Rice 2016; Owen & Wu 2017).

In either case, it would be useful to search for multiple populations in this region of the M–R diagram, to test if the bimodal marginal radius distribution unearthed by Fulton et al. (2017) and verified by Hsu et al. (2018) extends to the joint radius and mass space. While there is not currently evidence for a bimodal mass and radius distribution, our approach of calculating the M–R and R–M relations by first estimating the joint distribution is capable of finding and quantifying such bimodalities. Before the joint distribution can be interpreted as an astrophysical result, however, both selection effects discussed in Section 3.1 would need to be incorporated into the density estimation.

6.4. Predictive Distributions

Given the current data set, a lognormal distribution is a reasonable approximation for the conditional distribution of mass given radius in most radius regimes (see Figure 5). On the other hand, there is significant skewness in the conditional distributions for log-scaled radius, given log-scaled mass (see Figure 7). This result demonstrates that adopting the more flexible model was warranted, and that different mass regimes have different composition distributions. In particular, the super-puffy planets found by TTV analyses are visible as the skewed tail in the 10 M_⊕ conditional distribution.

6.5. Implications for TESS

As seen in Figure 5, mass predictions are the best constrained for planets at 3 R_⊕: the conditional distribution for mass at 3 R_⊕ is the most robust to the bootstrapping scheme we used to measure the uncertainty in these distributions. As a result, RV follow-up of TESS will yield the most scientific return for 1–2 R_⊕ planets and above ∼4 R_⊕ where there are intrinsically fewer planets.

While few Kepler target stars were sufficiently bright to enable follow-up, a larger proportion of K2 planet-hosting stars are not. One would hope that these brighter stars would enable more mass detections of smaller planets, yet we see in Figure 8 that there is significant bias in the K2 planets planetary masses, especially for the planets with ≤7 R_⊕ radii. This serves as a word of warning for those who aim to perform mass–radius analyses with TESS data: to build robust M–R relations, all mass measurements, even upper limits, must be published and incorporated into the analysis (see Burt et al. 2018 for an in-depth study of this effect). In addition, the second type of selection effect described in Section 3 must be quantified and published as part of major radial velocity follow-up programs.

**Figure 8.** The masses of K2 planets with radii <10 R_⊕ are biased. The shaded area is the same as that shown in Figure 3; it is the uncertainty region between 16% and 84% prediction intervals of the M–R relation estimated using *Kepler* data. K2 planets' masses and radii are plotted in red with their measurement errors.
Download figure:
Standard image High-resolution image

7. Conclusion

In this paper we provide a nonparametric approach, using the Bernstein polynomial model, to estimate the M–R relation. We applied this approach to analyze two data sets.

The first data set is from WRF16, which we use to benchmark our results. First, we found that the M–R relations estimated using the parametric model and the nonparametric model are similar for <5 R_⊕ planets. However, the two relations differ for >5 R_⊕, in that the prediction intervals of the M–R relation obtained from the nonparametric model are wider from a model that assumed a power-law M–R relation. We also found that the conditional distribution of mass given radius is not always normally distributed. Last, we found that the intrinsic scatter is not a constant function with radius, increasing weakly up to 5 R_⊕.

We applied the Bernstein polynomial model to study the joint exoplanet mass and radius distribution using all Kepler planets with measured masses. We found that there are at least three distinct M–R relations for planets with radii ≲5 R_⊕, 5 R_⊕ < r < 11 R_⊕, and >11 R_⊕. We also studied the relationship between the intrinsic scatter and the radius. We found that the intrinsic scatter is a weakly increasing function with radius up to ∼5 R_⊕ and becomes nonlinear beyond that. Furthermore, we found that the conditional distribution of mass given radius is reasonably approximated by a lognormal distribution, which skews to the right in m, but is nearly symmetric in log m. Last, we studied the R–M relation and applied the estimated M–R relation to predict K2 planets. We found that the bias for K2 planet masses, especially for those with <10 R_⊕, is large.

A major contribution of this study is that our method provides a tool to explore a wider range of fits by incorporating heteroscedastic measurement errors. Our method provides a new direction for studying the M–R relation when more mass and radius measurements are obtained with future missions like TESS and PLATO. To enable these future studies, we have placed the input data sets and the R code which we use to fit the weights, find the optimal number of degrees, and plot the figures in this paper, at the following website: https://github.com/Bo-Ning/Predicting-exoplanet-mass-and-radius-relationship.

In the future, the model itself can be extended by adding incident flux as a third variable for a better understanding of the M–R relation (see Thorngren & Fortney 2018 and Sestovic et al. 2018 for examples of flux-dependent M–R analysis in the Jupiter regime). It is likely that a mass–radius–flux (M–R–F) relation would better describe planets in all mass regimes, given the importance of photoevaporation for low-mass planets. With the rich and growing body of literature on the subject, this is an area where significant progress can be made with the right tools. As we have demonstrated in this paper, a nonparametric approach provides such a tool: it can yield numerous astrophysical insights and model the data with minimal assumptions, making it a promising option for future mass–radius analyses.

The first draft of this paper was completed when the first author was a PhD student at the Department of Statistics at North Carolina State University, Raleigh, North Carolina.

The authors thank the Statistical and Applied Mathematical Sciences Institute (SAMSI) for bringing the authors together and providing space and funding for continued collaborations. The authors also thank Tom Loredo and Eric Ford for their valuable suggestions on this work while it was in development at SAMSI, Eric Ford for providing detailed revisions on the first version of the manuscript, Shubham Kanodia and Gudmundur Stefansson for their work in translating the R code into Python, and Matthias Yang He and Eric Ford for their work in testing the number of effective nonzero weights while translating the code into Julia. Furthermore, the authors thank the two referees who provided valuable suggestions to improve the paper. This material was based on work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. A.W. acknowledges support from the National Science Foundation Astronomy & Astrophysics Postdoctoral Fellowship program under Award No. 1501440. The Center for Exoplanets and Habitable Worlds is supported by the Pennsylvania State University, the Eberly College of Science, and the Pennsylvania Space Grant Consortium. This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program. This paper includes data collected by the Kepler mission. Funding for the Kepler mission is provided by the NASA Science Mission directorate.

Facility: NASA Exoplanet Archive. -

Software: R and python.

Appendix A: Bernstein Polynomials and Their Properties

Bernstein polynomials (BPs) have a long history in mathematics and statistics since their original publication (Bernstein, 1912). Readers interested in knowing more about the use and origins of BPs in various scientific fields will find the article by Farouki (2012) very comprehensive and informative. Here we present a brief overview of the main properties and motivations of choosing BPs over other basis functions, such as splines or kernel methods. Given a continuous function f(u) defined on the unit interval [0, 1], the BP of degree d corresponding to the function f is defined as

$\begin{eqnarray*}&&{B}_{d}(u;f)=\displaystyle \sum _{k=0}^{d}f\left(\displaystyle \frac{k}{d}\right)\displaystyle \left(\genfrac{}{}{0em}{}{d}{k}\right){u}^{k}{(1-u)}^{d-k}\,\,\mathrm{for}\,d=1,2,\,\ldots ,\end{eqnarray*}$

where B_d(x, f) is a polynomial of degree at most d. It was shown by Bernstein that $\parallel {B}_{d}(\cdot ,\,f)-f(\cdot ){\parallel }_{\infty }\equiv {\sup }_{u\in [\mathrm{0,1}]}| {B}_{d}(u,f)\,-f(u)| \to 0$ as $d\to \infty$ , which constructively demonstrates that a continuous function on a closed interval can be uniformly approximated by a polynomial. Figure 9 provides plots of the basis function(s) with d = 1, 2, 3, 5, 10, 20.

**Figure 9.** Plot of Bernstein polynomial basis functions, as a function of degree (i.e., the number of terms in the polynomial series); compare to Equation (3). Note that the independent variable (i.e., the x-axis) must be rescaled to be between 0 and 1. See Appendix B.1 for a discussion of this.
Download figure:
Standard image High-resolution image

The result extends to more than one dimension by taking the tensor product of the univariate BPs. In this article we have used suitable transformations to approximate a bivariate density function defined on the unit square [0, 1]². In a statistical density estimation problem, we would not know the density function f and we thus work with ${\theta }_{k}=f\left(\tfrac{k}{d}\right)$ for k = 0, 1, ..., d and estimate the parameter vector (θ₀, θ₁, ..., θ_d) using likelihood and other standard statistical methods. The prime motivation for using BPs model for estimating densities is that it not only enjoys many of the similar asymptotic properties as some of the other popular density estimation methods (such as those based on kernels and B-splines), but it also has good asymptotic properties near boundaries. Its connection to a mixture of Beta densities is described in Tenbusch (2014). Babu & Chaubey (2006) provide much of the asymptotic properties of the BPs in multi-dimensions and show that densities can be estimated based on dependent sequence of multivariate random variables. Moreover, it has been shown that BPs can also be used to make inference when the density or regression functions are subject to shape constraints (see Wang & Ghosh 2012; Turnbull & Ghosh 2014).

Appendix B: More on the Bernstein Polynomials Model

B.1. On Choosing the Upper and Lower Bounds for Mass and Radius: $\underline{M}$ , $\overline{M}$ , $\underline{R}$ , $\overline{R}$

The Bernstein polynomials model requires specifying an upper bound and a lower bound for both mass and radius. Although in theory one could choose the upper bound and the lower bound to be any values, in practice setting the upper bounds and the lower bounds to different values will cause the results to be different from the results in Figures 1 and 3. In particular, when there are no data in a region, the Bernstein polynomials model will fit a smooth line toward the overall mean. This happens when the upper bound is chosen to be too large. To illustrate this, we offer Figure 10 as a comparison to Figure 1.

**Figure 10.** Plot of the M–R relation with its prediction intervals when $\overline{R}$ is chosen to be 8 R_⊕ instead of 6.7 R_⊕ (as in Figure 1). Compared with Figure 1, the two M–R relations are similar in the region where R ≤ 6 R_⊕, where a large amount of data is available. The current plot shows a downward trend for the M–R relation when R > 6 R_⊕, because where there is no data the model fits a line smoothly to the overall mean. The large prediction intervals show that this downward trend is highly uncertain. We note that in this plot, $\hat{d}=\hat{d}^{\prime} =35$ , which was selected based on the cross-validation method described in Section 2.2.
Download figure:
Standard image High-resolution image

In Figure 10, we set the upper bound of masses to be 8 R_⊕, which is the same value used in WFR16's paper, instead of 6.7 R_⊕. One can immediately tell that the region between 6.7 R_⊕ and 8 R_⊕ has a downward trend. This is because the overall mean of the data set's planet masses is smaller than the mass at 6.7 R_⊕, which is where the last observation lies. Because the model tends toward the overall mean when there is no data, a downward trend is produced. This trend might not provide a correct astrophysical interpretation of the M–R relation in this region, and so we strongly caution against extrapolating this nonparametric relation. That said, the prediction intervals are very wide in this region, which suggests that there is significant variability in the mean relation, which would be rectified with more observations.

B.2. On Choosing the Degrees d and d'

Choosing the degrees d and d' is somewhat computationally expensive due to the cross-validation method's searching for the optimal d and d' from a large number of potential values. To decrease the computation time, one can force d = d'. We adopt this approach to choose d and then re-estimate the parameters in the model. Figures 10 and 11 are the results produced by choosing d = d'. The cross-validation method chooses d = d' = 35 for the first data set and d = d' = 55 for the second. Comparing these two figures to Figures 1 and 3, respectively, the results do not vary too much.

**Figure 11.** Plot of the M–R relation with its prediction intervals estimated from the *Kepler* data. We chose $\hat{d}=\hat{d}^{\prime} =55$ , where the values of d and d' are selected based on the cross-validation method discussed in Section 2.2. Comparing with Figure 3, the two M–R relations are similar.
Download figure:
Standard image High-resolution image

**Figure 11.** Plot of the M–R relation with its prediction intervals estimated from the *Kepler* data. We chose $\hat{d}=\hat{d}^{\prime} =55$ , where the values of d and d' are selected based on the cross-validation method discussed in Section 2.2. Comparing with Figure 3, the two M–R relations are similar.
Download figure:
Standard image High-resolution image

Appendix C: A Simulation Study

To check how missing values and measurement errors could affect the estimation result, we conduct several simulation studies. We simulate data from a power-law relation and use the Bernstein polynomial model to estimate that function, as we are interested in comparing the M–R relations estimated under different settings.

In this simulation study, we define the "true" M–R relation as follows: $r\sim { \mathcal U }(0,8)$ , which stands for the uniform distribution between 0 R_⊕ and 8 R_⊕, and $m\sim { \mathcal N }(1.6{r}^{1.8},1)$ (i.e., the extended M–R relation found by WRF16 but with a smaller intrinsic scatter). We then perform a series of simulations where we generate 100 data sets, each of which has 100 observations. What varies between the different simulations are the measurement errors: we choose the measurement errors on radii as 0.4, 1, 3 R_⊕ and on masses as 5, 20, 35 M_⊕.

We fit each data set to the model, choosing the degrees using the same 10-fold cross-validation scheme that is described in Section 2.2. As the measurement uncertainty increases, the observed data smear out the underlying M–R relation. Information about the M–R relation is lost in this process, and so a less complex nonparametric model is required to fit the data. This is reflected in a lower degree being recommended by our cross-validation scheme: for the original simulated data set with no measurement errors, d = 90, while d = 40, 7, 5 for the three data sets with consecutively larger measurement errors.

In the following figures, we plot the estimated M–R relation (red line) along with the true M–R relation (black line) and the 16% and 84% confidence intervals. We also plot all 100 simulated data sets in light blue in the background and highlight one data set from them in dark blue.

To summarize our findings, the first subplot in Figure 12 suggests the nonparametric model did a good job estimating the M–R relation when the data are generated from the "true" model. When data have measurement errors, the quality of estimated M–R relation depends on the magnitude of measurement errors for a fixed sample size. The larger the measurement errors, the worse the estimated M–R relations and the larger the confidence intervals. We also found that the true mean M–R relation is contained in the confidence intervals regardless of how large the measurement errors are.

We also run a simulation study for a data set with missing observations in some regions (non-uniform sampling). Figure 13 presents the estimation result when the data between 4.5 R_⊕ and 5 R_⊕ are removed. Compared to the first subplot in Figure 12, the M–R relation is different in the region where the data are missing. The model tries to find the optimal polynomials to estimate the M–R relation. The polynomials may not necessarily to be linear; thus in the region with no data points, the M–R relation can be nonlinear. When there are missing values, Bernstein polynomials impute missing values by taking the mean from the nearest region where data are presented. Thus the mean M–R relation is lower than the true relation on the left side of missing data region and higher than the true relation on the right side. While the mean M–R relation may depart from the true relation in the missing data region, the confidence intervals contain the true M–R relation, which does not rule out the possibility of obtaining the true relation. This gives us confidence in applying our model to real Kepler and K2 data that have non-uniform sampling.

Predicting Exoplanet Masses and Radii: A Nonparametric Approach

Article metrics

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

1.1. Previous Work on Mass–Radius Relations

1.2. Moving to a Nonparametric Approach

2. The Model

2.1. The Bernstein Polynomials Model

2.2. Model Inference

3. Data

3.1. Discussion of Selection Effects

4. Benchmarking to WRF16

5. The Joint Exoplanet Mass–Radius Distribution

5.1. M–R relation of the Full Kepler Data Set

5.2. Mass Predictions for Given Radii

5.3. The Radius–Mass (R–M) Relation

5.4. Predicting K2 Planets

6. Implications for Exoplanet Compositions

6.1. Assessing "Well-understood" M–R Features

6.2. Location of Transition Points

6.3. Intrinsic Dispersion

6.4. Predictive Distributions

6.5. Implications for TESS

7. Conclusion

Appendix A: Bernstein Polynomials and Their Properties

Appendix B: More on the Bernstein Polynomials Model

B.1. On Choosing the Upper and Lower Bounds for Mass and Radius: $\underline{M}$ , $\overline{M}$ , $\underline{R}$ , $\overline{R}$

B.2. On Choosing the Degrees d and d'

Appendix C: A Simulation Study

Footnotes

Predicting Exoplanet Masses and Radii: A Nonparametric Approach

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

1.1. Previous Work on Mass–Radius Relations

1.2. Moving to a Nonparametric Approach

2. The Model

2.1. The Bernstein Polynomials Model

2.2. Model Inference

3. Data

3.1. Discussion of Selection Effects

4. Benchmarking to WRF16

5. The Joint Exoplanet Mass–Radius Distribution

5.1. M–R relation of the Full Kepler Data Set

5.2. Mass Predictions for Given Radii

5.3. The Radius–Mass (R–M) Relation

5.4. Predicting K2 Planets

6. Implications for Exoplanet Compositions

6.1. Assessing "Well-understood" M–R Features

6.2. Location of Transition Points

6.3. Intrinsic Dispersion

6.4. Predictive Distributions

6.5. Implications for TESS

7. Conclusion

Appendix A: Bernstein Polynomials and Their Properties

Appendix B: More on the Bernstein Polynomials Model

B.1. On Choosing the Upper and Lower Bounds for Mass and Radius: \underline{M}, \overline{M}, \underline{R}, \overline{R}

B.2. On Choosing the Degrees d and d'

Appendix C: A Simulation Study

Footnotes

B.1. On Choosing the Upper and Lower Bounds for Mass and Radius: $\underline{M}$ , $\overline{M}$ , $\underline{R}$ , $\overline{R}$