This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
NOTICE: Ensuring subscriber access to content on IOPscience throughout the coronavirus outbreak - see our remote access guidelines.

The American Astronomical Society (AAS), established in 1899 and based in Washington, DC, is the major organization of professional astronomers in North America. Its membership of about 7,000 individuals also includes physicists, mathematicians, geologists, engineers, and others whose research and educational interests lie within the broad spectrum of subjects comprising contemporary astronomy. The mission of the AAS is to enhance and share humanity's scientific understanding of the universe.

https://aas.org/

The Institute of Physics (IOP) is a leading scientific society promoting physics and bringing physicists together for the benefit of all. It has a worldwide membership of around 50 000 comprising physicists from all sectors, as well as those with an interest in physics. It works to advance physics research, application and education; and engages with policy makers and the public to develop awareness and understanding of physics. Its publishing company, IOP Publishing, is a world leader in professional scientific communications.

https://www.iop.org

# MOST 1.6 EARTH-RADIUS PLANETS ARE NOT ROCKY

, ,

0004-637X/801/1/41

## Abstract

The Kepler mission, combined with ground-based radial velocity (RV) follow-up and dynamical analyses of transit timing variations, has revolutionized the observational constraints on sub-Neptune-sized planet compositions. The results of an extensive Kepler follow-up program including multiple Doppler measurements for 22 planet-hosting stars more than doubles the population of sub-Neptune-sized transiting planets that have RV mass constraints. This unprecedentedly large and homogeneous sample of planets with both mass and radius constraints opens the possibility of a statistical study of the underlying population of planet compositions. We focus on the intriguing transition between rocky exoplanets (comprised of iron and silicates) and planets with voluminous layers of volatiles (H/He and astrophysical ices). Applying a hierarchical Bayesian statistical approach to the sample of Kepler transiting sub-Neptune planets with Keck RV follow-up, we constrain the fraction of close-in planets (with orbital periods less than ~50 days) that are sufficiently dense to be rocky, as a function of planet radius. We show that the majority of planets have too low density to be comprised of Fe and silicates alone. At larger radii, the constraints on the fraction of rocky planets are even more stringent. These insights into the size demographics of rocky and volatile-rich planets offer empirical constraints to planet formation theories, and guide the range of planet radii to be considered in studies of the occurrence rate of "Earth-like" planets, .

Export citation and abstract

## 1. INTRODUCTION

After subsisting on the major bodies orbiting the Sun for centuries, astronomers are now placing the solar system into context with the discovery of a plethora of exoplanets. Exoplanets detected both in transit and by their dynamical influence are very valuable. The radius derived from the transit depth and the mass derived from radial velocity (RV) measurements or transit timing variations (TTVs), together give the planet density and some handle on the planet composition. To date, more than 200 transiting planets have measured masses (e.g., exoplanets.org; Wright et al. 2011). The accumulating census of transiting planets with measured masses contains information about the underlying composition distribution of planets and about the masses of the more than 3000 Kepler transiting planet candidates (Borucki et al. 2011; Batalha et al. 2013) that currently lack dynamical confirmation.

Planet mass–radius measurements are especially important for planets that are smaller than Neptune $\left( {{R}_{p}}\lesssim 4\;{{R}_{\oplus }} \right)$. For planets in this size range, a wide diversity of planet compositions are a priori plausible; rock, astrophysical ices (H2O, NH3, and CO), and H/He gas can all make significant contributions to both the planets' mass and volume (e.g., Rogers & Seager 2010a, 2010b). In the solar system, there are no planets with masses and radii intermediate between the Earth and the ice giants. As a statistical sample of sub-Neptune-sized exoplanets with measured masses and radii accumulates, we may begin to study the intriguing transition between planets that are predominantly rocky and those with voluminous layers of volatiles (astrophysical ices and H/He).

The Kepler Science Team targeted 22 Kepler Objects of Interest (KOIs) hosting planet candidates with ${{R}_{p}}\lt 4\;{{R}_{\oplus }}$ in an extensive Keck HIRES RV follow-up program (Marcy et al. 2014). The planet candidates were selected for RV follow-up based on their quiet host stars (Kepler magnitude ${{K}_{p}}\lt 13.5$, ${{T}_{{\rm eff}}}\lt 6100{\rm K}$, $v{\rm sin} i\lt 5\;{\rm km}\;{{{\rm s}}^{-1}}$) and detectable predicted RV-amplitudes $\left( K\gt 1\;{\rm m}\;{{{\rm s}}^{-1}} \right)$. Each target received 20–50 RV measurements between 2009 July and 2013 August. A mean velocity precision of $\sim 2\;{\rm m}\;{{{\rm s}}^{-1}}$ was achieved for the planet hosts, which have Kepler magnitudes ranging from 8.77 to 13.5. Orbital fits and MCMC analyses were employed to derive planet masses and mass upper limits from the Doppler RVs, yielding mass constraints for 49 planets (42 transiting and 7 non-transiting) in 22 planetary systems. Sixteen transiting planets have mass measurements at a confidence level of $2\sigma$ or better, while the rest have marginal RV detections or mass upper limits. The Marcy et al. (2014) survey more than doubles the number of known sub-Neptune-sized transiting planets with RV mass constraints.

The Kepler transiting planets with Keck HIRES RV follow-up is the largest and most homogeneous sample of sub-Neptune-sized planet mass–radius measurements to date. It opens an unprecedented window into the demographics of small planet bulk compositions. Weiss & Marcy (2014) fit these measured masses and radii (along with those of an additional nine planets smaller than $4\;{{R}_{\oplus }}$ with masses vetted on exoplanets.org) to power-law relations and found a nearly linear mass–radius relation: ${{M}_{p}}/{{M}_{\oplus }}=2.69{{\left( {{R}_{p}}/{{R}_{\oplus }} \right)}^{0.93}}$.

In this work, we focus on the intriguing threshold between rocky planets and exo-Neptunes with voluminous gas layers, a transition which has implications for planet formation, evolution, habitability, and the interpretation of the Kepler planet radius distribution. Instead of fitting simple functional forms to the measured properties of planets, we apply statistical tools to investigate how the HIRESRV and Kepler transit depth measurements constrain the underlying composition distribution of planets. We apply a hierarchical Bayesian analysis to constrain, as a function of planet size, the fraction of planets that are sufficiently dense to be rocky. As inputs, our approach takes samples from the mass–radius posterior distributions output from fitting the RV and transit photometry data. Our approach naturally accounts for both the non-Gaussian likelihoods, and the significant correlations between the RV-measured masses of planets in multi-planet systems (neither of which are taken into account by fitting to estimate outputs or summary statistics, as in Weiss & Marcy 2014).

We present updated mass–radius measurements for small planets and describe our sample of planets in Section 2. We outline our statistical approach to constraining the fraction of planets that are sufficiently dense to be rocky, as a function of planet size in Section 3. We present the results of our analysis in Section 4, and then discuss and conclude in Sections 5 and 6.

## 2. PROPERTIES OF INDIVIDUAL SUB-NEPTUNE-SIZE PLANETS

### 2.1. Defining a Statistical Sample of Planets

The full collection of planets having constraints on both their mass, Mp, and radius Rp (Figure 1) is a heterogeneous sample. Some planets were initially discovered from ground-based RV surveys, and later found to transit (GJ 436b, 55 Cnc e, HD 97658b). A larger sample of planets were initially discovered by transit surveys, and then confirmed with RV follow-up (e.g., GJ 1214b, CoRoT-7b, HAT-P-11b). The Kepler transit survey alone has made a tremendous contribution to populating the mass–radius diagram of small planets. The Kepler discoveries themselves are a diverse sample; many have mass constraints derived from TTVs (e.g., Kepler-11b,c,d,e,f,g, Kepler-30b,c, Kepler-36b,c, and the Wu & Lithwick 2013 planets), while the rest have RV-derived mass constraints (e.g., Kepler-4b, Kepler-10b,c, Kepler-19b, Kepler-20b,c,d, Kepler-21b, Kepler-22b, KOI-94b, and the Marcy et al. 2014 planets) or joint RV-TTV constraints (e.g., Kepler-18b,c,d).

For our statistical study of the planet composition distribution, we focus on the Kepler planets and planet candidates with RV-constrained masses. Each planet transit and RV survey has its own set of selection effects and biases. By restricting our sample to the Kepler planets with Keck HIRES RV follow-up, we work with the largest, most homogeneous sample of sub-Neptune planet mass–radius measurements collated to date. In addition to the planet masses published in Marcy et al. (2014), we also include planetary systems with previously published Keck HIRES RVs: KOI-70 (Kepler-20), KOI-72 (Kepler-10), KOI-84 (Kepler-19), KOI-87 (Kepler-22), and KOI-975 (Kepler-21). We have chosen not to include Kepler planets with TTV-constrained masses in our sample, since they are subject to separate selection effects (which may preferentially favor low-density planets, e.g., Jontof-Hutter et al. 2014; Weiss & Marcy 2014).

The sample of sub-Neptune-size Kepler planets with Keck HIRES RV follow-up includes planets on close-in orbits. All but four of the planets in our sample have orbital periods less than 50 days, while all but one of the planets with RV-measured masses more than 2 σ above zero have orbital periods below 16.2 days. Kepler-22b (KOI-87.01) is the long period outlier in the sample, with an orbital period of 290 days, and a marginal ~2 σ mass measurement of $32_{-14}^{+10}\;{{M}_{\oplus }}$ (including RV measurements up to 2013 August). The planets in our sample receive bolometric incident flux, Fp, between 1.1 and 3700 times that received by the Earth at 1 AU from the Sun, ${{F}_{\oplus }}$.

For each Kepler planet candidate selected for RV follow-up, the measured planet mass provides an unbiased sampling of close-in planet masses for its particular planet radius. The selection of KOIs for Keck HIRES follow-up (as detailed in Marcy et al. 2014) was neither random, nor fully algorithmic. As the Kepler team's priorities shifted toward characterizing smaller and smaller planets as the mission progressed, the selection criteria prioritizing KOIs for follow-up observations also evolved. The planets were selected for Keck HIRES RV follow-up based on their planet radius and stellar properties, and then pursued with Keck HIRES in a mass-blind way. We factor out dominant selection effects by focussing on the conditional distribution of masses for close-in planets as a function of planet radius. We plan to incorporate a more elaborate treatment of selection effects in our hierarchical Bayesian model in future work.

### 2.2. Measuring Rp and Mp from Transit and RV Data

For each planet host in our sample, the Kepler photometry and Keck RVs were simultaneously fit with an analytic model for transiting planets on Keplerian non-interacting circular orbits. The stellar properties are constrained either by analysis of high-resolution Keck reconnaissance spectra with SME (Spectroscopy Made Easy; Valenti & Piskunov 1996), or by asteroseismology analysis. There was no strong evidence in the RVs for eccentric planet orbits, though seven of the planet host stars did have RV trends indicative of non-transiting planets. The fitting procedure is described in detail in Marcy et al. (2014). This full photo-dynamical analysis was performed in 2013 May. To incorporate RV measurements from the 2013 observing season, the analysis of the Keck RVs was repeated in fall 2013. This time, however, only the RV amplitude of each planet was fit, with all other system parameters held fixed (H. Isaacson 2013, private communication).

In our statistical analysis we use the planet mass distributions obtained from this updated fit to the RVs. We note that some of the masses used in this work may be slightly different from those quoted in Table 2 of Marcy et al. (2014), which mixes the two different procedures to fit the light curve and RV data (from spring 2013 and fall 2013). For our statistical study, we apply the same fitting procedure consistently to all 22 planet hosts in the Marcy et al. (2014) and the 5 previously published small-planet-hosting stars with Keck HIRES RVs. The difference in the masses obtained by the two procedures is no more than $1.4\;\sigma$ for any planet.

The MCMC chains sampling the masses of the planets orbiting each star are directly incorporated into our statistical analysis of the planet composition distribution. In this way, we fully account for correlations between the RV masses of planets orbiting the same star, which may be induced both by the mutual dependence of the multi-planets' properties on the stellar properties and by correlations in the planets semi-amplitudes obtained from fitting the RVs. Accounting for correlations is especially important in the case of marginal RV detections, for which the posterior distribution of the planet mass is most strongly subject to correlations with the masses of other planets in the system. Pairs of planets with strong correlations in their measured masses (correlation coefficients $\left| R \right|\gt 0.1$) include KOIs 70.01 and 70.02 $\left( R=-0.31 \right)$, 70.01 and 70.03 $\left( R=-0.13 \right)$, 70.02 and 70.03 $\left( R=0.15 \right)$, 116.01 and 116.02 $\left( R=0.11 \right)$, 148.01 and 148.02 $\left( R=-0.35 \right)$, 148.01 and 148.03 $\left( R=0.14 \right)$, 148.02 and 148.03 $\left( R=-0.18 \right)$, 245.02 and 245.03 $\left( R=-0.16 \right)$, and 246.01 and 246.02 $\left( R=-0.12 \right)$.

We have neglected correlations among the planet radii and any correlations between a planet's radius and its mass, since the RV data were fit independently of the transit light curve data. We approximate the distribution of planet radii as Gaussian with the mean and standard deviation given in Table 2 of Marcy et al. (2014). Since the radii of the planets in our sample are far more tightly constrained than the masses, correlations affecting the planet radii are expected to be minor and to have a subdominant effect on our results. This has borne out in the results of the full photo-dynamical fit to the Kepler photometry and a subset of the RV data (Marcy et al. 2014).

### 2.3. Identifying Potentially Rocky Planets

One of the most direct insights we can glean about a sub-Neptune exoplanet's composition is whether it must have some volatiles (where volatiles refer to H/He or astrophysical ices). The mass–radius relation for a silicate composition (the solid-brown curve in Figure 1) represents an extreme lower limit on the mass of rocky planet with no volatiles at a specified radius. Planets that are less dense must have some volatiles (in the form of water or H/He); though rock may still account for most of their mass, these planets are too low in density to have their transit radius defined by a rocky surface. Planets that are more dense could potentially be comprised of iron and silicates alone. At the high-mass extreme, the mass–radius relation for a pure iron composition represents an upper limit to the mass of a rocky planet of a given size (the solid-gray curve in Figure 1). Planet mass–radius pairs that are more dense than pure iron are unphysical.

We denote the minimum and maximum masses of rocky planets of a given radius, Rp, by ${{M}_{{\rm rock},{\rm min} }}\left( {{R}_{p}} \right)$ and ${{M}_{{\rm rock},{\rm max} }}\left( {{R}_{p}} \right)$, respectively. These mass–radius relations bound the "potentially rocky" regime of planet mass–radius space. Planets with mass ${{M}_{{\rm rock},{\rm min} }}\left( {{R}_{p}} \right)\leqslant {{M}_{p}}\leqslant {{M}_{{\rm rock},{\rm max} }}\left( {{R}_{p}} \right)$ are potentially (but not necessarily) comprised of iron and silicates alone; planets in this regime may still contain substantial amounts of water and other volatiles if the low-density material is offset by higher density iron-enhanced rock.

We nominally use the mass–radius relations from Seager et al. (2007) for MgSiO3 perovskite, and phase Fe to define ${{M}_{{\rm rock},{\rm min} }}\left( {{R}_{p}} \right)$ and ${{M}_{{\rm rock},{\rm max} }}\left( {{R}_{p}} \right)$. The curvature of these iso-composition mass–radius relations takes into account the compression of materials at higher pressure. There is a maximum radius for planets in the "potentially rocky" regime, set by the maximum radius of a sphere of the limiting low-density composition before degeneracy pressure takes over and causes radius to decrease with mass. For a pure silicate composition, ${{R}_{{\rm max} ,{\rm rock}}}=3.48\;{{R}_{\oplus }}$.

Pure silicate and pure iron are both hypothetical end-member compositions; more plausible limits to the masses of rocky planets would include some mixture of Fe and silicates at both the high and low mass extremes. By adopting a generous mass range in our definition of "potentially rocky" planets at a given size, we set an upper bound on the fraction of planets that actually have a rocky composition. We investigate the effect of other choices for limiting high- and low-density rocky planet mass–radius relations in Section 5.2.

The lack of rocky planets with ${{R}_{p}}\gt 2\;{{R}_{\oplus }}$ (Figure 1) is one of the most striking (and largely model-independent) features of the transiting sub-Neptune-sized planets with mass constraints. There is an apparent high-density/high-mass threshold to the measured planet masses and radii. The paucity of high-density/high-mass planets persists despite the fact that, for transit-discovered planets of a specified size, Rp, it is easier to detect denser and more massive planets in RV follow-up. This seems to indicate a dearth of rocky planets with masses in excess of ${{M}_{p}}\sim 10\;{{M}_{\oplus }}$. Our goal is to place this observation on a strong statistical footing.

We must account for the large measurement uncertainties on planet masses and radii when assessing which planets may be rocky. Many of the Kepler planets with RV follow-up have either marginal RV detections or mass upper limits. Nonetheless, even the non-detections contain information about the planet composition. We introduce ${{p}_{{\rm rocky}}}$ the posterior probability that a planet is dense enough to be rocky based on the measured mass and radius. ${{p}_{{\rm rocky}}}$ is evaluated as the fraction of a transiting planet's joint mass–radius posterior probability density (obtained from fitting a planet model to the RV data and transit light curve as described in Section 2.2) that falls within the high-density "potentially rocky" regime (${{M}_{{\rm rock},{\rm min} }}({{R}_{p}})\leqslant {{M}_{p}}\leqslant {{M}_{{\rm rock},{\rm max} }}({{R}_{p}})$). Planets with ${{p}_{{\rm rocky}}}$ near 1 are well constrained to have masses and radii in the potentially rocky regime, while low-density planets have ${{p}_{{\rm rocky}}}$ near 0. RV marginal detections or non-detections, may have measured RV amplitudes that spill into unphysical regimes (corresponding to negative mass or masses exceeding that of a pure iron sphere). In calculating ${{p}_{{\rm rocky}}}$, we assume a flat prior on planet mass–radius pairs that are physically plausible and a prior probability of 0 on mass–radius pairs that are not (i.e., $p({{M}_{p}},{{R}_{p}})={\rm constant}$ if ${{M}_{p}}\leqslant {{M}_{p{\rm rock},{\rm max} }}({{R}_{p}})$ and ${{M}_{p}}\gt 0$, otherwise $p({{M}_{p}},{{R}_{p}})=0$).

The values of ${{p}_{{\rm rocky}}}$ for our sample of Kepler transiting planets with RV mass constraints are presented in Figure 2. It is seen that smaller planets are more likely to be dense enough to be rocky. Planets larger than $\sim 2\;{{R}_{\oplus }}$ have too low density to be comprised of iron and silicates alone; there is some sort of transition regime between 1 and $2\;{{R}_{\oplus }}$.

## 3. STATISTICAL METHODS FOR CHARACTERIZING PLANET POPULATION PROPERTIES

While the trend of planets larger than $2\;{{R}_{\oplus }}$ being volatile-rich is visible by eye in Figures 1 and 2, we aim to quantify the rocky/non-rocky transition in a more statistically rigorous way. We would like to assess (i) up to what size do the majority of planets have a rocky composition, and (ii) is the transition from rocky to non-rocky planet compositions as a function of Rp gradual or abrupt?

The calculation of ${{p}_{{\rm rocky}}}$ in the previous section (Figure 2) considers each planet separately and independently, with a flat prior on each planet's mass and radius. We now show how a joint analysis of several planet systems having constraints on the planet masses and radii can be used to both constrain the mass–radius distribution of the planet population, and to inform our priors on the masses of subsequent transiting planet discoveries.

We use a hierarchical Bayesian model to infer properties about the underlying population of planet compositions from the sample of Kepler planets with RV follow-up. The approach is hierarchical in the sense that we open up the priors assumed for each planet's mass and radius to modeling. Instead of assuming that any planet mass–radius pair is as likely or as unlikely as any other (the typical flat-prior assumption that has been applied in all planet mass–radius analyses to date), we assume the planets in our sample are drawn from a joint mass–radius distribution. We model the joint mass–radius distribution, assuming simple functional forms with a few free parameters, and derive the posterior probability density of those population-level parameters conditioned on the data.

We describe our statistical approach in detail below. First, we set up a hierarchical Bayesian model framework to measure the underlying mass–radius distribution of a population of planets from a census of planets with constraints on their masses and radii (Section 3.1). Though we hope to eventually characterize the complete joint mass–radius distribution of planets, the dominant selection effects in the current sample of Kepler transiting planets with RV follow-up lead us to instead focus on the conditional distribution of planet mass at a specific planet radius. In Section 3.2, we tailor the hierarchical Bayesian approach to constrain the fraction of planets, ${{f}_{{\rm rocky}}}\left( {{R}_{p}} \right)$ that are sufficiently dense to be rocky, as a function of planet size.

### 3.1. Hierarchical Bayesian Model for the Planet Mass–Radius Distribution

We adapt the resampling approach outlined by Hogg et al. (2010), which focussed on constraining the true distribution of planet eccentricities from the likelihood functions on the orbital parameters of RV-detected planets, to the question of constraining the underlying true joint distribution of planet masses and radii from a sample of transiting planets with RV follow-up. This method takes as input a sample from the posterior probability density for the masses and radii in the individual planetary systems.

We have N stars (indexed by n, where $1\leqslant n\leqslant N$) that are orbited by at least one transiting planet, and that have each been followed up with spectroscopic RV measurements. Each star is orbited by Jn planets detected in transit (indexed by j, where $1\leqslant j\leqslant {{J}_{n}}$). We use ${{{\boldsymbol{D}} }_{n}}$ to denote the data on the nth star, including RV measurements and the light curve photometry.

We denote by ${{{\boldsymbol{ \beta }} }_{n}}$ the properties of the planet(s) orbiting the nth star. In this work, we will focus mostly on planet masses Mnj and radii Rnj, but ${{{\boldsymbol{ \beta }} }_{n}}$ also includes the number of planets orbiting the star, the planet orbital periods, host star properties, and other model parameters from the photodynamical fit that, for this work, are nuisance parameters that we marginalize over (e.g., star center of mass velocity, transit ephemeris, properties of non-transiting planets detected by RV-trends etc).

For each star, n, we imagine that we have been provided (in our case by Marcy et al. 2014) with a Kn-element sampling from the posterior probability density function (PDF) of the planet properties obtained from fitting the RV and photometry data,

Above, ${{\mathcal{L}}_{n}}=p\left( {{{\boldsymbol{D}} }_{n}}\left| {} \right.{{{\boldsymbol{ \beta }} }_{n}} \right)$ is the likelihood of the data given a set of planet model parameters ${{{\boldsymbol{ \beta }} }_{n}}$, Zn is a normalization constant, and ${{p}_{0}}\left( {{{\boldsymbol{ \beta }} }_{n}} \right)$ is an uninformative "interim" prior PDF chosen by the RV-photometry data fitter (Marcy et al. 2014, took a flat prior on planet mass and radius). For each planetary system, n, the sampling takes the form of a chain of Kn samples (indexed by k, where $1\leqslant k\leqslant {{K}_{n}}$), each a complete set of the planet parameters ${{\beta }_{nk}}$, such that the distribution of samples is consistent with a random draw from the posterior PDF, $p\left( {{{\boldsymbol{ \beta }} }_{n}}\left| {} \right.{{{\boldsymbol{D}} }_{n}} \right)$.

Assuming that the observations of different stars are independent (i.e., neglecting any correlations between observations of different stars introduced by hardware issues or calibration), the total posterior, likelihood and prior for all the parameters of all the N planetary systems, is just the product of the individual system posteriors, likelihoods and priors, respectively

Our goal is to constrain the distribution of "true" (as opposed to measured or estimated) planet masses and radii based on the noisy observations of all N planetary systems. Here, the adjective "true" refers to what would have been measured in much higher signal-to-noise ratio observations of the same objects. To this end, we develop a model for the distribution of true planet masses and radii, that depends on some population-level parameters ${\boldsymbol{ \alpha }}$. We use that population model to define a new set of priors on the planet properties, depending on ${\boldsymbol{ \alpha }}$

The joint posterior probability for the properties of all the individual planets and the population-level parameters ${\boldsymbol{ \alpha }}$ governing the planet mass–radius distribution is then,

where $p\left( {\boldsymbol{ \alpha }} \right)$ are the priors on the population-level parameters.

The posterior presented above in Equation (6) is hierarchical: the parameters describing the distribution of planet properties are inferred, and influence the estimates of the individual planet properties. With this hierarchical framework, characterizing the true planet mass–radius distribution boils down to constraining the population-level parameters, ${\boldsymbol{ \alpha }}$. For this purpose, the true properties of the individual planetary systems are nuisance parameters. Marginalizing over (integrating out) the nuisance parameters, we obtain the posterior PDF on ${\boldsymbol{ \alpha }}$,

We may then identify the marginal likelihood function of ${\boldsymbol{ \alpha }}$, ${{\mathcal{L}}_{{\boldsymbol{ \alpha }} }}$,

Fortunately, we do not have to evaluate the multi-dimension integrals in Equation (8) directly. The marginalized likelihood for ${\boldsymbol{ \alpha }}$ can readily be evaluated by applying importance resampling to the samples from the posterior PDF for each individual planet system's properties provided to us by the RV-photometry fitter.

Equation (9) above is equivalent to Equation (9) of Hogg et al. (2010). Each element in the PDF samples for an individual planet system is re-weighted by the ratio of the new prior PDF (depending on ${\boldsymbol{ \alpha }}$, which we want to infer) and the interim prior PDF (on which the original sampling was based). We elaborate upon the choice of $p\left( {{{\boldsymbol{ \beta }} }_{nk}}\left| {} \right.{\boldsymbol{ \alpha }} \right)$ in the next section.

### 3.2. Modeling the Conditional Planet Mass Distribution, at Specified Planet Radius

In the previous section we described a general approach to constraining the true mass–radius distribution of planets from a census of transiting planets with RV follow-up. We now tailor the approach to the Marcy et al. (2014) sample of Kepler planets with RV follow-up.

As described in Section 2.1, the Marcy et al. (2014) sample of sub-Neptune-sized planets were initially detected from transits in the Kepler photometry, selected for RV follow-up based on their radius, orbital period and stellar properties, and then spectroscopically followed-up with Keck HIRES in a mass-blind way. For a specified planet radius, the measured RV-masses of the Marcy et al. (2014) planets provide an unbiased sampling of close-in planet masses of that particular size. In contrast, the radius distribution of the Marcy et al. (2014) planets is dominated by the selection effects applied in choosing Kepler planet candidates, and does not reflect the true underlying radius distribution of planets.

In our hierarchical Bayesian analysis, we factor out the dominant selection effects by focussing on the conditional distribution of planet masses at a specified planet radius. We frame our model for the conditional distribution of planet masses at a specified radius in terms of the fraction of planets of that size that are sufficiently dense to be rocky, ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$. As defined, ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$ is the fraction of planets of radius Rp that have mass greater than ${{M}_{{\rm rock},{\rm min} }}\left( {{R}_{p}} \right)$, while $1-{{f}_{{\rm rocky}}}|{{R}_{p}}$ is the fraction of planets with radius Rp that have mass less than ${{M}_{{\rm rock},{\rm min} }}\left( {{R}_{p}} \right)$ (and thus must contain astrophysical ices or H/He that contribute to the transit depth).

We assume simple functional forms for ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$, that depend on a few free parameters ${\boldsymbol{ \alpha }}$, the planet radius, and (potentially) other properties of the planet–star system (such as planet insolation or stellar mass)

In our model, the distribution of planet masses conditioned on a particular planet radius depends on ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$ (and hence on ${\boldsymbol{ \alpha }} ).$ Instead of a completely flat prior on planet mass (as typically assumed when fitting individual planet systems), we divide mass into two regimes ("potentially rocky" and "non-rocky"), and take a flat prior PDF on the planet mass within each regime

On all other parameters except for planet mass, Mp, we keep the same non-informative interim prior PDF as used by Marcy et al. (2014). Assuming separability of the prior PDF,

The interior prior, ${{p}_{0}}({{M}_{nj}})$, is flat on Mp for each planet j in the nth system. The α-dependent prior we hope to infer also treats each planet independently.

Using Equations (9) and (12) we constrain the posterior PDF of ${\boldsymbol{ \alpha }}$. Then, based on the assumed functional form for ${{f}_{\alpha }}\left( {{R}_{p}} \right)$, we may transform $p\left( {\boldsymbol{ \alpha }} \left| {} \right.\left\{ {{{\boldsymbol{D}} }_{n}} \right\}_{n=1}^{N} \right)$ to obtain a posterior PDF for ${{f}_{{\rm rocky}}}$, conditioned on planet size, Rp.

where δ is the Dirac-delta function. The posterior PDF $p\left( {{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}},\left\{ {{{\boldsymbol{D}} }_{n}} \right\}_{n=1}^{N} \right)$ quantifies and summarizes the constraints the Kepler planet candidates with RV follow-up place on the fraction ${{f}_{{\rm rocky}}}$ of planets that are dense enough to be rocky as a function of planet size. For a given Rp, values of ${{f}_{{\rm rocky}}}$ with higher values of $p\left( {{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}},\left\{ {{{\boldsymbol{D}} }_{n}} \right\}_{n=1}^{N} \right)$ are more strongly favored by the RV and photometry data. This approach to constraining the fraction of planets of a given size that are rocky takes into account both the observational uncertainties on the individual planet masses and radii, and the statistical uncertainties due to the finite number of planets in our sample.

## 4. RESULTS

We now apply the hierarchical Bayesian formalism described above to the Marcy et al. (2014) sample of transiting planets with mass constraints. We explore several different choices for the functional form of ${{f}_{\alpha }}\left( {{R}_{p}} \right)$.

### 4.1. Step-function Rocky/Non-rocky Radius Threshold

We first explore the possibility that all planets larger than a threshold radius, ${{R}_{{\rm thresh}}}$, are non-rocky, while all planets smaller than ${{R}_{{\rm thresh}}}$ are dense enough to be comprised of iron and silicates alone. In this scenario, we have a one-parameter (${{{\boldsymbol{ \alpha }} }_{1}}\equiv {{R}_{{\rm thresh}}}$) step-function model where the fraction of planets with volatiles depends only on planet radius, ${{f}_{1\alpha }}\left( {{R}_{p}},{{R}_{{\rm thresh}}} \right)$,

We adopt uniform priors on $0\lt {{R}_{{\rm thresh}}}\lt {{R}_{{\rm max} ,{\rm rock}}}$, where ${{R}_{{\rm max} ,{\rm rock}}}=3.48\;{{R}_{\oplus }}$ is a very conservative upper bound on the radius of rocky planets (the maximum radius of a silicate sphere before degeneracy pressure takes over and causes radius to decrease with mass). This simple one-parameter model serves as a baseline case against which all more sophisticated models are compared.

Figure 3 presents the resulting posterior probability on Rthresh. We find a median value ${{R}_{{\rm thresh}}}=1.48\;{{R}_{\oplus }}$, and a 95% confidence upper bound of ${{R}_{{\rm thresh}}}=1.59\;{{R}_{\oplus }}$. The finding that ${{R}_{{\rm thresh}}}$ is near $1.5\;{{R}_{\oplus }}$ with a 95% confidence upper limit of $1.6\;{{R}_{\oplus }}$ effectively quantifies the trend visible by eye in Figure 1 that most planets discovered to date that are larger than $1.6\;{{R}_{\oplus }}$ are too low in density to be rocky. For the pure silicate composition adopted as the low-density extreme for rocky planets, the median and 95% upper bound on ${{R}_{{\rm thresh}}}$ correspond to threshold masses of $3.5$ and $5.0\;{{M}_{\oplus }}$, respectively. For an Earth-like composition, the median and 95% upper bound on ${{R}_{{\rm thresh}}}$ correspond to masses of $4.5$ and $6.0\;{{M}_{\oplus }}$, respectively.

Physically, we might expect that small planets will show some composition diversity, and that there will be a range of radii at which high-mass rocky planets and low-mass non-rocky planets co-exist. We now investigate whether there is evidence in the data for a gradual transition between the populations of rocky and non-rocky planets, and how allowing the possibility of a gradual transition affects the inferred relative frequency of rocky planets at a given size.

In our second model, assume that the fraction of planets that are dense enough to be rocky, ${{f}_{{\rm rocky}}}$ decreases with planet radius in a piecewise-linear fashion

We are still assuming that the non-rocky planet fraction depends only on planet radius. However, our model now has two parameters (${{{\boldsymbol{ \alpha }} }_{2}}\equiv \left\{ {{R}_{{\rm mid}}},{{{\Delta }}_{R}} \right\}$), the mid-point of the linear transition (where ${{f}_{2\alpha }}=0.5$), ${{R}_{{\rm mid}}}$, and the width of the transition, ${{{\Delta }}_{R}}$. This linear transition model reduces to the step-function model when ${{{\Delta }}_{R}}=0$, but admits the possibility of a range of radii over which rocky and non-rocky planets coexist. For our priors, we take a flat prior distribution on $\left( {{R}_{{\rm mid}}},{{{\Delta }}_{R}} \right)$ from the rectangular 2D area, $-0.5{{R}_{{\rm max} ,{\rm rock}}}\lt {{R}_{{\rm mid}}}\lt 1.5{{R}_{{\rm max} ,{\rm rock}}}$, $0\lt {{{\Delta }}_{R}}\lt {{R}_{{\rm max} ,{\rm rock}}}$. In cases with ${{R}_{{\rm mid}}}-\frac{1}{2}{{{\Delta }}_{R}}\lt 0$ there is a finite prior probability of non-rocky planets as Rp approaches 0, and when ${{R}_{{\rm mid}}}+\frac{1}{2}{{{\Delta }}_{R}}\gt {{R}_{{\rm max} ,{\rm rock}}}$ there is a finite prior probability of rocky planets up to the maximum physically allowable size, ${{R}_{{\rm max} ,{\rm rock}}}$. The joint posterior probability density of ${{R}_{{\rm mid}}}$ and ${{{\Delta }}_{R}}$ conditioned on the observed planets is displayed in Figure 4, along with the marginal distribution of each parameter.

At any radius equal to or larger than $1.62\;{{R}_{\oplus }}$, the majority (50% or more) of planets of that size are not-rocky (at 95% statistical confidence, based on the linear transition model for ${{f}_{{\rm rocky}}}$). In the linear transition model, ${{R}_{{\rm mid}}}$ represents the planet size at which 50% of the planets are sufficiently dense to be rocky. Marginalizing over ${{{\Delta }}_{R}}$ (Figure 4), we constrain ${{R}_{{\rm mid}}}=1.29_{-0.54}^{+0.23}\;{{R}_{\oplus }}$, where we list the median as the nominal value and quote the $34.1\%$ credible interval on either side of the median as the uncertainties. The maximum probability (mode) is ${{R}_{{\rm mid}}}=1.47\;{{R}_{\oplus }}$, and $1.62\;{{R}_{\oplus }}$ is the 95th percentile. Iso-probability contours in ${{R}_{{\rm mid}}}$${\Delta }R$ space encompassing 68.27, 95.45 and 99.73% of the posterior probability include scenarios with ${{R}_{{\rm mid}}}$ up to $1.70$, $1.79$ and $1.87\;{{R}_{\oplus }}$, respectively.

The width ${{{\Delta }}_{R}}$ of the transition between rocky and non-rocky planets is consistent with an abrupt transition $\left( {{{\Delta }}_{R}}=0 \right)$. The mode of the marginalized posterior PDF is indeed found at ${{{\Delta }}_{R}}=0$. The distribution of ${{{\Delta }}_{R}}$ is very broad, with a median and $1\sigma$ confidence interval of ${{{\Delta }}_{R}}=1.28_{-1.04}^{+1.34}\;{{R}_{\oplus }}$. Notably, the maximum probability solution of the linear-transition model (${{R}_{{\rm mid}}}=1.47\;{{R}_{\oplus }}$, ${\Delta }R=0.02\;{{R}_{\oplus }}$) nearly coincides with the best fit of the step-function model.

With the added flexibility of this two-parameter linear transition model, the 68.27% $(1\;\sigma )$ and 95.45% (2 σ) credible regions include solutions with rocky planets extending to larger radii as compared to the step-function model. In the linear transition model, the maximum radius achieved by rocky planets (the smallest radius at which ${{f}_{2\alpha }}=0$) is ${{R}_{{\rm mid}}}+\frac{1}{2}{{{\Delta }}_{R}}$. Iso-probability contours in ${{R}_{{\rm mid}}}$${\Delta }R$ space encompassing 68.27, 95.45 and 99.73% of the posterior probability include values of ${{R}_{{\rm mid}}}+\frac{1}{2}{{{\Delta }}_{R}}\leqslant 2.27\;{{R}_{\oplus }}$, $2.64\;{{R}_{\oplus }}$ and $2.95\;{{R}_{\oplus }}$, respectively. We similarly set upper limits on the planet radius at which no more than 5% of planets are dense enough to be rocky of $2.15\;{{R}_{\oplus }}$ (68.27%), $2.46\;{{R}_{\oplus }}$ (95.45%), and $2.78\;{{R}_{\oplus }}$ (99.73%).

The posterior distribution of ${{f}_{{\rm rocky}}}$ conditioned on planet radius, $p\left( {{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}},\left\{ {{{\boldsymbol{D}} }_{n}} \right\}_{n=1}^{N} \right)$, quantifies the constraints that the Kepler planet candidates with Keck RV follow-up place on the fraction ${{f}_{{\rm rocky}}}$ of planets that are dense enough to be rocky as a function of planet size (Figure 5). We construct the conditional posterior distribution of ${{f}_{{\rm rocky}}}$ by sampling from the posterior distribution of ${{{\boldsymbol{ \alpha }} }_{2}}$. We carefully chose the priors on ${{{\boldsymbol{ \alpha }} }_{2}}\equiv \left\{ {{R}_{{\rm mid}}},{{{\Delta }}_{R}} \right\}$ to obtain (nearly) flat priors on the fraction of planets that are rocky at a given Rp, ${{p}_{0}}\left( {{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}} \right)$; for any ${{R}_{p}}\leqslant {{R}_{{\rm max} ,{\rm rocky}}}$, the priors on ${{f}_{{\rm rocky}}}$ are flat for $0\lt {{f}_{{\rm rocky}}}\lt 1$, but there is some "pile-up" of prior probability at the specific points ${{f}_{{\rm rocky}}}=0$ and ${{f}_{{\rm rocky}}}=1$, corresponding to situations where Rp is larger (smaller) than the maximum (minimum) achieved radius of rocky (non-rocky) planets

The conditional posterior PDF $p\left( {{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}},\left\{ {{{\boldsymbol{D}} }_{n}} \right\}_{n=1}^{N} \right)$ (Figure 5) reveals statistically robust insights into the fraction of planets that are sufficiently dense to be rocky, as a function of planet size. There is an upper limit on the fraction of large planets (${{R}_{p}}\gtrsim 1.5\;{{R}_{\oplus }}$) that are dense enough to be rocky, stemming from the lack of large dense planets in the sample of Kepler planets with HIRES RV follow-up. There is also a lower limit on the fraction of small planets (${{R}_{p}}\lesssim 1.5\;{{R}_{\oplus }}$) that are in the "potentially rocky" regime, stemming from the detection of a handful of $\sim 1.5\;{{R}_{\oplus }}$ planets that are sufficiently dense to be rocky: Kepler-10b, Kepler-100b (KOI-41.02), Kepler-99b (KOI-305.01), and Kepler-406b (KOI-321.01). The fact that the non-zero lower bound on ${{f}_{{\rm rocky}}}$ extends down to Rp = 0 is a consequence of the functional form of ${{f}_{2\alpha }}$ (Equation (15)), which imposes that ${{f}_{2\alpha }}\left( {{R}_{p}} \right)$ is monotonically decreasing with planet radius. The posterior distribution of ${{f}_{{\rm rocky}}}$ at a given planet size becomes more uniform (on ${{f}_{{\rm rocky}}}$ between the Rp-dependent lower bound and 1) for smaller planet radii as ${{R}_{p}}\to 0$, since the RV-mass constraints on small planets ${{R}_{p}}\lesssim 1\;{{R}_{\oplus }}$ are weaker and contain less information on the planet compositions than the mass constraints on larger planets. The posterior distribution of ${{f}_{{\rm rocky}}}$ at ${{R}_{p}}\sim 1.5\;{{R}_{\oplus }}$(in the neighborhood of the best-fit radius threshold in the step-function model, ${{{\Delta }}_{R}}=0$) has significant probability density across the full range of $0\leqslant {{f}_{{\rm rocky}}}\leqslant 1$.

Our results are largely unchanged when we use a logistic curve instead of a linear function to model a smoother transition between the size regime where most planets are "potentially rocky" and the size regime where most are volatile-rich

We adopted flat priors on $-0.5{{R}_{{\rm max} ,{\rm rock}}}\lt {{R}_{{\rm mid}}}\lt 1.5{{R}_{{\rm max} ,{\rm rock}}}$ and $0\lt {{{\Delta }}_{R}}\lt {{R}_{{\rm max} ,{\rm rock}}}$. The constraints on the midpoint of the rocky/non-rocky transition obtained with this logistic curve model are very similar to those obtained for the linear transition: ${{R}_{{\rm mid}}}=1.27_{-1.24}^{+0.24}\;{{R}_{\oplus }}$, with a mode of $1.48\;{{R}_{\oplus }}$ and a 95% upper bound of $1.60\;{{R}_{\oplus }}$.

Is there evidence in the data for a gradual transition in radius between planets that are sufficiently dense to be rocky, and those that are not? To assess this, we evaluate the Bayesian evidence for each model,

The simpler one-parameter step function rocky/non-rocky transition model (${{E}_{1}}=4.1\times {{10}^{-68}}$) is mildly favored over the gradual linear transition and logistic transition models (${{E}_{2}}=8.2\times {{10}^{-69}}$ and ${{E}_{3}}=6.2\times {{10}^{-69}}$, respectively). The improvement in the fit in the gradual transition models does not justify the addition of another parameter. This does not mean that there is not some radius range in which both rocky and non-rocky planets co-exist, but rather that more mass–radius measurements of small planets ${{R}_{p}}\lt 2\;{{R}_{\oplus }}$ are needed to conclusively discern any structure in the transition between rocky and volatile-rich planet populations.

### 4.3. Incident Flux-dependent Rocky/Non-rocky Radius Threshold

We turn now to exploring whether the transition between planets that are dense enough to be rocky depends on the amount of radiative energy the planet is receiving from its star, Fp. We adopt a generalized step-function model, where the radius threshold depends linearly on ${\rm log} {{F}_{p}}$,

In this parameterization, ${{R}_{{\rm thresh}0}}$ is the radius threshold at ${{F}_{p0}}=100{{F}_{\oplus }}$ (a characteristic median flux for the planets in our sample). We take flat priors on $0\lt {{R}_{{\rm thresh}0}}\lt {{R}_{{\rm max} ,{\rm rocky}}}$ and $-{{R}_{\oplus }}\lt R_{{\rm thresh}}^{\prime }\lt {{R}_{\oplus }}$.

Given the current sample of planets with measured masses and radii, there is no statistically robust evidence for an incident-flux dependence in the radius threshold between planets that are dense enough to be rocky and those that are not. We present the joint posterior PDF of ${{R}_{{\rm thresh}0}}$ and $R_{{\rm thresh}0}^{\prime }$ in Figure 6. The slope of the rocky/non-rocky threshold is consistent with zero (i.e., no dependence on incident flux) and exhibits a slight preference for positive values. Marginalizing over ${{R}_{{\rm thresh}0}}$, we find $R_{{\rm thresh}}^{\prime }=0.11_{-0.12}^{+0.35}\;{{R}_{\oplus }}$. The preference for $R_{{\rm thresh}0}^{\prime }\gt 0$ may be expected if more massive planet cores manage to retain their volatiles at higher levels of irradiation.

The Bayesian evidence, ${{E}_{4}}=8.3\times {{10}^{-69}}$, implies that the flux-dependent rocky/non-rocky transition is less favored than the simple one-parameter step-function model in radius (which does not have any dependence on incident flux). The lack of evidence for an incident flux-dependence in the fraction of planets of a given size that are dense enough to be rocky does not mean that incident flux has no effect on planet compositions. For sub-Neptune-sized planets with H/He envelopes, the planet radius is very sensitive to the planet gas mass fraction, but less sensitive to the total planet mass (Rogers et al. 2011; Lopez & Fortney 2014). Highly irradiated planets may lose their volatile envelopes to atmospheric escape over time, converting larger non-rocky planets (having large Rp with ${{f}_{{\rm rocky}}}\left( {{R}_{p}} \right)\sim 0$) into smaller rocky planets (having small Rp with ${{f}_{{\rm rocky}}}\left( {{R}_{p}} \right)\sim 1$). To leading order, mass loss would have a more pronounced effect on the radius distribution of planets (the relative number of planets at each radii, Owen & Wu 2013) than on the fraction of planets at specified size that are sufficiently dense to be rocky.

## 5. DISCUSSION

### 5.1. Sensitivity to the Chosen Planet Sample

We have used the Kepler-discovered planets with Keck HIRES RV follow-up as our sample to constrain the fraction of planets that are sufficiently dense to be rocky, as a function of planet size. The sample employed in the analysis comprises 27 planet systems. How sensitive are the results to the chosen sample, for instance, to adding or removing a planetary system? Is a single planet system dominating the constraints?

To assess how sensitive the posterior is to the chosen sample of planet systems, a bootstrapping analysis is used. In each of 1000 iterations, N = 27 planet systems are sampled with replacement from the original sample of 27 planet-hosting Kepler+HIRES targets and the hierarchical analysis with the one-parameter step-function model is repeated. Figure 3 shows the extent to which the posterior probability density of ${{R}_{{\rm thresh}}}$ varies in the bootstrapping analysis; at each value of ${{R}_{{\rm thresh}}}$, the blue shaded region in Figure 3 encompasses 68.3% of the bootstrapped values of $p\left( {{R}_{{\rm thresh}}}\left| {} \right.{\rm data} \right)$. Among the bootstrapped samples, median values of ${{R}_{{\rm thresh}}}$ span 1.43 to $1.52\;{{R}_{\oplus }}$ while the 95% percentile upper bounds on ${{R}_{{\rm thresh}}}$ span 1.54 to $1.67\;{{R}_{\oplus }}$, where the 34.1% percentiles above and below the median are quoted. The span of the bootstrapped posterior density of ${{R}_{{\rm thresh}}}$ along with the span of the median and 95% percentiles of ${{R}_{{\rm thresh}}}$ do not have a formal meaning in Bayesian statistics (e.g., they should not be interpreted as a credible intervals), but are instead presented to illustrate, in a rough sense, that our results are largely insensitive to adding/removing planet systems from the sample considered.

For many planets in the sample (especially those at ${{R}_{p}}\lesssim 1\;{{R}_{\oplus }}$), the RV semi-amplitude constraints contain very little information about the planet composition, spanning the range of physically reasonable masses. Our results are not sensitive to removing these planets. Eliminating from the analysis of the one-parameter step-function model planets that span Mp = 0 and ${{M}_{{\rm rock},{\rm max} }}\left( {{R}_{p}} \right)$ within their 1 σ error bars (namely, KOI 70.04, 70.05, 82.03, 82.04, 82.05, 116.04, 245.02, 245.03, 321.02, 1612.01), a median of ${{R}_{{\rm thresh}}}=1.48\;{{R}_{\oplus }}$ and 95% percentile of ${{R}_{{\rm thresh}}}=1.59\;{{R}_{\oplus }}$ are obtained (identical to the values obtained for the full planet sample to within the quoted precision). We emphasize that to avoid a Malmquist bias toward higher-densities, planets with RV upper limits should be included in the analysis, as we have done. Several RV non-detections in the sample contain valuable information about the planet composition, constraining the planet to be volatile-rich.

### 5.2. Threshold Mass–Radius Relations for Rocky Planets

We now turn to quantifying the effect of considering a more restrictive range of possible compositions (and hence masses) for rocky planets. The assumptions we have made to date concerning the range of plausible planet masses for rocky planets of a given size are very inclusive. The pure silicate, and pure iron compositions adopted as the low- and high-density extremes for rocky planets are extreme end-member scenarios. The photospheres of planet-hosting stars have Fe/Si abundance ratios near 1 (ranging from 0.5 to 1.3) (Grasset et al. 2009). Further, metallic iron and silicates have similar condensation temperatures in the protoplanetary disk (e.g., Petaev & Wood 2005), and are expected to concomitantly condense to form solids and to contribute together to the bulk material forming a rocky planet (Valencia et al. 2007).

Increasing the density (iron fraction) of the limiting low-density composition assumed for rocky planets tends to decrease the inferred fraction of planets of a given size that are sufficiently dense to be rocky. This leads to even stronger upper bounds on the planet radius above which most planets are not rocky. When we take an Earth-like composition (with 32% Fe core and 68% silicate mantle, by mass) as the limiting low-density composition for rocky planets in the one-parameter step-function model we find a median value of ${{R}_{{\rm thresh}}}=1.43_{-0.80}^{+0.05}\;\;{{R}_{\oplus }}$, and a 95% confidence upper bound of ${{R}_{{\rm thresh}}}=1.53_{-0.04}^{+0.08}\;{{R}_{\oplus }}$ (compared to ${{R}_{{\rm thresh}}}=1.48_{-0.05}^{+0.04}\;{{R}_{\oplus }}$ and a 95% confidence upper bound of ${{R}_{{\rm thresh}}}=1.59_{-0.05}^{+0.18}\;{{R}_{\oplus }}$ assuming a pure-silicate limiting composition). The uncertainties quoted on the percentiles of ${{R}_{{\rm thresh}}}$ span 34.1% of the values above and below the median evaluated from 1000 bootstrapping samples. The threshold mass corresponding to the limiting threshold radius increases to $4.0\;{{M}_{\oplus }}$ (median) and $5.0\;{{M}_{\oplus }}$ (95% upper bound) for an Earth-like composition.

Decreasing the density (iron fraction) of the limiting high-density composition assumed for rocky planets tends (i) to increase the threshold radius of the rocky/non-rocky transition, and (ii) to broaden the constraints on the fraction of planets of a given size that are dense enough to be rocky. Decreasing ${{M}_{{\rm rock},{\rm max} }}\left( {{R}_{p}} \right)$ extends the region of planet mass–radius space that is ruled physically implausible in the prior, $p\left( {{{\boldsymbol{ \beta }} }_{n}}\left| {} \right.{\boldsymbol{ \alpha }} \right)$. Since small-radius solutions are preferentially disfavored, this has the effect of systematically increasing the value inferred for the "true" radius of each planet relative to that inferred based on the non-informative interior prior, ${{p}_{0}}\left( {{\beta }_{n}} \right)$. The value of ${{p}_{{\rm rocky}}}$ for each planet is also systematically decreased. Marcus et al. (2010) used numerical simulations of giant impacts to compute a minimum radius for iron-rich rocky planets formed by collisional mantle stripping of differentiated planets with an initial 0.33 Fe core mass fraction. Adopting the Marcus et al. (2010) minimum radius relation for rocky planets in the one-parameter step-function model, we find a median value of ${{R}_{{\rm thresh}}}=1.53_{-0.05}^{+0.34}\;{{R}_{\oplus }}$, and a 95% confidence upper bound of ${{R}_{{\rm thresh}}}=1.96_{-0.36}^{+0.14}\;{{R}_{\oplus }}$.

When the reduced upper and increased lower limits on the density of rocky planets are adopted simultaneously, the location of the rocky/non-rocky threshold shows good agreement with that obtained from the nominal limits (which assume pure iron and pure silicate). Taking the Marcus et al. (2010) minimum radius relation and an Earth-like composition as the limiting maximum radius relation for rocky planets in the one-parameter step-function model, we find a median value of ${{R}_{{\rm thresh}}}=1.48_{-0.56}^{+0.07}{{R}_{\oplus }}$, and a 95% confidence upper bound of ${{R}_{{\rm thresh}}}=1.62_{-0.08}^{+0.67}{{R}_{\oplus }}$.

### 5.3. Prior Assumptions

As in any work relying on Bayesian inference, some priors must be chosen. In this section, we discuss our priors and the sensitivity of our results to these choices.

Within the each of the "potentially rocky" and "non-rocky" regimes, we have taken a flat prior PDF on the planet mass (Equation (11)). Throughout the analysis, only the relative weights given to the two regimes at a given Rp has been adjusted; the assumption of a flat prior on the planet mass within each regime has not been varied. The most natural alternative to the semi-flat mass prior assumption would be to treat the planet mass as a scale parameter and to adopt a flat prior on ${\rm log} {{M}_{p}}$. With the RV mass constraints in hand, however, we cannot accurately assess the effect of adopting a flat prior on ${\rm log} {{M}_{p}}$ in a quantitative way; our importance resampling approach breaks down for the $p\left( {{M}_{nj}}\left| {} \right.{{R}_{p}} \right)\propto 1/{{M}_{nj}}$ target prior due to insufficient support at low Mp from the interim prior PDF adopted by Marcy et al. (2014). Qualitatively, however, we expect that adopting a flat prior in ${\rm log} {{M}_{p}}$ at a given radius would serve to further strengthen our main conclusion that most 1.6 ${{R}_{\oplus }}$ planets are not rocky, by adding additional statistical weight at low planet masses (and hence low planet densities).

In this work, we have parameterized the mass–radius distribution of planets in terms of the fraction of planets or a given size that are rocky, ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$. We have assumed simple functional forms for ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$ that depend on a few free parameters ${\boldsymbol{ \alpha }}$ as well as on properties of the planet–star system. Exploring four different models for ${{f}_{{\rm rocky}}}\left| {} \right.{{R}_{p}}$, we have shown that our main results are robust against the particular choice of parameterization. The use of low-parameter functional forms for ${{f}_{{\rm rocky}}}$, however, imposes monotonically decreasing and smooth—except in the step function model, which is discontinuous at ${{R}_{{\rm thresh}}}$ by construction—behavior on the variations of ${{f}_{{\rm rocky}}}$ with Rp. In future work, a more generalized model for the mass–radius distribution of rocky and non-rocky planets could be employed to include more freedom, for example using a step function with $M\gt 1$ steps and a smoothness prior (Hogg et al. 2010, Equations (10) and (11) ), or a Gaussian process.

Finally, we have assumed flat priors on the population-level parameters ${\boldsymbol{ \alpha }}$ for each model explored. In each case, the posterior PDF of ${\boldsymbol{ \alpha }}$ is significantly narrower than prior probability density, demonstrating that the planet mass–radius data provide stronger constraints on the population-level parameters than do our priors.

### 5.4. Insights into Planet Formation

We have shown that, at planet radii of $1.6\;{{R}_{\oplus }}$ and above and orbital periods below ~50 days, rocky planets without a gas envelope are less common than planets of the same size with volatile envelopes, at 95% confidence. This largely empirical result is in agreement with the expectations from both core nucleated accretion and rocky planet formation theories. For an Earth-like composition, $1.6\;{{R}_{\oplus }}$ corresponds to $6\;{{M}_{\oplus }}$.

Core nucleated accretion models predict that rocky cores of $6\;{{M}_{\oplus }}$ imbedded in a gas disk will accrete gas. Calculations by Ikoma & Hori (2012) estimate that a $6\;{{M}_{\oplus }}$ core could accrete $\gtrsim 0.3\%$ H/He by mass when imbedded in a minimum-mass-solar-nebula protoplanetary disk with a nebular temperature ${{T}_{d}}=550\;{\rm K}$ and a disk lifetime of 100 kyr. The mass fraction of H/He accreted would be even higher for lower disk temperatures, longer disk dissipation timescales, and less grain opacity. Thus, $1.6\;{{R}_{\oplus }}$ protoplanet cores that assembled before the dissipation of the protoplanetary disk are expected to have acquired a H/He envelope which (if retained) would substantially increase their observed transit radii to $\gtrsim 1.9\;{{R}_{\oplus }}$. More recent simulations by Bodenheimer & Lissauer (2014) have revealed that even planets with core masses of $\sim 2.2\;{{M}_{\oplus }}$ may accrete, within 2.0 Myr, $0.037\;{{M}_{\oplus }}$ of H/He at 0.5 AU and $0.16\;{{M}_{\oplus }}$ of H/He at 2.0 AU.

Our results constrain the fraction of close-in planets of a specified size that assembled after the dissipation of the protoplanetary gas disk. In the standard rocky planet formation scenario (wherein planet embryos grow through runaway and then oligarchic accretion of planetesimals, before finally accumulating into terrestrial planets through a chaotic late-stage of embryo–embryo collisions, see e.g., Raymond et al. 2013), the assembly of rocky planets continues after the gas disk has dissipated. Planets formed through this pathway have masses and radii in the high-density "potentially rocky" regime (${{M}_{{\rm rock},{\rm min} }}({{R}_{p}})\leqslant {{M}_{p}}\leqslant {{M}_{{\rm rock},{\rm max} }}({{R}_{p}})$) since they would not accrete primary H/He envelopes, and could only accrete water-rich material that is scattered into the terrestrial planet formation region from beyond the snow-line (e.g., by the migration of giant planets). Simulations by Carter-Bond et al. (2012) predict that most rocky planets retain less than 10 Earth oceans of water delivered by migration (including both surface water and water incorporated into the mantle). Formation of massive rocky planets $\left( 6\;{{M}_{\oplus }} \right)$ would require a disk with a high surface mass density of solids (relative to the minimum mass solar nebula). The characteristic mass-scale of rocky planets (derived by assuming that each planet accretes all of the condensed material in an annulus centered on its orbit of width proportional to the planet Hill sphere) scales with the disk surface mass density in solids, σ, as ${{M}_{p}}\propto {{a}^{3}}{{\sigma }^{3/2}}M_{\star }^{-1/2}$ (Lissauer 1995). Further, N-body simulations of the giant impact stage of rocky planet formation have found that the mass of the most massive planet formed scales nearly linearly with the total mass in protoplanets (Kokubo et al. 2006). While our result that most close-in planets larger than $1.6{{R}_{\oplus }}$ are not rocky is not necessarily surprising, this is the first time that sufficient mass–radius constraints for sub-Neptune-size planets exist to extract population level composition constraints in a statistically robust way.

Current RV follow-up of close-in Kepler planets does not rule out nor definitively rule in the possibility small planets with substantial complements of low-density astrophysical ices. Improving the constraints on the density distribution of small planets on close orbits may help to resolve whether the compact close-in systems of low-density planets discovered by Kepler formed in situ (e.g, Hansen & Murray 2012; Ikoma & Hori 2012; Chiang & Laughlin 2013), or alternatively acquired their volatiles farther out in the disk and migrated in to their current locations (e.g, Rogers et al. 2011; Swift et al. 2013; Cossou et al. 2014). Low-density condensables (water and other astrophysical ices) are a tracer a planet's formation location: planets formed beyond the snow line are expected to initially contain an ice mass fraction comparable to its rock mass fraction (Lewis 1972), while planets formed on the close-in orbits (${{P}_{{\rm orb}}}\lt 50\;{\rm days}$, representative of the Kepler planets with RV follow-up) are expected to only have trace amounts of condensables (e.g., Raymond et al. 2008, Table 1). Small radius planets on close-in orbits would lose any envelopes of light gases (H and He) on short timescales ($\lesssim 1{\rm Myr}$) (Rogers et al. 2011; Lopez & Fortney 2013). Thus, a small planet with a bulk density below that of silicate ($3.3\;{\rm g}\;{\rm c}{{{\rm m}}^{-3}}$ uncompressed or $5.6\;{\rm g}\;{\rm c}{{{\rm m}}^{-3}}$ mean compressed density at $1.5\;{{R}_{\oplus }}$) would be a clear signature of astrophysical ices and an initial formation location beyond the snow-line. Stronger RV detections and upper limits on the masses of planets with radii (${{R}_{p}}\lt 1.5\;{{R}_{\oplus }}$) are needed.

Our analysis focusses on the current composition of planets observed today. We expect that the fraction of planets of a given radius that are rocky was even smaller in the past, since volatile loss processes outpace volatile sources. Close-in sub-Neptunes lose volatiles over time to photo-evaporation (e.g., Lecavelier Des Etangs 2007; Valencia et al. 2010; Rogers et al. 2011; Boué et al. 2012; Lopez et al. 2012). In contrast, mechanisms to replenish a lost envelope are not predicted to provide volatiles in quantities sufficient to substantially increase the transit radius. Rates of volcanism per unit mass on rocky Earth-like exoplanets are not expected to exceed 10 times that of the present-day Earth ($1.7\times {{10}^{-11}}\;{\rm y}{{{\rm r}}^{-1}}$; Best & Christiansen 2001) for planets older than 2 Gyr (Kite et al. 2009). Further, late delivery of volatiles by impacting comets will not contribute sufficient volatiles to produce an observable effect on the transit radius, while large impactors (with diameters larger than the atmospheric scale height, e.g., Ahrens 1993) will erode the planet atmosphere.

### 5.5. The Nature of Sub-Neptune-size Kepler Planet Candidates

Our hierarchical Bayesian analysis gives insights into the nature of the thousands of transiting Kepler planet candidates that do not have measured masses. Based on the sample of Kepler planets with RV follow-up, we found that most planets larger than $1.6\;{{R}_{\oplus }}$ are so low-density that a volatile envelope must contribute significantly to their transit radius. The Kepler mission developed a working nomenclature for planets, based solely on their radii; describing planets $\lt 1.25\;{{R}_{\oplus }}$ as Earth-size, 1.25–$2.0\;{{R}_{\oplus }}$ as Super Earth-size, and 2–$6\;{{R}_{\oplus }}$ as Neptune-size (e.g., Borucki et al. 2011). Our results (Figure 5) provide quantitative estimates of the fraction of planets in each of these ranges that are sufficiently dense to be rocky. One of the primary science goals of the Kepler mission is to calculate the occurrence rate of Earth-like planets in the habitable zones of Sun-like stars, ${{\eta }_{\oplus }}$. We suggest that the operational definition of "Earth-like" focus on planets with ${{R}_{p}}\lesssim 1.6\;{{R}_{\oplus }}$, to consider planets with a significant probability of having a rocky composition.

The limits on the fraction of planets of a given size that are dense enough to be rocky derived in this work should be regarded as upper bounds; it is likely that a smaller fraction of planets of any size are rocky. We have specifically investigated the fraction of planets that are sufficiently dense to be rocky (i.e., more dense than an iron-poor, pure silicate composition). Planets sufficiently dense to be rocky may still harbor a thick envelope of volatiles that contributes to its transit radius, if the volatiles are offset by a more iron-rich make-up for the rocky component of the planet.

Our analysis does not preclude the possibility of large rocky planets. We have assumed a smooth functional for how the fraction of planets that are dense enough to be rocky depends on planet size. With the current sample of planet mass and radius measurements, we do not capture complex structures in the planet mass–radius distribution. Massive rocky planets larger than 1.6 ${{R}_{\oplus }}$ on close orbits may still exist, but they are the exception rather than the rule.

The current sample of transiting Kepler planets with RV follow-up is limited to planets on close-in orbital periods ($P\lt 50\;{\rm days}$). We might naively anticipate that the fraction of planets that are rocky will decrease at greater separations from the host star. At longer orbital periods, planets are less irradiated by their stars and will experience less photoevaporation. Additionally, the temperature in the protoplanetary disk decreases with distance from the star, making it easier for planetary embryos farther out to accrete primordial H/He envelopes. Reality is likely more complicated, however, and this naive expectation may break down. For instance, if the close-in compact systems of volatile-rich low-mass/low-density planets discovered by Kepler formed by Type I inward migration that stalled at the inner edge of the gas disk, rocky planets formed after the dissipation of the gas disk could be a more important fraction of the planet population further out $\left( \sim 1\;{\rm AU} \right)$. Exoplanetary science is a data-driven field, and we ultimately must push to measure the masses and radii of planets at longer orbital periods to get a clearer picture of small planet composition demographics and to extrapolate ${{f}_{{\rm rocky}}}$ to the habitable zone.

Kepler-22b, the first habitable zone planet with measured radius ($2.4\;{{R}_{\oplus }}$ Borucki et al. 2012), is an illustrative example of how the sample of Kepler planets with HIRES RV follow-up affects our prior assumptions on the nature of transiting planets with unknown masses. When it was first announced in 2012, the RV mass upper limit on Kepler-22b (36, 82 and $124\;{{M}_{\oplus }}$ $1\sigma$, $2\sigma$, and $3\sigma$ upper limits) did not substantially constrain the composition, and a rocky planet scenario for Kepler-22b was considered a serious possibility (Borucki et al. 2012). Even with subsequent RV follow-up that has further constrained Kepler-22b's mass (42, 54 and $66{{M}_{\oplus }}$ $1\sigma$, $2\sigma$, and $3\sigma$ upper limits) the nature of Kepler-22b is ambiguous; based on its individual mass–radius constraints ${{p}_{{\rm rocky}}}=0.75$. Placed in the context of the other $\sim 2.4\;{{R}_{\oplus }}$ planets with mass measurements (all of which are constrained to have low density), we find that Kepler-22b is likely not rocky, instead having a volatile envelope that contributes significantly to its volume. Within the constraints of the linear-transition model, ${{f}_{{\rm rocky}}}\left( 2.4\;{{R}_{\oplus }} \right)\lt 0.02$ at higher than 95% confidence.

Kepler-10c is another notable planet in our sample of Kepler-discovered planets with HIRES RV mass constraints. Intensive HARPS-N RV-follow-up measured the planet mass to be ${{M}_{p}}=17.2\pm 1.9\;{{M}_{\oplus }}$ (Dumusque et al. 2014). Even with its high reported density $\left( {{\rho }_{p}}=7.1\pm 1.0\;{\rm g}\;{\rm c}{{{\rm m}}^{-3}} \right)$, Kepler-10c mass and radius are inconsistent with a rocky composition by more than $1\;\sigma$; it must have a substantial volatile envelope (of astrophysical ices or H/He) that contributes to its transit radius. Based on the mass and radius quoted from Dumusque et al. (2014), ${{p}_{{\rm rocky}}}\sim 0.1$; Kepler-10c is likely not rocky nor solid. This is consistent with our finding that at a radius of $2.3\;{{R}_{\oplus }}$, rocky planets are rare relative to low-density volatile-rich planets.

## 6. CONCLUSIONS

We have developed a hierarchical Bayesian approach to constrain the fraction of planets that are sufficiently dense to be rocky as a function of planet size from a sample of transiting planets with mass constraints. Applying this approach to the sample of Kepler planets with Keck HIRES RV follow-up, we have shown that at any radius equal to or larger than $1.62\;{{R}_{\oplus }}$, the majority (50% or more) of planets of that size have too low density to be comprised of Fe and silicates alone (at 95% statistical confidence). With the current sample of Kepler planets having Keck HIRES RV follow-up, we can neither distinguish between an abrupt transition and gradual transition from rocky to "volatile-shrouded" planets, nor conclusively identify a dependence on planet irradiation. More planet mass–radius measurements with smaller error bars and well quantified selection effects are needed to constrain the structure of the transition in between rocky and non-rocky planets.

Our constraints on the radii above which most planets have too low density to be composed of iron and silicate alone provide a useful empirical constraint for planet formation theories. These results give insights into the masses and compositions of the remaining sub-Neptune sized Kepler planet candidates that orbit stars which are too faint for RV follow-up, and motivate an operational definition of "Earth-like" that can be used to calculating the occurrence rate of Earth-analogs, ${{\eta }_{\oplus }}$. Our conclusions are the result of a largely model-independent statistical interpretation of empirical data. The only planet interior structure models that entered into our analysis were those defining the limiting low-density and high-density mass–radius relations for rocky planets.

With larger numbers of planets having constraints on both their mass and radius, one could eventually include extra parameters in the model to constrain the mass–radius distribution of planets as a function of planet orbital period/incident flux, stellar mass, and planet multiplicity. Each new extra dimension of characterization promises to yield additional insights into planet formation and evolution. For the current sample of planet mass–radius measurements, a more sophisticated treatment of selection effects in the analysis of the current sample of planet masses and radii may provide an avenue to move beyond exploring conditional mass/composition distributions at specific planet radius, to the joint mass–radius–incident flux distribution. Modeling the selection effects will be messy due to the ever-evolving criteria applied to select Kepler planet candidates for RV follow-up, however, and will be the subject of a future paper.

The future is bright for small planet mass–radius measurements. Continued Doppler monitoring of Kepler planets with Keck HIRES and HARPS-N will improve the planet mass constraints. Gaia will be of great help by measuring the distances to Kepler targets and thereby reducing the uncertainties on the host star and planet radii. The Transiting Exoplanet Survey Satellite (TESS), scheduled to launch in 2017, will find transiting planets around bright stars that are more amenable to RV follow-up than the typical Kepler target. Moving forward into the TESS-era, adopting an algorithmic approach with pre-determined criteria to select TESS transiting planet candidates for RV follow-up would facilitate subsequent statistical inferences about the underlying composition distribution of planets from the measured masses and radii. Such a survey strategy would help to leverage as much information as possible from RV marginal detections and RV non-detections. Improving constraints on the composition distribution of small planets would benefit from devoting a pre-determined fraction of TESS RV follow-up time to a mass–radius survey that is allocated separately and therefore buffered against the inevitable competing pressures to follow-up high-impact individual systems.

The hierarchical Bayesian model approach that we have outlined for constraining the composition distribution planets has several strengths: (i) it directly couples interior structure models to the the mass–radius posterior distributions output from the analysis of transit and RV data; (ii) it uses the information contained in marginal detections and non-detections; and (iii) it can naturally be extended to account for survey selection effects. These features of our model approach will become even more crucial as planet hunters continue to push toward smaller, less-massive planets at longer orbital periods near the sensitivity limits of the RV and transit techniques.

I would like to especially thank Geoff Marcy for extensive discussions and advice, Howard Isaacson for sharing samples from his MCMC fits to the Keck RV data, and an anonymous reviewer whose comments helped to improve this manuscript. I am also grateful for helpful discussions with Lauren Weiss, Rebekah Dawson, Tim Morton, Brice-Olivier Demory, Brian Jackson, Jack Lissauer, Tom Loredo, Eric Ford, David Hogg, and John Johnson. I acknowledge support provided by NASA through Hubble Fellowship grant #HF-51313 awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS 5–26555. This work also benefited from the Summer Program on Modern Statistical and Computational Methods for Analysis of Kepler Data, held at SAMSI, Research Triangle Park, NC in 2013 June. This research has made use of the Exoplanet Orbit Database and the Exoplanet Data Explorer at exoplanets.org.