Comparison of Machine-learning and Bayesian Inferences for the Interior of Rocky Exoplanets with Large Compositional Diversity

In previous work, we demonstrated that machine-learning techniques based on mixture density networks (MDNs) are successful in inferring the interior structure of rocky exoplanets with large compositional diversity. In this study, we compare the performance of a well-trained MDN model with the conventional Bayesian inversion method based on the Markov chain Monte Carlo (MCMC) method, under the same observable constraints. Considering that MCMC inversion is generally performed with the prior knowledge of planetary mass, radius, and bulk molar ratios of Fe/Mg and Si/Mg, we regenerate a substantial data set of interior structure data for rocky exoplanets and train a new MDN model with inputs of planetary mass, radius, Fe/Mg, and Si/Mg. It has been found that the well-trained MDN model has comparable performance to that of the MCMC method but requires significantly less computation time. The MDN model presents a practical alternative to the traditional MCMC method, surpassing the latter with minimal requirements for specialized knowledge, faster prediction, and greater adaptability. The developed MDN model is made publicly available on GitHub for the broader scientific community’s utilization. With the advent of the James Webb Space Telescope, we are ushering in a new epoch in exoplanetary explorations. In this evolving landscape, the MDN model stands out as a valuable asset, particularly for its ability to rapidly assimilate and interpret new data, thereby substantially advancing our understanding of the interior and habitability of exoplanetary systems.


Introduction
Recently scientists announced the discovery of six new exoplanets, bringing the total number of confirmed exoplanets to 5502. 4 This milestone represents another significant step in our journey to understand the worlds beyond our solar system.The characterization of exoplanets has become a fascinating field of research, where researchers are particularly interested in understanding the physical and chemical properties of planetary atmospheres (e.g., Madhusudhan 2018Madhusudhan , 2019;;Lustig-Yaeger et al. 2023) and the interior structure of rocky exoplanets (e.g., Noack & Lasbleis 2020;Unterborn et al. 2023), as both provide valuable information on planetary formation and evolution processes and ultimately determine planetary habitability.Given the current capabilities of groundand space-based telescopes, the primary measurable characteristics for exoplanets are their mass and radius.Theoretical mass-radius curves offer initial insights into the possible bulk composition and internal structure of an exoplanet based on these two types of observation (Valencia et al. 2006;Spiegel et al. 2014;Unterborn et al. 2016;Zeng et al. 2016).However, mass and radius alone do not uniquely determine an exoplanet's interior properties.Mass-radius curves are generated from models that make assumptions about the materials that compose a rocky exoplanet (e.g., iron, silicates, water, gases) and how they are structured internally (e.g., core, mantle, outer layers).Different combinations of the composition and structure can produce degenerate mass-radius relationships (Rogers & Seager 2010;Dorn et al. 2015).Therefore, they cannot definitively constrain the interior structure of observed exoplanets and show large degeneracies.
Bayesian inference has recently been proven to be successful in deducing the interior structures of exoplanets by providing complete probability distributions for model parameters of interest, such as layer thicknesses and core properties, with a prior knowledge of some constraints (Gelman et al. 2013;Dorn et al. 2015;Liu & Ni 2023).This allows for a more robust quantification of uncertainty in the interior structure.However, these Bayesian calculations are typically performed using the computationally intensive and time-consuming Markov chain Monte Carlo (MCMC) method.As a result, hundreds of thousands of interior structure models must be computed, leading to inference times that can range from hours to days for a single exoplanet.Additionally, integrating the physical interior models with the MCMC sampling scheme requires specialized frameworks and expertize in planetary interior modeling, which may hinder widespread adoption of Bayesian techniques for exoplanetary interiors.Given the rapidly expanding catalog of exoplanet observations, the development and implementation of more efficient and powerful inference methodologies become imperative.
In recent years, machine learning (ML) has emerged as an indispensable tool for modeling complex data sets in diverse domains.The development of algorithms such as deep neural network, random forest, and convolutional neural network has propelled ML to achieve remarkable performance in areas ranging from computer vision to natural language processing.Its application in planetary science has been equally transformative, addressing challenges such as stellar classification (Kuntzer et al. 2016), exoplanet detection (Schanche et al. 2019), and atmospheric retrieval of exoplanets (Márquez-Neila et al. 2018).One particular type of neural network (NN), the mixture density network (MDN), has proven especially effective for the inverse problem of inferring the interiors of exoplanets.The MDN algorithm uniquely combines nonlinear regression with density estimation, using a mixture model to accurately represent complex output distributions (Bishop 1994).This approach is advantageous over a traditional NN because it outputs probabilistic results, thereby capturing inherent uncertainties more effectively.Baumeister et al. (2020) pioneered the application of MDNs to probabilistically infer layer thickness distributions for exoplanets of up to 25 Earth masses, achieving inference times of milliseconds, rather than hours for traditional inversion methods.Based on this, Zhao & Ni (2021, 2022) expanded the MDN framework to simultaneously predict layer thicknesses and core properties for a wide range of planets, including Earth-like planets and gas giants.Baumeister & Tosi (2023) developed an ML model, ExoMDN, for predicting the thickness of various layers and mass distributions, based on observed parameters such as mass, radius, equilibrium temperature, and tidal Love number k 2 .The evolution of ML applications in planetary science highlights its potential to accelerate our understanding of the interiors of exoplanets.
In our previous research (Zhao & Ni 2021;Zhao et al. 2023), we have performed a preliminary analysis comparing the MDN and simple Monte Carlo (MC) sampling methods to estimate the layer thicknesses and core properties of rocky exoplanets.Our findings indicate that the predictions made by the MDN are largely consistent with those from the MC sampling combined with interior-model calculations.In this study, we will conduct a detailed comparison between the MDN model and the traditionally more robust MCMC method, specifically evaluating their accuracy and efficiency across a spectrum of rocky exoplanets with a wide range of diverse compositions.This comparison is intended to offer deeper insights into the potential of ML as a viable and efficient alternative to traditional Bayesian methods for probabilistic inference of exoplanetary interiors.
This paper is structured in the following way.In Section 2, we provide details on the MDN and MCMC methods, as well as elaborate on how we construct an interior structure model to generate the data set utilized for training the MDN model.Section 3 evaluates the well-trained MDN model by analyzing its predictions on the test data, revealing certain limitations in accurately inferring planetary interiors.We then conduct a focused comparison between MDN and MCMC on the characterization of representative rocky exoplanets in Section 4. Discussions about the MDN and MCMC methods are given in Section 5. Finally, we conclude in Section 6.

Interior Structure Models and Training Data Sets
In this study, the same interior model is employed for MCMC inversion calculations and MDN training data sets.We primarily focus on the interior structure of exoplanets that are predominantly composed of rocky materials.To accomplish this, we utilize the open-source Python package ExoPlex, as outlined in Unterborn et al. (2023).It employs a series of differential equations to calculate a planet's radius based on its mass and bulk composition.As shown in our previous works (Zhao & Ni 2021;Zhao et al. 2023), the internal structure model consists of three compositionally distinct layers: an ironrich core, lower and upper silicate mantles, and a water-ice outer shell (Figure 1(a)).The iron core mainly includes Fe, FeS, FeSi, and FeO, with its equation of state (EOS) approximated based on the fourth-order Birch-Murnaghan EOS of liquid iron at 1 bar and 1811 K (Anderson & Ahrens 1994).The silicate mantle of rocky exoplanets generally comprises silicates containing Fe, Mg, and Si, along with minor portions of Al and Ca species.Stable mineral assemblies in the mantle are constructed using the PerPlex Gibbs free energy minimization package (Connolly 2009).The water-ice shell consists of liquid water and six high-pressure ice polymorphs: ice Ih, II, III, V, VI, and VII.Phase and thermoelastic parameters of H 2 O are determined using the Seafreeze software package (Journaux et al. 2020) for liquid water and ice forms Ih, II, III, V, and VI (Unterborn et al. 2023).The surface temperature and pressure are kept constant at T surf = 300 K and P surf = 1 bar.For more detailed settings of the parameters of the interior structure model, refer to Zhao et al. (2023).
The MDN training data sets and MCMC inversion calculations are achieved within the same range of model parameters.Fe, Si, and Mg are the most abundant refractory elements in most rocky exoplanets and in stars hosting planets, and their relative molar abundances have been explored in two main ways: (i) by assuming that the planets share the same bulk elemental ratios as their host stars (Hinkel & Unterborn 2018) and (ii) by using Earth's devolatilization trend to convert stellar abundances into planet compositions (Spaargaren et al. 2023).Hinkel & Unterborn (2018) summarized the molar ratios of Fe/Mg and Si/Mg for all stars within the Hypatia Catalog, where Fe/Mg varies from 0.2 to 1.4 and Si/Mg from 0.4 to 2.0.Spaargaren et al. (2023) applied devolatilization models to constrain the compositions of rocky exoplanets from the elemental abundances in the solar neighborhood.They found that the Mg/Si ratios in most exoplanetary mantles fall between 0.8 and 2.2 while the mantle iron contents range from 3% to 7%.In our solar system, Earth has a bulk molar Fe/Mg ratio of 0.9 and Mars possesses a ratio of Fe/Mg = 0.85.They are close to the solar value of Fe/Mg = 0.80.However, Mercury stands out with a larger core, corresponding to a high Fe/Mg ratio ranging from 3.9 to 5.8 (Unterborn & Panero 2019).Some exoplanets, such as Kepler-107 c, exhibit higher densities and are likely super-Mercuries with larger bulk Fe/Mg ratios than their host stars (Liu & Ni 2023).The averaged Fe/Mg for 5220 FGK stars is also found as 0.70 ± 0.18 (Unterborn & Panero 2019).Combining the available knowledge of the relative molar abundances of Fe, Si, and Mg, we allow bulk Si/Mg ratios to vary between 0.1 and 2.0, bulk Fe/Mg ratios between 0 and 6.0, and mantle FeO mass fractions between 0% and 20% for interior-model calculations.These follow the default composition ranges in ExoPlex and cover most possible bulk compositions of rocky exoplanets.Al and Ca are minor refractory elements in the mantle that influence the mineral phases within it, but they have little impact on the overall radius of the planet.Furthermore their relative abundances are less explored than those of Fe, Mg, and Si.As before, we have kept their relative abundances constant, matching those of Earth (Ca/ Mg = 0.06 and Al/Mg = 0.08).Following Zhao et al. (2023), the uncertainty in the amount of core light elements is neglected for simplicity by assuming the iron-rich core to be an Fe-FeS alloy with a molar fraction of 13% FeS (Sotin et al. 2007).Besides the bulk/mantle compositions of one planet, the interior model of the planet is also initialized in terms of its total mass and water mass fraction (WMF).In this work, the planet mass ranges from 0.1 to 10 Earth masses and the WMF is sampled between 0 and 0.1, covering the mass spectrum of currently observed rocky exoplanets.
For the MDN training data set, we generated approximately 3.53 million rocky planet samples, which took about 374 hr on a node with 24 CPU cores.Among these samples, the distributions of various variables are plotted in Figure 10 in Appendix A, including planet mass, radius, bulk molar ratios of Fe/Mg and Si/Mg, water radial fraction (WRF), mantle radial fraction (MRF), core radial fraction (CRF), WMF, core mass fraction (CMF), CMB pressure (P CMB ), CMB temperature (T CMB ), and static tidal Love number k 2 .

ML Algorithms
Traditional feedforward NNs, which draw inspiration from the information processing mechanisms of biological neurons in the brain, consist of multiple layers for processing data (LeCun et al. 2015).NNs typically comprise an input layer, followed by several hidden layers consisting of interconnected neurons, and finally an output layer (Figure 1(b)).Each neuron sums the weighted inputs and passes through an activation function to introduce nonlinearity: where a j denotes the activation of neuron j, w ij represents the weight, x i signifies the input, b j is the bias, and σ() is the activation function, such as sigmoid or rectified linear units.
The training of the network is achieved through a process called backpropagation, in which errors are propagated backward through the network to update weights and minimize the loss, such as mean squared error (Hecht-Nielsen 1992).Optimization algorithms such as gradient descent (Ruder 2017) are commonly employed to iteratively adjust the weights.
Neural networks offer several significant advantages, including The inputs to the MDN are observed properties of an exoplanet including mass, radius, and bulk molar ratios of Fe/Mg and Si/Mg.The outputs contain the radial fractions of each layer, the pressure and temperature of the core-mantle boundary (CMB), and the tidal Love number k 2 .The data set used to train the MDN model is generated using a three-layer interior structure model, with the layers representing the water-ice shell, silicate mantle, and iron-rich core respectively.Within the MDN architecture, hidden layers are tasked with deciphering the complex relationships between the observations and the interior parameters, which are construed as a Gaussian mixture distribution in the resultant outputs.In contrast, Bayesian inference employing the MCMC technique systematically contrasts the observed data, denoted as D, with interior models derived from prior distributions, p(Θ).This process applies MCMC sampling to determine the posterior distributions p(Θ|D) of the interior parameters.
their capacity to effectively handle nonlinear relationships and learn intricate features from the high-dimensional data.However, conventional NNs are unable to represent the multimodal uncertainties inherent in inverse problems, where a single set of input values can correspond to multiple output values.To address this, Bishop (1994) proposed a new architecture of MDNs, which combines a traditional NN with a mixture density model (Figure 1(b)).The MDN incorporates a mixture model in its output layer to provide probabilistic estimates, which can be expressed as where p(y|x) represents the conditional distribution of the output vector y given the input vector x.This is modeled as a mixture of K components.Each component π k (x) is a mixing coefficient, and p k (y|x) is the probability density function of the kth component.These component distributions can adopt various forms, such as Gaussian, logarithmic, and exponential, among others.The network is designed to map input variables to the parameters of the mixture model, enabling the representation of complex, multimodal distributions.
According to Zhao et al. (2023), the NN structure of the MDN model comprises three hidden layers, each containing 512 neurons.For the MDN layer, 20 components of the Gaussian mixture were selected.It is noted that the input and output parameters exhibit significant scale variations, ranging from 10 −2 to 10 3 .Such disparities can adversely impact training by leading to exploding and vanishing gradients, as well as feature weight bias.To mitigate these issues, feature scaling is imperative to standardize the raw data generated by the interior structure model.Following the method outlined in Zhao et al. (2023), min-max normalization is used, rescaling the raw training data to a range of [0, 1].This standardization not only aids in addressing exploding and vanishing gradients but also enables the comparison of negative log-likelihood (NLL) values across models with varying scales of inputs, thereby ensuring a consistent evaluation framework for model performance.Subsequently, the data set is divided into training and test subsets, maintaining a 9:1 ratio.The training set is used for tuning the hyperparameters of the MDN model, while the test data set is utilized for assessing the model's performance.We employ the average NLL as our loss function, which measures the discrepancy between the predicted values and the actual input values.For details on training specifics, hyperparameters such as batch size, learning rate, activation function, loss function, and optimizer selection, please refer to our previous work (Zhao et al. 2023).
In addition to employing early stopping techniques as in previous studies (Zhao & Ni 2021, 2022;Zhao et al. 2023) to prevent overfitting, this study introduces L2 regularization, implemented at each hidden layer of the NN.Our previous research indicated that while dropout effectively prevents overfitting, it requires an unusually low dropout value (such as 0.05 in Zhao et al. 2023), considerably lower than the usual setting of 0.5, to minimize the loss discrepancy between the training and validation sets.To address this, L2 regularization has been implemented with a dropout rate of 0.1.This method enhances model stability and reliability by narrowing the gap between training and validation losses.L2 regularization is applied by introducing a penalty term to the loss function, which is proportional to the square of the magnitude of the model weights.Specifically, in our model, L2 regularization with a regularization parameter λ of 0.01 is applied to each of the three hidden layers.The modified loss function incorporating L2 regularization is mathematically represented as where L is the total loss function inclusive of the regularization term, L 0 is the original loss function, and w i denotes the individual weights of the neurons in each hidden layer.By penalizing the square of these weights, L2 regularization encourages smaller weight values, thereby preventing the model from becoming overly complex and reducing the likelihood of overfitting.The implementation of L2 regularization across multiple layers effectively ensures a more robust and generalizable learning process.
The construction and training of the MDN models are conducted using Keras (Chollet 2015) and TensorFlow (Abadi et al. 2016), with the MDN layer derived from Martin & Duhaime (2019).Splitting and preprocessing of the data set are carried out using scikit-learn (Pedregosa et al. 2011).The model training is executed on hardware comprising a desktop computer running Windows, equipped with an NVIDIA GeForce RTX 3060 Ti graphics card.All runtime environments are configured using Anaconda software (https://anaconda.com).

Bayesian Inference
The MCMC algorithm is a Bayesian inference methodology that is widely utilized for solving inversion problems in planetary science and astrophysics.Previous studies (e.g., Dorn et al. 2017b;Plotnykov & Valencia 2020;Adibekyan et al. 2021;Liu & Ni 2023) have demonstrated the viability of employing this approach to determine the composition of rocky exoplanets based on their radius and mass.In general, the MCMC algorithm belongs to the family of MC methods and aims to generate a random sample of model parameters Θ that follow a desired distribution p(Θ|D) where D refers to the observational data.In this study, the model parameters are mass, water mass fraction, bulk Fe/Mg molar ratio, bulk Si/ Mg molar ratio, and mantle FeO mass fraction: Θ = [M, WMF, Fe/Mg, Si/Mg, FeO].The observational data are mass and radius: As shown in Figure 1(c), unlike simple MC methods that generate independent samples, each sample position This process constructs a Markov chain that eventually converges to a stable distribution after running for a considerable amount of time.Carefully designed transition methods between samples, such as the Metropolis-Hastings algorithm (Hastings 1970), ensure that the stable distribution of the Markov chain corresponds to the desired sampling distribution.The Metropolis-Hastings algorithm proposes a new position Y from the previous sample position X(t) by a simple multivariate Gaussian transition distribution Q(Y; X(t)) and an acceptance of the proposed position Y.The proposed position Y has a probability of q to be accepted as X(t + 1) = Y and a probability of 1 − q to be rejected as X(t + 1) = X(t).
Compared to simpler MC methods, the MCMC algorithm excels at efficiently sampling from high-dimensional distributions.However, a notable challenge in the MCMC method is to determine the convergence point of the algorithm and ensure that the chain is independent of its initial distribution.To initiate MCMC calculations, it is essential to construct the target (posterior) distribution.According to the Bayesian theorem, the posterior distribution p(Θ|D) is proportional to the product of the prior distribution and the likelihood function, as expressed in the equation Here, the prior distribution p(Θ) encapsulates our preobservation understanding of the sample, whereas the likelihood function L(D|Θ) quantifies the probability of observing the data D given a specific parameter set Θ.
In this work, the prior distribution p(Θ) is configured as follows.The bulk refractory elemental ratios in rocky exoplanets are assumed to mirror those found in their host stars.The elemental abundances of Fe, Mg, and Si in planethosting stars are generally determined from the spectra of the host stars (Adibekyan et al. 2021).For the available measurements such as mass and bulk molar Fe/Mg and Si/ Mg ratios, we adopt a Gaussian distribution sampling within the error margins of their measurements, as defined by Equation (6) in Hinkel & Unterborn (2018).The WMF is randomly sampled between 0 and 0.1, and the mantle FeO mass fraction is randomly sampled between 0 and 20%.Detailed information on the prior distributions p(Θ) of the model parameters can be found in Table 1.The radius R is then computed by the interior model with the input parameters drawn from p(Θ).The logarithmic likelihood function Q ( ( | )) L D log can be calculated from the deviation between the model output parameters from Θ and observational data D, which can be written as where α and β denote the ratios Fe/Mg and Si/Mg in the planet, respectively, and α star and β star refer to these ratios in the star.The terms s M planet , s R planet , s a star , and s b star represent the spread of the observation data for mass, radius, Fe/Mg ratio, and Si/Mg ratio, respectively.When conducting Bayesian inference using the MCMC technique, we deploy the prevalent open-source Python package emcee (Foreman-Mackey et al. 2013), a renowned tool that executes MCMC computations.This package provides a parallel move method to take advantage of parallel computation to accelerate convergence.For each planet, we deploy 48 walkers to sample about 4000-4500 steps, which takes 24 CPU cores running for 22 hr.We check the convergence of each sampler by computing the autocorrelation time of each chain and discard the initial two autocorrelation time steps as "burn-in," following the instruction in Foreman-Mackey et al. (2013).

Evaluation of MDN Predictions on Test Data
In training ML models, overfitting is a common challenge, often arising from insufficient data.This manifests itself as the model excelling with training data, but exhibiting poor generalization to new, unseen data, as discussed in Section 2.2.To address this, it is crucial to plot the relationship between the size of the training set and the training loss, observing how the loss adapts to varying training set sizes.As depicted in Figure 2(a), we note that the NLL loss consistently diminishes with an increase in the training data size.In particular, beyond the threshold of approximately 1 million data points, the decrease in loss becomes marginal, suggesting the adequacy of our generated training set.Approximately 106 hr are required to generate 1 million samples on a node with 24 CPU cores.Additionally, employing different random seeds to generate a variety of training sets helps in reducing the impact of random fluctuations, thereby enhancing the stability and reliability of our findings, as illustrated in Figure 2(a).Note.The prior distributions of planetary mass and the bulk ratios of Fe/Mg and Si/Mg are Gaussian, incorporating observational data from the planet and its host star.In cases where there is no prior information, the WMF and the mantle's FeO mass fraction are assumed to be uniformly distributed.The planet's radius is then calculated by inputting these parameters into an interior structure model.This calculated radius is further constrained and refined by the observational radius data by applying the likelihood function.
After confirming the sufficiency of the data volume, we employ all the data sets to train the MDN model.Figure 2(b) shows the learning curves of the optimal model, chosen through a random search strategy.The graph illustrates that, with an increasing number of training rounds, there is a consistent decrease in NLL loss for both the training and validation sets.Following approximately 60 training epochs, the rate of decrease in loss on the training set slows down, while the loss on the validation set hovers around the value of −29.This suggests that the model has effectively grasped the inherent patterns within the data set, indicating that achieving additional performance improvements on the validation set may not be feasible.Consequently, increasing the number of training iterations might not lead to improved results under these circumstances.Put simply, within the constraints of the current data and model, this situation might be indicative of the model having attained its optimal performance potential.The ensuing sections will undertake an in-depth analysis of the well-trained MDN model's predictive capabilities, utilizing the test data set for evaluation.In order to discern the effect of L2 regularization, we train another MDN model with inputs of mass, radius, and the Fe/(Mg + Si) ratio, implementing L2 regularization similar to the model with inputs of mass, radius, Fe/Mg, and Si/Mg.All other hyperparameters are kept consistent across both models.The training curve shows that the minimum NLL loss reached approximately −27 (see Appendix B).The value is significantly lower than that (about −21) for Model A trained on the [M, R, Fe/(Mg + Si)] inputs by Zhao et al. (2023), but it is close to the loss of −29 for the MDN model of this work.This suggests that L2 regularization is the main contributor to the significant enhancement in the MDN's performance.Of course, the input of Fe/Mg and Si/ Mg, rather than Fe/(Mg + Si), improves the MDN performance, but the effect is not comparable with that of L2 regularization.Furthermore, even with a dropout rate of 0.1, there is no noticeable increase in the gap between the training and validation sets, indicating enhanced model stability and performance.
To evaluate the performance of the well-trained MDN model, we make predictions on the test data set and compare these predictions with the actual values.In the test data set, each sample is used to predict a probability distribution through the MDN model.This distribution is depicted as a vertical line in Figure 3.The line's color indicates the probability distribution's shape, with yellow representing the highest probability and green the lowest.Moreover, the placement of the vertical line on the x-axis reflects the true value of the output parameter for the respective sample.In the ideal scenario, where the MDN model achieves its optimal performance, the diagram in Figure 3 would display diagonal trends, as indicated by the red dashed lines, with high probability density along these lines.This pattern signifies that the predicted values exactly match the actual values, indicating zero error between them and demonstrating the model's perfect accuracy.As shown in Figure 3, the MDN model, trained using inputs of (M, R, Fe/Mg, Si/Mg), can effectively predict the parameters of the interior structure.This performance is a significant improvement over that of the MDN model trained with mass, radius, and Fe/(Mg + Si), as discussed in Zhao et al. (2023).As detailed in the research conducted by Baumeister et al. (2020), Zhao &Ni (2021), andZhao et al. (2023), the strong degeneracy in internal structures, particularly in planets with smaller cores, presents a significant challenge for the MDN in accurately forecasting the radial structure and CMB thermal properties.This challenge often results in the model's pronounced tendency to underestimate mantle sizes while overestimating core dimensions.Additionally, the model tends to moderately undervalue P CMB and T CMB in rocky exoplanets with large mantles.In contrast, the refined MDN model presented in this study exhibits a notable improvement in predicting these degenerate solutions, especially in terms of P CMB , for planets with smaller cores.
The tidal Love number k 2 , a parameter quantifying a planet's deformation in response to tidal forces, plays a critical role in understanding the internal density distribution of planets, as emphasized in the studies by Kramm et al. (2011) and Consorzi et al. (2023).This parameter is essential for elucidating a planet's interior structure, its formation history, and potential geological activities.Research conducted by Zhao et al. (2023) has shown that incorporating k 2 as an input in the MDN model significantly mitigates the density-composition degeneracy and enhances the accuracy of predictions regarding the interior properties of rocky exoplanets.However, the detection of the k 2 value for rocky exoplanets remains a challenging task.In this study, we have incorporated k 2 as an output in our model.Figure 3 shows that the well-trained MDN model demonstrates proficient predictive skills for smaller k 2 values; however, its predictions become widely dispersed and tend toward underestimation for larger values.This pattern aligns with the established correlation that higher k 2 values are indicative of smaller planetary cores.As identified in our previous analysis, the MDN model has a propensity to overestimate the core size in planets with smaller cores, leading to a forecasted k 2 value that is lower than the actual one.

Comparison between MDNs and MCMC for Representative Rocky Exoplanets
In this section, we infer the interior structures of Earth and some representative rocky exoplanets using the well-trained MDN model and the MCMC inversion method.It should be noted that the ranges of the interior-model parameters remain the same for both methods.
The MDN model trained on the (M, R, Fe/Mg, Si/Mg) inputs does not account for the uncertainties in the input parameters, although exoplanet measurements often come with associated errors.In contrast to the MDN model, the MCMC method takes into account uncertainties in the measurements when dealing with the prior probability distributions of these measured parameters.For a direct comparison between these two methods, we incorporate the uncertainties of the inputs into the MDN predictions.When applying the well-trained MDN model to a specific planet, the input variables (M, R, Fe/Mg, Si/Mg) are respectively sampled from Gaussian distributions defined by the observations and their associated errors: where μ represents the observable and σ the associated error margin of the measurement.For each planet, 1000 sets of the inputs are generated to encompass the entire spectrum with errors.We then feed each of them through the well-trained MDN model, yielding the predicted probability distributions of the output variables.Recently it has been suggested by Baumeister & Tosi (2023) that resampling from the predicted distributions and then merging these samples can reduce memory usage and improve computational efficiency.In this work, we experimented with this approach and found it to be markedly less demanding on memory and processing resources.For each predicted distribution, we process 20 samples based on the distribution itself, ensuring that this sampling process accurately reflects the predicted probability distribution of the variables.In total, this procedure generates 20,000 plausible interior solutions for each planet, representing the final MDN predictions including the uncertainties in the input variables.More importantly, we find that this strategy does not yield obvious differences in the functional outcome of the final posterior distributions compared to the direct summation of Gaussian mixtures.We begin by considering Earth as a representative of Earthsized exoplanets, and its internal structure is the most thoroughly understood among all the planets in our solar system.The pressure and temperature at Earth's CMB are respectively estimated to be around 136 GPa and 2700 K, and the MRF of Earth is approximately 0.45 (Mundl-Petermeier 2021).Figure 4 shows the comparison of the MDN predictions using the (M, R, Fe/Mg, Si/Mg) inputs with the results from the MCMC method for Earth.The bulk Fe/Mg and Si/Mg ratios of Earth are respectively known as 0.900 and 0.800 (McDonough 2003).For explicit comparison with the current knowledge of Earth's interior, we adopt a small error of 2% for Earth's bulk parameters including mass and radius.The probability distributions inferred for each parameter are represented with color coding: predictions from the MDN model are shown in blue while inferences from the MCMC  (Zhao et al. 2023).This is also evidenced by the significantly lower NLL loss of our well-trained MDN model, as elaborated in Section 3, in comparison to the model in Zhao et al. (2023).For the output variables such as the MRF, CRF, and CMF, as well as P CMB and T CMB , the results from the MDN model are almost identical to those obtained using the MCMC method, and furthermore they are consistent with the available understanding of Earth's interior (Mundl-Petermeier 2021).
It seems that the MDN model underestimates the mass and radial fractions of the water-ice shell with respect to the MCMC method.The available constraints, such as bulk Fe/Mg and Si/Mg, are mainly devoted to the mantle and core rather than the water-ice shell.In other words, the role of solid layers is generally enhanced in training the MDN model while the effect of thin water-ice shells is reduced to some extent (Zhao et al. 2023).So the resulting WRF and WMF show larger degeneracy than the other model parameters, and a small underestimation of them occurs for the MDN model.Furthermore, the water-ice shell usually makes up only a small percentage of rocky planets, and the quantities characterizing the water-ice shell, WRF, and WMF are small in magnitude.As expected, it is more difficult to describe a small quantity accurately and stably because it is more sensitive to various factors.It is therefore not surprising that there are slight differences in the features of the water-ice shell between the MDN and MCMC results.For the static tidal Love number k 2 , the prediction made by the MDN model is slightly higher than  Note.Mass and radius are in Earth units, Fe/Mg and Si/Mg in molar ratios.The Fe/Mg and Si/Mg ratios are adopted from the stellar abundance in Adibekyan et al. (2021).
that from the MCMC method, which aligns with the reported value of k 2 = as indicated by Lambeck (1980).
Next we transfer our attention to one Earth-sized exoplanet (Kepler-78 b) and four super-Earths (Kepler-10 b, HD 3167 b, HD 219134 b, and EPIC 249893012 b).Their masses and radii are summarized in Table 2, together with the stellar elemental ratios of their host stars.
Kepler-78 b is an ultra-short-period super-Earth that is slightly larger than Earth.The comparison between the posteriors derived from the MCMC inversions and the the model parameters' priors is illustrated in Figure 11 in Appendix B. The posterior distributions for mass, bulk molar ratios of Fe/Mg and Si/Mg, mantle FeO content, and WMF showcase a notable coherence with their corresponding priors, indicating a robust Bayesian updating process.Particularly for the radius, the posterior aligns precisely with the observation, signifying the success of the MCMC method in refining parameter estimates.The concordance between the posteriors and the priors/observations highlights the robustness of the MCMC approach in constraining the interior structure of Kepler-78 b.
Figure 5 illustrates the posterior distributions of the interior parameters for Kepler-78 b, which are separately inferred by the well-trained MDN model and by the MCMC inversion method.Figure 10 in Appendix A depicts the 1D and 2D posterior distributions of the interior parameters in detail.In particular, the predicted distributions of the MDN model show broad agreement with those of the MCMC method, as evidenced by their similar shapes and closely aligned median values.The MDN predictions are also consistent with those of the previous MDN model trained on the inputs of [M, R, Fe/ (Mg + Si), k 2 ] as detailed by Zhao et al. (2023).Previous studies using a two-layer internal structure model and MCMC inversions have indicated that the CMF of Kepler-78 b ranges between 8% and 34% (Dai et al. 2019;Plotnykov & Valencia 2020).The present result of roughly 24% falls within this range.Just as for Earth, the divergence between the two methods becomes pronounced when the characteristics of the water-ice shell are examined.The MDN encounters a challenge in accurately predicting the features of the water-ice shell, as discussed above.Previous research indicates that the dayside of Kepler-78 b reaches temperatures between 2300 and 3100 K (Pepe et al. 2013), sufficiently high to vaporize any potential water, rendering the presence of liquid water on its surface improbable.However, results from the MCMC method suggest approximately 5% water content, possibly indicating the presence of water mixed with a surface lava ocean.For the tidal Love number k 2 , the results from the MDN and MCMC methods are consistent and closely approximate the range of 0.823-0.937,as predicted by the deep NN model trained in Zhao et al. (2023).
The results for Kepler-10 b are shown in Figure 6.Weiss et al. (2016) presented the revised masses and densities of the planets around Kepler-10 based on 220 radial velocities.The probability that Kepler-10 b has a nonnegligible volatile layer is estimated as only 13.6%, and the two-layer interior model of rocky planets yields a CMF of 0.17    The MDN model predicts an almost negligible water-ice shell, which aligns with the fact that its surface equilibrium temperature exceeds 647 K, causing any water to transition into steam or a supercritical state (Wagner & Pruß 2002).The MDN and MCMC results show a consistent prediction for P CMB , but the MDN model yields a slight larger value for T CMB than the MCMC inversion.As discussed in Zhao & Ni (2021), the CMF has a direct influence on P CMB because it determines the CMB location and affects the gravitational acceleration involved in the hydrostatic equation.The consistent prediction for P CMB can be easily understood as the good agreement between the MDN and MCMC for the CMF.The CMB temperature is supposed to correlate with the MRF: the CMB temperature is enhanced with increasing MRF.So the slight overestimation in the CMB temperature is attributed to a larger MRF for the MDN model.For the static Love number k 2 which is sensitive to internal structure parameters, the MDN prediction follows the MCMC results within an acceptable range.
For the interior of HD 3167 b, Figure 7 depicts the predictions made by the MDN and MCMC methods.Overall, there is a similar level of agreement between the two results as observed in the case of Kepler-10 b.The MDN model tends to predict a larger MRF, and consequently overestimates the CMB temperature compared to the MCMC inversion, as discussed earlier.Gandolfi et al. (2017) concluded that HD 3167 b is a rocky terrestrial planet, based on its position in the mass-radius diagram.Its composition is estimated to be approximately 50% silicate and 50% iron.This aligns with the predictions of both the MDN model and MCMC method, with each inferring the MRF and CRF to be approximately 50%.In addition, the present MDN predictions for HD 3167 b are also consistent with our previous results inferred by the MDN model trained on the inputs of [M, R, Fe/(Mg + Si), k 2 ] ( Zhao et al. 2023).
Figures 8 and 9 show the predictions derived by the MDN model and the MCMC method for HD 219134 b and EPIC 249893012 b, respectively.As can be seen, there is an acceptable level of agreement between the MDN and MCMC results.These two planets are located in the radius gap region of 1.5-2.0Earth radii, which represents a transition zone from super-Earths to sub-Neptunes (Fulton et al. 2017).Motalebi et al. (2015) suggested that HD 219134 b is best characterized by a two-component iron-magnesium silicate model, with an estimated CMF of approximately 22%-23%.Plotnykov & Valencia (2020) regarded HD 219134 b as possibly a rocky planet and derived its CMF of -+ 0.14 0.11 0.10 by the MCMC method.These are in good agreement with the present predictions derived from the MDN model and MCMC method.Both the MDN and MCMC results cannot exclude a certain amount of water.In particular, the WMF predicted by the MDN model tends to gravitate toward 0.1, suggesting the possibility that their WMF may exceed the upper limit of 0.1.Dorn et al. (2017) explored the interior structure of HD 219134 b from stellar abundances and suggested a potential range of 0%-17% for the mass fraction of volatile layers with a 1σ uncertainty.The mass-radius diagram suggests that the composition of EPIC 249893012 b is less dense than a pure silicate composition.And the stellar Fe/Mg ratio of EPIC 249893012 is only slightly lower than the solar value.Thus, if EPIC 249893012 b has an elemental composition similar to that of its host star, the presence of volatile layers is necessary to account for its density.
To quantify the discrepancy between predictions from the MDN model and the MCMC method, we use the following equation: where M MDN and M MCMC denote the median values of the bestfitting parameters obtained by the MDN and MCMC methods, respectively.The combined standard deviation σ is defined as , where s MDN and σ MCMC are approximated as the upper error bounds of the two methods, providing a conservative estimate of the uncertainty.The Δ values for the above representative planets are listed in Table 3, together with the detailed MDN and MCMC results.We can see that Δ is less than 1σ in most cases, indicating that the predictions of the MDN model are not statistically significantly different from those of the MCMC method.
Notably, the MDN model yields a broader range of predicted distributions for the interior structures than those derived from the MCMC inversion.This difference can also be seen when examining the contour of the two-dimensional distribution, as shown in Figure 12 in Appendix C. The most likely explanation for this is that the MDN model does not directly incorporate observational errors during the MDN training process and accounts for them only in the MDN inference for one specific exoplanet.In contrast, the MCMC inversion method typically integrates observational errors directly into its probabilistic calculations, more accurately considering the effect of these errors on the inference process.

Discussion
As an extension to our previous works, a comparative analysis of the MDN and MCMC methods in practical applications for rocky exoplanets highlights the power and potential of the well-trained MDN models in characterizing planetary interiors.First, the MDN model simplifies the process by directly predicting planetary interiors without the elaborate setup required for interior structure models.This ease of use contrasts sharply with the MCMC method, which requires prior knowledge of some physical parameters governing interior structure models and necessitates interior-model calculations in the inversion process.Second, the MDN model separates the interior-model calculation from the inversion process.When making predictions, the MDN model avoids extensive repetitive calculations of interior models, and therefore is quite effective in characterizing planetary interiors.It usually takes mere seconds for each individual planet, while the MCMC method spends hours or even days on one inversion.Third, the MDN model exhibits considerably more adaptability than the MCMC method.Once the MDN model is well trained, it offers a practical tool to characterize planetary interiors even for planets whose observable constraints are significantly improved.Along with the development of space-based technologies/missions, a large amount of observational data will be constantly updated.In this case, the MDN is adept at efficiently providing revised predictions for planetary interiors when faced with updated observations.However, it should be noted that if new observations are substantially different from previous data, retraining of the MDNs may be necessary to maintain the model's accuracy.This contrasts with the MCMC method, where re-inversion is required for any change in observational data of a planet, a process that is far more timeconsuming.The ability to update predictions quickly gives the MDN a strategic edge, especially as models of rocky exoplanet interiors become more complex with ongoing research in exoplanet science.
Moving forward, the enhancement of MDN models involves two key aspects.First, increasing the model's robustness is essential.Compared to our previous work (Zhao & Ni 2021, 2022;Zhao et al. 2023), this study suggests that L2 regularization significantly prevents overfitting in MDN model training.In the future, more effort will be devoted to analyzing the effects of various values of L2 regularization parameters and choices of MDN architecture on mitigating overfitting.Furthermore, the addition of the Xavier initialization and warm-up is expected to further narrow the gap between the loss in the validation set and the training set.Xavier initialization is designed to keep the signal in a reasonable range of values through many layers, which can help prevent the gradients from vanishing too quickly and thus mitigate underfitting (Glorot & Bengio 2010).Warm-up, by slowly increasing the learning rate, helps to avoid suboptimal minima, mitigating underfitting (Lu et al. 2020).Second, as outlined in Section 4 the MDN did not account for the possibility of uncertainty in a given input observation.We will adhere to the methodology described in Wang et al. (2020).Specifically, Gaussian random noise is added to each sample of the training set according to a distribution  s ( ) 0, 2 , where σ represents the level of observational error.This approach is intended to simulate the uncertainty inherent in observational data and enhance the generalization ability of the ML model.To maintain the MDN model's broad applicability, the error introduced into the training data is chosen based on the highest precision currently achievable by observations.Unless significant improvements are made in observations, there is no need to retrain the MDN model.If the precision of detection significantly improves, noisy inputs in the training process might be unimportant and the procedure of this work (only incorporating the input error in the inference process) could be suitable.
Currently, little is known about the iron-rich core of rocky exoplanets.Earth's core exhibits a density marginally lower than that of pure iron, indicating the presence of light elements alloyed with the iron core.Potential candidates for these light elements include sulfur, silicon, oxygen, carbon, and hydrogen, either individually or in combination (McDonough 2003).The latest research suggests that magnesium may also be present in the Earth's core (Badro et al. 2016;Liu et al. 2020).The amount of core light elements has a direct influence on the core density and mass-radius relation of rocky exoplanets (Schlichting & Young 2022), and it is hence of certain significance to accurately characterize planetary interiors.If future training data sets incorporate this uncertainty, the resulting MDN models would have enhanced performance in exploring the interior structure of rocky exoplanets.

Conclusions
In this work, we compare the performance of the well-trained MDN model and the MCMC inversion method in inferring the interior structure of rocky exoplanets with large compositional diversity.In addition to planet mass and radius, the bulk abundance ratios of refractory elements (Fe/Mg and Si/Mg) of a planet are used to constrain its interior, and they are approximated as the stellar abundances of its host star.The The James Webb Space Telescope (JWST)'s advanced capabilities are anticipated to not only update and refine our observational data for known exoplanets, such as mass and radius, but also to discover new exoplanets.In this context, the MDN model emerges as a particularly promising tool thanks to its capability to rapidly interpret new data from the JWST.

Appendix B
Training Curves for the MDN Model with Inputs of Mass, Radius, and Fe/(Mg + Si) Figure 11 presents the training curves for the MDN model, which incorporates inputs of (M, R, Fe/(Mg + Si)), and applies L2 regularization akin to the model that utilizes inputs of (M, R, Fe/Mg, and Si/Mg).All remaining hyperparameters are maintained uniformly across the two models.

Appendix C Detailed MDN and MCMC Results for Kepler-78 b
In this appendix, we take Kepler-78 b as an example to illustrate the detailed results derived by both MDN and MCMC methodologies.For the MCMC inversion method, the prior density functions, as listed in Table 2, are specified based on observable constraints, and the posterior density functions of interior-model parameters are determined in terms of Bayes' theorem.Figure 12 illustrates the marginalized 1D posterior distributions of the model parameters (mass, Fe/Mg, Si/Mg, mantle FeO, and WMF) compared with the prior distributions.As additional information, the posterior distribution of the resulting radius is shown and compared with the current observation.The posterior distributions overlap with the prior distributions or current observations.More specifically, the posterior peaks for mass, Fe/Mg, Si/Mg, and radius almost align with those of the prior distributions or current observations.This clearly demonstrates the ability of the MCMC method to infer the interior of Kepler-78 b.
Figure 13 illustrates the 1D and 2D contours of the model parameters (WRF, MRF, CRF, WMF, CMF, P CMB , T CMB , and tidal Love numbers k 2 ) for Kepler-78 b, where the MCMC results (blue) are compared with the MDN predictions (red).Obviously, the MDN predictions closely follow the results of the MCMC method, although they exhibit slightly larger uncertainties than the MCMC results.Figure 12.Sampled 1D marginal posterior distributions for Kepler-78 b's inferred parameters, juxtaposed with their respective priors and observations.The histograms display a strong consistency between the priors (blue) and posteriors (red) for the planet's mass, bulk molar ratios of Fe/Mg and Si/Mg, mantle FeO content, and WMF, as specified in Table 1.For the planetary radius, the posterior distribution closely aligns with the observations (green).

Figure 1 .
Figure1.Schematic diagram showing framework architectures for the MDN and MCMC inference methods used in predicting the interior structure of rocky exoplanets.The inputs to the MDN are observed properties of an exoplanet including mass, radius, and bulk molar ratios of Fe/Mg and Si/Mg.The outputs contain the radial fractions of each layer, the pressure and temperature of the core-mantle boundary (CMB), and the tidal Love number k 2 .The data set used to train the MDN model is generated using a three-layer interior structure model, with the layers representing the water-ice shell, silicate mantle, and iron-rich core respectively.Within the MDN architecture, hidden layers are tasked with deciphering the complex relationships between the observations and the interior parameters, which are construed as a Gaussian mixture distribution in the resultant outputs.In contrast, Bayesian inference employing the MCMC technique systematically contrasts the observed data, denoted as D, with interior models derived from prior distributions, p(Θ).This process applies MCMC sampling to determine the posterior distributions p(Θ|D) of the interior parameters.

Figure 2 .
Figure 2. Learning curves of the MDN model trained on inputs of mass, radius, and bulk molar ratios of Fe/Mg and Si/Mg.(a) Training loss as a function of the training size.We generate different training samples per training size by modifying the random seed.(b) Negative log-likelihood loss over the epochs for both training and validation data.

Figure 3 .
Figure3.MDN predicted distributions of eight output variables: the radial fractions of water-ice shell, mantle, and core, the mass fraction of water-ice shell and core, the core properties P CMB and T CMB , and the tidal Love number k 2 vs. the actual values obtained from the interior structure model for the test data set.Red dashed diagonal lines indicate the best performance of the MDN.For a given sample, the maximum of the predicted probability density is mapped to yellow, while the minimum is mapped to green.

Figure 4 .
Figure 4. Comparison of Earth's interior structure distributions inferred by the well-trained MDN model and MCMC inference.The dashed lines indicate the median values of the posterior distributions for the interior parameters, providing a benchmark for comparing the central tendencies inferred by the two methodologies.
± 0.11.Brugger et al. (2017) then constrained the interior of Kepler-10 b from stellar abundances and found the possible CMF range between 10% and 33%, showing an Earth-like composition for Kepler-10 b.Both the MDN and MCMC results are in good agreement with the results from Weiss et al. (2016) and Brugger et al. (2017).

Figure 7 .
Figure 7. Same as Figure 4, but for HD 3167 b.

Figure 8 .
Figure 8. Same as Figure 4, but for HD 219134 b.

Figure 9 .
Figure 9. Same as Figure 4, but for EPIC 249893012 b.

Figure 11 .
Figure 11.Negative log-likelihood loss over the epochs for both training and validation data for the MDN model trained on inputs of mass, radius, and Fe/(Mg + Si).

Table 1
Prior Ranges of Parameters for the MCMC Method

Table 2
Summary of Observations for Representative Exoplanets

Table 3
(Zhao et al. 2024)rior Structure Predictions from MDN and MCMC for Earth and the Five Rocky Exoplanets MDN model is trained on the inputs (M, R, Fe/Mg, Si/Mg) using a comprehensive data set for the interior of rocky exoplanets.It has been demonstrated that the well-trained MDN model shows impressive performance on the test data set.The MCMC inversion calculations are carried out with the prior probability distributions of interior parameters.The ranges of interior parameters are the same for both cases.Considering that the MCMC inversion calculations incorporate the uncertainties of available measurements, the uncertainties of the input variables are taken into account in the MDN inference for the sake of consistency.Both the well-trained MDN model and the MCMC inversion calculation are applied to infer the interior structure of some representative rocky exoplanets.It is found that the MDN model provides a practical alternative to the MCMC method.By contrast, the MDN model distinguishes itself with its minimal requirement for specialized expertize, its rapid and efficient predictive capabilities, and its adaptability to evolving observational data.We have made the well-trained MDN model publicly available on GitHub 5 under a CC-BY 4.0 License.The version 1.0 is archived in Zenodo, facilitating its accessibility and utility for the broader scientific community.The data set associated with the MDN model can be accessed directly via MDN Model Data Set 6(Zhao et al. 2024).