Deducing neutron star equation of state from telescope spectra with machine-learning-derived likelihoods

The interiors of neutron stars reach densities and temperatures beyond the limits of terrestrial experiments, providing vital laboratories for probing nuclear physics. While the star's interior is not directly observable, its pressure and density determine the star's macroscopic structure which affects the spectra observed in telescopes. The relationship between the observations and the internal state is complex and partially intractable, presenting difficulties for inference. Previous work has focused on the regression from stellar spectra of parameters describing the internal state. We demonstrate a calculation of the full likelihood of the internal state parameters given observations, accomplished by replacing intractable elements with machine learning models trained on samples of simulated stars. Our machine-learning-derived likelihood allows us to perform maximum a posteriori estimation of the parameters of interest, as well as full scans. We demonstrate the technique by inferring stellar mass and radius from an individual stellar spectrum, as well as equation of state parameters from a set of spectra. Our results are more precise than pure regression models, reducing the width of the parameter residuals by 11.8% in the most realistic scenario. The neural networks will be released as a tool for fast simulation of neutron star properties and observed spectra.


I. INTRODUCTION
Neutron stars are valuable astrophysical laboratories for studying matter under extreme conditions.With masses generally between 1 to 2 M ⊙ and radii between 10 and 15 km, the inner regions of these neutron-rich stars can reach density regimes well beyond those accessible in terrestrial laboratories.Matter at such high densities can potentially experience transitions to stable but unusual states of matter such as exotic baryons made of hyperons and ∆ isobars [1][2][3][4][5]; deconfined up, down, and strange quarks [6,7]; color superconducting phases [8][9][10]; or Bose-Einstein condensates made of negatively charged pions or K − mesons [11][12][13][14][15].A better understanding of the internal composition of these stars would shed light on many areas of current interest, including various astrophysical phenomena such as core-collapse supernovae [16] and binary star mergers [17], nuclear laboratory physics, QCD and relativistic gravity, as well as the early Universe.A long-standing issue in experimental and theoretical astrophysics is the determination of the equation of state (EOS) of matter within the cores of neutron stars, which describes the underlying relationship between pressure P and energy density ϵ.Theoretical EOS models make various assumptions regarding the state of matter within neutron stars' interiors, resulting in widely varying relationships between pressure and density.
The interiors of neutron stars are not available for direct observation.However, observational data from these stars, such as electromagnetic emission, is growing rapidly from telescopes like the Chandra X-ray Observatory, the Neutron star Interior Composition Explorer (NICER), and the Five-hundred-meter Aperture Spherical radio Telescope (FAST).The star's emitted spectrum is influenced by macroscopic stellar properties such as mass and radius, which are determined by the star's underlying EOS.The mass and radius, paired with other stellar parameters, control the emitted radiation, which when convoluted with the telescope response, determines the observed spectra.Therefore, in principle, the EOS can be inferred from observations of stellar spectra.In practice, crucial elements of the likelihood are not tractable, preventing straightforward inference.For example, while the mass-radius relation can be calculated from the EOS using the relativistic stellar structure equations (SSEs), the equations cannot be trivially inverted [18][19][20][21][22][23][24].The task is further complicated by the small number of neutron star observations and the relatively large uncertainty of the individual measurements.Extraction of the maximal information content, with robust uncertainties, is therefore of paramount importance.
Recently, machine learning tools have demonstrated breakthroughs in data analysis in the natural sciences [25], increasing the power of valuable data [26] while naturally propagating uncertainties [27].Applications in particle physics have used larger and deeper neural networks powered by advances in hardware processing to tackle more complex and higher-dimensional tasks [28,29].In astrophysical contexts such as regression of neutron star parameters where the likelihood is not tractable and analytical inversion is not feasible, simulation-based inference techniques [30] have been explored.For example, recent work from Fujimoto et al [31,32] demonstrated regression of neutron star matter EOS from mass-radius pairs, where networks are trained on samples of simulated stars and uncertainties are drawn from an ad-hoc Gaussian model.Morawski [33] also performed regression from mass-radius pairs to EOS assuming Gaussian uncertainty and attempted to reduce EOS parameterization dependence.Soma et al [34] regressed both EOS and mass-radius information through a set of connected modes: density values are used to determine pressure values using a deep neural network, and the regressed pressure values are then used to generate mass-radius pairs using a generative deep learning model.Ferreira [35] instead regressed EOS from radius and tidal deformation information using support vector machines (SVMs) and deep neural networks (DNNs).Our recent work [36] demonstrated regression of EOS directly from simulated X-ray spectra without collapsing to intermediate states of observable properties like mass and radius, and where uncertainty due to unknown values of other stellar parameters is fully propagated.This was the first demonstration that neural networks have the capacity to directly analyze high-dimensional astrophysical data such as X-ray spectra.
Machine learning is, however, capable of more than regression.In this paper, we introduce a novel approach to the inference of EOS parameters from neutron star X-ray spectra, in which the likelihood is made tractable by replacing the intractable elements with neural networks trained on samples of simulated stars.Rather than directly learning the entire likelihood [37] in one step, we leverage our knowledge of the problem by replacing only the crucial missing pieces, which focuses the learning task and allows the interpretation of network outputs as physically meaningful quantities.The resulting machine-learning-derived likelihood of observing a set of stellar spectra given EOS parameters allows for estimation of the EOS parameters via Bayesian maximum a posteriori and use of standard error estimation techniques.The derivation of this machine-learning-derived likelihood is shown schematically in Figure 1.This paper is outlined as follows.In Section II, we describe the physics of neutron star interiors.Section III gives an introduction to the relevant machine learning techniques.We describe the generation of our samples of simulated stars in Section IV, and the strategy for machine-learning-derived likelihoods in Section V.Sections VI and VII demonstrate the likelihood calculation and extraction of parameter estimates for mass-radius information and EOS parameters, respectively.Sections VIII and IX present a discussion of the results and future directions, respectively.

II. BACKGROUND
Observation of neutron stars has long served as a way to place constraints on our knowledge of superdense matter due to the theoretical connection between features deduced from observation (such as gravitational mass and radius) and the underlying EOS.The theoretical connection comes from the relativistic stellar structure equations, derived from Einstein's field equations, where the star's internal structure is used to derive the stellar properties of a compact object.
For spherically symmetric, non-rotating stars comprised of isotropic material in hydrostatic equilibrium, the stellar structure is compactly summarized in the Tolman-Oppenheimer-Volkoff (TOV) equation (in geometerized units G = c = 1): where m is the gravitational mass enclosed within a sphere of radius r, P is the pressure, and ϵ is the energy density.This equation creates a one-to-one map from the EOS to a mass-radius relation [23], where the inverse mapping would provide constraints on the EOS from mass and radius values.While inverting the relativistic SSEs has been demonstrated to be numerically intractable, mass and radius values determined by observation provide valuable constraints on EOS; observations of high-mass pulsars (such as PSR J0952-0607, where M = 2.35 ± 0.17 M ⊙ [38]) have ruled out previously popular EOS models which failed to predict masses above 2M ⊙ .Observation of neutron star emission has long served as a way to constrain mass and radius for neutron stars (eg.[39,40]), which in turn places constraints on the nuclear EOS.Some of the most precise constraints thus far come from observing thermal emission from quiescent low-mass X-ray binaries (qLMXBs).These binary systems are identified in well-studied globular clusters where distances, ages, and reddening are well-constrained [41].Additionally, their low magnetic fields result in minimal effects on the radiation transport or temperature distribution on the star's surface [42][43][44], making them particularly desirable for placing strong constraints on neutron star structure.
Thermal emission from qLMXBs is observed with powerful telescopes such as NASA's Chandra X-ray Observatory, where high-resolution imaging and spectroscopy have provided powerful insight into neutron star properties such as cooling [45,46], mass and radius limits [42], and binary mergers of exotic stars [47].Chandra's telescope contains a system of four pairs of mirrors that focus incoming X-ray photons to the Advanced CCD Imaging Spectrometer (ACIS), which measures the energy of each incoming X-ray.The observed spectrum then determines neutron star bulk properties like mass and radius through spectral fitting, where both the spectrum and corresponding instrument response are fit to an appropriate atmosphere model, where the surface composition is known or can be determined by the X-ray spectrum.An extensive list of such models exists in xspec [48], an X-ray spectral fitting package distributed and maintained by the aegis of the GSFC High Energy Astrophysics Science Archival Research Center (HEASARC).xspecis a state-of-the-art tool to analyze X-ray data from Chandra and other spectrometers like NICER, Nustar, and XMM-Newton -making it a valuable resource for inference of neutron star properties.

III. MACHINE LEARNING
Machine learning is a branch of computer science, and the main component of today's artificial intelligence, where computers learn to compute useful functions from data.In deep learning, a modern re-branding of neural networks and today's major branch of machine learning, networks of simple neurons connected by adjustable synaptic weights are used to implement flexible classes of functions.In a typical case, the parameters of such models, i.e. the synaptic weights, are adjusted incrementally from the training data via a stochastic gradient descent process in order to minimize some error function.For instance, in the case of a regression problem, neural networks can be trained to minimize the standard least-squared error over the training data.This can be viewed as a generalization of standard linear regression, where the linear model is replaced by a more complex, non-linear, neural network model.The least-squares error can be viewed as the negative log-likelihood of the data under a Gaussian noise model.The neural network learns to predict the parameters of the negative log-likelihood function, i.e. the mean of the Gaussian distribution in the standard regression framework.
Machine learning, and in particular deep learning, have many applications in science [25] and have greatly benefited from the increase in computing power over the last few decades, especially in the form of GPUs (Graphical Processing Units) which are well suited for the matrix-vector multiplications -the computational workhorse of neural networks.Deep learning methods can flexibly handle data sets, models, and dimensions that range from very small to very large.For instance, deep learning models with millions, or even billions, of adjustable parameters, are not uncommon.These methods can handle and leverage raw data, without the need for pre-processing using heuristic or manual simplifications, which often discard valuable information.In physics applications, this often leads to significant performance improvements in terms of accuracy, detection rates, and so forth (e.g.[26,49,50]).In the rest of this work, we use deep learning methods for regression to deterministically compute the parameters of complex functions needed to calculate likelihoods.

IV. TRAINING DATA
Samples of simulated stars are used both to train and evaluate network performance.The samples used in these studies were originally produced for Ref. [36]; we use the identical samples here to allow direct comparison to previous work.Each simulated star is described by two parameters of interest and three additional physical stellar properties.The parameters of interest are mass M and radius R, derived from EOS variations using the stellar structure equations.The other three quantities, deemed nuisance parameters, affect the stellar spectrum but are not related to the EOS: the effective temperature of the star's surface, T eff , the distance to the star, d, and the hydrogen column, N H which parameterizes the reddening of the spectrum by the interstellar medium.All five parameters determine the simulated X-ray spectra, generated using xspec.Details of each data generation step are outlined below.

A. Generation of Equation of State
As described in Ref. [36], the EOS models used to generate many simulated neutron stars are variations of the relativistic, non-linear mean-field model GM1L [51].The parameterization of GM1L used in these studies accounts only for protons and neutrons; the corresponding saturation properties of this parametrization are shown in Table 1 of Ref. [36], which follow current constraints from nuclear experiments and astrophysical observations [52].In order to represent the EOS in a compact fashion and limit parameters of interest, we represent GM1L in a parametric representation based on spectral fits; the process for constructing such fits is outlined in detail in Refs.[53] and [21].As in Ref. [36], the parametric representation of GM1L is given by the coefficients of order two spectral expansion: λ 1 and λ 2 .We randomly vary the two original coefficients to create 10 4 unique EOS variations in order to generate the many models needed for training the networks discussed below.Each EOS variation was then used to generate a mass-radius relation using the TOV equation (1).From each mass-radius relation, 100 (M, R) pairs whose mass falls in the physical range of neutron star masses (1 to 3 M ⊙ ) are selected for training and validation.

B. Modeling Spectra
Generation of a simulated neutron star X-ray spectrum requires the stellar mass and radius, generated from a particular EOS model as described above, as well as the three nuisance parameters.Beyond spectral fitting, the xspec program [48] can create simulated spectra via the fakeit command.The command requires a specified theoretical model and telescope response matrix.As described in greater detail in Ref. [36], the theoretical hydrogen atmosphere model NSATMOS [54] was used, paired with the Chandra telescope response specified in Ref. [54] which describes the instrument response and telescope effective area.

C. Nuisance Parameters
The NSATMOS model implemented in xspec requires five inputs for each simulated spectrum.The first two, gravitational mass M in units of M ⊙ and radius R in units of kilometers, are taken from mass-radius relations from the GM1L EOS variations.The remaining three parameters are the effective temperature of the star's surface T eff , the distance to the star, d, and the hydrogen column density, N H .These three additional parameters are referred to as nuisance parameters (NPs) in Ref. [36] and are given values drawn from reasonable physical limits.
To determine the physical limits of the distance to the star and the hydrogen column density N H , we turn to Table 1 in Ref. [40].The distances typically range between 2 and 10 kpc, and hydrogen columns lie between 0.2 and 5 × 10 21 cm −2 .From Table 3 in Ref. [55], effective temperatures at the surface typically lie between 50 and 200 eV, or from 6 × 10 5 and 2.4 × 10 6 K, with temperatures in the core increasing by a few orders of magnitude.
Examples of generated X-ray spectra are shown in Ref. [36].The three NPs, which we will refer collectively to as ν, directly impact the simulated spectra and the uncertainty in the spectral fitting process.In order to propagate the uncertainty from these parameters and gauge our networks' sensitivity to each parameter, we define three example scenarios of uncertainties: "true", "tight", and "loose".The three scenarios describe the quality of prior information on the NP values for each star.
In the "true" scenario, the NPs are set to the true value chosen from the physical range used to generate the spectra, such that the NP prior is essentially a delta function.In the "tight" scenario, the uncertainty is described as a narrow Gaussian for each NP, with distance having a width of 5%, hydrogen column having a width of 30%, and log(T eff ) having a width of 0.1.In the "loose" scenario, the uncertainties are described by a wider Gaussian, with distance having a width of 20%, hydrogen column having a width of 50%, and log(T eff ) having a width of 0.2.Table I shows the width of the prior information of the three NPs for the three uncertainty scenarios.

V. MACHINE-LEARNING DERIVED LIKELIHOOD CALCULATION
If the likelihood p(x|θ) of observing some data x for given values of the theoretical model parameters θ were known, estimating the value of θ for a given x would be a straightforward task.In a Bayesian approach, one would use maximum a posteriori (MAP) to find θ, the value of θ which maximizes the product of the likelihood and the prior p(θ).In the case of neutron star EOS estimation, the likelihood p(S|λ 1 , λ 2 ) of observing a set of stellar spectra S for a given choice of EOS parameters λ 1 , λ 2 is not tractable.Instead, simulation-based inference [30] techniques often circumvent the need for a closed-form likelihood by approximating p(S|λ 1 , λ 2 ) using density estimation with simulated samples.But accurately estimating the density in high-dimensional spaces such as those for stellar spectra, with dimensionality of ≈ O(1000), would require prohibitively large samples of simulated events.Instead, we use machine learning to estimate the likelihood.But rather than estimating the entire likelihood in one step, our machine-learning-derived likelihood calculation focuses the statistical power where it is most needed for a given set of observations by combining the available analytic components with neural networks to replace the intractable components.
For the estimation of parameters such as EOS or mass and radius, many elements of the likelihood calculation are known, such as the Poisson fluctuations within each energy bin of a stellar spectrum.However, there are two elements without closed-form expressions that require machine-learning assistance.We overview those intractable components here briefly and provide more detail below.
The first missing piece is a prediction for the expected photon rates observed in a telescope given stellar parameters M and R and nuisance parameters ν.Below, we train a neural network to estimate this quantity and demonstrate its application by doing MAP estimates of M and R, in which we integrate over the nuisance parameters ν.
The second missing piece is a prediction for the mass-radius values allowed by given EOS parameters λ 1 , λ 2 .We train a neural network to output the allowed radius for a given stellar mass when provided with the EOS parameters.This allows integration over the M − R curve defined by the EOS.
Together, these two machine-learned elements allow for a tractable calculation of the likelihood p(S|λ 1 , λ 2 ) and provide a MAP estimate of λ 1 , λ 2 for a given set of spectra S. We describe each piece in turn below.

VI. STELLAR MASS AND RADIUS INFERENCE
The mass and radius of a neutron star can be estimated from its spectrum s if one can calculate and maximize p(s|M, R)p(M, R) for a fixed s.This requires access to p(s|M, R), which depends on p(s|M, R, ν), the likelihood to produce this spectrum s given the full set of stellar parameters including the nuisance parameters ν: If the spectra consists of a set of energy bins of telescope photon counts, then the joint likelihood p(s|M, R, ν) over the bins can be written as where µ j is the expected number ⟨N γ j ⟩ of photons in bin j of n bins: determined by M, R, ν via complex physics of the stellar emission model as well as the telescope response.In principle, for this specific scenario, this function is contained within xspec, but is not exposed to the user in a convenient fashion which would allow for a rapid evaluation of many points, as needed for this application.More generally, one may want to learn a function in scenarios where no function is available, or where the training sample contains examples generated from a mixture of models where no single simple function can describe all samples.When we do not have access to a simple expression for µ(M, R, ν), it is possible to train a neural network to learn such that µ(M, R, ν) can be estimated from f (M, R, ν) scaled by the observation time ∆t, The likelihood p(s|M, R, ν) can then be estimated as where the final likelihood can be obtained by marginalizing over the nuisance parameters: This estimate allows for a scan of the M, R plane for the values which maximize p(s|M, R)p(M, R) to find the MAP estimate for (M, R) given a fixed s.Schematically, this process is shown in Fig 2 .A. Learning the Model f for Stellar Spectra The estimated likelihood requires learning a function f [M, R, ν] which produces the expected stellar spectrum in each bin.This function is modeled by a deep neural network with five inputs (M , R, and the three components of ν: effective temperature, distance, and N H ) and 250 outputs, one for each of the spectral bins.
The network architecture includes two input branches, one to process the mass and radius, and another to process the nuisance parameters.Each branch contains a series of nine layers, with 2048 nodes and leaky ReLU activation, that process its inputs in isolation.Following these initial layers, the output from the branches is combined together, forming a single volume with all information.This grouping is then passed to another successive set of nine layers which produce the generated spectra.The choice of hyperparameters has a significant impact on performance of this network.A complete list of hyperparameters tried is given in A; the architecture used for model f was the configuration with the best performance on the validation set.
Generating the spectra is formulated as a supervised learning problem, where the network learns to minimize the error between the true spectra and its predictions for a set of simulated examples generated by xspec.The weights are updated with gradient descent via backpropagation.The Huber loss function is used, which is a standard loss function for regression robust to outliers, and the Adam optimizer computes gradients and schedules the backward passes.
Figure 3 shows several examples of generated spectra.Each subplot contains generated results using various values of mass, radius, and nuisance parameters.The predictions are shown with their corresponding version from xspec.The predictions track the xspec values remarkably well across a range of parameter values.In general, we notice a slight overprediction by the network; below, we estimate the potential bias due to this overprediction and find it to   4: Scans of the likelihood for two example stellar spectra s (left, right) versus stellar mass and radius.Top demonstrates the ideal nuisance parameter (NP) conditions where the NPs are fixed to their true values.For the same simulated observed spectra, center shows a more realistic "tight" scenario , and bottom shows a "loose" scenario in which the NPs are not well constrained by priors.In the "loose" and "tight" scenarios, dependence on the nuisance parameters has been integrated out as described in the text.
be negligible compared to statistical uncertainties and other systematic uncertainties.

B. Results
The machine-learning-derived likelihood is tractable, allowing for a scan of p(s|M, R) for individual stars.Figure 4 shows examples of two individual simulated stars under the three nuisance parameter scenarios.
The likelihood can then be used to produce estimated values of neutron star mass and radius for a given spectrum, after marginalizing over the nuisance parameters.The mass-radius plane is scanned using adaptive optimization to find values that maximize the product of the likelihood and the prior.
We assess the performance of this method to produce likelihood estimates, ML-Likelihood M,R , by calculating the residual between the true values of the mass and radius and the estimates produced by our method.As benchmarks, we compare the residuals to those generated by xspec itself, which has access to the true likelihood used to generate the simulated samples, as well as a regression-based method, MR Net, described in Ref. [36]. Figure 5 and Table II shows a comparison of the results.
It is striking that ML-Likelihood M,R outperforms MR Net dramatically, despite being trained on the same dataset.ML-Likelihood M,R benefits from the physics knowledge encoded in the approximate likelihood, which requires ML solutions only for a specific sub-task, rather than having to blindly learn the entire problem, as MR Net must.ML-Likelihood M,R performs essentially as well as xspec, even having smaller residuals when nuisance parameters have the most uncertainty, despite not having access to the explicit likelihood used to generate the data, as xspec does.Note that the performance comparison here highlights a practical difference between our method and use of xspec, rather than a principled difference in the methods.That is, our method of replacing the difficult calculational step with a learned ML model allows for convenient and rapid evaluation of the likelihood over many values of the nuisance parameters, enabling us to marginalize over them, which explains the improved relative performance.While in principle one could perform the same operation with xspec, its lack of a convenient programmable interface makes this impractical.The practical distinction is crucial, however, as it allows for greater general flexibility, such as interpolation across several models, or in application to the broader EOS inference problem described below, where likelihoods are not just hidden behind inconvenient interfaces, but completely unavailable.
TABLE II: Performance of our estimation of neutron star mass (left) and radius (right) using an approximate likelihood which incorporates neural networks, ML-Likelihood M,R , in comparison to the performance of a pure regression network, MR Net [36] and the xspec tool.Shown are the mean (µ) and standard deviation (σ) of the residual distributions under three scenarios of nuisance parameter uncertainties.In the "true" case, the NPs are fixed to their true values; in the "tight" and "loose" cases, they are drawn from narrow or wide priors, respectively; see text for details.

VII. EQUATION OF STATE INFERENCE
The ultimate goal is to estimate the EOS parameters (λ 1 , λ 2 ) given a set of spectra S = (s 1 , s 2 ..., s nstars ).In principle, this would be straightforward if one could evaluate p(S|λ 1 , λ 2 )p(λ 1 , λ 2 ), which would allow for maximization to find an estimate for λ 1 , λ 2 for a fixed S.
We begin with the assumption that the EOS parameters have a uniform prior within their physical boundaries of [36]).The remaining step is evaluating p(S|λ 1 , λ 2 ).
First we express the probability over the entire set S as the joint probability for each star s i : The obstacle is that we do not know how to evaluate p(s|λ 1 , λ 2 ), only p(s|M, R), which depends on stellar parameters M and R. Linking these expressions is not trivial, as the EOS parameters λ 1 , λ 2 do not uniquely determine stellar parameters M and R, instead they only determine the M -R relation.That is, each point in (λ 1 , λ 2 ) space specifies a curve in M -R space.The solution is to integrate over the M -R curve allowed by the EOS parameters λ 1 , λ 2 .This is most directly accomplished by expressing the integral over the mass-radius plane, constrained by a delta function which traces out the M -R curve determined by the EOS parameters λ 1 , λ 2 : FIG.5: Performance of our estimation of neutron star mass (left) and radius (right) using an approximate likelihood which incorporates neural networks, ML-Likelihood M,R , in comparison to the performance of a pure regression network, MR Net [36] and the xspec tool.Shown is the residual, the difference between the true and predicted values, for three scenarios of nuisance parameter uncertainties.In the "true" case, the NPs are fixed to their true values; in the "tight" and "loose" cases, they are drawn from narrow or wide priors, respectively; see text for details.
where p(M, R|λ 1 , λ 2 ) describes the allowed M -R relation given the EOS specified by λ 1 , λ 2 , as where h λ [M ] → R is a function that gives the allowed value of R for a value of M , determined by the EOS parameters λ 1 , λ 2 .The function h λ [M ] encodes all of the physics which translates the EOS into stellar mass and radius, and is not available analytically or tractable numerically.It is possible, however, to train a neural network to learn this function, as we do below.Assuming h λ [M ] is available, we choose to integrate over mass, as each mass is mapped to a unique R; the same is not true for scanning in R, see Figure 6.
The delta function reduces the double integral in M and R to a single integral over mass: where the range of the mass integral is limited to the physical region, from 1.2M ⊙ to 1.6 − 3.25M ⊙ , depending on the radius, see Figure 6.This allows us to write an expression for the joint probability over the set of stars: We have now expressed the likelihood p(S|λ 1 , λ 2 ) in terms of the likelihood p(s|M, R), which we previously learned to calculate.The equation for p(s|M, R) now allows for the expression of a joint likelihood over the stars and the bins: where we can replace each of the µ ij as we did above with This expression can be evaluated, assuming one can learn a function h λ [M ] → R. The determination of the likelihood L S (λ 1 , λ 2 ) is shown schematically in Fig. 1.

A. Learning the Model h λ [M ] for Stellar Radius
The approximate likelihood above requires learning a function h λ [M ] which estimates the stellar radius for a given stellar mass as determined by the EOS parameters λ 1 , λ 2 .Note that one could equivalently estimate the mass from the radius, but this has the additional complication of degenerate outputs for some radii, see Fig. 6.It is important here to note that model h λ [M ] is trained on M-R relations created by equations of state that do not feature a phase transition; parameterizing the M-R relation as a function of R = h λ [M ] may miss details stemming from more exotic features, like those caused by a strong, first-order phase transition, in the EOS.
We model h λ [M ] with a deep neural network comprising 10 hidden layers with 32 nodes each and ReLU activation.The output layer is a single node with linear activation, which is standard for regression.The number of hidden layers, their widths, and activation functions were optimized for the functionality of h λ [M ]; the relatively small width of 32 nodes and ReLU activation were found to perform well.The network was trained with Mean Squared Error (MSE) loss and the Adam optimizer.Each color represents a single choice of EOS parameters, which determine a curve in the mass-radius plane.Individual calculations as described in the text are shown (crosses), as compared with the output of a neural network function h λ [M ] (solid line), which estimates the radius corresponding to an input value of M as determined by the EOS parameters.

B. Results
The two networks that model the missing functions f and h allow for an approximate evaluation of the likelihood, Eq. 3 as a function of the EOS parameters.Figure 7 shows examples of two individual sets of simulated stars under the three nuisance parameter scenarios.
To estimate the EOS parameters from a fixed set of stellar spectra, the likelihood is maximized via a course scan over EOS parameter space followed by the use of an optimization algorithm for a more refined location of the optimal EOS parameters.Each evaluation of the likelihood involves nested loops over the stars, an integral over possible masses, and a loop over the spectral bins.Performance of ML-Likelihood EOS and comparison with benchmarks are shown in Fig. 8 and Table III.We note that the data used in evaluations are generated via xspec, not from the models f and h, allowing for a test of the fidelity of the machine-learned models.Experiments in which simulated spectra are generated using the models f and h show equivalent performance, indicating that any bias due to mis-estimation by f or h is negligible in this context.

VIII. DISCUSSION
The results above demonstrate that machine-learning-derived likelihoods are useful statistical tools, allowing for traditional inference such as parameter estimation for quantities of interest (eg stellar M and R) as well as profiling over nuisance parameters (eg stellar distances and temperatures).
In the case of M ,R-estimation for an individual star, the performance of the ML-Likelihood M,R method matches the performance of xspec when the nuisance parameters are known.This is an important validation of the technique, as the simulated samples are generated by xspec and so its internal likelihood estimation represents something of an upper bound on possible performance.Though xspec can provide point estimates and other analysis, ML-Likelihood M,R in this case is valuable as a building block for further analysis, as xspec does not provide an efficient interface to its internal calculations.For example, in the cases where nuisance parameters weaken the inference, ML-Likelihood M,R is able to improve on xspec's performance by marginalizing over the stellar nuisance parameters.Given access to the full likelihood, one could also choose to profile over the nuisance parameters.In addition, while xspec's inference is linked to a particular theoretical model, ML-Likelihood M,R can be trained on a variety or mixture of models, providing a smooth interpolation between otherwise distinct conceptual approaches [56]. .Top demonstrates the ideal nuisance parameter (NP) conditions where the NPs are fixed to their true values.For the same simulated observed spectra, the center shows a more realistic "tight" scenario, and the bottom shows a "loose" scenario in which the NPs are not well constrained by priors.In the "loose" and "tight" scenarios, dependence on the nuisance parameters has been integrated out as described in the text.
The M, R-likelihood estimation is a building block toward the estimation of EOS parameters for sets of stars.In this case, as well, the likelihood provides for reliable inference of the EOS parameters, as demonstrated by the performance of ML-Likelihood EOS .The residuals in this case again are narrower than the pure regression approach, nearly matching the performance of xspec in the true case, and exceeding its unmarginalized estimates in the realistic case where nuisance parameter uncertainty is important.
Our method uses machine learning to enable what is typically termed a forward process, in that it aids the calculation of the likelihood of experimental data from the parameters, rather than backward inference of the parameters from the data.In this sense, it can be considered a fast and flexible simulation tool.The neural networks developed for this work effectively enable end-to-end, fast, and convenient simulation of neutron star spectra for a range of EOS parameters and nuisance parameters, including the intermediate step of generating plausible neutron star properties (mass and radius) for a given set of EOS parameters.In the case where the true likelihood exists but is not made conveniently accessible, our approach provides a powerful and flexible new interface, even without speed enhancements.In the more general case, such as for EOS inference, the approach additionally allows for more rapid generation of simulated stellar masses and radii for specific EOS parameters without solving the complex sets of equations underlying the physical model.FIG.8: Performance of our estimation of neutron star EOS parameters λ 1 (left) and λ 2 (right) using an approximate likelihood that incorporates a neural network, ML-Likelihood EOS , in comparison to the performance of a spectra-to-EOS regression network and network which regresses EOS parameters from M, R values estimated by xspec, both from Ref. [36].Shown are the residual distributions, the difference between the true and predicted values, under three scenarios of nuisance parameter uncertainties.See Table III for quantitative analysis.In the "true" case, the NPs are fixed to their true values; in the "tight" and "loose" cases, they are drawn from narrow or wide priors, respectively; see text for details Once released, this framework will serve as a convenient simulation tool that can be used by the larger machine learning research community to generate neutron star samples without the expert knowledge required to run xspec or solve the TOV equations, and spur further innovation, such as in simulator-based inference techniques that can handle TABLE III: Performance of our estimation of neutron star EOS parameters λ 1 (left) and λ 2 (right) using an approximate likelihood that incorporates a neural network, ML-Likelihood EOS , in comparison to the performance of a spectra-to-EOS regression network and network which regresses EOS parameters from M, R values estimated by xspec, both from Ref. [36].Shown are the mean (µ) and standard deviation (σ) of the residual distributions under three scenarios of nuisance parameter uncertainties.See Fig 8 for distributions.In the "true" case, the NPs are fixed to their true values; in the "tight" and "loose" cases, they are drawn from narrow or wide priors, respectively; see text for details a large number of nuisance parameters, or to setup ML challenges where participants are given access to the simulator.

IX. CONCLUSION
In this work, we demonstrate an alternative approach to deducing neutron star mass, radius, and EOS from simulated stellar X-ray spectra using machine learning.The alternative approach employs a technique novel to neutron star astrophysics, machine-learning-derived likelihoods, in which intractable elements of the calculation are replaced with machine-learned functions.This allows for an approximate likelihood calculation with increased interpretability.Our forward neural network model allows us to predict the X-ray spectra given a value of nuisance parameters, the equation of state parameters, and the neutron star mass.Forward modeling of this kind demonstrates how these networks are interpretable, testable, and often with the assumptions being explicit rather than implicit when compared to inverse models [40,57].The ML-Likelihood EOS model outperforms our previous best-performing regression model, demonstrating the power of such a technique for point estimation.Note that it additionally provides access to the full likelihood, which can inform uncertainty estimation.
While the studies shown in this work demonstrate the power of the method to provide a tractable likelihood for stellar spectra generated from a specific model, the natural ability of neural networks to interpolate allows for the treatment of inherent model uncertainties, by training on samples with mixed or varying models [27].
The machine learning-driven calculations lend themselves to various future directions.In terms of training data, EOS models with different parametrization can be tested as well as using a different response matrix (e.g. from NuStar, NICER, or XMM-Newton as described in [58]) in xspec to generate simulated X-ray spectra.In addition, this technique can be rapidly applied to the analysis of X-ray spectra from recent observations.The networks developed for this work will become a convenient tool for fast simulation of neutron star spectra which can be run without the expert knowledge required to run xspec or to solve the TOV equations.

Figure 6
demonstrates how the network h λ [M i ] → R performs for a few example values of the EOS parameters λ.Generation of the training data as described above required approximately 24 hours of CPU time for 10 6 stars; in comparison, evaluation of the network h λ [M ] generates 10 6 stellar radii in 400 ms, a relative speed enhancement of 10 5 .

FIG. 6 :
FIG.6: Relationship between neutron star mass and radius, as determined by equation of state parameters λ 1 , λ 2 .Each color represents a single choice of EOS parameters, which determine a curve in the mass-radius plane.Individual calculations as described in the text are shown (crosses), as compared with the output of a neural network function h λ [M ] (solid line), which estimates the radius corresponding to an input value of M as determined by the EOS parameters.

TABLE I :
Description of "true", "tight", and "loose" nuisance parameter (NP) scenarios.Shown are the width of each Gaussian distribution representing the prior knowledge of each NP.For distance and N H , width is relative; for log(T eff ), it is absolute.See text for details and references.
Scans of the likelihood for two example sets of stellar spectra s (left, right) versus EOS parameters λ 1 and λ 2