Physics-enhanced neural networks for equation-of-state calculations

Rapid access to accurate equation-of-state (EOS) data is crucial in the warm-dense matter (WDM) regime, as it is employed in various applications, such as providing input for hydrodynamic codes to model inertial confinement fusion processes. In this study, we develop neural network models for predicting the EOS based on first-principles data. The first model utilises basic physical properties, while the second model incorporates more sophisticated physical information, using output from average-atom (AA) calculations as features. AA models are often noted for providing a reasonable balance of accuracy and speed; however, our comparison of AA models and higher-fidelity calculations shows that more accurate models are required in the WDM regime. Both the neural network models we propose, particularly the physics-enhanced one, demonstrate significant potential as accurate and efficient methods for computing EOS data in WDM.

The warm-dense matter (WDM) regime has emerged as a topic of great interest in recent years [1][2][3][4][5][6] for several reasons.Firstly, a plethora of interesting scientific phenomena occur under WDM conditions: of particular importance, given the present requirement for clean and abundant energy, is inertial confinement fusion (ICF) [7][8][9][10].WDM conditions also present themselves in a variety of astrophysical systems, such as stars in various stages of their life cycle [11][12][13], planetary cores [14,15], and more besides [16].Secondly, it has become possible to probe matter exposed to WDM conditions in the laboratory [4,[17][18][19][20][21], due to the increasing capabilities of large experimental machines such as the National Ignition Facility [22], the Linac Coherent Light Source [23], SACLA [24], and the European XFEL [25].Thirdly, the WDM regime is notoriously difficult to model, in part due to the unique position it occupies between the condensed-matter physics and plasma physics domains [5,6,26].
Equation-of-state (EOS) data -the pressure and energy of a material as a function of its temperature and mass density -is of particular importance in WDM.For example, hydrodynamics codes that are used to guide ICF experiments rely on accurate EOS data, typically given in tabular form and interpolated to the conditions of interest, to close the conservation equations [27].Many theoretical techniques exists to compute EOS data.These include "first-principles" methods, namely density-functional theory moleculardynamics (DFT-MD) [28,29] and path-integral Monte-Carlo [28,30], as well as reduced models such as average-atom (AA) models [31][32][33], and extensions thereof [34,35].A review of the application of these methods to EOS calculations can be found in Ref. [27].
Whilst these models differ in both accuracy and computational expense, they share the common principle of having only some fundamental physical properties, i.e. the temperature, density, and material composition, as inputs to compute the EOS.This is in contrast to alternative approaches such as the widely-used SESAME database [36], which extrapolates EOS data from cold curves with a variety of different models.
Assessing the accuracy of EOS models is not straightforward.One of the principle difficulties in bench-marking models to experimental data is that temperature cannot be directly measured under WDM conditions [17,37,38].Historically, the temperature has been inferred from other quantities which can be measured, such as the internal energy [17], and is thus dependent on the model used for the inference; recent developments in this area hold future promise to resolve this issue [38][39][40].Therefore, whilst experimental EOS data exists, it is in short supply and not always suited for comparison with theoretical models [17].
Consequently, results from the first-principles methods, DFT-MD and PIMC, are typically taken as the highest quality benchmarks for EOS data in the WDM regime.These methods are denoted "first-principles" because, in theory, they exactly describe the manybody electronic structure problem.DFT-MD actually incorporates two different theories, Kohn-Sham DFT-MD (KS-DFT-MD) [41], which is often synonymous with DFT-MD due to its more widespread use, and orbital-free DFT-MD (OF-DFT-MD) [42].Whilst both flavours are in principle exact, they rely on approximations.In KS-DFT, we have to approximate the exchange-correlation (xc) energy functional, for which there exists a wide spectrum of choices [43][44][45], with efforts ongoing to assess the impact of this choice in the WDM regime [26,[46][47][48][49][50][51][52][53].Meanwhile, OF-DFT relies on approximations for both the kinetic energy density-functional in addition to the xc-energy [54][55][56][57][58].Likewise, PIMC calculations of real materials have historically relied on the fixed-node approximation [59] to circumvent the fermion sign problem [60] (although PIMC with no such approximation has recently been applied to study Hydrogen under WDM conditions [61]).Of course, both methods also rely on other fundamental assumptions (such as the Born-Oppenheimer approximation), besides various numerical approximations.
In spite of these approximations, DFT-MD and PIMC are trusted methods for computation of EOS data and thus comparisons with these methods are typically made when assessing the accuracy of simpler models, such as AA models.However, we have typically observed that these comparisons generally involve a relatively small set of benchmark data points (say in the region of tens of data points), and the analysis is often a visual comparison of the pressures on a log-log plot.This is most likely due to the lack of large datasets with which to make comparisons.In recent years, two (at least) substantial EOS databases constructed from first-principles data have been published [28,29].These databases allow for a more comprehensive analysis of the performance of AA models.
A second consequence of the existence of larger databases is that they offer greater possibilities for the application of machine learning techniques such as neural networks.The use of neural networks in scientific fields such as materials science [62,63], quantum physics and chemistry [64][65][66][67][68], high-energy-density physics [69], and even in warm dense matter [70][71][72], is becoming increasingly popular.There are many reasons behind this, but relevant to this paper (besides the aforementioned growth in the size and number of databases) is the fact that they are excellent function approximations, with the ability to learn complex non-linear relationships between benchmark data and input features [73,74].
Neural network models can be trained on first-principles data and then applied to arbitrary densities and temperatures within the range of the training data.Of course, interpolating EOS data is a well-established practise.However, machine learning models offer an intriguing path to interpolate EOS data with only minimal physical constraints from the user, and, for example, modern practises in machine learning can be used for more robust error estimation.Applying machine learning techniques to EOS data is still a relatively fresh topic: most of the applications to date focussed on uncertainty quantification [75][76][77], although in a recent work [78] the authors built a surrogate EOS model using neural networks.
In this paper, our primary objective is to develop a neural network model capable of accurately interpolating the EOS by leveraging existing first-principles datasets [28,29].
Furthermore, we explore the potential of physics-enhanced neural networks for improving model performance, by incorporating additional physical information from an AA model, which has a low computational overhead for data generation.We also show that these neural network models yield a significant improvement in performance relative to the base AA model.
The paper is structured as follows.Firstly, in Section II, we give a brief overview of the AA model which we use.Next, in Section III, we present various methods for computing the pressure in AA models, which will all be compared in the results.After that, in Section IV, we explain the method that we use to train and evaluate our neural networks.Finally, in Section V, we compare results from the AA and neural network models against the two firstprinciples datasets [28,29].We note that this section is split into separate parts, because the dataset of Ref. [29] was not used in the training or evaluation of the neural networks; it therefore tests the robustness of the networks with respect to different sources of data, and also tests their ability to extrapolate to unseen species.

II. AVERAGE-ATOM MODELS: THEORETICAL BACKGROUND
Average-atom (AA) models have a long history in plasma physics [26,31,[79][80][81][82], especially in terms of calculation EOS data [32,33,35,83,84].There exists a broad range of AA models which use different assumptions and approximations.However, AA models are united by a common concept, namely the reduction of a many-body fully-interacting system of electrons and nuclei to an effective single atom model.The effect of inter-atomic interactions is coarsely accounted for via the boundary conditions used to solve the Schrödinger equation for the single atom.A derivation from first-principles of an AA model similar to the one used in this work can be found in Ref. [26].
We use the AA model proposed by Massacrier et al [85].In this model, we solve the spherically symmetric Kohn-Sham equations, In the above, v s [n](r) is the KS potential, given by The three terms in the potential are respectively the electron-nuclear attraction, the clas- [41], using the parameterization of Perdew and Wang [86].As ever, due to the dependence of the KS potential on the density n(r), the KS equations must be solved iteratively until self-consistency is reached.
The KS equations (1) are first solved under the following two boundary conditions, which yields a set of eigen-functions X + nl (r) and X − nl (r) with associated eigen-energies + nl and − nl .These energies define the upper and lower limits of an energy band, and the KS equations ( 1) are then solved for energies inside the band limits to yield X nl (r), − nl ≤ ≤ + nl .After discretizing the energy bands, the density n(r) is equal to where N k is the number of points used in the discretization of each energy band.The derivation of Eqs. ( 5) and ( 6) can be found in Ref. [87].
The occupation numbers f knl are given in the usual way according to the Fermi-Dirac distribution, The chemical potential µ is determined by fixing the electron number N e = 4π to be equal to a pre-determined value (in this paper, N e = Z in all cases).
All AA calculations were performed using the open-source code atoMEC [88,89].Ref. [89] gives a broad overview of the code, along with a more detailed description of how to set up and run SCF calculations.Since that publication, the stress-tensor and virial methods for the pressure (see next section) have been added to atoMEC, and can be found in v1.2.0 onwards.We note that the following libraries are used extensively by atoMEC: NumPy [90], SciPy [91], LIBXC [92], mendeleev [93], and joblib [94].

III. PRESSURE IN AVERAGE-ATOM MODELS
The total pressure in AA models is equal to the sum of the ionic and electronic components.Typically, the ionic pressure is just given by the ideal gas pressure, where n is the number of moles, R the ideal gas constant, and V the volume of the gas.
Clearly, this assumption is more accurate for lower material densities and higher temperatures.
There are a number of methods to compute electronic pressure in AA models.In this paper, we consider four different methods, presented below.We note from the start that a number of papers have explored and derived links between these different formulae [33,[95][96][97], and that the stress-tensor method and the virial method should, in principle, be equivalent to the functional derivative method, which is the established thermodynamic expression for the pressure [98].However, AA models rely on a range of different approximations, and thus an established relationship for one model is unlikely to hold for a different one.Moreover, the aim of this paper is to take a practical approach to EOS calculations, i.e. one that is informed by comparisons to first-principles datasets.Therefore, in this work we present the methods as we use them, but we do not attempt to draw theoretical links between them.

A. Functional derivative of the free energy
The first method involves taking the functional derivative of the free energy with respect to the volume, which is a well-established way to compute the pressure in thermodynamics.However, this method is used less frequently in AA calculations, because it is only thermodynamically consistent if no distinction is made between bound and unbound electrons, i.e. the same equations are solved for all orbitals, regardless of their energy.This is true in the AA model we use, but it is not the case for AA models in general.For example, in Ref. [26], in which the bound and unbound orbitals are treated differently, unusual results were observed for the pressure with this method.
In DFT, the free energy is given by where S[n] is the entropy and E[n] is the internal energy functional.In KS-DFT (and thus our AA model), these terms are given by [99] In the above, T s [n] denotes the KS kinetic energy, E en [n] the electron-nuclear attraction energy, E Ha [n] the Hartree energy and F xc [n] the xc free energy.These terms are given by Besides the issue with thermodynamic consistency, a more minor issue with this method is that the functional derivative in Eq. ( 9) is evaluated using finite differences, which requires two separate SCF calculations at different volumes.

B. Stress-tensor
We next consider the stress-tensor method for calculating the electronic pressure.This method was applied earlier using only the radial component of the stress-tensor [96,100,101], but more recently the full expression for the stress-tensor was used in an AA model [33], where the AA results were seen to be in good agreement with DFT-MD calculations and experimental data.In a follow-up paper [102], excellent agreement was found between an AA model and DFT-MD simulations using this method, for compressed Carbon at temperature 100 eV.
The general formula for the pressure from the stress-tensor is where φ n are the KS orbitals, and x i the i-th cartesian co-ordinate.
Transforming into spherical co-ordinates, the diagonal components of the stress-tensor matrix become In Ref. [96] and Ref. [97], the radial component T rr is used to define the total electronic pressure, i.e.P st rr = T rr .On the other hand, in Ref. [33], the (average of) the trace is taken, which leads to the expression Eq. ( 23) is essentially the same expression as that presented in Eqs. ( 17) and ( 18) of Ref. [33], but there are a few small differences.Firstly, we use the band-structure model, which introduces the weightings w k , and we make no distinction between bound and free electrons, hence there is just a single term for the pressure.Additionally, we don't enforce a boundary condition on the potential, i.e. v s (R VS ) = 0 in our model; this introduces the extra potential term in Eqs. ( 19) and (23).Finally, the spatial wave-functions are also defined slightly differently: to convert between P nl (r) in Ref. [33] and X nl (r) in our work, we use the relation However, although the formula ( 23) is consistent with Ref. [33], the individual components P rr and P θθ are, in fact, different.These differences cancel out when the trace is taken.Of course, this does not affect the results in Ref. [33] because the individual components are not used in that paper, but we mention it here to avoid confusion for the reader.When we present results in Sec.V, we consider both the radial component, P st rr (19), and the trace, P st tr .

C. Virial theorem
Usually the virial theorem relates the pressure to the kinetic energy T and potential energy U as follows [103][104][105]: where the potential energy U denotes the sum of all the interaction energies in the system.
In DFT, the above expression becomes [106] The form of W xc depends on the type of xc-functional used.For the LDA xc-functional used in this paper, it is given by [106] However, it is noted, e.g. in Refs.[96] and [97], that the expressions ( 25) and ( 26) are only valid in the case of an infinite system (R VS → ∞) or if all the wave-functions obey one of the following boundary conditions: Neither of these assumptions is true for the AA model we use.Instead, Refs.[96] and [97] propose the following form for the virial pressure, In Eq. ( 30), the first term K 1 is equal to the kinetic energy defined in Eq. ( 13), K 1 = T s .
The second term K 2 is given by It is straightforward to see that K 1 = K 2 = T s for the case when either of the conditions (28) or ( 29) holds.In fact, K 2 is a well-known alternative expression for the kinetic energy in KS-DFT [107,108], and is used, for example, to calculate the electron localization function [109].
In Refs.[97] and [96], it was shown that the expression (30) for the virial pressure, and the radial component of the stress-tensor (19), are in principle equivalent.However, as discussed, AA models are based on various different assumptions which may mean that an established relationship in one model does not hold for another.For example, in the proof presented in [96], there is no explicit exchange-correlation term included in the virial formula.
In this work, we calculate the pressure with both forms of the virial expression P vir T (25) and P vir K 12 (30) and benchmark the results against first-principles simulations.

D. Ideal approximation
The final method we use to compute the pressure in the AA model is based on the assumption that the electron density at the sphere boundary is completely free [110].We therefore call this approach the ideal approximation, and the electron pressure is calculated as [26] P In Ref. [33], it is shown that the above expression for the electron pressure is consistent with both the trace of the stress-tensor (23) and the radial component (19), under the assumption of free electron density.Of course, it is unclear how well the assumption of free electron density at the sphere boundary holds, particularly considering the known difficulties of defining whether electrons are free or bound in the WDM regime [87].However, since the AA model is already built on numerous approximations, the accuracy of the above expression will be assessed in the results section.

IV. NEURAL NETWORK METHODOLOGY
There are various flavours of neural network, such as convolutional neural networks [111][112][113], recurrent neural networks [114], and generative adversarial networks [115].In our work, we use a standard multi-layer perceptron (MLP) feed-forward neural network [116].As men- 35) tioned in the introduction, we have trained two completely separate neural networks.The first network is trained using only fundamental physical quantities -i.e., the temperature, material density, and atomic number Z -together with the ideal gas pressure, Eq. ( 8), as input features.The second network is trained using various outputs of an AA calculation, chosen based on physical intuition, in addition to the aforementioned physical quantities, as input features.Aside from this difference, the networks were trained in an identical manner.
We follow a nested cross-validation workflow to train and evaluate our networks [117].
In this procedure, there is an inner cross-validation loop during which feature selection and hyper-parameter optimization are performed; and an outer loop, in which the generalization error (the performance of the network on unseen data) is evaluated.Nested cross-validation is a computationally expensive procedure, however it is recommended for relatively small datasets (such as the one used in this paper), because it avoids overfitting to the training sets (bias) and yields a robust estimate for the generalization error.
In this paper, we work with several error evaluation metrics.These are used not only in the neural network training and evaluation, but also in evaluation of the base AA results.
As shall become apparent in Sec.V, it is difficult to compare the performance between different models using just a single error metric, which is why we consider several in this work.In Table I iii Train a network for each training set Str ij , with the chosen hyper-parameters and features.Let us denote these networks g ijkm [S], where S is the data (or dataset) on which they are evaluated.iv Evaluate the error metric M ijkm on the corresponding test set Ste ij and save the result.Then compute the adMALE 3,ikm over j = 1..5 for the given hyperparameters and features: where N tr i is the total number of samples in the training set Str i , N tr i = 5 j=1 N te ij , and n denotes a sample within that set.We note that all the reference pressures Y 0 are positive, and because we use the rectified linear activation function in the neural networks, all the predictions are also strictly positive.Therefore there are no problems taking the logarithms.v Pick a new set of {h k }.Repeat steps ii-vi 10 times, and finally save the minimum error metric and the associated hyper-parameters: 5. Repeat steps 3-4 20 times, saving the error metric each time and features used for each repetition.At the end, select the three lowest error metrics and save their associated features: where (min +1) and (min +2) denote the second and third lowest errors respectively.
We shall comment on this step in more detail below, because it is not a standard part of a nested CV procedure.
6. Following steps 3-5, we end up with three optimal sets of features {f l m } and hyperparameters {h l k }, l = 1, 2, 3, for the i-th iteration of the outer CV loop.Using these optimal sets, train three networks with the full training set S tr i .We denote these networks g il [S].
With these predictions, evaluate various error metrics ¯ j,i on the corresponding test dataset S te i .Finally, take the average over all the test datasets to compute the final error metrics.For example, the MAPE (33), after averaging over all test sets, is defined as where N tot = 5 i=1 N te i is the total number of samples in the full dataset.
8. The above steps tell us the generalization error as opposed to yielding a specific model.
To train the final model, repeat steps 4-6, but instead of looping over separate training sets S tr i , use the full dataset.
We emphasize that steps 1 and 2 are carried out once and only once, before any network training or feature selection begins.This ensures there is no contamination of training and test data.
As promised above, we now elaborate a bit on step 5, in which the best three combinations of features and hyper-parameters were chosen.Typically, only a single set of features and hyper-parameters (that with the lowest error) would be carried forward in this stage.
However, in order to reduce the bias (the likelihood of over-fitting to a specific model), we take forward the three best networks and then later use them to make an average prediction.
This idea is borrowed from the idea of ensemble models in machine-learning [118], where multiple models are trained and the outputs from those models are used to build a new model; however, our case is a very simple example because we just do a basic averaging in the final step, without any optimization on the inner validations sets.

A. Feature selection and scaling
Initially, a set of possible features was chosen for the two networks.For the AA-free network, we considered only some fundamental physical quantities: namely, the material density and temperature, the volume of the Voronoi sphere, the atomic number, and the ideal gas pressure for the ions (8).For the AA network, various outputs from the AA model were considered as features: for example, the electronic pressure computed using the four different methods described in Section III, the value of the density at the Voronoi sphere boundary, the mean ionization state, and so on.These features were selected from a physically intuitive viewpoint, because there is reason to expect a correlation between the feature and the target pressure.
An initial thinning of the features was performed by hand.Using the first training set following the outer SCV splitting, S tr 1 , we first computed the correlation of each feature and the target pressure, using the Kendall-Tau correlation measure [119].The results (in descending order of correlation) are shown in Table II.Following this, the three features with the lowest correlations, namely the atomic number Z, the chemical potential µ, and the free energy F [n], were dropped (there were additional reasons for dropping F [n] and µ, which will be shortly discussed).
Next, the correlations between the different features were checked.The potential v s (R VS ) and electron density n(R VS ) at the cell boundary were found to be fully anti-correlated (C = −1); in fact, this is expected due to the Hohenberg-Kohn theorem in DFT, which establishes a one-to-one mapping between density and potential [120].As a result, v s (R VS ) was excluded from the list of potential features.Although the correlations between the different AA electron pressures were high (C ≥ 0.9), we kept all of them as potential features as their correlation with the target pressure was also high.
In Table II, we show which features remained and which were discarded following this manual thinning of the feature space, for the AA and AA-free networks.For the remaining features, we plotted the relationship between the target pressure and feature, to get a better insight into their relationship.We observed that, for every feature considered, a much better correlation could be observed between the feature and target pressure when logarithms of both were taken.A couple of examples of the features plotted against the pressure, with and without taking logarithms, can be found in Fig. 1.
We note that the AA pressures sometimes have a negative sign, but in these limited instances the absolute value of the pressure was taken instead.All the other features considered had a consistent sign; this further motivated the dropping of the chemical potential and free energy as features, since they spanned a wide spectrum of positive and negative signs for the dataset used, which complicated the use of logarithms.Furthermore, some preliminary networks were trained using the training set S tr 1 .These networks were not trained using the full workflow described in the previous sub-section.
Rather, a few simple networks -consisting of one or two hidden layers and between 10-30 neurons per layer -were trained (also varying a few other hyper-parameters), just to get an initial idea of whether the logarithmic scaling improved the network performance.The out- come of this investigation clearly indicated that taking the logarithm of the target pressure and all features yields significantly lower errors for all the different network architectures.
Consequently, this logarithmic scaling was adopted for all subsequent training of networks.
As described in the previous sub-section, in step 3 of the workflow a random subset of features is chosen from the full feature space.In total, there were 14 available features for the AA network (after the manual reduction performed above) and just 5 for the AA-free network.For the AA network, we randomly selected between 5 and 13 features in step 3.As discussed, following hyper-parameter optimization for this set of features, the process was repeated; in total, it was repeated 20 times for the AA network.For the AA-free network, since the number of features is limited, we randomly chose between 3-5 features, and again repeated the process 20 times.Of course, there is only one subset of 5 features in this case, but repeating the hyper-parameter optimization with this same set of features is still beneficial, because the hyper-parameter search will change every time.
In Fig. 2 we plot the error metrics for the feature-sampling procedure described above, for the AA network.Note that each of the dark blue points in this graph denotes the error ¯ 0 3,im (39), i.e. the error for that particular feature subset, following the inner CV search for the best hyper-parameters.This plot is an amalgamation of all the outer loop training sets S tr i .The dark orange line tracks the average (mean) error over the number of features selected for the sample.This information is also summarized in Table III.We observe that, in general, more features leads to better performance; however, if a good subset of features is chosen, the errors are very similar, regardless of how many features are used.Following the logarithmic scaling described above, we scaled both the features and the target pressure for each of the training sets S tr i .This was done via the standard feature scaling formula, where f nm and fnm are the original and scaled features for the n-th data point, and u m and s m are respectively the mean and standard deviation of the m-th feature.The same approach was used for the target pressure.Feature scaling is a standard procedure in machine learning; on the other hand, the target is not usually scaled.However, the preliminary networks we trained performed much better when the target was also scaled, and therefore we adopted this approach.Following this initial training, we decided to use the 'Adam' stochastic optimization method [121] and the rectified linear (ReLU) activation function, ReLU(x) = max(x, 0).Below, we summarize the hyper-parameters that we allowed to vary during the automated search for the best hyper-parameters: • The number of hidden layers in the network, N hidden • The number of neurons in the first hidden layer, N (1) neuron • The fraction by which the number of neurons in successive layers are reduced in size, α r ; for example, the second hidden layer has N All remaining hyper-parameters were fixed to their default arguments in the TensorFlow library [122].
With the hyper-parameters listed above, the Optuna library [123] was used to search for the best hyper-parameters.For each hyper-parameter, a search window was specified (based on observations from the preliminary networks that were trained); the search windows that were used are shown in Table IV.As mentioned, the optimal hyper-parameters were chosen as those which minimize the adMALE (38) between the prediction and reference pressures for a given set of features.This error metric is related to the mean-absolute log error (MALE), however, we scale the errors with the logarithm of the target pressure.This scaling is done because the raw AA models perform best at high pressures (resulting from high-temperatures and densities), and it would be undesirable to train a neural network that performs badly under these conditions.This scaling gives a slightly bigger weighting to higher-pressure errors.
On the other hand, the loss function used in the network training was the standard mean-absolute log error (MALE) between the reference and predicted pressures, This was chosen (as opposed to the weighted MALE described above) because the MAE is a regularly-used loss function in neural networks, and therefore is implemented in an optimal way in neural network libraries.However, there is an added bonus: by using the MALE as the loss function and the weighted MALE as the error metric for hyper-parameter  optimization and feature selection, we actually minimize (to some extent) two important errors.
We finish this section by briefly summarizing the training and evaluation procedures for our neural network models, and their architectures.Our neural network models are based on a standard feed-forward (MLP) network architecture [116].They are shallow networks, with a maximum of two hidden layers used during the hyper-parameter optimization.We train two kinds of networks, one whose input features are just basic physical properties, and another which uses additional features from the output of an AA calculation.We use a nested cross-validation (CV) approach [117]: this means hyper-parameter optimization and feature selection are performed on an inner CV loop, meanwhile errors are evaluated on an outer CV loop, which ensures no contamination of training and test data.Several error metrics, shown in Table I, are calculated.Borrowing from the idea of ensembles in machine learning [118], final predictions are constructed by taking the average predictions of the three best models.
V. RESULTS

A. FPEOS database of Militzer et al.: Average-atom results
We start by presenting the results of the AA model, using the four different approximations for the pressure described in Sec.III, compared to the FPEOS dataset of Militzer et al. [28].From this dataset, we use only single-element data, with atomic number 1 ≤ Z ≤ 14.
We do not use the mixtures from this dataset, because at present our AA method is not suited for mixture calculations.Furthermore, for a small subset of the conditions (high temperatures and low densities), we were unable to perform AA calculations.This is a limitation of the atoMEC code, which uses sparse matrix diagonalization to solve the KS equations [89]; under these conditions, the number of eigenvalues (KS orbitals) required is so large that the sparse matrix diagonalization breaks down.In general, however, these conditions are accessible without issues for AA codes.Nevertheless, this still leaves us with 1920 total data points, out of a possible 2371 had we been able to perform all calculations.
In Fig. 3, we plot the absolute percentage error (APE) (33) between the AA and FP pressures as a function of temperature, 1 (P ref , P AA nn ), with the density also highlighted using a colour scale.Note that the y-axis is linear up to 20%, and after that the scale is logarithmic.
The two dashed lines (and associated shaded areas) denote errors of 5% and 20% respectively.
From this figure, it is clear that the AA model has large errors for low temperatures ( 10 eV), independent of the approximation used for the error.The errors appear to be particularly severe when the density is also low, at least for the finite-difference and virial methods.This could partially be attributed to the asymmetry of the MAPE, where over-estimates can lead to especially large errors.However, under these conditions (low temperature and density), one would expect the electronic pressure to approach zero, with the only contribution to the pressure coming from the ions; as mass density increases for fixed temperature, we expect the electron (and ion) pressures to also increase.However, this is not (in general) the case: consider, for example, Fig. 4. In this figure, we plot the total pressure as a function of mass density for Helium, at fixed temperature T = 0.043 eV (the lowest temperature in the FPEOS database).We see a strange behaviour in the finite-difference pressure and both virial pressures: it is negative and does not monotonically increase, as should be expected for a fixed temperature and increasing density.On the other-hand, both stress-tensor methods and the ideal pressure show the correct qualitative behaviour, with the stress-tensor results also showing good quantitative accuracy in this example.Of course, the ideal gas expression for the ions is also less accurate at low temperatures, particularly as the density increases; it is difficult to say how much the error in the total pressure is affected by errors in the ion and electron pressures, but it is likely both are incorrect to some extent.
Nevertheless, for all the pressure approximations with the exception of P st tr , the AA model principles data [28], as a function of temperature.A colour scale is also included to indicate the effect of mass density.Note that the y-axis scale is linear up to 20% (with the dashed lines representing errors of 5% and 20% respectively), and logarithmic above 20%.
improves steeply with increasing temperature.This is a positive finding, because we would expect the AA model to be most accurate at higher temperatures, due to the decreasing importance of quantum effects.The trend appears to be most strong for the finite-difference and the P vir T virial methods.A somewhat puzzling observation is the systematic errors of ∼ 10 − 20% in the high-temperature region for the P st tr method.This is in contrast to the results of Refs.[33] and [102], in which this method agreed very well with DFT-MD simulations under similar conditions.This could be related to the differences in the AA the various AA methods compared to the first-principles reference [28].We observe that, for both the virial methods and the finite-difference method, the pressure actually decreases as density increases for the first few data points.This is both counter-intuitive, and also does not agree with the benchmark.
model we used compared to the model of Refs.[33] and [102].
Next, in Fig. 5, we plot the AA pressures against the reference FPEOS pressure, on a logarithmic scale.In the lower panels of this plot, we plot the absolute log errors (ALE) (34) between the AA and reference pressures, 2 (P ref , P AA nn ).In this figure, it is clear that, for high pressures (high material density and/or temperatures), the AA results are in close agreement with the FPEOS reference for almost all the methods, but in particular for the virial and finite-difference methods.As was observed in Fig. 3 for the APE, upon close observation, the stress-tensor results appear to systematically underestimate the reference at high pressures.However, this behaviour is more noticeable when the APE is plotted (compared to the ALE); this indicates the importance of considering different error metrics when comparing pressures across a large range of temperatures and densities.
In Table V, we collect some error metrics for all the different AA methods.From this table, we notice that the P fd , P st rr , P vir T and P id methods yield quite similar results for the majority of the error metrics.The P st tr and P vir K 12 show greater differences; in particular, they both yield a far smaller fraction of results within a 5% error (by an order of magnitude in The absolute log errors (ALEs) (34) are also shown in the smaller sub-plots.In general, we observe the tendency of the AA model to perform better at higher-pressures.
the case of P st tr ) and also have larger adMALE values, demonstrating bigger errors at higher pressures.This table further demonstrates the importance of considering a range of error metrics; for example, P st tr has a lower MAPE than P fd and P vir T , but performs worse (and typically far worse) in all the other error metrics.
To finish our analysis of the pressure results with the AA model, consider Table VI.This table is analagous to Table V, however we now only consider data with temperature above 10 eV.This quantifies the big improvements at higher temperatures seen in Fig. 3, and implicitly in Fig. 5: in particular, the P fd , P st rr , P vir T , and P id methods all have MAPEs of ∼ 4%, which may be considered within the bounds of experimental uncertainties.Above 10 eV, all these methods display very similar results, and significant improvements to when all temperatures were considered.The P vir K 12 method also shows improvements across all metrics, but peforms somewhat worse overall than the aforementioned methods.However, as expected from Fig. 3, the P st tr method does not show any improvement (in fact the results are arguably worse) when temperatures below 10 eV are eliminated.

B. FPEOS database of Militzer et al.: Neural network results
Now, we consider results for the neural networks trained by the procedure described in Section.IV.As mentioned, we trained two networks, one with features from the output of AA calculations, and one without.The results for the neural network are the aggregate of the results over the outer five test sets of the nested CV procedure.
First, consider Fig. 6, in which we plot the APE of the AA neural network (left panel) and the AA-free neural network (right panel).What is immediately clear from this figure, when compared to Fig. 3, is that both neural networks show a dramatic improvement relative to  the raw AA results, at least for low-to-moderate temperatures (T < 100 eV).In particular, there are now only a handful of results with errors of over 20%, and indeed it seems that most lie within 5%, even at low temperatures.Fig. 7 paints a similar picture; on a logarithmic scale, it is challenging to discern any real differences between the neural-network pressures and the reference pressure.In Fig. 8, we directly compare the the APE (left) as a function of temperature, and the ALE (right) as a function of the reference pressure, between the two neural network models and the P fd raw AA result.We observe clear improvements in the neural network models relative to the raw space of elements considered.Secondly, it uses modern best practises in machine learning, such as nested cross-validation, which enables us have a robust error estimation.Finally, to the best of our knowledge, output from AA calculations has not previously been used to supplement interpolations of first-principles (or alternative sources of high-fidelity) data.
Our work is similar in spirit to Ref. [78], but it is distinguished by the inclusion of the AA outputs, besides various other differences such as the reference data, the neural network architecture, and the training and evaluation framework.Next, we consider the FPEOS database from Ref. [29], which we henceforth denote as the FP-Be dataset.In this case, all the data comes from DFT-MD calculations, with KS-DFT-MD used for temperatures up to 250,000 K (21.5 eV), and OF-DFT-MD used for the calculations with temperatures above that.They observed that, at the temperature transition point, the results between the two methods were consistent to within 1%.Beryllium is the only element in this dataset.We note that, in the results we shall present, we use only the raw data and not any extrapolated or interpolated values; and like for the FPEOS data, there were a small number of high-temperature and low-density calculations which we are currently not able to perform with atoMEC.
We emphasize that this dataset is completely separate from the FPEOS data used to train the neural network models, and furthermore, that Beryllium does not feature in the FPEOS dataset.Therefore this tests the ability of the neural network models to extrapolate to elements on which they were not trained, instead of the more straightforward interpolation considered in the previous section.Of course, Beryllium has an atomic number of 4, which lies inside the range 1 ≤ Z ≤ 14 spanned by the FPEOS dataset.We also note that the FP-Be data has a similar temperature and density distribution as the FPEOS dataset, as shown in Fig. 9.In this figure, we see the temperature profiles are almost identical, and while the FP-Be data is slightly more skewed towards higher densities (and hence higher pressures), the difference is quite small.
In Fig. 10, we compare the APEs across all the different AA models.As with Fig. 3, we see that all methods with the exception of P st tr tend to approach the reference pressure as temperature increases; and we again observe very large errors for T 10 eV.This observation is especially stark for the P st rr method, in which the errors drop sharply from ∼ 100% to < 5% almost immediately as temperature increases beyond 10 eV.
Next, consider Fig. 11, in which we plot the APE for the two neural network models.
Overall, we see a clear reduction in the number of large errors when the neural network models are used, relative to the raw AA results.In particular, the number of results with error > 20% is very low.On the other hand, under high temperatures at which the AA models perform strongly, there is no evidence of improvement for the neural networks.However, it is worth noting that the AA results are themselves highly accurate at these temperatures, so it would be difficult for the neural networks to demonstrate much improvement.
The above observations are further borne out when considering Figs. 12 and 13.We can see, by direct comparison of the errors between the neural network and AA model (which was chosen as the virial method P vir T in this case), that the AA result has larger errors and lower temperatures and pressures.However, it rapidly converges to the reference.Both neural networks perform relatively strongly across the full range of conditions spanned, albeit with evidence of slightly larger errors at the highest temperatures and pressures.VIII, in which we compare error metrics for the whole set of AA pressures and also the neural network predictions.Of the AA methods, P fd and P vir T seem to overall perform best, and also perform better than they had done on the FPEOS data.

Now consider Table
One cause could be the slightly different mass density distributions between the two datasets; the MAPE (and to a lesser extent the MALE and adMALE) can be easily influenced by large errors.The f 5 and f 20 scores, which are not unduly influenced by outliers, demonstrate the most consistency between the two datasets.The predictions from the neural networks show improvement in almost all error metrics, especially for example in terms of the f 20 scores; the network trained with AA outputs again seems to be stronger than the one trained without AA features, but not by a huge margin.However, these improvements are weaker relative 12. Pressures computed from neural network models and the P vir T AA method, against the benchmark pressures from Ref. [29].The lower panels show the logarithmic errors (34).Left: network trained with AA output as features.Right: network trained without any AA outputs.), and neural networks trained with and without AA output, evaluated on the FP-Be dataset [29].Left: absolute percentage errors (33).
to the AA results than the equivalent improvements shown for the FPEOS dataset.This is an indication that the neural networks can extrapolate to elements to which they were not trained on, but that the performance will drop slightly as a result.
As a final analysis, consider Table IX, in which we compare error metrics for the AA methods against the neural networks, but this time including only temperatures above 10 eV in our analysis.We see that the AA results are very comparable to Table VI; here, the P fd and P vir T methods show the best performance when all metrics are considered, with P id and P st rr slightly behind.Again, the P st tr method is a clear outlier.Interestingly, the neural network results do not seem to show improvement relative to the best AA methods in Table IX.This suggests that, when considering elements on which training has not been performed, and temperatures above 10 eV, the base AA model (with the P vir T or P fd method) may be more suitable than either of the neural network models.Of course, this conclusion does not apply when considering the full temperature range, or for elements present in the training data.be applied to elements outside of the original training set.To test this theory, we tested our pre-trained networks on the FP-Be dataset [29] (Beryllium is not in the FPEOS dataset).
Both networks demonstrated significant improvement relative to the raw AA results over the full temperature range, although when temperatures below 10 eV were removed from consideration, there was no improvement relative to the AA model (for the best-performing AA pressure methods).Overall, the network trained with outputs from AA calculations performed slightly better than the other network for both the FPEOS and FP-Be datasets.
Overall, the results in this paper indicate that -with a good choice of method for calculating the pressure -AA models generally show accurate agreement with DFT-MD and PIMC benchmarks at moderate-to-high temperatures ( 10 eV).In fact, it is at these temperatures that AA models are most useful, because DFT-MD calculations are more computationally feasible at low temperatures.Furthermore, we have developed neural network models which, especially if trained with output data from AA calculations, are highly effective interpolation tools for first-principles EOS data.There is even evidence that these models, in particular the model trained with AA outputs, can be applied successfully to elements on which they were not trained: for example, the networks demonstrate significant improvements relative to raw AA results at low temperatures (< 10 eV).This offers a promising solution for a global EOS method that is both accurate and efficient throughout the WDM regime.

DATA AVAILABILITY
Upon publication of this pre-print, the code required to run the AA calculations, and train and test the neural networks, will be made publicly available.The AA data and neural network predictions will also be made publicly available.The FPEOS database of Militzer et al. can be downloaded from the Supplementary Material of Ref. 28.The FPEOS Be data from Ding and Hu can be requested from the authors of Ref. [29], as stated in that paper.

23 A
. FPEOS database of Militzer et al.: Average-atom results 23 I. INTRODUCTION

7 .
Using the networks trained in the above step, make predictions on the corresponding test dataset S te i .The final prediction, which we denote ḡi [S], is given by a simple average of the three individual predictions,

FIG. 1 .
FIG.1.Features (output from AA calculation) vs target pressure.Top row: linear scale.Bottom row: logarithmic scale.This plot indicates the potential benefits of using a logarithmic scale for both the features and the target pressure.

FIG. 2 .
FIG. 2. Mean absolute percentage error (MAPE) (top) and mean absolute log error (bottom) for the AA neural networks trained with different numbers of features.The average for each number of features is shown with the orange line.

•
The number of epochs over which training is performed, N epoch • The size of the mini-batch used for training, N batch • The learning rate l r

FIG. 3 .
FIG. 3. Absolute percentage errors (APE) in pressure between different AA methods and first-

FIG. 4 .
FIG. 4. Pressure as a function of mass density, for Helium at temperature T = 0.043 eV, with

FIG. 6 .
FIG.6.Absolute percentage errors (APE) for the neural network models evaluated on the FPEOS dataset[28] as a function of temperature, with a colour bar to indicate the dependence on mass density.Left: model trained with AA output as features.Right: model trained without any AA outputs.

FIG. 10 .
FIG.10.Absolute percentage errors (APE) in pressure between different AA methods and Be firstprinciples data[29], as a function of temperature, with the colour scale indicating the dependence on mass density.

FIG. 11 .
FIG. 11.Absolute percentage errors (APE) for the neural network models evaluated on the FP-Be dataset [29] as a function of temperature, with a colour bar to indicate the dependence on mass density.Left: model trained with AA output as features.Right: model trained without any AA outputs.
FIG. 12. Pressures computed from neural network models and the P virT AA method, against the benchmark pressures from Ref.[29].The lower panels show the logarithmic errors(34).Left:

FIG. 13 .
FIG. 13.Comparison of errors between an AA model (P virT ), and neural networks trained with and without AA output, evaluated on the FP-Be dataset[29].Left: absolute percentage errors(33).

TABLE I .
The error metrics that are used in this work.We note that the first three metrics, i (y 1 , y 2 ), i = 1..3, denote an error measurement between two data points y 1 and y 2 .If the mean (M) over a set of data points is computed, this gives an average error.The last two error metrics, f 20 and f 5 , which represent the fraction of points with under 20% and 5% errors respectively, are always aggregate measures. 1(Mean) absolute percentage error 2 (Mean) absolute log error 3 Adjusted (mean) absolute log error (37)lly, the f 20(36)and f 5(37)scores are useful metrics because they are not sensitive to the presence of large outliers.
Below, we outline the workflow we use to train and test our neural networks.As discussed, the same procedure is used for the networks trained with and without AA features, with the only difference being the set of features that can be used in the training.In Secs.IV A and IV B, we discuss in more detail the feature selection and hyper-parameter optimization procedures.Note that, below and throughout this paper, we use the notation P ref and Y 0 to denote the reference (ground-truth) pressure values.1. Assign class-labels C i , i = 1..10 based on the magnitude of the reference pressure P ref , i.e. the lowest 10% values of P ref are labelled C 1 , and so on.2. Using a five-fold stratified cross-validation (SCV) approach, split the dataset randomly (preserving equal ratios of the class labels in each splitting) into training and test sets, S tr i and S te i , i = 1..5.Put the test sets aside.3.For each training set S tr i , randomly select a set of features {f m } 4. For each set of features {f m }: i Pick a set of hyper-parameters {h k }. ij , j = 1..5.

TABLE II .
Initial features considered for the AA and AA-free neural networks, and the Kendall-Tau correlation figures between the features and the target pressure.We also show which features are used for both networks during the training.
* Summed together with the electron pressures in the network with AA features

TABLE IV .
Search windows for hyper-parameters

TABLE V .
[28]egate error metrics for the AA methods, compared to the FPEOS dataset[28].See TableIfor the definitions of the errors.Note that here, and in all subsequent tables, the MALE and adMALE results are multiplied by a factor of 100.

TABLE VI .
[28]egate error metrics for the AA methods, compared to the FPEOS dataset[28].The difference between this table and Table V is that here we only consider data with temperature above 10 eV, whereas in Table V the whole dataset was considered.

TABLE VII .
Aggregate error metrics for the AA and AA-free neural networks, with AA results (P fd ) also shown for comparison.The numbers in brackets for the MAPE, MALE and adMALE are the standard deviations of these quantities across the five outer cross-validation folds.

TABLE VIII .
[29] P st rr P st tr P vir T P vir K 12 P id P AA nn P no-AA Aggregate error metrics for the AA methods, compared to the FP-Be dataset[29].See TableIfor definitions of the error metrics.P fd P st rr P st tr P vir T P vir K 12 P id P AA nn P no-AA

TABLE IX .
[29]egate error metrics for the AA methods, compared to the FP-Be dataset[29].The difference between this table and TableVIIIis that here we only consider data with temperature above 10 eV, whereas in TableVIII, the whole FP-Be dataset was considered.See Table.I for definitions of error metrics.