Accuracy evaluation of different machine learning force field features

Predicting energies and forces using machine learning force field (MLFF) depends on accurate descriptions (features) of chemical environment. Despite the numerous features proposed, there is a lack of controlled comparison among them for their universality and accuracy. In this work, we compared several commonly used feature types for their ability to describe physical systems. These different feature types include cosine feature, Gaussian feature, moment tensor potential (MTP) feature, spectral neighbor analysis potential feature, simplified smooth deep potential with Chebyshev polynomials feature and Gaussian polynomials feature, and atomic cluster expansion feature. We evaluated the training root mean square error (RMSE) for the atomic group energy, total energy, and force using linear regression model regarding to the density functional theory results. We applied these MLFF models to an amorphous sulfur system and carbon systems, and the fitting results show that MTP feature can yield the smallest RMSE results compared with other feature types for either sulfur system or carbon system in the disordered atomic configurations. Moreover, as an extending test of other systems, the MTP feature combined with linear regression model can also reproduce similar quantities along the ab initio molecular dynamics trajectory as represented by Cu systems. Our results are helpful in selecting the proper features for the MLFF development.


Introduction
Potential energy surface in the high dimension of the atomic configuration {R} is a basic physical quantity that governs many phenomena of the system, including the ground state atomic structure, molecular dynamics, and chemical reactions [1][2][3][4][5]. Predicting the potential energy and atomic forces requires both accuracy and efficiency, in order to reliably describe the evolution of complex systems over long timescales. Classical force fields an empirical expression of potential energy surface, based on the functions of the atomic coordinates can scale to large systems and long timescales [6]. However, the accuracy of such force fields is often problematic [7,8], which will directly affect the simulation results. The ab initio density functional theory (DFT) can be used to determine the ground state of potential energy surface and no longer to rely on empirical parameters, though the computational cost (the O(N 3 ) scaling to the system size N) limits its utilization in large systems (e.g. less than 1000 atoms) and short time molecular dynamics simulations (e.g. less than 100 ps) [9]. In recent years, there is a surge to develop machine learning force field (MLFF) based on DFT data, and ample work were published in this area [8,[10][11][12][13][14][15]. These studies clearly demonstrated the MLFF a promising approach to fit the force filed. Moreover, MLFF aim to approximate a set of high-fidelity energy and force labels. Except the early attempts for small molecules [12,13], for the large systems, a common approach is to assume the DFT total energy as the sum of the atomic energy of each atom, which was first proposed by Behler and Parrinello in 2007 [16]. As a result, this MLFF model is reduced to find a mapping of the atomic energy depending on the local atomic configuration surrounding a given atom (i.e., within a distance cut R C ).
There are different ways to find this mapping [1-3, 9, 14, 16-27]. One approach is to use Bayesian model based on a distance-dependent kernel between any two local atomic configurations [3,17]. One example is the smooth overlap of atomic positions (SOAPs) model [19], which uses a density rotational overlap between two different atomic configurations to define the distance and the related kernel in Bayesian model. Another more common approach is to use machine learning regression model based on local atomic descriptors (or say features) [1,2,[16][17][18][19][20][21][22][23][24][25]. The regression model can be linear regression [28][29][30] or neural network model [15,16,31]. The features can be pre-defined, or can be trained in an embedded network (e.g., the DeepMD model) [32], or through a graphic neural network [33]. There are also methods to train the features, not through the network, but through fixed format analytical forms with adjustable parameters (e.g., the ACE method) [24,34,35]. New methods and networks were proposed rapidly in literature, such as Behler-Parrinello neural networks [16], GAP [18], spectral neighbor analysis potential (SNAP) [22], DeepMD [32], moment tensor potentials (MTPs) [21], and ACE [24,34,35], the field is crowded with different methods and features, making it difficult to choose one for a given physical problem. Since there is no universal way to test the accuracy and reliability of different methods, and in the literature different systems were used to test the newly developed methods, it is difficult to compare the efficiency of different methods based on the literature. Therefore, it is imperative to compare different methods in a systematic way, using the same system, the same data set, the same number of features, etc.
To test the efficiency of these features. We used the approach of pre-defined features and data regression model. One can use the linear regression or neural network model. All the features we chose can be used for either the linear fitting or neural network model. However, since neural network model training might lead to local minimum, which can complicate the comparison and add uncertainties in the conclusion, and thus we adopted the linear model, which is more stable. Note that, in practice, many MLFF models use linear fitting [8,28,30]. Our MLFF model can be described as: (1) one can first carry out an ab initio DFT simulation on a small system for a short time, while at the same time ensuring that the DFT simulation covers all the possible local atomic configurations essential to the physical phenomena to be studied; (2) this will be followed by a series of feature training, and the results can gain energy and force; (3) finally, the ability to yield accurate MLFF results compared to DFT data within the desired confrontational region.
In this work, we compared different methods for the same physical problem. On this basis, we compared the accuracy of nine different feature types by applying them to the two different systems: sulfur (S) and carbon (C). Then followed by a series of results on standard benchmarks: the training residue mean square error (RMSE) for different systems. Since the results obviously depend on the number of features, we systematically increased the number of features following the instruction given in the literature for how to increase the number of features optimally. We had nine different feature types, including the most commonly used methods in the literature: 2-body and 3-body feature developed in our previous work [20], the 2-body and 3-body features used by Behler and Parrinello [16], the MTP feature [19,21], the SNAP feature [18,22], the DeepMD like predefined feature with Chebyshev polynomial radial function [32], the DeepMD like predefined feature with Gaussian radial function, and the ACE feature [34]. Except the ACE one, whose radial functions are optimized for its parameters, all the other features are predefined with best practice default parameters [35]. The results revealed that MTP has obvious advantages: the RMSE value is the smallest, and the MLFF accuracy fitted by MTP can be well matched with that of DFT.

Energy decomposition
In this work, to expand the datasets for training, the DFT total energy is partitioned into atomic energies outlined in Kang and Wang's work [36,37]. The sum of each individual atomic energy is equal to the total energy of the system: where the E total DFT is the total energy for a given system and E DFT i is the atomic energy of the ith atom. There is no unique way to decompose the DFT total energy to atomic energies. E i is a local property that only depends on the atomic configurations near the atom i, in other words, depends on all adjacent atomic configurations in a R C sphere surrounding in the atom i, such local property can be represented by the MLFF model. They thus provide direct data for the model, more useful than the force which is a derivative of the total energy. To further reduce the issue of non-uniqueness, we defined a group energy (E group ), which is a local sum of the atomic energies of atoms surrounding the center atom. In other words, it is defined as the weighted average of the DFT-calculated atomic energy around the center atom i: where atom js are all the atoms in the sphere with the center of atom i and radius of cut-off distance R C , d ij is the distance between atom i and atom j, d width is the decay distance factor, E j is the DFT calculated decomposed atomic energy of atom j. The E group i can reduce the non-uniqueness during the assignment of local energy density to nearby atoms.

The definition of features
To fit the atomic or total energies, the surrounding chemical environment of each atom can be mapped to a set of descriptors (features). Features (G i,α R ij ) are differentiable functions that depend on the local atomic environment of an atom, which can be described as the atomic coordinate sets {R j } for atoms within a cut-off distance R C to the center atom i. As total energies remain constant under permutation, translation, and rotation operations, the features we used were constructed from permutation-, translation-, and rotation-invariant functions. The translational invariance is automatically satisfied if all the features depend on the distance vector R j − R i instead of R j itself. To reserve the rotational symmetry, it is often required to do the contraction of the spatial vector or tensor index at various stages. The contraction can be done based on Cartesian coordinates x, y, z (or its higher order tensor index) as in MTP, and it can also be done on spherically harmonic index over the azimuthal angular momentum index m like in SNAP and ACE. To yield Permutation invariance, one usually does a pooling, i.e., a summation over the neighboring index j. Researchers use smooth functions to guarantee the smoothness of the feature near the radial cutoff R C . The values and derivatives of these smooth functions gradually approach zero at R C . In this work, we considered nine different feature types that satisfy such conditions: 2-body and 3-body feature types using piecewise cosine functions [21], 2-body and 3-body feature types using Gaussian symmetric functions [16], moment tensor potential (MTP) [19,21], SNAP [18,22], simplified smooth deep potential with Chebyshev polynomials (DP-Chebyshev) and simplified smooth deep potential with Gaussian polynomials (DP-Gaussian) [32], and atomic cluster expansion (ACE) [34]. The details of the nine feature sets were given in the appendix.
Features are differentiable functions of the spatial coordinates, so that force can be calculated as: where j is the index of neighboring atom within the sphere of cutoff radius, and α is the index of feature.

Linear regression
Features are usually used as the input of various regression models (linear model, neural network, etc), which output atomic energies and forces. The linear regression is one of the simplest and most common machine learning algorithms and is the basis of many powerful nonlinear models. The atomic energy E i and total energy E tot has the following form in linear fitting: Here, G i,α is the feature, C α is the fitting coefficient, i is the atom index, α is the feature index. The forces on atom i have the following form: Here, atom js are the neighbors of atom i, α is the index of descriptor G. Let the training set contain the configurations ML k , the DFT set contain the configurations DFT k , such as the known DFT total energies of the E DFT (total k ), group energies E DFT (group k ), and force DFT (k). Our machine learning (fitting) process consists of minimizing the quadratic loss function L(C α ): Figure 1. Schematics of the MLFF model design. The leftmost box is the simulation system, the second column represents the atomic environment including interatomic distances and three-body angles, the third column represents the different features, the fourth column represents the linear regression for the model, and the fifth column represents the energy and force term(s) to be trained.
The L(C α ) can be minimized when C α satisfies ∂L ∂Cα = 0. A small regulation term δ * α |C α | 2 can be added in L(C α ) to avoid singularity. By multiplying terms to L(C α ), weight factors are required to have the same order of final contributions to L(C α ). More specifically, ω total , ω group and ω force represent the weight of the total energy, group energy and force, and the proportional relationship of the weights is ω total : ω group : ω force = 0.894 : 0.447 : 0.004. After obtaining C α , we have the linear model, and can calculate the MSEs of the linear model as follows: Here, N C is the number of cases (configurations), N atom is the total number of atoms of all these cases. The RMSE is just the square root of the MSE. Thus, the loss function error can be rewritten as:

Training procedure
Our MLFF model was illustrated in figure 1. We first performed a series of ab initio molecular dynamics (AIMD) calculations for the carbon and sulfur systems as the input data sets. For the DFT calculations, plane wave pseudopotential method was applied with the PWmat package [38,39] with the norm-conserving pseudopotential. The Perdew-Burke-Ernzerhof exchange functional [40] was employed with the energy cutoff for plane wave function is 60 and 80 Ry for the sulfur and carbon systems, respectively. Only the Γ point was used in simulations due to the large box. PWmat can also extract atomic energies from DFT calculations based on the energy decomposition scheme [20]. The sum of these atomic energies equals to the total energy of the system. Using atomic energies can significantly boost the size of the energy data set, thus has the potential to reduce the number of DFT calculations. However, to reduce the possible over constraint on the MLFF model, we used a Gaussian function for the local atomic distance to average the atomic energies surrounding each atom and called the results as the group energy, which indicates the energy within the atom-centered region. The group energy is fitted together with the total energy and the atomic forces. Secondly, AIMD data were used to generate different number of features using nine different feature types, generating the number of features and the linear fitting procedures were carried out using the PWmat-MLFF package [41]. The above model was implemented in our open source code. The training and testing data used in this work can be found on the package website. Finally, we made the comparison between DFT and the MLFF trained by different feature types.

DFT data generation
We performed an NVT AIMD simulation for the amorphous sulfur (α-S) system of 128 atoms at 300 K and 1500 K (see figures 2(a)-(c)). The time step was set as 1 fs, we ran a 2 ps molecular dynamics simulation at 300 K, and the S 8 rings were reserved as shown in figure 2(b). One image was obtained for each AIMD step, and 2000 images in total were obtained for 300 K. We also performed a 3 ps molecular dynamics at 300 K-1500 K and then performed a 2 ps AIMD simulation at 1500 K. We chose the latter trajectory as the 1500 K training data set. The S 8 rings were broken during the simulations, and the structure of 1500 K simulation is shown in figure 2(c). The bond broken system represents a challenging system far from equilibrium, which is difficult to described using force field. However, we got three training situations from AIMD simulations: (a) sulfur-300 K molecular dynamics trajectory, (b) sulfur-1500 K molecular dynamics trajectory, and (c) combination of sulfur-300 K and sulfur-1500 K trajectories. For the carbon system, we selected four different phases of carbon structures, (a) diamond, (b) graphenylene, (c) graphene, and (d) M-C. We performed an NVT AIMD simulation for each of these four structures from 300 K to 3500 K, the different structures from 300 K to 3500 K were shown in figure 3. To carry out machine-learning molecular dynamics simulations using the results of MLFF at a given temperature T, AIMD simulations at higher temperature of 3500 K were added in the training data set to cover a larger configuration space, because it is close to the melting point of carbon. A total of 1000 steps were performed for each carbon phase. We obtained 4000 images as our training data sets. The details of sulfur and carbon systems and their AIMD steps were presented in the table 1.

Comparison between DFT and MLFF
Then, to facilitate the comparison for the accuracy of different feature types according to the DFT data sets, we evaluated the fitting accuracy of each feature type by RMSE. The specific method is to compare the errors of different features in fitting total energy, group energy, force and loss function with the same total number of features. The best feature can produce the smallest error in fitting total energy, group energy, force and loss function. Then, we systematically tested the nine different features under the linear regression model, using the two data sets: sulfur system and carbon system. It is worth mentioning that we used 2-body feature and 3-body feature together in a single training for cosine features and Gaussian features, respectively. Thus essentially, we have seven different feature types. To compare different feature types in a fair fashion, unified parameters were adopted for different systems to obtain features. We systematically increased the number of different features and compared the quality of different feature types of each time with the same total number of features. The detail of generating features was described in tables S1 to S6 of the supplementary data.  After the above training steps, for the sulfur system, by decomposing the total energy into individual atomic energies, the training set contains 256 000 atomic energies and 768 000 atomic forces in the sulfur-300 K and sulfur-1500 K data sets while 512 000 atomic energies and 1536 000 forces in the combination of sulfur-300 K and sulfur-1500 K data sets. The RMSE of the group energies, total energies and forces were shown in figures 4 and 5. We can see that for both systems, the RMSE decreases as the number of features increases, and the following discussion was based on the maximum number of features (about 180).
For sulfur-300 K system (solid line in figure 4), the MTP feature showed the best results in fitting total energy with the RMSE about 0.060 eV; the cosine feature (2-body + 3-body) presented the best results in fitting group energy with the RMSE only about 0.004 eV; the Gaussian feature (2-body + 3-body) exhibited the best results in fitting force, and the RMSE is about 0.09 eV Å −1 . Overall, MTP feature is the best linear model with the loss function error of 0.002. For sulfur-1500 K system (dashed line in figure 4), all the RMSEs are larger than those for the sulfur-300 K system, the RMSE minima of the group energy, total energy and force are around 0.017 eV, 0.263 eV, and 0.419 eV Å −1 , respectively, the fitting error of loss function is only 0.036. Except for the RMSE minimum of force, which was yielded by the ACE feature, the RMSEs of the group energy, total energy and the fitting error of loss function were all produced by MTP feature.
Moreover, the information of the combined data set of sulfur-300 K and sulfur-1500 K, is more comprehensive than the single data set of sulfur-300 K or sulfur-1500 K. Figure 5 shows that the errors of the group energy, total energy and force, as well as the error of the loss function are larger than the fitting results of sulfur-300 K. This is because the addition of sulfur-1500 K makes the simulated system more complex, resulting in a higher challenge to fit the force fields. Nevertheless, the same trend was observed: the MTP linear model gives the lowest errors for total energy (RMSE is 0.269 eV), group energy (RMSE is 0.013 eV) and the loss function (error is 0.027) while the ACE feature is the best feature type for the force accuracy with the RMSE is 0.270 eV Å −1 .
For the carbon systems, inclusion of four different phases of carbon structures made the simulated system complicated, by decomposing the total energy into individual atomic energies, the training set contains 256 000 atomic energies and 768 000 atomic forces. One point should be emphasized is that for the   carbon system, the ACE procedure gave large errors and failed with the default settings of the pacemaker program. Therefore, the following results were discussed for the six different features. As shown in figure 6, the MTP linear model displayed advantages in fitting the group energy, total energy with the RMSEs of 0.101 eV and 0.012 eV, respectively; the 2bgauss and 3bcos features showed advantage in fitting forces with the RMSE of 0.110 eV Å −1 ; MTP feature type and 2bgauss with 3bcos feature type have close errors in loss function (∼0.016).

Testing the MLFF models
Before applying MLFF models in practice, it is necessary to test these models using new data sets. An accurate MLFF model must also be able to reproduce similar quantities along its AIMD trajectory. Considering the MTP feature give best result in the previous training, we applied the MTP linear model for testing procedure. Then, we examined the MTP linear model results comprehensively near the number of features about 180, by comparing the energy and force trajectories of each step of MLFF and DFT.
For this test, we performed NVT AIMD simulations on another sulfur structure (not in the training set) for 6 ps at 300 K and 1500 K, respectively, and chose the last 1 ps trajectories as the testing sets. Therefore, we obtained two testing sets for sulfur systems: sulfur-300 K-new and sulfur-1500 K-new. Then, we applied three different MLFF models, including the sulfur-300 K module with sulfur-300 K data set, the sulfur-1500 K model with the sulfur-1500 K data set, and combination model with the data sets of sulfur-300 K and sulfur-1500 K on these new testing sets.
In our tests, the sulfur-1500 K-new set was challenging as the configurations were far from equilibrium, the error was larger than the sulfur-300 K-new data set. Figures 7(a) and (b) show the sulfur-300 K model and sulfur-1500 K model match DFT total energies well on the same phase, separately. However, we can find large offsets in figures 7(b) and (c), indicating that it is not rational to apply model on data set at other temperatures, which can be assigned to that the training set and testing set were different phases, in which the S 8 ring were reserved and broken, respectively. The models lack configuration and energy information of other phases. Figures 7(e) and (f) show that the combination model gave good results on both 300 K-new and 1500 K-new data sets, because the training set of combination model also contains the S 8 -ring and broken-S 8 -ring information. A good force field is required to contain as many configurations as possible for high accuracy.  Beyond energy, force is also an important factor for evaluating the accuracy of MLFF. When sulfur-300 K-new was used as the test set, the error of MLFF model of sulfur-300 K training set is the least (0.10 eV Å −1 ), see table S7 in the supplementary data; however, the error of MLFF model of sulfur-1500 K training set is the largest (0.49 eV Å −1 ). Since the force at each trajectory include all components of all atoms, to compare the DFT force and the MTP force, we compared the results in the form of a scatter plot (figures 8(a)-(f)). The MTP linear model is in good agreement with the DFT results of the sulfur-300 K, the combination of sulfur-300 K and sulfur-1500 K, see figures 8(a) and (c). According to figure 8(b), the regression model is poor for the sulfur-1500 K. This is can be understood by that the training set (sulfur-1500 K) and testing set (sulfur-300 K-new) were from different phases at different temperatures. When sulfur-1500 K-new was used as the test set, we can see the same trend: the error of MLFF model of sulfur-1500 K training set is the least (0.26 eV Å −1 ), while the error of MLFF model of sulfur-1500 K training set is the largest (1.02 eV Å −1 ). From figure 8(d) we can also find that the regression model is poor. This is also due to that the training set (sulfur-300 K) and testing set (sulfur-1500 K-new) we used were different phases at different temperatures. The results are consistent with the comparison of energies. For carbon systems, we performed NVT AIMD simulations on another carbon structure (not in the training set) for 6 ps at 300 K, and chose the last 1 ps trajectory as the testing set. As shown in figures 9(a) and (b), MTP-LR model is also in good agreement with DFT in both total energy and force along the AIMD trajectories.
The MTP-LR model outperforms other examined feature types for both sulfur and carbon systems based on the preceding results, it is natural to ask whether the MTP-LR model is applicable to systems beyond sulfur and carbon. Thus we further considered the Cu system: firstly, we performed an NVT 2 ps AIMD simulation for bulk Cu structure from 300 K to1000 K (see figure S1 in the supplementary data); then, we performed an NVT AIMD simulations on another Cu (not in the training set) structure for 1 ps at 1000 K, and used the trajectory as the testing set. As shown in figures 9(c) and (d), we can find that along the AIMD trajectory both the total energy and fore in MLFF model of Cu system match well with the DFT results. It is worthy noting that the MLFF model for both carbon and Cu systems almost recovered the DFT forces along the trajectory (figures 9(b) and (d)): the errors of the total energies and forces are 0.22 and 0.09 eV, 0.12 and 0.05 eV Å −1 for the carbon and Cu systems (table S7 in the supplementary data), respectively.

Conclusions
In this study, by comprehensively comparing the RMSE of the group energies, total energies, forces as functions of the total number of features for the sulfur and carbon systems, we can conclude that the group energies, total energies and forces are converging as the number of features increases. Moreover, for the simple cases (sulfur-300 K data sets) of the sulfur system, the fitting error is minimal with the same number of features. For the complicated situation of combination of sulfur-300 K and sulfur-1500 K AIMD configurations, which is challenging as the configurations are far from equilibrium, the fitting error is larger than the sulfur-300 K data set. Though the fitting accuracy decreases as the complexity of the system increases, we can also find that the MTP feature provides the best linear model. For the complex carbon system composed of different phases, by comparing the results of different feature types, at a given total number of features, MTP feature can also give the best linear model.
When examining the MTP linear model results near the number of features about 180 based on the same trajectories, for simple cases, such as sulfur-300 K, carbon, bulk Cu systems, both the energies and forces of the MLFF model matched well with DFT results. When the training set and testing set are different phases at different temperatures, such as sulfur-300 K and sulfur-1500 K, the MLFF models gave large offsets on total energies. One should obtain enough configurations of different phases to apply force filed for accurate simulations.
All in all, although the MTP comes out as the best overall features, there are other features with the quality quite close to the MTP feature, such as the cosine (2-body + 3-body) feature, and the Gaussian (2-body + 3-body) feature, whose total loss function is a slightly larger than that of MTP, nevertheless, they can be better in some individual terms among the group energy, total energy, and force. Thus, we believe for a given system, it is worth first performing a test for different features using linear model, then such feature set can be used in neural network model training. We hope our cross-method comparison can provide guidelines in terms of choosing different methods.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary data).
The 2-body Gaussian feature of atom i is defined as: where η and R s are parameters defined by user. The 3-body Gaussian feature of atom i is defined as: η, ζ and λ = ±1 are parameters defined by user.
In practice, these two features are usually used in pair.

Moment tensor potential (MTP)
Moment tensor represents charge, dipole, quadruple, etc. It uses their amplitude (or other ways of contractions) as the feature. This is a bit like the Taylor expansion with contraction (so it is rotational invariant). The MTP was proposed by Shapeev et al [19,21]. In MTP, the local environment of the center atom i is characterized by: where z i is the atom type of the center atom, z j is the atom type of the neighbor j, and R ij is the relative coordinates of neighbors. Next, energy contribution of each atom is expanded as: where B α are the basis functions of choice and c α are the parameters to be fitted. We now introduce moment tensors M µν to define the basis functions: These moments contain both radial and angular parts. The radial parts can be expanded as: where Q (β) R ij are the radial basis functions. Specifically, where ϕ (β) are polynomials (e.g., Chebyshev polynomials) defined on the interval [R min , R C ]. The angular part ⊗ ν R ij , which means taking tensor product of R ij C times, contains the angular information of the neighborhood n i . ν determines the rank of moment tensor. With ν = 0 one gets a constant scalar, ν = 1 a vector (rank-1 tensor), ν = 2 a matrix (rank-2 tensor), etc.
Define further the level of moments as: However, this is an empirical formula. We can control the number of the generated features by adjusting the size of M µν .

Spectral neighbor analysis potential (SNAP)
This is closely related to SOAP method, but it does not use Gaussian progress regression, so it does not calculate the distance and kernel between two charge density maps. It first defines a charge density, then use spherical harmonic (or 4D sphere, with rotation matrix) to expand the charge density [18,22]. It then uses the bispectrum (sum over the angular momentum m indexes), so it becomes rotational invariant. In a way it is like the MTP, but it uses a special way to contract out the orientational index to make it rotational invariant. It is often used with linear regression.
Consider the density of neighboring atoms around the central atom i at position r as a sum of δ functions in a three-dimensional (3D) space: where r ki represents the position of neighbor atom k relative to center atom i. The ω k is dimensionless weights so we can have different weights for different types of atoms. The radial function f C (r ki ) ensures that the contribution of each neighbor will smoothly vary to zero near R C : The angular part of this density function can be expanded in the familiar basis of spherical harmonic functions defined for l = 0, 1, 2 . . . and m = −l, . . . , 0, . . . , l. The radial distribution is usually represented by a set of radial basis functions. However, here, the radial information r is mapped into another angle information in a 4D hyper spherical function: U j mm ′ (θ 0 , θ, ϕ ), where all the points (neighboring atoms) fall into a 3D spherical surface (in a 4D space) and the orientational (angular) information is given by the tree angles: As a result, the above charge density can be expanded by these 4D hyper spherical harmonics Here, k is the neighboring atom index, and θ 0 (k) , θ (k) , ϕ (k) are the value of equation (A16) using r ki . Note, u j mm is orientational dependent due to its index m, m ′ . They can be contracted out (using three u j mm ) with the following contraction formula: Here, C jm j1m1j2m2 C jm j1m ′ 1 j2m ′ 2 are Clebsch-Gordan coefficients, and the final scalar features are F (j 1 , j 2 , j) . We can produce different features by setting different (j 1 , j 2 , j). Note, in these features, there is no radial function index, instead we have three angular momentum indexes. This is because we have converted the radial distance information into the third angle information in a 3D sphere.

Simplified smooth deep potential
Deep potential-Chebyshev (DP-Chebyshev) For deep potential features [32], we construct a rank-1 tensor (vector T M ), then contract it. Interestingly, it includes one additional component r to make a four-dimensional (4D) vector, a bit like in SNAP. Nevertheless, there are still M radial functions g M (s) trained with an embedded network in DeepMD, here

Atomic cluster expansion (ACE) method
The ACE method is rather like the SNAP method. However, instead of incorporating the distance r into a fourth component and make a 4D spherical harmonic, the ACE method uses the usual radial functions to track the information in radial distance r and use the usual 3D spherical harmonic to represent the orientation information. The ACE method starts with an atomic density expression of the neighboring atoms, as in equation (A30), then it uses the radial function and spherical harmonics to expand the atomic density ρ (r) [34], get the expansion coefficient A ν : Here, R nl is a radial function (with the index n for angular momentum l), and Y m l is the spherical harmonics. One can contract A nlm basis to guarantee the rotational symmetry, to get the features B (K) inl . More specifically, we have: