Improved Description of Atomic Environments using Low-cost Polynomial Functions with Compact Support

The prediction of chemical properties using Machine Learning (ML) techniques calls for a set of appropriate descriptors that accurately describe atomic and, on a larger scale, molecular environments. A mapping of conformational information on a space spanned by atom-centred symmetry functions (SF) has become a standard technique for energy and force predictions using high-dimensional neural network potentials (HDNNP). Established atom-centred SFs, however, are limited in their flexibility, since their functional form restricts the angular domain that can be sampled. Here, we introduce a class of atom-centred symmetry functions based on polynomials with compact support called polynomial symmetry functions (PSF), which enable a free choice of both, the angular and the radial domain covered. We demonstrate that the accuracy of PSFs is either on par or considerably better than that of conventional, atom-centred SFs, with reductions in force prediction errors over a test set approaching 50% for certain organic molecules. Contrary to established atom-centred SFs, computation of PSF does not involve any exponentials, and their intrinsic compact support supersedes use of separate cutoff functions. Most importantly, the number of floating point operations required to compute polynomial SFs introduced here is considerably lower than that of other SFs, enabling their efficient implementation without the need of highly optimised code structures or caching, with speedups with respect to other state-of-the-art SFs reaching a factor of 4.5 to 5. This low-effort performance benefit substantially simplifies their use in new programs and emerging platforms such as graphical processing units (GPU). Overall, polynomial SFs with compact support improve accuracy of both, energy and force predictions with HDNNPs while enabling significant speedups with respect to their well-established counterparts.


I. INTRODUCTION
In the past decade, computational chemistry has seen a tremendous increase in use of machine learning (ML) techniques to overcome the length-and timescale problem burdening first principles methods [1][2][3][4] . As such, ML techniques have seen particularly beneficial use in Molecular Dynamics (MD) simulations in the condensed phase. By providing highly elaborate fitting mechanisms, ML allows for accurate interpolation between training points on the potential energy surface (PES) generated from a computationally more expensive reference method. This enables, e.g. , MD simulations of water to be performed at the nano-rather than the picosecond scale with forces and energies of first principles accuracy 5,6 . While still more expensive than classical force fields, ML techniques can incorporate information from the underlying PES well outside of local minima, making the simulation of chemical reactions possible. In contrast to reactive forcea) Electronic mail: christoph.dellago@univie.ac.at fields, no assumptions on the functional form of the interactions and therefore, the underlying PES, have to be made 7,8 . Instead, the fit is carried out mostly in a black-box manner, with one of the most prominent adjustable input parameters being a set of functions that uniquely describe the environment of every atom in a manner that is invariant both to permutation, rotation and translation 4,9,10 .
Machine learning techniques commonly used to interpolate between first principles data include kernel-based methods [11][12][13][14] and neural network potentials (NNP) 15,16 . A prominent framework of the latter is due to Behler and Parrinello 17,18 , who advocated the use of highdimensional neural network potentials (HDNNP) in combination with atom-centred input functions. In the context of HDNNPs, these functions are commonly referred to as symmetry functions (SF) 19 . Ideally, these functions provide a unique description of the atomic environment of every structure in the training set. Several functional forms have been proposed 20,21 , which commonly include a radial and angular part multiplied by a cutoff function that ensures that the former are naught outside of predefined bounds. SFs are therefore commonly a product of up to three different types of functions, and parametrising them properly for a given chemical system can become non-trivial. An intuitive construction of SFs can be based on structural features of the system at hand. For instance, sufficient spatial resolution could be obtained by taking into account all relevant peaks in the radial distribution function of the system and by covering all chemically relevant angles spanned by an atom and two of its neighbours. The width and centre of a symmetry function should then reflect fluctuations of the quantity it describes in the training data. Such a strategy has successfully been used, e.g. for the construction of a neural network of water and ice, including different ice phases 5,22,23 .
However, whereas such a non-automated approach can yield symmetry function sets much smaller than those generated using automated procedures 21,24 , there are some pitfalls associated to an intuitive approach: While the position of the maxima of radial functions can generally be freely chosen within the cutoff radius, this is not necessarily the case for angular terms, which are commonly based on a cosine and which are therefore restricted to peak at either 0 or π. Introducing a simple phase shift in the cosine has been shown to work well for energy evaluations 25 , but it would be bound to fail for force predictions, since any shift in a cosine will inevitably introduce derivative discontinuities at 0 and π: The evaluation of angles is restricted to a domain of [0, π], imposing symmetries in the angular functions of the form f (ϑ) = f (−ϑ) and f (π + ϑ) = f (π − ϑ). Phase shifts break this symmetry and introduce non-zero derivatives at f (0) and f (π). Since derivatives of the angular functions are used to compute forces (vide infra), this results in non-unique descriptions at ϑ = 0 and ϑ = π.
In order to account for equilibrium angles that are not centred on either peak and which only fluctuate over a few degress -such as a double bond -one therefore uses linear combinations of angular symmetry functions which are centred on either 0 or π, increasing the overall number of SFs used. The presence of cutoff functions also obfuscates the choice of input SF parameters: If the SF are constructed from stuctural properties such as the radial distribution function of a liquid, the position of the radial peaks is not easily deduced from the input parameters, since the product of radial term and cutoff function may exhibit a shifted maximum and a more rapid decay with respect to the Gaussian function alone. In particular, if the Gaussian is not centred on the origin, this may lead to a symmetry function that is not symmetric with respect to its peak. Therefore, it is usually necessary to plot every single symmetry function in order to monitor the correspondence between the input parameters and the final system setup. Hence, parametrising symmetry functions based on structural properties can be a cumbersome procedure. While universal rules to generate input SFs have been proposed 21,24 , they often increase the number of SF needed with respect to an optimal, tailored set of parameters. This can hamper performance.
When using high-quality first principles methods to generate the training set, the first principle calculations usually represent the computational bottleneck. However, both the training of the neural network itself as well as its use in productive calculations carry a certain overhead that is typically larger than that of classical force fields. The computational cost of the training is in part associated to the number of symmetry functions used, as well as their computational complexity. While it is possible to store the values of all symmetry functions during training 26 , making their calculation necessary only once at the very beginning of the training procedure, this is evidently impossible when the neural network potential is used to predict unknown structures e.g. during a MD run. Factors that contribute to the overhead of symmetry function evaluation are two-fold: The most popular choice for the radial term are Gaussian functions. However, exponentials are comparably expensive to compute. Furthermore, the calculation of a cutoff function (often cosines or hyperbolic tangents) introduces an additional overhead with respect to bare radial functions. To this end, optimised cutoff functions based on polynomials have already been proposed. Still, the disadvantages of the evaluation of exponentials for the radial terms, their possibly becoming asymmetric due to multiplication with a cutoff function and the impracticability of introducing phase shifts in the angular components remain. While strategies to reduce the large number of floating point operations during symmetry function evaluation have been developed, e.g. by grouping and caching 26,27 , this involves intermediate storage of quantities, loop breaks and many if-statements, rendering their implementation cumbersome, reducing code readability and making performance benefits both problemand architecture dependent.
It would therefore be desirable to have a framework at hand where the number of floating point operations in SF evaluations is substantially reduced, where the SF remain symmetric with respect to their maxima, where the input parameters directly reflect the shape of the resulting function, and where the maximum of the angular component can be adjusted to reflect equilibrium angles. This would alleviate the need to plot every symmetry function during construction, making the resulting procedure more effecient and less error-prone, greatly simplifying the choice of parameters based on structural properties of the system at hand. Most importantly, a reduction of floating point operations supersedes the need of complex code optimisation and renders the efficient implementation of symmetry functions architecture-independent.
In the following, we will demonstrate that further simplifications to existing symmetry functions are possible by replacing the product-based ansatz of radial, angular and cutoff function by a product of polynomials with compact support, making the use of cutoff functions obsolete. Such polynomial symmetry functions have the advantage of allowing for angular terms to be centred anywhere within ϑ ∈ [0, π] without introducing deriva-tive discontinuities at ϑ = 0 and ϑ = π. The latter is of particular importance for force evaluations, as discontinutities in symmetry function derivatives will directly impact the forces computed. The evaluation of these polynomial symmetry functions does not involve exponentiation. Instead, the most demanding low-level operation becomes the calculation of an arccosine when computing angular terms. The use of one simple functional form for both angular and radial terms considerably reduces the number of floating point operations associated to the evaluation of a single SF. This does not only result in significant simplifications in the underlying code, but comes at the benefit of speedups of up to a factor of about 4.5 to 5, while their performance with respect to highly optimised cache and grouping based SF implementations still reaches a factor of almost 2 without the need of further algorithmic optimisations.
This text is organised as follows: First, we briefly introduce the concept of HDNNPs and describe the mathematical form of commonly used symmetry functions first proposed by Behler and Parrinello 4,17 . We then introduce our new polynomial symmetry functions (PSF). We will go on to demonstrate that use of polynomial symmetry functions does not negatively impact the accuracy of the resulting neural network potentials when compared to conventional symmetry functions. To this end, both dynamic and static properties will be compared for a water and copper sulfide model system. Using the rotation barrier of the amino group of the organic dye DMABN 28,29 , we will show that the angular flexibility of polynomial symmetry functions simplifies the choice of angular parameters compared to Behler-Parrinello type symmetry functions (BPSF). We will conclude the discussion by comparing execution times for the evaluation of the NNPs between BPSFs and PSFs, showing that the execution time per symmetry function is substantially reduced for our polynomial ansatz.

A. Neural Networks in a Nutshell
First applications of neural network potentials in computational chemistry date back as far as 1995 15 . A considerable advance in the prediction of molecular forces and energies was made in 2007, when Behler and Parrinello 17 proposed to use high-dimensional neural network potentials (HDNNP) in combination with a ficticious decomposition of the potential energy V from N individual atoms: The atomic contribution V i depends on the chemical environment of atom i, which is described by a tuple of N j input values G i = {G 1 , . . . , G j }. The forces on the kth atom follow from the gradient: The G i are used as the input layer of a feed-forward neural network ( Fig. 1) that yields the scalar V i . The network itself is constituted by multiple layers of nodes, which are connected to each other by virtue of weights a nm . Starting from a set of input values (G i ), the data is propagated to node n of hidden layer k as follows: where l = k − 1, y k n represents the value of the node and f a is a non-linear activation function. b k n denotes the bias associated to y k n and M l is the number of neurons in layer l. One independent neural network per element is used. Together with the biases b k n , the weights a nm are the fitting parameters of the HDNNP.
The atomic environment descriptors G i are commonly called symmetry functions (SF). In order to ensure permutational symmetry, the symmetry functions are identical for all atoms of the same element. However, in order to account for varying chemical environments (bond lengths, van-der-Waals radii, relevant bond angles), they can differ between different elements. An appropriate choice of symmetry functions is crucial in order to reliably discriminate between unique points on the PES.

B. Behler-Parrinello Symmetry Functions
In the case of Behler-Parrinello HDNNPs, the input layer consists of scalars obtained from a predetermined set of atom-centred symmetry functions. The choice of symmetry functions should discriminate between all relevant structural features, such that every distinct point on the potential energy surface is described by a unique tuple of symmetry function values. In practice, this is achieved by combining a set of spherically symmetric and angle-dependent functions that have to be parametrised appropriately. Some examples are given in Fig. 2. For the radial part, Behler and Parrinello suggested: where r is an interatomic distance and r s its reference point. An angular dependency can by introduced by a function of the form: where ϑ is the angle formed by any three atoms, bound by [0, π]. This function appropriately has ∂f ang B /∂ϑ| ϑ=0 = ∂f ang B /∂ϑ| ϑ=π = 0. For λ = 1, maximum and minimum of the function lie at 0 and π, respectively; for λ = −1, the inverse holds. Chosing ζ > 1 allows to contract the function, resulting in a more rapid decay. Values of ζ < 1 are not possible, since this would result in non-zero derivatives at the boundaries.
While the form of f ang B (ϑ, λ, ζ) guarantees zero derivatives at the boundaries for appropriate choices of λ and ζ, the exponential in f rad B (r, r c ) is finite everywhere. Therefore, a cutoff function with cutoff radius r c is introduced, which ensures that the radial term and its derivatives are zero for all r > r c . Common choices of cutoff function for r > r c (6) or With this choice of functions, the radial symmetry function G i centred on an atom i is given by a sum over all its neighbours j that are within a r c of i: Note that such a product may have a width and a maximum different from f rad B alone, see also Fig. 2. The angular symmetry function is constructed from a product of angular and radial terms, with the sum encompassing all neighbours j and k = j. Depending on the radial term, two types of angular symmetry functions can be distinguished, namely narrow n, which also constrains the interatomic distance between atoms j and k, and the wide form w: the evaluation of which is more computationally expedient due to the lack of a radial term for jk. These symmetry functions have successfully been applied in a wide variety of contexts. Note that the distribution of symmetry function values at λ = −1 is not symmetric with respect to λ = 1 due the presence of a radial function 21 .
with the polynomial: with f poly2 (0) = 1 and f poly2 (1) = 0. At x = 0 and x = 1, Eq. 12 has continuous derivatives up to second order, which ensures a continuous, smooth transition to outside of the compact set. Similar polynomials with continuous derivatives up to even higher order can be easily constructed 27 , but their use does not result in any practical benefit 18 . The main advantage of polynomial cutoff functions as introduced in Ref. 27 lies in their expedient evaluation compared to expressions based on hyperbolic tangent or cosine, reducing the overhead due to the cutoff function. However, the computation of costly exponentials is still needed for all radial components. Hence, a grouping strategy was proposed in Ref. 27 which avoids multiple evaluations of the same exponential term, resulting in considerable speedups.

C. Polynomial Symmetry Functions with Compact Support
On one hand, the summation over the arguments of G ang still involves a substantial number of exponential functions that have to be explicitly evaluated even when a grouping strategy is used. On the other hand, the present choice of f ang implies that, in order to specifically sample angles which are not centred close to either 0 or π, a combination of several values of λ and ζ is necessary, since introducing an angular shift ϑ 0 would inevitably violate the boundary condition ∂f ang /∂ϑ = 0 at the bounds of the angular domain. In the following, we will attempt to overcome this issue by constructing a simple set of symmetry functions based on polynomials with compact support which can be used to indiscriminately describe both anglular and radial dependencies.
In order to do so, we generalise Eq. 11 to a generic form, expressed in terms of the boundaries x min , x max of the underlying domain. In particular, for our polynomial symmetry functions f PSF , we also allow for arguments x < 0: The resulting function is symmetric with respect to x 0 and has continuous derivatives -up to an order determined by the construction of f poly2 -on all R 3 . It is possible to further influence the decay of the radial function by appropriately modifying its argument such that the symmetry function decays more rapidly around its maximum while also approaching 0 more slowly, thus breaking the point symmetry around f PSF (0.5). We therefore define the asymmetric polynomial symmetry function (PAS) as: which offers the same benefits as f PSF for x ∈ [0, 1]. By construction, f PAS resembles the shape of the product of a local, narrow Gaussian function with a cutoff function and can therefore be used to adapt existing BPSF-setups to polynomial form. The functional forms of f PAS and f PSF are depicted in Fig. 4. The corresponding radial symmetry functions can now be constructed as: with f P either f PSF or f PAS . Their narrow angular counterparts read: while the wide angular functions become: where the radial component f P can again be described by both f PAF or f PSF . Both types of angular symmetry function can be centred anywhere within [0, π], provided that ∂f PSF /∂δ| ϑ=0 = 0 and ∂f PSF /∂δ| ϑ=π = 0. This enables chemically intuitive choices of angular maxima, i.e. 60 • for a double bond or 109, 4 • for a carbon-carbon single bond. Fig. 5 shows the dependency of PAS-type symmetry functions on the position of atomic neighbour(s).
Eqs 13 to 14 do not involve any exponentials, and the most expensive arithmetic operation, an inverse cosine, has a modest computational footprint.

III. COMPUTATIONAL METHODS
A. Training of the HDNNP 10 training runs per system are carried out with an independent seed for the random number generator each using the n2p2 package 26,27,30 . If the value G i of a given symmetry function does never exceed 10 −3 over the whole test set, the function is discarded before training. The data discussed in the rest of the text always refers to the best set of weights out of those 10 runs, which are defined as the weights that yield the lowest force root mean-square deviation (RMSD) with respect to a test set constituted by 10% of training-set structures that were randomly removed before training. If several sets of weights yield very comparable RMSDs, the set that displays the lowest relative error over 1% of the forces largest in magnitude is used.

B. Training Sets
The systems investigated here are depicted in Fig. 6. The training sets for water and copper sulphide correspond to those published in Refs 5 and 26, respectively. Forces and energies for an isolated DMABN molecule, liquid ethyl benzene and anisole (16 molecules in a periodic box) were obtained from high-temperature Car-Parrinello Molecular Dynamics runs performed with the CPMD code 31 using the SCAN exchange-correlation functional 32 . The amino group of DMABN was rotated using a slowly growing harmonic restraint, giving rise to a set of non-equilibrium conformers. Details on the computational setup can be found in the Supporting Information.

C. Test Sets
Test sets consists of 10% randomly chosen configurations that are removed from the training set before training and which are then used to validate the predictive power of the neural network potential.

D. Molecular Dynamics using Neural Network Potentials
All MD simulations using HDNNPs were carried out using the n2p2-interface 27,30 to the LAMMPS code 33,34 . System setups are detailed in the Supporting Information.

A. Water
We first assess the accuracy that can be attained with polynomial SFs by comparing the efficiency and accuracy of both training and productive simulations using identically set up HDNNPs. Two setups of polynomial SFs will be used: One, PSF, consisting entirely of symmetric functions of the form 13, and one mixed setup, PAS, including both symmetric and asymmetric radial components. We will compare against a reference setup using BPSFs, which has successfully been used in Ref. 26 to describe the PES of water and several ice phases. PSF and PAS were constructed by plotting the BPSFs and using their peaks and the full width at half maximum as rough indicators for the polynomial input. For sake of simplicity, the only symmetric radial terms (PSF-type) in the PAS set were those with maximum width; all others were chosen to be of the antisymmetric type.
First, we compare the learning curves of both methods to assess the efficiency of the training procedure. Here, we define the learning curve as the RMSD between trained energy and forces and their reference as a function of the number of training steps (epochs). Curves for Behler-Parrinello type and polynomial SF setups are plotted in Fig. 7 and are virtually indistinguishable, suggesting that both types of symmetry functions can be trained with equal efficiency. RMSDs at the last training step are comparable between different methods, with the largest remaining force deviation being observed for the PSF setup. Conversely, the lowest force RMSD is observed for the PAS setup. BPSFs yield a training error intermediate between PAS and PSF data.
The same conclusion holds when analysing the distribution of both force and energy errors within the test set: Force RMSDs are 43.7 meV/Å for BPSFs, 48.5 meV/Å for PSF and 43.2 meV/Å for PAS, respectively. Energy RMSDs range from 1.17 meV fo BPSFs to 0.95 meV for PAS. In practical applications, these values could be considered identical. In Fig. 7, we histogram both the relative force error for HDNNPs trained with Behler-Parrinello type and polynomial SFs. The resulting distributions for the test set compare well between PSF, PAS and BPSF: All relative errors show a rapid decay, indicating an excellent quality of the fit. PAS and BPSF show very similar relative error distributions, whereas the distribution of PSF skews very lightly towards the right; however, we will show that this is of no practical relevance. Fig. 9 shows predicted forces against their reference values; ideally, all values should lie on a diagonal. Use of PAS and PSF improves the accuracy of the most extreme force values at either end of the graph, which lie closer to the diagonal than for BPSF. This indicates that, although force RMSDs differ only little between the different SF setups, qualitative improvements to conventional BPSFs are possible by use of a PSF or PAS set.  tribution functions and mean-squared displacements obtained from Molecular Dynamics suggest that the small differences in force RMSD over the test set do not entail significant changes in the quality of the predictions made by the HDNNP. Notably, the mean-squared displacements (MSD) for PSF and BPSF are virtually identical.

B. Copper sulfide
All symmetry function setups used for water were based on the optimised, hand-selected set of Ref. 5. Such a selection by hand may not always be possible. In the following, we shall further validate the accuracy of PSF and PAS by comparing their performance to the accuracy of a semi-automatically generated set of BPSFs for copper sulfide. It has been show in Ref. 26 that a HDNNP with BPSFs can be successfully used to observe structural phase transitions in this compound. Fig. 10 compares absolute and relative force errors obtained using BPSFs of Ref. 26 as well as our PSF and PAS. Force RMSDs are 51.9 meV/Å for BPSF, 56.7 meV/Å for PSF and 52.3 meV/Å for PAS, which corresponds roughly to a 10% difference between PSF and BPSF. Again, the performance of BPSF and PAS is on par, whereas use of PSF leads to slightly larger errors, which is also reflected in the relative error distribution skewing lightly to the right. Overall, however, this effect is expected to remain negligible in practical applications. Deviations from the diagonal in the graph comparing prediction and reference forces (Fig. 10, right panel) do not exhibit visible differences, further indicating that the general performance of all symmetry function types investigated here remains comparable and that the differences in force RMSDs observed here are not expected to influence the quality of the predictions in a significant manner. Energy RMSDs remain similar for all methods,  spanning a range from 0.99 meV for PAS to 1.06 meV for BPSF.

C. Organic molecules: Gas phase and liquids
In the following, we shall compare the performance of BPSFs generated according to Ref. 24 with our polynomial symmetry functions using an isolated (gas-phase) DMABN molecule and liquid ethyl benzene. We note that optimal symmetry function setups could possibly be found for all SF types investigated here; however, since this is a time-intensive procedure, the main purpose of this comparison is to compare the out-of-the-box performance of some generic sets of SFs.
To this end, we introduce a simple generation scheme for radial symmetry function parameters given a minimal and maximum width, ∆ min and ∆ max , and a maximum cutoff radius r c . For N 0 symmetry functions which are symmetric w.r.t to the origin, we choose the width ∆ i 0 parameters of the i-th out of N 0 PSF or PAS as: For a given number N s of symmetry functions which are centred at r i s = 0, we obtain shifts and widths ∆ i s as:  Whereas the distribution for BPSF and PAS are similar, PSF skew slightly towards the right. However, this difference is much less pronounced than the one observed for BPSF in Fig. 12.
Structures in the DMABN-set cover regions of the PES thermally accessible at 1320K, including a forced rotation around the amino group along a non-equilibrium path. This results in many forces with large magnitude which are well outside of the equilibrium regime in which simple harmonic potentials can also yield good approximations. The DMABN set therefore serves as an assessment for the transferability of our polynomial SFs to structures which are considerably off-equilibrium.
Force RMSDs on the test set amount to 63.2 meV/Å for BPSF, 56.6 meV/Å for PSF and 54.0 meV/Å for PAS. This difference is directly reflected in the right panel of Fig. 11, where the force histograms show an increasing spread towards larger values when going from PAS over PSF to BPSF. Note that this amounts to a difference exceeding 15% between worst (BPSF) and best (PAS) set. Energy RMSDs range from 0.64 meV for PSF to 0.72 meV for BPSF, with the overall error remaining low for all types of functions. The force predictions obtained here imply that for a generic test set containing a significant amount of off-equilibrium structures, generic PAS and PSF sets are able to outperform BPSFs.
We will now show that the same considerations also hold for an organic liquid in which van der Waals interactions dominate. For a test set of liquid ethyl benzene, using the same symmetry function setups as for DMABN, we find force prediction RMSDs of 89.5 meV/Å for BPSFs, 68.9 meV/Å for PSF and 61.7 meV/Å for PAS, respectively. This corresponds to a difference in accuracy approaching 50% between BPSF and PAS. Whereas these considerable differences are not evident from the diago-nal force-to-force plot in the left panel of Fig. 12, they become visible in the force histogram in the right panel: Histograms for PSF and PAS are very similar, with the only significant difference being the peak around 0%, but BPSFs skew considerably to the right, and much more so than what was observed for PSF in the case of water or Cu 2 S. Energy RMSDs are about 0.05 meV for all SFs studied here, which is of excellent accuracy. For liquid anisole, all symmetry function setups result in about equally accurate predictions: Observed force prediction RMSDs amount to 100.8 meV/Å for BPSFs and a comparable 106.5 meV/Å for PSF, with the lowest deviation again being observed at 98.7 meV/Å for PAS. All SFs yield energy RMSDs that can be considered identical, within a small range from 0.08 meV for BPSF to 0.11 meV for PAS and PSF. Overall, the trends obtained for force predictions in organic molecules indicate that our generic PAS and PSF setups on average exhibit superior transferability between systems, while at the same time retaining lower force errors on the test sets compared to generic BPSFs. Importantly, whereas BPSFs performed considerably worse than polynomial functions for liquid ethyl benzene, neither PAS nor PSF showed a considerably inferior performance with respect to BPSF in any of the other systems investigated here. Energy predictions are much more robust with respect to the choice of system, they remain of comparable accuracy for all SFs investigated here.

D. Computational Footprint
The choice of symmetry functions has an influence on both, the computational footprint of training as well as the cost of force predictions during production runs. For the former, if SFs can be calculated for all structures in the training set and then stored, it is mainly the number of SFs that influences performance: More SFs imply more weights to be optimised and as such result in higher execution times. Memory requirements will grow as well as the number of symmetry functions increases. Even if a (generic) set of two symmetry functions might comprise the same number of input functions, pruning of functions that never exceed a certain threshold (typically 10 −3 ) can result in different set sizes during training. In productive MD runs, SFs have to be recalculated for every new structure and hence, performance will not only be determined  by their sheer number, but also by the number and type of floating point operations involved in their computation. In the following, we will analyse both performance related aspects. Table I lists the cost of running 100 steps of MD for a periodic box containing 360 water molecules using the HDNNPs investigated here both for a straightforward calculation of all SFs as well as using an optimised algorithm tailored for distributed memory parallelisation on core processing units (CPUs) as reported in Ref. 27. The former serves as a measure for the cost of the floatingpoint operations associated to every SF type; this corresponds to the limit of all N out of N symmetry functions being independent and differently parametrised. In comparison with the former, the latter indicates the benefit of underlying algorithmic optimisations; this however requires that a subset of the N symmetry functions contain identical parameters and cutoff functions in order to benefit from a grouping and caching strategy. The effective speedups obtained by using PSF or PAS for the water setup reach almost a factor of 5 when no grouping or caching strategy is applied. This indicates that the number of expensive floating point operations is significantly reduced when purely polynomial SFs are used. Vice versa, symmetry function grouping and caching has a much lower influence of execution times of PSF and PAS, where those optimisations reduce the computational cost by about 15%. As would be expected by the larger number of terms involved in the evaluation of PAS, their computation is slightly more expensive than the evaluation of PSF. However, the difference is small enough to be negligible in practice.
In contrast, BPSFs -for which grouping and caching strategies were developed in the first place -benefit from speedups of up to a factor of 3 when grouping and caching strategies are introduced. Note that, by construction, PAS and PSF are not intended to require use of optimisation strategies, which is reflected in comparably modest speedups upon introduction of grouping and caching. Still, even in those cases, PSF and PAS are almost a factor of 2 faster than BPSF. In addition, it should be noted that PAS yield the lowest RMSD on water with the smallest total number of SFs; they therefore do not only lower the computational cost, but also the memory footprint with respect to BPSFs. Table II compares the numbers of symmetry functions effectively used during training after pruning all SFs that never exceed a magnitude of 10 −3 . This provieds a measure for computational cost and memory requirements. Apart from ethyl benzene, in which significantly more SFs of BPSF-type were removed during pruning, numbers are comparable between BPSF, PSF and PAS. It should be noted that all systems except water were trained with automatically generated sets of SF parameters, which also accounts for the different extent of functions with maximum amplitude of < 10 −3 in these setups. In this context, it is interesting to note that in spite of the structural similarities between anisole and ethyl benzene, for BPSFs, there is a significant difference in the number of pruned symmetry functions. For PAS and PSF, no such difference can be observed; this can also account for the larger errors in force RMSDs that are observed when ethyl benzene is trained with a set of BPSFs. Within the hand-picked set of SFs for water, differences between PSF and PAS are notable, with the latter allowing for a reduction of about 15% with respect to the former. BPSFs lie in between. This suggests that, for hand-selected sets, PAS allow for an even smaller number of SFs to be used. Overall, the performance of PSF and BPSFs was comparable for all systems studied here. Force RMSDs for water and Cu 2 S were slightly higher for PSF, however, this was not reflected in the dynamic and structural properties of water obtained by 2 ns of HDNNP-MD, where results using PSF and BPSFs showed excellent agreement. On the other hand, for DMABN and liquid ethyl benzene, PSF considerably outperformed BPSFs. For all systems, PAS were able to outperform either PSF or BPSF. We therefore strongly recommend PAS for future HDNNP setups, since they allow to speed up productive MD runs by a factor of 1.8 and further improve accuracy of the HDNNPs. In particular, due to their floating point cost being lower by about a factor of almost 5, we particularly recommend use of PAS for implementations where caching and grouping strategies are not feasible. This can, for instance, be the case when offloading computation of SFs to GPUs, where the resolution of if-statements and loop breaks associated to optimisation schemes can considerably slow down computation with respect to purely arithmetic operations. Not least, thanks to their small computational footprint, PAS can also be used in programs that have not undergone extensive optimisation, and their simple structure facilitates implementation and calculation of derivatives up to an arbitrary order defined by the underlying polynomial.

V. CONCLUSION AND OUTLOOK
Here, we have introduced a new family of symmetry functions for Behler-Parrinello HDNNPs, based on polynomials with compact support for both radial and angular environments, which supersedes use of a separate set of radial cutoff functions. The centres and widths of our polynomial symmetry functions can be freely chosen. This is notably the case for angular functions, which have so far been restricted to peak at 0 • or 180 • . As long as symmetry of derivatives on 0 • and 180 • is maintained, angular functions can be centred on any point within [0 • , 180 • ] and their width ∆ϑ can be freely chosen. We have introduced two types of radial symmetry functions, PSF and PAS, with the former being point-symmetric on [r s , r s + ∆], whereas the latter have a long-range tail reminiscent of the product of a cutoff function and a Gaussian. Our polynomial symmetry functions considerably simplify the choice of radial symmetry function parameters with respect to common Gaussian functions, since the position of maxima and minima can be straightforwardly predicted without having to take into account shifts and asymmetries introduced by a cutoff function.
The accuracy of a PSF-and PAS-based setup with respect to conventional BPSFs was assessed for various phases of water, for solid Cu 2 S as well as for organic molecules in the gaseous and liquid phase. For water, Cu 2 and DMABN, PSF, PAS and BPSF all showed excellent accuracy. Structural and dynamic properties obtained from 2 ns HDNNP-MD of liquid water revealed no relevant differences between the neural network potentials obtained with any of the three symmetry function types. In the case of off-equilibrium structures of a DMABN molecule in the gas phase, PAS and PSF have outperformed BPSFs for force predictions by 15% and improved energy RMSDs by 50%. This situation was much more prominent for liquid ethyl benzene, where force RMSDs over the test set were more than 50% higher for BPSF than for PSF or PAS. Force prediction errors were generally higher for liquid anisole, however, they remained highly comparable between all types of SFs studied here. Generally, PSF performed on par with BPSF, except for ethyl benzene where they performed considerably better. For all systems studied here, PAS consistently yielded the best results.
We have demonstrated that HDNNP-MD of liquid water is about 2 times faster using PSF or PAS compared to BPSF, constituting a considerable improvement of performance and hence, sampling. In terms of the floating point operations associated to the evaluation of SFs, we found that use of PSF results in speedups of a factor of about 5 with respect to BPSF, and speedups associated to the slightly more complex PAS still exceed a factor of 4.5. The number of symmetry functions used for the systems investigated here were comparable between PSF, PAS and BPSF, although results for the hand-picked set for water indicate that setups with PAS can potentially reduce the number of SFs used with respect to common BPSFs, therefore saving on memory and on the cost of training by reducing the number of fitting parameters. Given the substantial reduction in extensive floating point operations associated to the use of PSF and PAS, and given that accuracy of generic sets of PAS is consistently better than that of BPSF, we advocate use of PAS as a method of choice for future implementations. In particular, their simple structure not only simplifies their implementation and makes their setup straightforward, but their small computational footprint can also facilitate their porting to different system architectures without the need for extensive optimisation.
Overall, we have shown that by constructing SFs with compact support, substantial performance gains of up to a factor of almost 5 can be obtained without impacting accuracy of the resulting HDNNPs. Polynomial SFs are easy to implement and set up, since they allow for a flexible choice of radial and angular domain do not require any cutoff functions. Our polynomial symmetry functions therefore lend themselves for all applications of HDNNPs, be it for equilibrium or off-equilibrium stuctures of isolated molecules, liquids or solids. All training sets that are first reported here were generated with a developmental version of the CPMD 1 code, using the SCAN 2 exchange-correlation functional and PBE 3 pseudopotentials of the Martins-Troullier type 4 . A plane wave cutoff was used throughout, and calculations on isolated systems used the Tuckerman-Martyna Poisson solver 5 in order to decouple periodic images. A plane wave cutoff energy of 80 Ry was used throughout. In accordance with standard practice, hydrogen atoms in Car-Parrinello MD (CPMD) were replaced by deuterium, and the fictitious electronic mass was set to 600 a.u.; the wavefunction convergence threshold was set to 10 −7 a.u. on the maximum norm of the gradient. If not stated otherwise, Nosé-Hoover chains 6 were used to thermostat the systems.

Water and Copper sulfide
Training sets for water were obtained from Ref. 7. Data for Cu 2 S corresponds to the set of Ref. 8.
HDNNP-MD of 384 water molecules was performed using the LAMMPS 9 program and the corresponding n2p2-interface. A timestep of 1 fs was used throughout. A pre-equilibrated snapshot of 384 liquid water molecules at experimental density (cubic supercell with a = b = c = 22.211 Å 3 ) was propagated in the NVT ensemble for a total of 2 ns at 300K, followed by 2 ns of HDNNP-MD in the NVE ensemble. were performed at 990 K. 800 structures of this trajectory were selected for the training set. In order to obtain a non-equilibrium path along a rotation of the amino group, a harmonic restraint with k = 0.005 a.u. was set up between the dihedral planes of four carbons of the phenyl ring and a dihedral spanned by two carbons of the phenyl ring, the nitrogen as well as one carbon of the amino group. The value of the restraint was set to grow by 0.25 • per 4 a.u. over a total of 9950 time steps. 398 regularly spaced structures were selected along this trajectory, with the training data containing a total of 1198 different structures.

Ethyl Benzene and Anisole
In order to obtain a diverse set of structures for these systems, Born-Oppenheimer MD (BOMD) simulations using a timestep of 10 a.u. were performed, with 16 molecules contained in cubic supercells at the experimental density (a = b = c = 14.8206 Å 3 for ethyl benzene and a = b = c = 14.24 Å 3 for anisole). BOMD of ethyl benzene was performed for 11750 steps in the NVT ensemble at 420 K, and 468 equispaced structures of the resulting trajectory were used to constitute the training set. BOMD of anisole was performed for 10640 steps in the NVE ensemble at 420 K, followed by 1725 steps at 1320 K in order to sample rotations of the methoxy group. 494 equispaced structures were selected for the training set.

B. Training of HDNNPs
All HDNNPs were trained with the n2p2 program package 10 . Symmetry functions that do not take values above 10 −3 were removed from the setups. Threshold for force and energy updates were set to 100% of the given RMSD on the training set, with update candidates being randomly selected from structures where force and energy errors meet these criteria. The energy to force update ratio was 1:9. 10% of structures in the training set were randomly discarded at the beginning and kept for validation of the resulting HDNNP (training vs.  Table I lists force and energy RMSDs obtained on training and test sets. Data refers to the best out of 10 training runs.

III. SYMMETRY FUNCTIONS
SF parameters for all systems studied here can be downloaded from the n2p2 10 repository: https://github.com/CompPhysVienna/n2p2.
Behlertype SF setups for water and Cu 2 S were first reported in Refs 11 and 8, respectively.