Combining the D3 dispersion correction with the neuroevolution machine-learned potential

Machine-learned potentials (MLPs) have become a popular approach of modeling interatomic interactions in atomistic simulations, but to keep the computational cost under control, a relatively short cutoff must be imposed, which put serious restrictions on the capability of the MLPs for modeling relatively long-ranged dispersion interactions. In this paper, we propose to combine the neuroevolution potential (NEP) with the popular D3 correction to achieve a unified NEP-D3 model that can simultaneously model relatively short-ranged bonded interactions and relatively long-ranged dispersion interactions. We show that improved descriptions of the binding and sliding energies in bilayer graphene can be obtained by the NEP-D3 approach compared to the pure NEP approach. We implement the D3 part into the gpumd package such that it can be used out of the box for many exchange-correlation functionals. As a realistic application, we show that dispersion interactions result in approximately a 10% reduction in thermal conductivity for three typical metal-organic frameworks.


I. INTRODUCTION
Machine-learned potential (MLP) [1] is an emerging approach for modelling the interatomic interactions in materials.To achieve a linear scaling of the computational cost with respect to the system size, a MLP must be constructed based on local descriptors [2].The descriptor for an atom is usually constructed based on the positions of its neighbors within a certain cutoff R c .The average number of neighbors for an atom, hence the computational cost of a MLP, is proportional to R 3 c .Therefore, in practice, R c is usually chosen to be a few Å.This length is usually sufficient for describing the bond interactions in typical materials, but is not sufficient for describing the London dispersion interactions that can extend to one to a few nm.However, the dispersion interactions are important in describing e.g., the interlayer attractions in the so-called van-der-Waals (vdW) layered materials [3,4] and structural transformation between a narrow-pore and large-pore phase in flexible metalorganic frameworks (MOFs) [5,6].
To address this challenge, a few attempts have been made to augment MLPs with dispersion corrections.Wen and Tadmor [7] added an attractive −C 6 /r 6 ij term multiplied by some switching/damping functions to a MLP, where r ij is the distance between atoms i and j and C 6 > 0 is a fitting parameter.There are four additional fitting parameters in the switching functions [7].They have obtained quite good description for the binding and sliding energies in bilayer graphene.By adding a similar −C 6 /r 6 ij term, Deringer et al. [8] and Rowe et al. [9] constructed general-purpose MLPs for phosphorous and carbon systems.Muhli et al. [10] developed a more sophisticated method that determines the dispersion coefficient and damping function using a local descriptor.
Despite these achievements, it still requires quite a lot efforts to determine the dispersion interactions for a MLP model.The determination of the dispersion coefficient and the damping function is species and system dependent, and the process is thus not very systematical and transferable.Indeed, it is known that the dispersion coefficient is environment dependent, which has been taken into account in many popular dispersion corrections to density functional theory (DFT), such as the D3 approach [11,12].To our best knowledge, the D3 approach has not been combined with MLPs to perform large-scale atomitstic simulations.One reason is that there is so far no efficient implementation of D3 for the use with classical potentials.In this work, we make an efficient graphics processing units (GPU) implementation of D3 into the gpumd package [13] and combine it with the machinelearned neuroevolution potential (NEP) [14][15][16] to form a NEP-D3 approach.This approach inherits all the merits of D3 and applies to the 94 elements H-Pu for a large number of DFT functionals.Due to the separability of the NEP and D3 parts in our approach, it also allows for isolating the role of dispersion interactions in affecting specific physical properties.We will use two example systems, bilayer graphene and MOFs, to demonstrate the convenience, accuracy, efficiency, generality, and usefulness of the NEP-D3 approach.
NEP is a neural-network-based MLP which got its name due the use of an evolutionary algorithm for training the free parameters [14][15][16].In this method, the total energy of a system is taken as the sum of the site energies at each atom, which is taken as the output of a neuralnetwork, as first proposed by Behler and Parrinello [17].The input layer of the neural-network consists of a number of features called descriptor components.The design of descriptor components is crucial for a MLP and different MLPs presently in use differ from each other mainly arXiv:2310.05279v1[cond-mat.mtrl-sci]8 Oct 2023 by the descriptor.In the very first version of NEP [14], it has been realized to be beneficial to have two different kinds of descriptor components, the radial ones and the angular ones.The radial descriptor components depend only the interatomic distances, which are relatively cheaper to calculate, while the angular descriptor components also depend on bond angles and are relatively more expensive to calculate.Therefore, it has been suggested to use a relatively longer cutoff for the radial descriptor components r R c in combination with a relatively shorter cutoff for the angular descriptor components r A c when this is beneficial [14].For example, this strategy has been used for modelling a number of vdW structures [18,19] that have both strong covalent bonds and weak vdW interactions, where r R c was taken to be about 7 Å to 8 Å and r A c 3 Å to 4 Å.While constructing a pure NEP that incorporates dispersion interactions is feasible, the primary objective of this work is to introduce the NEP-D3 method that addresses dispersion interactions in a more elegant manner.We will compare results from the two approaches in terms of both accuracy and computational efficiency.

A. Implementation of the D3 dispersion correction into GPUMD
The D3 dispersion correction has contributions from a "two-body" (the meaning of the quotes will be made clear soon) part and a three-body part.However, it has been recommended not to include the three-body part [11].We therefore only considered the two-body part, which is also the choice of our reference DFT code Vienna Ab initio Simulation Package (vasp) [20,21].We chose to use the Becke-Johnson damping which does not introduce spurious force at short distances [12].The total D3 energy can be expressed as where r ij is the distance between atoms i and j, s 6 , s 8 , a 1 , and a 2 are parameters depending on the chosen exchange-correlation functional, C 6ij and C 8ij are the dispersion coefficients for the ij atom pair, and R 0 is taken as the geometric mean of tabulated parameters for each sepecies.The summation of j is over the neighbors of i within a cutoff R pot .
The two dispersion coefficients are related by , (3) where the summations of a and b are over the numbers of reference points for atoms i and j, respectively.Here, n ref ia is the a-th reference coordination number for atom i, n ref jb is the b-th reference coordination number for atom j, and C ref 6ijab is the (a, b) reference dispersion coefficient for the atom pair (i, j).
The coordination number for atom i is defined as where R cov i is the effective covalent radius of atom i.The summation of j is over the neighbors of i within a cutoff R cn .Because the coordination number of atom i depends on its neighbors, it is clear to see that the "two-body" part of D3 is not a truly two-body (pairwise) potential, but a many-body potential.This is the reason for using the quotes.For a many-body potential, the general formulation for force, virial, and heat current has been established before [22,23].The force acting on atom i can be written as where and r ij ≡ r j − r i .The per-atom virial can be defined as and the per-atom heat current can be expressed as where v i is the velocity of atom i.The efficient GPU implementation of D3 then follows the general algorithms for many-body potentials [13].From a practical point of view, the use of D3 only requires three inputs: the exchange-correlation functional, the cutoff for the potential R pot , and the cutoff for the coordination number R cn .
The combined NEP-D3 potential is simply a sum of D3 and NEP energies, where U NEP is the NEP energy as detailed in previous works [14][15][16].The NEP model here can also be replaced by the NEP-ZBL model [24] where the Ziegler-Biersack-Littmark (ZBL) potential is used to describe the strong repulsive forces at extremely short distances.
To confirm the correctness of our GPU implementation of D3 in gpumd and to evaluate the effects of cutoffs, we take three MOF materials (MOF-5, ZIF-8, and HKUST-1) as studied before using NEP models [25] and compare the calculated forces to those from vasp (using the IVDW=12 option).The results are shown in Fig. 1.The vasp code uses R pot = 50.2Å and R cn = 20 Å as defaults.With R pot = 12 Å and R cn = 6 Å, the forces calculated from gpumd already agree quite well with those from vasp.With increasing cutoff, the root mean square errors (RMSEs) between gpumd and vasp implementations, eventually diminish.While D3 is almost free in DFT calculations, it can take a considerable portion of time for MLPs, which is particularly true for the highly efficient NEP approach.Therefore, a trike between accuracy and efficiency must be made in selecting the D3 cutoffs in NEP-D3.We used R pot = 12 Å and R cn = 6 Å for all the subsequent calculations with gpumd.from molecular dynamics (MD) simulations (from 300 K to 1500 K) driven by the second-generation reactive empirical bond-order potential [26] in combination with the registry-dependent interlayer potential [27].For all the structures, we performed single-point DFT calculations as details in Appendix A to obtain energy, force, and virial data that are needed for NEP training.Two different kinds of reference data sets were considered: one based on the PBE functional [28] and the other based on PBE combined with D3 [12].We labeled them as PBE and PBE-D3 data sets, respectively.TABLE I.The RMSEs for energy (meV/atom), force (meV/ Å), and virial (meV/atom) of the NEP models versus the PBE-D3 reference and the computational speeds (atomstep/second) in MD simulations of bilayer graphene with 40 000 atoms using one GeForce RTX 4090 GPU card.
The performances of the four models are compared in Table I.The NEP(4.5 Å, 4.5 Å)-D3 model has the highest accuracy in terms of energy, force, and virial.However, it has the lowest computational speed.The D3 part takes about 75% of the computational cost in the NEP(4.5 Å, 4.5 Å)-D3 model.This reflects the high computational cost of D3 and also the high computational efficiency of NEP.Indeed, the NEP approach has been shown to be far more computationally efficient than other state-ofthe-art MLPs [16].Because the dispersion interactions are weak forces, the RMSEs are not the best indicators for evaluating the accuracy.To better appreciate the higher accuracy brought about by the NEP-D3 approach compared to the pure NEP approach, we examine the binding and sliding energies in detail below.
Fig. 3 shows the binding energy curves from the various calculation methods.The binding energies in bilayer graphene cannot be well captured by the PBE functional, which is essentially zero at an interlayer distance of 4.5 Å, and this is the reason why a cutoff of 4.5 Å is sufficient for NEP to fit the PBE data set (Fig. 3(a)).By adding D3 to PBE, the binding energies can be nicely captured and the results from D3 is very close to those from the many-body dispersion (MBD) [29] (Fig. 3(b)), which is one of the most accurate dispersion corrections presently available.By adding D3 to NEP(4.5 Å, 4.5 Å), the resulting model, NEP(4.5 Å, 4.5 Å)-D3 is very close to PBE-D3 (Fig. 3(c)), and most of the errors come from the NEP part, with small extra errors from the truncation of the D3 part to R pot = 12 Å and R cn = 6 Å.Without adding up D3, the pure NEP models with a relatively large radial cutoff can also partially capture the binding energies, but the curves are not as smooth as that from NEP-D3 (Fig. 3(d)-(f)).The best results were obtained with a radial cutoff of 8 Å, which means that increasing the radial cutoff in NEP is not a feasible way to improve the accuracy regarding the vdW interactions.To sum up, the NEP(4.5 Å, 4.5 Å)-D3 model outperform all the pure NEP models for describing the binding energies in bilayer graphene.It is expected that similar conclusions can be drawn for dispersion-dominated binding energies in layered materials.
In contrast to the binding energies, the sliding energies of bilayer graphene with equilibrium interlayer spacing (about 3.4 Å) are not dominated by the D3 part.Therefore, as shown in Fig. 4(a), results from NEP(4.5 Å, 4.5 Å) are already very close to those from PBE-D3 and adding D3 does not make significant changes.However, the resulting NEP(4.5 Å, 4.5 Å)-D3 model significantly outperforms the pure NEP models with larger radial cutoffs (Fig. 4(b)).One possible reason is that the long radial cutoff in a pure NEP model introduces extra features that are not needed for describing the sliding energies and thus complicates the training process.Using too large a cutoff than needed, the construction of a MLP becomes more demanding since a large configuration space has to be explored and more descriptor components are needed to distinguish the atom environments [30].Indeed, among the three pure-NEP models, NEP(6 Å, 4.5 Å) performs the best and NEP(10 Å, 4.5 Å)

C. Applications to heat transport in MOFs
After confirming the reliability of the combined NEP-D3 approach, we show its usefulness in practical MD simulations.In a previous work [25], some of the present authors have studied heat transport in three typical MOFs (MOF-5, ZIF-8, and HKUST-1) using MD simulations with NEP models.The reference DFT data used for training these NEP models have no dispersion corrections.Because of the porous structures in MOFs, vdW interactions between the organic chains are expected to have noticeable effects in the structural and dynamic properties.Here we study the effects of dispersion interactions on the thermal conductivity in the MOFs.Fig. 5 compares the thermal transport results from homogeneous non-equilibrium molecular dynamics (HNEMD) simulations [31] at 300 K using the previously constructed pure NEP models [25] and the combined NEP-D3 models obtained by adding up D3 (R pot = 12 Å, R cn = 6 Å).The MD simulation details are consistent with the previous work [25] and the relevant inputs are given in Appendix C. For all the MOFs, the introduction of dispersion interactions consistently reduces the thermal conductivity, and the average amount of reduction being about 10% (Fig. 5(a)).In the presence of dispersion interactions, the phonon mean free paths of the lowfrequency phonons (ω/2π < 1 THz) are reduced to some degree, but are still at the sub-micron scale (Fig. 5(b)).More importantly, the ∼ 10% reduction of the thermal conductivity in MOF-5 still leaves the discrepancy between the calculated (0.57±0.02W/(m K)) and measured (0.32 W/(m K)) results unresolved.

III. SUMMARY AND CONCLUSIONS
In summary, we have made a GPU implementation of the D3 dispersion correction into the gpumd package and enabled its integration with NEP to form a combined NEP-D3 model.We demonstrated the superior accuracy of the NEP-D3 approach than the pure NEP approach by using bilayer graphene systems as an example, for which the dispersion interactions between the two layers play an important role for the binding energies.Although the dispersion interactions are not responsible for the sliding energies between the two layers, the presence of D3 in NEP-D3 allows for using a relatively short cutoff in the NEP part, which indirectly leads to better description of the sliding energies by the NEP part.The D3 dispersion correction we implemented can be readily added to available NEP models that have been trained against reference data without considering dispersion correction.As an example, we showed that adding D3 dispersion correction to the previous NEP models for MOFs [25]

FIG. 3 .
FIG.3.Binding energy of AB-stacked bilayer graphene as a function of the interlayer distance computed from various approaches as indicated in each panel.The RMSE between the two curves in each panel is indicated.See text for details.

FIG. 4 .
FIG. 4. Sliding energies along the AA-AB-SP path of bilayer graphene with a interlayer spacing of 3.4 Å computed from various approaches.Atoms from the top and bottom layers have different colors.
FIG. 5. (a) Thermal conductivity at 300 K as a function of simulation time and (b) spectral phonon mean free path as a function of phonon frequency for MOF-5 (top), ZIF-8 (middle), and HKUST-1 (bottom).The converged thermal conductivity values with error estimates (from five independent runs) obtained from NEP and NEP-D3 are indicated.
leads to about 10% reduced thermal conductivity.doctoral Researchers.Z. Fan was supported by National Natural Science Foundation of China (Project No. 11404033).