A KIM-compliant potfit for fitting sloppy interatomic potentials: application to the EDIP model for silicon

Mingjian Wen; Junhao Li; Peter Brommer; Ryan S Elliott; James P Sethna; Ellad B Tadmor

doi:10.1088/0965-0393/25/1/014001

1. Introduction

An interatomic potential (IP) is a model for approximating the quantum-mechanical interaction of electrons and nuclei in a material through a parameterized functional form that depends only on the positions of the nuclei. IPs such as the Lennard–Jones [1, 2] and Morse [3] potentials were initially introduced as part of theoretical studies of material behavior in the early 20th century. Interest in IPs renewed in the 1960s following the development of Monte Carlo and molecular dynamics (MD) simulation methods. This led to the development of IPs for a variety of material systems, a trend that accelerated in the 1980s with the formulation of a large number of increasingly complex IPs. (See [4] for more on the history and scientific development of IPs and MD.) Even complex IPs are far less costly to compute than a first principles approach that involves solving the Schrödinger equation for the quantum system. This enables simulations of large systems (millions, billions and even trillions of atoms depending on the IP) for very short times or moderate-size systems for longer times (tens or hundreds of nanoseconds). Such simulations can tackle problems that are inaccessible to quantum calculations, such as plastic deformation, fracture, atomic diffusion, and phase transformations [5].

Obtaining an accurate and reliable IP constitutes a challenging problem that involves choosing an appropriate functional form and adjusting the parameters to optimally reproduce a relevant training set of experimental and/or quantum data. Traditionally IPs were fitted to reproduce a set of material properties considered important for a given application, such as the cohesive energy, equilibrium lattice constant, and elastic constants of a given crystal phase. However experience has shown that the transferability of such IPs (i.e. their ability to accurately predict behavior that they were not fitted to reproduce) can be limited due to the small number of atomic configurations sampled in the training set (although recent work [6] has shown that this approach can be effective in some cases). Further, as the complexity of IPs increases (both in terms of the functional forms and the number of parameters) it can be difficult to obtain a sufficient number of material properties for the training set. This is particularly true for multispecies systems like intermetallic alloys.

To address these difficulties, Ercolessi and Adams [7] proposed the force-matching method in which a training set containing material properties is augmented with the forces on atoms obtained by first principles calculations for a set of atomic configurations. These can be configurations associated with important structures or simply random snapshots of the crystal as the atoms oscillate at finite temperature. By fitting to this information the transferability of the potential is likely enhanced since it is exposed to a larger cross-section of configuration space. The issue of insufficient training data is also resolved since as many configurations as needed can be easily generated. This makes it possible to increase the number of parameters and in fact in the original Ercolessi–Adams potential for aluminum [7] the functional forms were taken to be cubic splines with the spline knots serving as parameters. This gives maximum freedom to the fitting process (at the expense of a clear physical motivation for the functional form). The force-matching method has been widely used since its introduction and there are a number of open source implementations available. The ForceFit [8] and ForceBalance [9] packages target organic-chemistry applications, while potfit [10–12] focuses on solid-state physics.

IP fitting programs (such as those mentioned above) define a 'cost function' quantifying the difference between the training set data and IP predictions and use global and/or local minimization algorithms to reduce the cost as much as possible. For example, the potfit program uses simulated annealing for global minimization followed by a local polish using a conjugate gradient (CG) method. A difficulty associated with this procedure is that IPs are nonlinear functions that are often 'sloppy' in the sense that their predictions are insensitive to certain combinations of their parameters [13]. These soft modes in parameter space can cause the minimization algorithms to fail to converge. Recently an understanding of sloppy models based on ideas from differential geometry has led to efficient methods for fitting such models [14]. The basic idea is that the parameters of a model (like an IP) define a manifold in the space of data predictions. Fitting the IP then corresponds to finding the point on the manifold closest to the training set. Using these ideas Transtrum et al [14] augmented the Levenberg–Marquardt (LM) algorithm with a geodesic acceleration adjustment to improve convergence. The new geodesic LM algorithm is likely to be more efficient and more likely to converge for sloppy model systems than conventional approaches. Geodesic LM has been applied to a variety of applications in physics and biology, but until now not to the IP fitting problem.

Once an IP is developed it must be integrated into an MD code (or other atomic-scale simulation package). Traditionally this was done on a code-by-code basis. This creates a barrier for the universal usability of IPs. Users are typically limited by the IPs that happen to be available in the package that they choose. The same applies to IP fitting programs, which support a limited set of functional forms that they can fit. Extending codes to support new IPs is a time consuming and error-prone process. It also leads to an inevitable uncontrolled mutation of IPs as modifications to functional forms and/or parameters are introduced (often without adequate documentation). More recently, emphases have been placed on archival storage of IPs within projects such as the NIST Interatomic Potential Repository (IPR) [15] and the Open Knowledgebase of Interatomic Models (KIM) project [16–18]. The NIST IPR stores tabulated parameters for a limited number of functional forms (mostly the embedded-atom method (EAM) [19–21]). OpenKIM hosted at https://openkim.org is a more general effort in which the IP implementation itself is stored in a format that conforms to an application programming interface (API) standard developed as part of the KIM project [22]. The KIM API enables any IP stored in KIM to work seamlessly with any KIM-compliant simulation code including ASAP [23], DL_POLY [24, 25], GULP [26, 27], IMD [28–30], LAMMPS [31, 32], libAtoms+QUIP [33] and Quasicontinuum [34]. In addition IPs stored in OpenKIM are exhaustively tested by computing their prediction for a large number of material properties. These calculations are performed by 'KIM Tests' also archived in OpenKIM. Users can extend the OpenKIM system by uploading their own tests. The predictions of the IPs can be explored using text-based searches and a user-extendible visualization system [35].

In this paper we describe a KIM-compliant implementation of the potfit program designed for fitting sloppy IPs. The work has two main objectives. First, we extend the potfit program to support the KIM API making it possible to fit IPs with arbitrary functional forms that are portable to a large number of KIM-compliant simulation codes. Second, we explore the efficiency of the geodesic LM algorithm for fitting IPs using the force-matching method within the potfit framework. We test the new implementation on a KIM-compliant implementation of the environment-dependent interatomic potential (EDIP) for silicon [36–38]. We use the EDIP model to generate a training set of force configurations and explore the ability of potfit to converge back to the original EDIP parameters from different initial guesses. We find that EDIP has the properties of a sloppy model and that the geodesic LM algorithm can converge to the correct results in situations where other minimizers in potfit fail.

The paper is organized as follows. In sections 2 and 3, we briefly describe the potfit program and the KIM project, respectively. This is followed by a discussion of the modifications that have been made to potfit in order to support the KIM API and the geodesic LM algorithm. In section 5, we present the application of the extended potfit to the EDIP model. We conclude in section 6 with a summary and a discussion of future directions.

2. The potfit program for fitting IPs

The potfit program [10–12] (http://www.potfit.net) is an open source implementation of the force-matching method. It optimizes an IP's parameters by minimizing a cost function defined as the weighted sum of squares of deviations between calculated and reference quantities for forces, energies and/or stresses. (The particular form of the cost function used in this paper is described in section 5.) potfit consists of two largely independent components: IP implementations, and least-squares minimization. This program architecture makes it comparatively easy to add either an IP model or a minimization algorithm, which facilitated both aspects of the current work.

potfit supports both analytic and tabulated IPs. An analytic IP has a fixed functional form with certain free parameters. Consider for example the simplest pair potential form where the total energy of n interacting atoms is given by

$\begin{eqnarray*}&&V=\displaystyle \frac{1}{2}\displaystyle \sum _{\alpha =1}^{n}\displaystyle \sum _{\beta =1}^{n}\phi ({r}_{\alpha \beta }),\end{eqnarray*}$

in which ${r}_{\alpha \beta }$ is the distance between atoms α and β, and $\phi ({r}_{\alpha \beta })$ is the energy in the bond connecting them. An example of an analytic function, is the Lennard–Jones potential for which

$\begin{eqnarray*}&&\phi (r)=4\epsilon \left[{\left(\displaystyle \frac{\sigma }{r}\right)}^{12}-{\left(\displaystyle \frac{\sigma }{r}\right)}^{6}\right],\end{eqnarray*}$

where $\epsilon$ and σ are the IP parameters. In contrast, for a tabulated IP, the functional form is arbitrarily defined as an interpolation between sampling points that serve as the IP parameters. For example, for a cubic spline, given a set of distances $({r}_{1},{r}_{2},\ldots ,{r}_{N})$ and a corresponding set of function values $({\phi }_{1},{\phi }_{2},\ldots ,{\phi }_{N})$ , the functional form in the interval $[{r}_{i},{r}_{i+1}]$ is defined as

$\begin{eqnarray*}&&\phi {(r)={a}_{i}{(r-{r}_{i})}^{3}+{b}_{i}(r-{r}_{i})}^{2}+{c}_{i}(r-{r}_{i})+{d}_{i},\end{eqnarray*}$

where the coefficients $({a}_{i},{b}_{i},{c}_{i},{d}_{i})$ are obtained from the conditions that $\phi ({r}_{i})={\phi }_{i}$ and $\phi ({r}_{i+1})={\phi }_{i+1}$ (with different conditions at the ends that define the type of cubic spline). In this scenario the tabulated function values $({\phi }_{1},{\phi }_{2},\ldots ,{\phi }_{N})$ are the IP parameters. The form of interpolation used (cubic, quartic, quintic, or others) can have a strong influence on the predictions of the IP [39]. This is why the interpolation is considered part of the IP by OpenKIM (see section 3) and stored along with the parameters. The potfit program supports various functional forms including a variety of pair potentials, EAM potentials [19–21], angular-dependent potentials [40], bond-order potentials (Tersoff potential [41, 42]), and induced dipole potentials (Tangney–Scandolo [43]).

Once an IP is selected, different local and global minimization algorithms are available within potfit to find the optimal set of parameters that minimize the cost function. The local minimization algorithm is Powell's method [44], which is a variant of the CG method. Standard CG requires the derivative of the cost function with respect to the IP parameters. The advantage of Powell's method is that it constructs conjugate directions without requiring derivatives. Powell's algorithm begins with an initial guess for the IP parameters and converges to a nearby minimum. For highly nonlinear IPs it is possible that different initial guesses will lead to different solutions. To address this, global minimization algorithms attempt to explore the parameter space to find the deepest minimum. potfit includes simulated annealing and differential evolution in this class of methods. See the original potfit publications [10–12] for detailed descriptions of the optimization algorithms.

3. The Open Knowledgebase of Interatomic Models project

The KIM project [16–18] is an international effort aimed at improving the reliability and accuracy of molecular simulations using IPs. A schematic of the OpenKIM cyberinfrastructure is displayed in figure 1. First and foremost OpenKIM provides a repository for IPs at https://openkim.org. Within the OpenKIM system IPs are referred to more generically as models. A KIM model consists of the computer implementation of the IP along with any parameters. A KIM IP can be a stand-alone model, or a model driver that reads in different parameter files to define different models. All content in the OpenKIM system is archived subject to strict provenance control and can be cited using permanent links. For example, later in this article we will be analyzing the EDIP model for silicon [36–38] archived in OpenKIM [16, 45, 46]. The citations contain a unique 12-digit KIM identifier and 3-digit version number. This makes it possible to access the actual IP used in this publication at any later date to reproduce the calculations—an ability lacking prior to OpenKIM archiving. All content within the OpenKIM repository is citeable in this manner and is accessible to external users through web queries.

**Figure 1.** OpenKIM framework.
Download figure:
Standard image High-resolution image

In addition to archiving IPs, OpenKIM is tasked with determining the suitability of an IP for a given application. Since IPs are approximate, their transferability to phenomena they were not fitted to reproduce is limited. To help define a rigorous method for IP selection, OpenKIM includes a user-extendible suite of KIM tests for characterizing an IP. Each KIM test computes the prediction of an IP for a well-defined material property (such as the surface free energy for a given crystal face under specified temperature and pressure). Each IP in the system is mated with all compatible KIM tests using an automated system called the Processing Pipeline [47]. Whenever a new model or test is uploaded to OpenKIM by members of the community, the pipeline launches to perform the new calculations. The results are stored in the OpenKIM repository and can be explored and compared with first principles and experimental reference data using text-based and graphical visualization tools [35]. Machine learning based tools are being developed to use the accumulated information on the accuracy of an IP's predictions to help select for applications of choice [48].

The operation of the pipeline involves the coupling of pairs of computer codes, model and test, which may be written in different computer languages and need to exchange information. Specifically, the test repeatedly calls the model with new input (typically positions and species of atoms) and requires the generated output (typically energy and forces on atoms). This is made possible because all codes in OpenKIM conform to the KIM API [22]. This is a standard developed as part of the KIM project for atomistic simulation codes to exchange information. The KIM API is a lightweight efficient interface where only pointers are exchanged between the interfacing codes, which can be written in any supported language (C, C++, Fortran 77/90/95/2003, Python). A number of major simulation codes currently support the KIM API (listed in section 1). Anyone of these codes can work seamlessly (without having to be recompiled) with any KIM model stored on the same computer as a dynamically linked library. (This is akin to a CD being placed in a player from any manufacturer and having its content played.) This greatly increases the portability of IPs. It allows researchers to select the appropriate IP for their application and use it with their code of choice, rather than being limited to the IPs that happen to come with it.

4. Extensions to potfit: geodesic LM and KIM-compliance

An extended version of the potfit code has been developed that includes a geodesic LM optimizer and support for the KIM API. The new code is available via https://www.potfit.net. Details of the extensions are given below.

4.1. Geodesic LM algorithm

As explained in the introduction, IPs often have the characteristics of sloppy models. This can cause the standard minimization algorithms in potfit to fail to converge. As an alternative we have incorporated into potfit the newly developed geodesic LM algorithm [14, 49–51]. Geodesic LM is an efficient method offering a higher likelihood of convergence for sloppy models. The performance of the geodesic LM method is discussed in section 5.4. The method is described next.

The least-squares optimization problem is to find the parameters ${\boldsymbol{\theta }}=\{{\theta }_{1},\ldots ,{\theta }_{N}\}$ that minimize the cost function $C:{{\mathbb{R}}}^{N}\mapsto {\mathbb{R}}$ :

$\begin{eqnarray}&&C({\boldsymbol{\theta }})=\displaystyle \frac{1}{2}\displaystyle \sum _{m=1}^{M}{({r}_{m}({\boldsymbol{\theta }}))}^{2}=\displaystyle \frac{1}{2}\parallel {\boldsymbol{r}}({\boldsymbol{\theta }}){\parallel }^{2}=\displaystyle \frac{1}{2}{\boldsymbol{r}}{({\boldsymbol{\theta }})}^{{\rm{T}}}{\boldsymbol{r}}({\boldsymbol{\theta }}),\end{eqnarray} \tag{ 1 }$

where the residual ${\boldsymbol{r}}\,:{{\mathbb{R}}}^{N}\mapsto {{\mathbb{R}}}^{M}$ is an M-dimensional vector function of N parameters. The gradient of C is

$\begin{eqnarray}&&\displaystyle \frac{\partial C}{\partial {\theta }_{i}}({\boldsymbol{\theta }})=\displaystyle \sum _{m=1}^{M}{r}_{m}\displaystyle \frac{\partial {r}_{m}}{\partial {\theta }_{i}}\quad \iff \quad \displaystyle \frac{\partial C}{\partial {\boldsymbol{\theta }}}({\boldsymbol{\theta }})={{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{r}},\end{eqnarray} \tag{ 2 }$

where ${\boldsymbol{J}}$ is the Jacobian of ${\boldsymbol{r}}$ , and the Hessian of C is,

$\begin{eqnarray}&&\displaystyle \frac{{\partial }^{2}C}{\partial {\theta }_{i}{\theta }_{j}}({\boldsymbol{\theta }})=\displaystyle \sum _{m=1}^{M}\left(\displaystyle \frac{\partial {r}_{m}}{\partial {\theta }_{i}}\displaystyle \frac{\partial {r}_{m}}{\partial {\theta }_{j}}+{r}_{m}\displaystyle \frac{{\partial }^{2}{r}_{m}}{\partial {\theta }_{i}{\theta }_{j}}\right)\quad \iff \quad \displaystyle \frac{{\partial }^{2}C}{\partial {\boldsymbol{\theta }}\partial {\boldsymbol{\theta }}}({\boldsymbol{\theta }})={{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{J}}+{{\boldsymbol{r}}}^{T}{\boldsymbol{K}},\end{eqnarray} \tag{ 3 }$

where ${\boldsymbol{K}}$ is the second derivative of ${\boldsymbol{r}}$ with respect to ${\boldsymbol{\theta }}$ (a third-order matrix).

Before introducing geodesic LM, let us consider the more basic Gauss–Newton method. This approach is based on a linear approximation of ${\boldsymbol{r}}$ [52],

$\begin{eqnarray}&&{\boldsymbol{r}}({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})\approx {\boldsymbol{r}}({\boldsymbol{\theta }})+{\boldsymbol{J}}({\boldsymbol{\theta }})\delta {\boldsymbol{\theta }}.\end{eqnarray} \tag{ 4 }$

Substituting (4) into (1) we see that,

$\begin{eqnarray}C({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }}) & = & \displaystyle \frac{1}{2}{\boldsymbol{r}}{({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})}^{{\rm{T}}}{\boldsymbol{r}}({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})\\ & = & \displaystyle \frac{1}{2}{({\boldsymbol{r}}+{\boldsymbol{J}}\delta {\boldsymbol{\theta }})}^{{\rm{T}}}({\boldsymbol{r}}+{\boldsymbol{J}}\delta {\boldsymbol{\theta }})\\ & = & C({\boldsymbol{\theta }})+\delta {{\boldsymbol{\theta }}}^{{\rm{T}}}{{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{r}}+\displaystyle \frac{1}{2}\delta {{\boldsymbol{\theta }}}^{{\rm{T}}}{{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{J}}\delta {\boldsymbol{\theta }}.\end{eqnarray} \tag{ 5 }$

Taking the first derivative of (5) with respect to $\delta {\boldsymbol{\theta }}$ and setting it to zero, we have,

$\begin{eqnarray}&&\delta {\boldsymbol{\theta }}=-{({{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{J}})}^{-1}{{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{r}}.\end{eqnarray} \tag{ 6 }$

In (6) $\delta {\boldsymbol{\theta }}$ is a local decent direction of C, thus we can obtain the local minimum of C by solving (6) iteratively. This is the Gauss–Newton method.

Levenberg [53] suggested a 'damped version' of (6) where the parameters are iteratively updated according to

$\begin{eqnarray}&&\delta {\boldsymbol{\theta }}=-{({{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{J}}+\lambda {\boldsymbol{D}})}^{-1}{{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{r}},\quad \lambda \geqslant 0.\end{eqnarray} \tag{ 7 }$

Here λ is a (non-negative) damping parameter, and ${\boldsymbol{D}}={\boldsymbol{I}}$ is the identity. The Levenberg method is an interpolation between the Gauss–Newton algorithm and the steepest decent algorithm. For small λ, (7) reduces to the Gauss–Newton equation in (6); whereas for large λ, $\delta {\boldsymbol{\theta }}\approx -{\lambda }^{-1}{{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{r}}$ , which lies along the gradient (i.e. the steepest descent direction) in (2).

In order to overcome slow convergence in directions of small gradients, Marquardt [54] proposed taking ${\boldsymbol{D}}=\mathrm{diag}({{\boldsymbol{J}}}^{{\rm{T}}}{\boldsymbol{J}})$ , where $\mathrm{diag}({\boldsymbol{A}})$ returns a diagonal matrix with elements equal to the diagonal elements of the square matrix ( ${\boldsymbol{A}}$ ). The downside of this approach is that it greatly increases the susceptibility for parameter evaporation [50], i.e. the algorithm pushes the parameters to infinite values without finding a good fit around saddle points.

To improve the efficiency of the LM method, Transtrum and Sethna [14, 49–51] proposed the geodesic LM algorithm in which the minimization step size is constrained and the residual is modified to include a second-order (geodesic acceleration) correction,

$\begin{eqnarray}&&{\boldsymbol{r}}({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})\approx {\boldsymbol{r}}({\boldsymbol{\theta }})+{\boldsymbol{J}}({\boldsymbol{\theta }})\delta {\boldsymbol{\theta }}+\displaystyle \frac{1}{2}\delta {{\boldsymbol{\theta }}}^{{\rm{T}}}{\boldsymbol{K}}({\boldsymbol{\theta }})\delta {\boldsymbol{\theta }}.\end{eqnarray} \tag{ 8 }$

The cost function at ${\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }}$ is then

$\begin{eqnarray}&&C({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})=\displaystyle \frac{1}{2}{\boldsymbol{r}}{({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})}^{{\rm{T}}}{\boldsymbol{r}}({\boldsymbol{\theta }}+\delta {\boldsymbol{\theta }})+\displaystyle \frac{1}{2}\lambda \delta {{\boldsymbol{\theta }}}^{{\rm{T}}}{\boldsymbol{D}}\delta {\boldsymbol{\theta }},\end{eqnarray} \tag{ 9 }$

where a Lagrange multiplier is introduced to enforce the constraint on the step size $\delta {\boldsymbol{\theta }}$ . Taking the first derivative of (9) with respect to $\delta {\boldsymbol{\theta }}$ and setting it to zero, we have

$\begin{eqnarray}&&{J}_{{mi}}{r}_{m}+({J}_{{mi}}{J}_{{mj}}+{r}_{m}{K}_{{mij}}+\lambda {D}_{{ij}})\delta {\theta }_{j}+\left({J}_{{mk}}{K}_{{mij}}+\displaystyle \frac{1}{2}{J}_{{mi}}{K}_{{mkj}}\right)\delta {\theta }_{j}\delta {\theta }_{k}=0,\end{eqnarray} \tag{ 10 }$

where the indices are included explicitly to avoid ambiguity and Einstein's summation convention over repeated indices is adopted. The step size $\delta {\theta }_{i}$ is written as a sum of two parts:

$\begin{eqnarray}&&\delta {\theta }_{i}=\delta {\theta }_{i}^{(1)}+\delta {\theta }_{i}^{(2)},\end{eqnarray} \tag{ 11 }$

where $\delta {\theta }_{i}^{(1)}$ is given by (7) and $\delta {\theta }_{i}^{(2)}$ includes the remaining terms. Substituting (11) and (7) into (10), $\delta {\theta }_{i}^{(2)}$ can be solved as,

$\begin{eqnarray}\delta {\theta }_{i}^{(2)} & = & -\displaystyle \frac{1}{2}{({J}_{{mi}}{J}_{{mj}}+\lambda {D}_{{ij}})}^{-1}{J}_{{pj}}{K}_{{pkl}}\delta {\theta }_{k}^{(1)}\delta {\theta }_{l}^{(1)}\\ & = & -\displaystyle \frac{1}{2}{({J}_{{mi}}{J}_{{mj}}+\lambda {D}_{{ij}})}^{-1}{r}_{p}^{\prime\prime },\end{eqnarray} \tag{ 12 }$

where some small terms are ignored (see [50] for details) and the directional second derivative ${r}_{p}^{\prime\prime }={K}_{{pkl}}\delta {\theta }_{k}^{(1)}\delta {\theta }_{l}^{(1)}$ is used.

As seen from (12), the geodesic acceleration correction only depends on the directional second derivative oriented along the first order correction $\delta {{\boldsymbol{\theta }}}^{(1)}$ . This feature is very important because the cost to compute the directional second derivative is reasonably small and hence will not overly add to the computational burden.

Finally, in order to prevent parameter evaporation in the geodesic LM algorithm and to increase the likelihood of convergence, allowable step sizes must satisfy:

$\begin{eqnarray}&&\displaystyle \frac{2\parallel \delta {{\boldsymbol{\theta }}}^{(2)}\parallel }{\parallel \delta {{\boldsymbol{\theta }}}^{(1)}\parallel }\leqslant \alpha ,\end{eqnarray} \tag{ 13 }$

where α is some parameter (usually smaller than 1) and whose optimal value depends on the specific problem.

An open source implementation of the geodesic LM algorithm has been made available by Transtrum [55]. In addition to the geodesic acceleration correction, the geodesic LM package includes different options for updates to the damping parameter and the damping matrix, different conditions for accepting uphill steps (i.e. steps that increase C), and whether or not to use Broyden's method [56] to lower the Jacobian update frequency. In this work, the damping matrix is set to ${\boldsymbol{I}}$ , Umrigar and Nightingale's method [50] is used to update the damping parameter and to decide whether to accept uphill steps, and we do not employ Broyden's method to update the Jacobian because the computational cost in this problem is not too high.

4.2. KIM-compliance

The original version of potfit is limited to using IPs built into the program. The philosophy underlying the KIM API is different. It is based on the idea of separation between the simulation code (called a 'simulator' in KIM parlance) and the IP (called the KIM Model). Whenever the simulator (potfit in this case) requires the energy or forces for an atomic configuration, it calls a KIM model with the necessary input (e.g. coordinates and species of the atoms) and receives back the required output (e.g. the forces on the atoms). Enforcing this separation makes it possible for the simulator to work with any compatible KIM model in plug-and-play fashion⁴ . The KIM API is the standard for the exchange of information between the communicating codes. In practice this works by having all data exchanged between the simulator and IP stored in a data structure called the 'KIM API Object.' The simulator and IP can access and change this data using KIM API library routines. The only direct data transfer required between the programs is for the simulator to pass to the KIM model the pointer to the location in memory of the KIM API Object. This makes the KIM API extremely efficient and provides it with cross language compatibility (as described in section 3). This means that potfit, which is written in C, can work efficiently and seamlessly with KIM Models written in C, C++, Fortran 77/90/95/2003, and Python.

The potfit program has an unusual requirement for a simulator in that it needs to modify the parameters of the IP that it is optimizing. This is supported by the KIM API through the ability of KIM Models to 'publish their parameters.' A KIM model that does so has in its KIM descriptor file a list of its parameters identified by name with the prefix 'PARAM_FREE_'. The potfit user can select any of these parameters to optimize. This is specified in the input file to potfit along with the initial guesses for the parameters, and the KIM model identifier (see section 3).

The operation of the KIM-compliant implementation of potfit is described in figure 2. The program reads in the input defining the optimization problem and then generates one or more KIM API Objects containing the training set to be used for the optimization. At each step in the optimization whenever the IP parameters need to be changed, the following steps are performed using KIM API library calls:

(i)
The KIM model parameters are changed in the KIM API Object;
(ii)
The KIM model is called to compute the new energy and forces;
(iii)
potfit acquires the energy and forces from the KIM API Object.

When the optimization is complete, potfit outputs the final set of parameters and writes out a new KIM Model, which can be uploaded to https://openkim.org if desired.

**Figure 2.** Flowchart of the KIM-compliant *potfit* program.
Download figure:
Standard image High-resolution image

5. Application of KIM-potfit to EDIP

In this section we test the performance of KIM-potfit by applying it to the optimization of the EDIP model for silicon [36–38] archived in OpenKIM [16, 45, 46]. The training set consists of the energy and forces computed with the original EDIP model for a periodic configuration of $5\times 5\times 5$ conventional unit cells for a total of 1000 atoms perturbed off the ideal diamond structure positions. Our objective is to see whether starting from different initial guesses for the parameters, the optimization procedure is able to recover the EDIP parameters.

5.1. Cost function

As explained in section 2, IP optimization corresponds to minimizing a cost function quantifying the difference between the IP's predictions and a training set of reference data. Our training set consists of a single atomic configuration with n = 1000 atoms. The cost function used in our analysis is

$\begin{eqnarray}&&C({\boldsymbol{\theta }})=\displaystyle \frac{1}{2}{w}_{1}\displaystyle \sum _{\alpha =1}^{n}\parallel {{\boldsymbol{f}}}_{\alpha }-{{\boldsymbol{f}}}_{\alpha }^{0}{\parallel }^{2}+\displaystyle \frac{1}{2}{w}_{2}{(E-{E}^{0})}^{2},\end{eqnarray} \tag{ 14 }$

where ${{\boldsymbol{f}}}_{\alpha }={{\boldsymbol{f}}}_{\alpha }({\boldsymbol{\theta }})$ and ${{\boldsymbol{f}}}_{\alpha }^{0}$ are the predicted and reference force on atom α, $E=E({\boldsymbol{\theta }})$ and E⁰ are the predicted and reference energy, and w₁ and w₂ are weights. In our example we take ${w}_{1}=1\,{(\mathring{\rm A} /\mathrm{eV})}^{2}$ , ${w}_{2}=1\,{(1/\mathrm{eV})}^{2}$ to give a unitless cost function. The reference forces and energy are defined as ${{\boldsymbol{f}}}_{\alpha }^{0}={{\boldsymbol{f}}}_{\alpha }({{\boldsymbol{\theta }}}^{0})$ and ${E}^{0}=E({{\boldsymbol{\theta }}}^{0})$ , where ${{\boldsymbol{\theta }}}^{0}$ are the original EDIP parameters. EDIP's functional form and parameters along with the training set used are given in the supplementary information accompanying this article.

The Hessian of the cost function at the original parameters ${{\boldsymbol{\theta }}}^{0}$ follows from (3) as

$\begin{eqnarray}&&{H}_{{ij}}^{0}={\left.\displaystyle \frac{{\partial }^{2}C}{\partial {\theta }_{i}\partial {\theta }_{j}}\right|}_{{{\boldsymbol{\theta }}}^{0}}={w}_{1}\displaystyle \sum _{\alpha =1}^{n}{\left.\displaystyle \frac{\partial {{\boldsymbol{f}}}_{\alpha }}{\partial {\theta }_{i}}\right|}_{{{\boldsymbol{\theta }}}^{0}}\cdot {\left.\displaystyle \frac{\partial {{\boldsymbol{f}}}_{\alpha }}{\partial {\theta }_{j}}\right|}_{{{\boldsymbol{\theta }}}^{0}}+{w}_{2}{\left.\displaystyle \frac{\partial E}{\partial {\theta }_{i}}\right|}_{{{\boldsymbol{\theta }}}^{0}}{\left.\displaystyle \frac{\partial E}{\partial {\theta }_{j}}\right|}_{{{\boldsymbol{\theta }}}^{0}},\end{eqnarray} \tag{ 15 }$

where we have used ${({{\boldsymbol{f}}}_{\alpha }-{{\boldsymbol{f}}}_{\alpha }^{0})| }_{{{\boldsymbol{\theta }}}^{0}}=0$ and ${(E-{E}^{0})| }_{{{\boldsymbol{\theta }}}^{0}}=0$ . We will use the Hessian in the next section to evaluate the sensitivity of EDIP predictions to its parameters.

5.2. EDIP sensitivity analysis

It is a common feature of IPs that the prediction of a model (including the cost function) are weakly dependent on certain directions in parameter space, i.e. the models are sloppy [57]. This is demonstrated schematically in figure 3 showing the contour plot of a cost function with only two parameters. Clearly the function varies more slowly along the ${{\boldsymbol{\Lambda }}}_{2}$ eigendirection than along ${{\boldsymbol{\Lambda }}}_{1}$ . As a result of this structure of $C({\boldsymbol{\theta }})$ , convergence along certain directions can be incomplete. For example in the figure, the minimum is located at O but a convergence could terminate at a set of parameters offset to the optimal ones by ${\rm{\Delta }}{\boldsymbol{\theta }}$ . An indication of whether this is occurring can be obtained by computing the angle between ${\rm{\Delta }}{\boldsymbol{\theta }}$ and the eigenvectors of the Hessian ${\partial }^{2}C/\partial {\theta }_{i}\partial {\theta }_{j}$ evaluated at O,

$\begin{eqnarray}&&\cos {\alpha }_{i}=\displaystyle \frac{{\rm{\Delta }}{\boldsymbol{\theta }}\cdot {{\boldsymbol{\Lambda }}}_{i}}{\parallel {\rm{\Delta }}{\boldsymbol{\theta }}\parallel }.\end{eqnarray} \tag{ 16 }$

Point A and point B (corresponding to another fit) are equally close to the eigendirections because they form the same angles.

**Figure 3.** Contour plot of a cost function with two parameters. ${{\rm{\Lambda }}}_{1}$ and ${{\rm{\Lambda }}}_{2}$ are the eigendirections of the Hessian of the cost function corresponding to the larger and smaller eigenvalues, respectively. Points A and B represent two fits.
Download figure:
Standard image High-resolution image

**Figure 3.** Contour plot of a cost function with two parameters. ${{\rm{\Lambda }}}_{1}$ and ${{\rm{\Lambda }}}_{2}$ are the eigendirections of the Hessian of the cost function corresponding to the larger and smaller eigenvalues, respectively. Points A and B represent two fits.
Download figure:
Standard image High-resolution image

In general, the eigendirections associated with smaller eigenvalues are more 'sloppy' and we would expect larger scatter in the $\cos {\alpha }_{i}$ values computed in (16) for an ensemble of fits. Thus qualitatively we expect a behavior similar to that shown in figure 4, where the scatter in $\cos {\alpha }_{i}$ obtained from a large number of fitting attempts with different initial guesses would occupy a region similar to Region I in the figure with larger scatter along directions associated with smaller eigenvalues. This would be the signature of a sloppy model.

**Figure 4.** Expected region for normalized projected parameter errors if the fits are aligned along the sloppy direction(s). The horizontal axis corresponds to the eigenvalues of the Hessian sorted in increasing order.
Download figure:
Standard image High-resolution image

We perform this analysis for EDIP. EDIP has 13 parameters, but only 11 of them are used in this study with the two cutoffs excluded. The training set is described in section 5.1. We take an ensemble of 100 initial guesses obtained by adding to the original EDIP parameters random numbers ρ drawn from a Gaussian distribution with average zero and a standard deviation of 0.15, i.e. ${\theta }_{i}:= {\theta }_{i}^{0}\times (1+\rho ),i=1,\ldots ,11$ , and perform the fitting procedure for each one. The relative errors ${\rm{\Delta }}{\theta }_{i}/{\theta }_{i}^{0}=({\theta }_{i}^{* }-{\theta }_{i}^{0})/{\theta }_{i}^{0}$ (for $i=1,\ldots ,11$ ) between the parameters ${{\boldsymbol{\theta }}}^{* }$ obtained in the fitting procedure and the original EDIP parameters ${{\boldsymbol{\theta }}}^{0}$ for all 100 optimizations are shown in figure 5(a). No clear trend is apparent. Instead let us plot the cosine defined in (16) as done schematically in figure 4. To improve the scaling of the plot let us modify the definition to use the logarithm of the parameters, ${\hat{\theta }}_{i}=\mathrm{log}{\theta }_{i}$ . (This is important when different parameters have very different magnitudes.) The definition equivalent to (16) is then

$\begin{eqnarray}&&\cos {\hat{\alpha }}_{i}=\displaystyle \frac{{\rm{\Delta }}\hat{{\boldsymbol{\theta }}\,}\cdot \,{\hat{{\boldsymbol{\Lambda }}}}_{i}}{\parallel {\rm{\Delta }}\hat{{\boldsymbol{\theta }}\,}\parallel },\end{eqnarray} \tag{ 17 }$

where ${\rm{\Delta }}\hat{{\boldsymbol{\theta }}\,}={\hat{{\boldsymbol{\theta }}}}^{* }-{\hat{{\boldsymbol{\theta }}}}^{0}$ and ${\hat{{\boldsymbol{\Lambda }}}}_{i}$ is the ith eigendirection of the Hessian of the cost function C in the logarithm parameter space, ${\partial }^{2}C/\partial {\hat{\theta }}_{i}\partial {\hat{\theta }}_{j}={\theta }_{i}{\theta }_{j}{\partial }^{2}C/\partial {\theta }_{i}\partial {\theta }_{j}$ . The results for EDIP are plotted in figure 5(b), and are very much in line with the expected behavior in figure 4. In particular, we see that the spread is mostly accounted for by the first four eigenterms each of which corresponds to a particular combination of parameters as quantified by the eigendirections.

**Figure 5.** Errors in fitting the EDIP parameters. (a) Relative errors. (b) Normalized error projected on the eigendirections of the Hessian of the cost function in the logarithm parameter space.
Download figure:
Standard image High-resolution image

5.3. Cost along eigendirections

The results in the previous section support the conjecture that the observed scatter in results is due to sloppy directions. However, since the Hessian is a local measure of the structure of C near the optimal solution, it is possible that there are other minima further away that can trap the optimization process. To explore this possibility, we plot the cost function along the different eigendirections,

$\begin{eqnarray}&&{\widehat{C}}_{i}(s)=C({{\boldsymbol{\theta }}}^{0}+s{{\boldsymbol{\Lambda }}}_{i}),\end{eqnarray} \tag{ 18 }$

where as before ${{\boldsymbol{\theta }}}^{0}$ corresponds to the original EDIP parameters and s is a scalar factor that can be positive or negative. The results are plotted in figure 6 for all eigendirections from the smallest eigenvalue 1 to the largest eigenvalue 11. The cost $\widehat{C}$ varies more slowly along the eigendirections associated with the five smallest eigenvalues than along other directions, i.e. the cost function is more flat in these five directions. (Note that the scale of the y-axis is not the same in the plots, and the first five plots have a significantly smaller range than the others.) This is in agreement with the previous observation that the spread of the fitted parameters is larger along these directions. We also see that an additional local minimum (aside from the one associated with original parameters at s = 0) is only apparent along the positive 11 direction at a significantly higher cost function⁵ . This minimum can trap the optimization algorithm for initial guesses in its vicinity, however over a large range near the original parameters there is only a single minimum. This suggests that a local minimizer is sufficient for this IP (using different initial guesses), which can reduce the fitting time drastically compared with global minimization efforts.

**Figure 6.** Cost along eigendirections. ${\widehat{C}}_{1}$ is associated with the smallest eigenvalue, and ${\widehat{C}}_{11}$ is associated with the largest. For ${\widehat{C}}_{10}$ and ${\widehat{C}}_{11}$ the insets show a zoomed view of the flat portions of the curves.
Download figure:
Standard image High-resolution image

**Figure 6.** Cost along eigendirections. ${\widehat{C}}_{1}$ is associated with the smallest eigenvalue, and ${\widehat{C}}_{11}$ is associated with the largest. For ${\widehat{C}}_{10}$ and ${\widehat{C}}_{11}$ the insets show a zoomed view of the flat portions of the curves.
Download figure:
Standard image High-resolution image

5.4. Performance of minimization algorithms

Having established in the previous section that a local minimization is sufficient for EDIP (at least in the vicinity of the original parameters), we explore the efficiency of Powell's method, the geodesic LM algorithm added to potfit, and the standard LM algorithm described in [58] (see section 4.1 for details of LM and geodesic LM). The three minimizers are applied to fit EDIP. The initial guesses for the parameters are obtained in the same way as discussed in section 5.2. Three sets of 100 initial guesses are constructed by perturbing the original EDIP parameters by random numbers drawn from a Gaussian distribution with zero mean and standard deviations of 0.1, 0.2 and 0.3. The results are given in table 1 showing for each minimizer and perturbation magnitude the number of attempts (out of 100) that converged below the cost level indicated on the left. For example for Powell's method with a perturbation magnitude of 0.2, the algorithm converged to a cost below 10⁻³ for 46 out of the 100 initial guesses. We see that geodesic LM outperforms the other two algorithms for all perturbation amplitudes. In all cases it is able to converge from more initial guess to below the specified tolerance. The standard LM algorithm is superior to Powell in all cases except for the smallest perturbation.

Table 1. Statistics on the optimization of EDIP. The number of minimization attempts (out of a 100) that converge to a cost function below the tolerance specified on the left for the three minimizers studied and the three perturbation amplitudes (0.1, 0.2 and 0.3) used to generate the initial guesses.

	Powell			Standard LM			Geodesic LM
Cost(C)	0.1	0.2	0.3	0.1	0.2	0.3	0.1	0.2	0.3
$C\lt {10}^{-7}$	85	37	7	68	46	9	100	60	14
$C\lt {10}^{-5}$	85	39	8	76	49	9	100	60	14
$C\lt {10}^{-3}$	89	47	9	100	58	16	100	64	21
$C\lt {10}^{-1}$	93	68	23	100	79	25	100	84	33
$C\lt {10}^{0}$	98	78	36	100	85	39	100	90	51
$C\lt {10}^{1}$	100	94	65	100	96	54	100	98	76

We are also interested in the computational cost of the different minimization methods. The Powell's algorithm requires only evaluations of the cost function, whereas LM and geodesic LM require the gradient of the cost function with respect to the parameters. The gradient is computed using numerical differentiation, which only requires evaluations of the cost function. Therefore the number of cost function evaluations ( ${N}_{\mathrm{cost}}$ ) performed during the minimization is a good metric for comparing the performance of all three methods. Figure 7 shows for each perturbation amplitude, the normalized cost function $\bar{C}=C/{C}_{\mathrm{init}}$ (where ${C}_{\mathrm{init}}$ is the cost at the start of the minimization) versus ${N}_{\mathrm{cost}}$ for the three methods. The initial flat region is associated with the setup stage of the methods (e.g. building an initial estimate for the Hessian in Powell's method). Examining the plots, we see that Powell and LM are comparable in performance, whereas geodesic LM is about an order of magnitude slower. Thus geodesic LM's superior convergence properties come at the expense of a significant increase in the computational burden. For the current problem this tradeoff is acceptable since geodesic LM converges in most cases in less than 1000 cost function evaluations, which is quite reasonable for the current problem. However, when computation of the cost function is expensive (due to a larger training set or more complex IP) or when more steps are needed to converge as would be expected for IPs with more parameters, the computational cost of geodesic LM may be an issue.

**Figure 7.** Normalized cost $\bar{C}$ versus the number of cost function evaluations ${N}_{\mathrm{cost}}$ . The initial guesses for the parameters are perturbations off the original EDIP parameters drawn from a Gaussian distribution with zero mean and standard deviations of (a) 0.1, (b) 0.2, and (c) 0.3.
Download figure:
Standard image High-resolution image

**Figure 7.** Normalized cost $\bar{C}$ versus the number of cost function evaluations ${N}_{\mathrm{cost}}$ . The initial guesses for the parameters are perturbations off the original EDIP parameters drawn from a Gaussian distribution with zero mean and standard deviations of (a) 0.1, (b) 0.2, and (c) 0.3.
Download figure:
Standard image High-resolution image

6. Summary and future directions

The accuracy of an IP is critical to atomic-scale simulations. Designing a high-quality transferable IP is a very challenging task due to the complexity of the quantum mechanical behavior that the IP is attempting to mimic. The developer of an IP must select a functional form with adjustable parameters and a suitably weighted training set of first principles and experimental reference data. The IP is then optimized with respect to its parameters to reproduce the training set as closely as possible in terms of specified cost function. It is then validated against an independent test set of reference data. If the validation results are inadequate, the functional form and/or the training set must be amended and the process repeated until a satisfactory IP is obtained. This process is made more difficult by the fact that many IPs are sloppy models in the sense that their predictions are weakly dependent on certain combinations of their parameters. This can cause optimization algorithms to fail to converge.

potfit is an IP fitting program based on the force-matching method. The idea is to include in the training set the forces on atoms computed from first principles for a set of atomic configurations of interest. This allows the developer to train the IP with a large amount of reference data spanning a larger portion of configuration space. We have extended potfit in two ways:

(i)
We have incorporated a new optimization method based on the geodesic LM algorithm. Although slower than some other local minimizers it has improved convergence properties for sloppy models. This is the first application of geodesic LM to IP optimization.
(ii)
We have adapted potfit to be compatible with the KIM API. This allows potfit to optimize KIM Models stored in the KIM Repository at https://openkim.org. This greatly expands the functional forms that potfit can optimize. The resulting IPs are portable and can be used seamlessly with many major software packages that support the KIM API.

We study the effectiveness of the new code by using it to optimize the EDIP model for silicon. This IP is not available within the original potfit code, but is available as a KIM Model. We use EDIP to generate a training set of atomic configurations and test whether potfit can reproduce the EDIP parameters from different initial guesses. We make several observations:

(i)
EDIP displays the characteristics of a sloppy model. For different initial guesses we find a large scatter in the parameter values. An analysis shows that the scatter is maximal along the eigendirections of the Hessian of the cost function associated with the smallest eigenvalues. Thus the IP's behavior depends primarily on a few stiff combinations of parameters.
(ii)
At least in the vicinity of the original parameters the cost function is largely convex with only two local minima. This suggests that a local minimization method with different random initial guesses is sufficient.
(iii)
The geodesic LM algorithm was on average twice as likely to converge to the correct solution from different initial guesses than standard LM or Powell's method that tended to get trapped along sloppy directions. However, the improved performance comes at the cost of about an order of magnitude increase in computational expense for geodesic LM. For the current problem the computations were not prohibitive and geodesic LM was the preferred method.

The extended potfit code is the first step in developing an IP fitting framework based on the OpenKIM project. Such a framework would make it possible for the developer to draw upon the large selection of KIM tests (standardized codes for computing material properties) and reference data in OpenKIM. Models developed using this framework and the training set used in the fit will then be archived at https://openkim.org. The latter is an important advantage as it means that other developers will be able to draw upon previous work and adapt a model to a new application starting from the training set used in its development. This framework is currently under development for use in fitting IPs for 2D materials [59].

Acknowledgments

We thank Mark Transtrum for his open source implementation of the geodesic LM algorithm, and Colin Clement for useful discussions. The authors also thank Daniel Schopf for his assistance in merging the OpenKIM extension into the main potfit distribution. The research was partly supported through the National Science Foundation (NSF) under grants No. PHY-0941493 and DMR-1408211. The authors acknowledge the support of the Army Research Office (W911NF-14-1-0247) under the MURI program.

A KIM-compliant potfit for fitting sloppy interatomic potentials: application to the EDIP model for silicon

Article metrics

Submit

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. The potfit program for fitting IPs

3. The Open Knowledgebase of Interatomic Models project