Quantum mechanical force fields for condensed phase molecular simulations

Timothy J Giese; Darrin M York

doi:10.1088/1361-648X/aa7c5c

1. Introduction

Molecular simulations can provide deep insight into chemical processes in the condensed phase by unveiling atomic-level details beyond that which can be directly measured experimentally. However, the predictive value of these calculations hinges on the accuracy of the models used to determine intra- and intermolecular forces, the time scales the simulations can access, and the degree to which the spatial extent of the model system in silico accurately reflects the real system in vitro or in vivo. These considerations are problem-dependent and can vary widely—but universally there is a trade-off between accuracy and computational feasibility that must be appropriately balanced.

Pairwise additive 'molecular mechanics' (MM) force fields are among the simplest, and most efficient models and have seen widespread use in simulations of large systems over long time scales [1]. Several next-generation force fields have been developed that include explicit electronic polarization and other more realistic features in the molecular interactions [2–4]. While these force fields are quite promising, they are also less computationally efficient, and in many cases have not yet achieved the same application scope and level of parameter refinement as the more mature pairwise additive models.

Many chemical applications demand models that go beyond molecular mechanics and explicitly use a quantum mechanical framework [5, 6]. Quantum mechanical (QM) models provide more transferability between varying systems of interest and chemical environments than traditional MM force fields. Furthermore, QM models are inherently capable of modeling electronic polarization, charge transfer, chemical reactions, and spectra. The literature describing the development of modern quantum electronic structure methods is vast; however, for the system sizes and time scales required for condensed phase simulations, the scope of practical methods is quite limited. By far the most widely used quantum methods applied to molecular simulations are those based on single-determinant wave functions constructed from molecular orbitals built up from localized atomic orbital basis sets. For the purposes of the present review, we will restrict discussion to this class of quantum models, which includes both approximate and ab initio methods based on Hartree–Fock and density-functional theory [7].

The applicability of molecular orbital-based quantum mechanical methods to molecular dynamics (MD) simulation of large systems is severely hindered by:

1.
the large number of required integrals to be performed,
2.
the construction of globally orthonormal molecular orbitals, and
3.
the quality of computed results from an affordable Hamiltonian.

There are QM algorithms that forego the explicit calculation of orthonormal orbitals; relying instead on properties of the density matrix. The development of density matrix minimization [8, 9] and density matrix subsystem density functional methods [10] continue to attract attention, but have not yet found widespread adoption relative to orbital-based algorithms.

The number of electron repulsion integrals (ERIs) required in ab initio methods formally scales $O(N^4)$ , where N is the number of atoms in the system. For moderately sized systems, this scaling can be reduced to $O(N^3)$ by computing the ERIs through Cholesky decomposition or the resolution-of-identity approximation [11]. Alternatively, some Hamiltonians have been proposed that reduce the computational cost (prefactor) of density functional theory (DFT) methods by using very small atomic orbital (AO) basis sets and supplementing the Hamiltonian with empirical corrections to improve the quality of the results [12]. Semiempirical Hamiltonians reduce the formal scaling of the ERIs to $O(N^2)$ either through the neglect of diatomic differential overlap (NDDO) approximation [13], or by computing the electrostatics from small, atom-centered auxiliary basis functions, such as the Slater atomic charges used in the self-consistent charge density-function tight-binding models DFTB2 [14] and DFTB3 [15, 16] The formal scaling of ab initio and semiempirical Coulomb integral evaluation can be reduced to $O(N \log N)$ by approximating the long-range interactions in aperiodic systems with the fast multipole method [17, 18]. Linear-scaling evaluation of electrostatics in periodic systems remains an active area of research and is discussed in section 4.

The construction of globally orthonormal molecular orbitals is usually accomplished by solving the Roothaan–Hall equations:

$\begin{align} \displaystyle \label{E:Roothaan} {\bf F}^{\sigma}\cdot{\bf C}^{\sigma} = {\bf S}\cdot{\bf C}^{\sigma}\cdot{\bf E}^{\sigma} \nonumber \end{align} \tag{ 1 }$

where $F^{\sigma}_{\mu\nu} = \left.\partial E({\bf R}, {\bf P})/\partial P^{\sigma}_{\mu\nu}\right\vert_{{\bf R}}$ is the spin-resolved Fock (or Kohn–Sham) matrix, $E({\bf R}, {\bf P})$ is the total energy of the electronic structure method, $P^{\sigma}_{\mu\nu}=\sum_i n_i^{\sigma} C^{\sigma}_{\mu i} C^{\sigma}_{\nu i}$ is the AO representation of the spin-resolved density matrix, ${\bf C}^{\sigma}$ is the matrix of spin-resolved MO coefficients, $n^{\sigma}_i$ is a spin-resolved occupation number of MO i, ${\bf E}^{\sigma}$ is the diagonal matrix of spin-resolved MO eigenvalues, and $S_{\mu\nu} = \int \chi_{\mu}({\bf r})\chi_{\nu}({\bf r}) {\rm d}^3r$ is the AO overlap matrix. Equation (1) is a generalized eigenvalue equation, whose solution formally scales $O(N^3)$ . This bottleneck can be overcome by fragmenting the system into parts, such that each fragment independently solves equation (1) to obtain MOs that are locally (but not globally) orthonormal. The differences between the myriad number of fragment-based electronic structure methods occurs within their technical details that define how the fragments couple to one another. These differences are discussed in section 3.

The computational effort of 'linear-scaling quantum mechanical' (LSQM) methods [19, 20], as the name suggests, scales linearly with system size. Traditional LSQMs seek to systematically reproduce the results of a fully-coupled QM calculation without introducing empirically-fit parametric corrections to the interaction energy, instead preferring to explicitly model the inter-fragment MO coupling through a buffer region or many-body expansion. Although LSQM methods scale desirably with respect to system size, they are substantially more expensive than QM/MM calculations, and they may offer significant computational advantages relative to a standard QM method only when large systems are considered. Quantum mechanical force fields [21, 22] (QMFFs) are LSQM methods that replace elements of theoretical rigor by introducing parametrically fitted functions to the interaction energy (beyond those that may be contained in the base QM method). The parametrically fitted corrections are chosen to minimize the use of explicit inter-fragment MO coupling. Consequently, QMFFs achieve higher computational savings than traditional LSQMs, and they are often faster than fully-coupled QM methods, even for small systems. Furthermore, the parametric tuning of the interactions provides a means for improving the accuracy of QMFFs for certain properties.

Although QMFFs introduce empirical functions to improve upon inter-fragment interactions, the quality of their intra-fragment interactions continues to rely upon the approximations within the underlying QM Hamiltonian. Recent advances in semiempirical and DFT methods are briefly discussed in section 2. QMFFs are methods that attempt to leverage the benefits provided by LSQMs and combined quantum mechanical/molecular mechanical (QM/MM) [6, 23] approaches to produce a fully QM method that is able to simultaneously achieve high accuracy and efficiency. The first QMFF suited for condensed phase simulation applications was Gao's 'molecular orbital derived empirical potential for liquid simulations' (MODEL) method; [24] however, the term 'quantum mechanical force field' has been used to describe this type of method only recently [21, 22]. Within the current context, QMFFs are fundamentally built on a QM framework rather than simply using quantum mechanical results to fit an empirical molecular mechanical potential, as has been used successfully in the derivation of class II [25] and other classical [26, 27] force fields. It should be pointed out that strategies for developing quantum methods for large molecular systems differ in some ways from those for solid state systems owing to the need to address distinct challenges. Solid state applications often involve strongly correlated systems characterized by networks of bonds that require specialized embedding methods [28–34] to model accurately [35]. Molecular systems, which are the focus of this topical review, interact largely through weaker non-bonding interactions. Consequently, molecular systems typically require more phase space sampling to achieve reliable condensed phase properties.

Recent applications of QMFFs and LSQMs to MD simulation, drug docking, chemical reaction profiles, electron transfer reactions, NMR chemical shifts, and IR, Raman, and vibrational circular dichroism spectra are discussed in section 6.

2. Intra-fragment QM Hamiltonians

QMFFs and LSQMs require the choice of a QM Hamiltonian through a suitable compromise between performance and the Hamiltonian's accuracy for the intended application. Most MD applications, for example, require a relatively large number of particles and a significant amount of phase-space sampling to make a meaningful comparison with experiment. Semiempirical Hamiltonian QMFFs offer the greatest promise for this purpose, at least at the present time. There are two popular classes of semiempirical models in widespread use: modified neglect of diatomic overlap (MNDO) models [13, 36, 37] and density functional tight binding (DFTB) models [38, 39]. There are advantages and disadvantages for both types of Hamiltonian models [40, 41]. Most MNDO-based models improperly model torsion potentials and nucleobase sugar puckering due to the lack of orthogonalization effects resulting from the NDDO approximation [42]. Orthogonalization corrections (e.g. the OMx methods [43, 44]) have been introduced within the NDDO framework, and have been demonstrated to be a promising improvement [45], although reported applications to molecular dynamics simulations are scarce. DFTB-based models do not make the NDDO approximation, but—unlike MNDO models—the DFTB electrostatics are modeled solely through atomic charges, without regard to higher-order multipoles. MNDO and DFTB models both suffer from the limitations resulting from the use of minimal valence basis sets [46] and the range of their applicability caused by their limited training and inherent transferability of the model parameters. These limitations have motivated the reparametrization of models for general use or for system-specific applications [15, 16, 47–51], and the development of MNDO Hamiltonians which include orthogonalization corrections [45] and improved polarizabilities [52–54]. DFTB methods have similarly been developed to improve polarizabilities [55–57] and hydrogen bond strengths [58]. Exploratory DFTB methods have improved nonbond interactions by incorporating multipolar electrostatics [59–63], and DFTB pairwise potentials continue to be parametrized to widen the scope of applications [64–69].

The calculation of infrared (IR) and Raman spectra is challenging for semiempirical Hamiltonians due to their limited AO basis sets [70]. Therefore, recourse to higher-level ab initio DFT methods are needed for these applications. Recently, there has been growing interest in the development of so-called 'range-separated' density functionals, which have shown promise for making improved prediction of x-ray adsorption modeling [71], optoelectronic properties [72], noncovalent interactions [73], and sulfur chemistry [74], but which may offer only limited improvement of transition metal reactions in comparison to standard hybrid functionals [75].

An important consideration for both semiempirical and ab initio Hamiltonians is the treatment of dispersion [76]. For example, the use of dispersion-corrected ab initio density functionals [77] and semiempirical density functional tight binding Hamiltonians [78] has been shown to improve crystal structure prediction [79, 80]. However, if the fragments are individual molecules, then QMFFs can couple the fragments with an empirical dispersion model even though the underlying QM Hamiltonian's treatment may be inadequate [81].

3. Inter-fragment coupling schemes

Many algorithms founded upon fragmentation have been developed, classified, and previously discussed [21, 22, 82–92]. In this section, we emphasize the differences between fragment algorithms through the discussion of four types of methods:

1.
The 'divide-and-conquer' (DC) LSQM method.
2.
The fragment molecular orbital (FMO) LSQM method.
3.
Force fields founded upon underlying QM calculations, such as the effective fragment potential (EFP), the sum of interactions between fragments ab initio (SIBFA), and Gaussian electrostatic model (GEM) methods.
4.
The eXplicit polarization (X-Pol) and modified divide-and-conquer (mDC) QMFF methods.

The DC method [93–101] is a LSQM approach that has seen successful use in the simulation of liquid water [102], the prediction of protein p $K_{\rm{a}}$ values [103], and the calculation of NMR chemical shifts [104, 105]. In this scheme, the system is fragmented into distinct parts, and a fragment's neighbors are considered as a 'buffer' region when constructing the fragment MOs. A Fock matrix is constructed for each fragment (including its buffer), whose diagonalization produces a set of locally orthonormal MOs. By itself, this procedure would double-count the electron density between fragments; thus, the density matrix of each fragment+buffer region is partitioned such that:

If both AOs belong to the buffer, then those density matrix elements are removed.
If both AOs belong to the fragment, then the density matrix elements remain unaltered.
If an AO pair is split between the buffer and the fragment, then the density matrix elements are halved.

The use of a buffer region within the DC method causes the fragment MOs to directly couple to their neighbors, but the nonzero Fock matrix elements are influenced by the electrostatics of the entire system. Furthermore, additional ad hoc, empirical potentials are not introduced; that is, the non-electrostatic inter-fragment coupling occurs solely through the inclusion of a buffer. The method is made more accurate by increasing the size of the buffer.

The FMO method is considered here as a LSQM [106–108] owing to its close connection to the underlying ab initio framework. Analytic gradients for the FMO method have only recently been derived [106], enabling its application in MD simulation [109, 110]. This method fragments the system into nonoverlapping components. The MOs of each system are optimized in the electrostatic environment of the other fragments. Because the fragment MOs give rise to the electrostatic environment, the procedure is iterated until both the fragment MOs and the electrostatic environment reach self-consistency. Upon convergence, the energetic effects of inter-fragment MO coupling is modeled by a many-body expansion. Specifically, the many-body corrections are computed by self-consistently solving for the MOs of a fragment cluster in the presence of the fixed electrostatic potential of the remaining monomers. In the FMO2 method, for example, the inter-fragment coupling energy of fragment-pairs are corrected in the static field of the remaining $N-2$ fragments. Similarly, the FMO3 method corrects three-body MO couplings. The method is made more accurate by increasing the order of the many-body corrections. A Variational Many-Body expansion (VMB) method has been introduced by Gao and Wang [111] that extends the basic idea of FMO by variationally optimizing each term in the many-body expansion, as opposed to performing single-point energy corrections in the static field of the remaining fragment monomers.

The EFP [112–114], SIBFA [115–119], and GEM [115, 117, 120–123] methods are substantially more similar to classical, polarizable force fields, because the the inter-fragment interactions are computed from empirical functions; however, many—if not all—of the parameters within those empirical functions are produced from the analysis of an underlying QM calculation. For example, the inter-fragment electrostatics are computed from auxiliary Gaussian multipoles or damped, distributed point-multipoles arising from a reference ab initio charge density. Similarly, the fragments polarize to one another through linear-response dipoles, whose polarizability tensor can be constructed from coupled-perturbed Hartree–Fock/Kohn–Sham theory. The ab initio polarizability tensors can be used to generate C₆ dispersion coefficients to model London forces with Tang and Toennies [124] damped dispersion potentials. And the short-range, nonclassical, nonbonded repulsion is modeled either by inter-fragment electron density overlap or as a function of MO overlap [116, 125, 126]. By using cost-effective, empirical functional forms, these methods do not intend to formally reproduce the results of a fully-coupled MO ab initio calculation. Instead, the fragment ab initio calculations are used to nonempirically determine the parameters (or substantially reduce the amount of empirical fitting) appearing within the functional forms.

The X-Pol [24, 127–132] and mDC [21, 59–61, 81] methods are QMFFs in which the system is decomposed into nonoverlapping fragments, the fragment MOs are coupled to the other fragments only through electrostatics, and the fragment MOs of all systems are variationally optimized. Specifically, each fragment diagonalizes its own Fock/Kohn–Sham matrix ${\bf F}^{\sigma, A}$ , where A indexes the fragment. The diagonalization of ${\bf F}^{\sigma, A}$ produces a set of fragment-localized MO coefficients, ${\bf C}^{\sigma, A}$ , which define a fragment-localized density matrix $P_{\mu\nu}^{A}=\sum_{\sigma, i}n_{i}^{\sigma, A}C_{\mu i}^{\sigma, A}C_{\nu i}^{\sigma, A}$ . The electrostatics of the nuclei and electron density can be modeled from atomic point multipole expansions, ${\bf q}^{A}$ , to efficiently compute the Coulomb interactions between fragments. Unlike the FMO method, the inter-fragment MO coupling is not corrected via a many-body expansion. Instead, X-Pol and mDC model the energetic effect of the inter-fragment MO coupling with empirical functions. Typically, the empirical correction for nonbonded interactions is a standard Lennard-Jones potential; however, more advanced charge-dependent van der Waals models have also been used [81]. The X-Pol energy is:

$\begin{align} \displaystyle E_{\rm{X{\rm{-}}Pol}} = \sum_A E_{\rm{QM}}^{A}({\bf P}^{A}) + \frac{1}{2} \sum_{B > A} E_{\rm{QM/MM}}^{AB}({\bf P}^{A},{\bf q}^{B}) + E_{\rm{QM/MM}}^{BA}({\bf P}^{B},{\bf q}^{A}) \nonumber \end{align} \tag{ 2 }$

where

$\begin{align} \displaystyle E_{\rm{QM/MM}}^{AB}({\bf P}^{A},{\bf q}^{B}) = \sum_{a\in A \atop b\in B} \frac{Z_a q_b}{R_{ab}} - \sum_{\mu\nu \in A \atop b \in B} P_{\mu\nu}^{A} q_b \int \frac{\chi_{\mu}({\bf r}) \chi_{\nu}({\bf r})}{\vert {\bf r}-{\bf R}_b\vert} {\rm d}^3r + E_{\rm{vdW}}({\bf R}^A,{\bf R}^B) \nonumber \end{align} \tag{ 3 }$

is the QM/MM interaction energy between fragments A and B, in which fragment A is treated as the QM region that interacts with the atomic charges in fragment B, and $E_{\rm{vdW}}$ is the van der Waals energy. The mDC energy is similarly defined:

$\begin{align} \displaystyle E_{\rm{mDC}} = \sum_A E_{\rm{QM}}^{A}({\bf P}^{A}) + \sum_{B > A} E_{\rm{inter}}({\bf q}^{A},{\bf q}^{B}) \nonumber \end{align} \tag{ 4 }$

where the interaction energy is computed directly from the atomic multipole expansions and van der Waals potential.

$\begin{align} \displaystyle E_{\rm{inter}}({\bf q}^{A},{\bf q}^{B}) = \sum_{l\mu \in A \atop j\kappa \in B} q_{l\mu}^{A} q_{j\kappa}^{B} \frac{C_{l\mu}(\nabla_a) }{(2l-1)!! } \frac{C_{j\kappa}(\nabla_b) }{(2j-1)!! } \frac{1}{R_{ab}} + E_{\rm{vdW}}({\bf R}^A,{\bf R}^B) \nonumber \end{align} \tag{ 5 }$

$C_{l\mu}(\nabla_a)$ is a real-valued spherical tensor gradient operator (STGO) acting on the coordinates of atom a. A STGO is a solid harmonic function, whose Cartesian coordinate arguments have been replaced by Cartesian gradient operators. At the present time, the mDC method has only been used in conjunction with the DFTB3 semiempirical Hamiltonian [15, 16, 39]. A standard DFTB3 calculation computes the long-range electrostatics via an auxiliary Slater monopole representation of the density, and the multipolar character of the interactions is only treated in the short-range regime within the tight-binding matrix. Therefore, the use of DFTB3 within the X-Pol formalism would limit the inter-fragment Coulomb interactions to those between atomic charges, and the explicit multipolar interactions between fragments would be completely neglected because of the fragment-localized MOs. Consequently, it has been shown that water clusters lose their hydrogen bond angles, instead choosing to adopt TIP3P-like [133], planar hydrogen bonding motifs [21, 59, 61]. To correct for this deficiency, the mDC method concocts a new, auxiliary atomic multipole representation of the fragment density that is solely used for the inter-fragment Coulomb interactions.

When the fragments sever covalent bonds, X-Pol makes use of the Generalized Hybrid Orbital QM/MM method [134–139], and mDC makes use of the link-atom approach [140–143], although other capping strategies have been applied in similar QMFFs [144–146]. Ideas have been proposed for nonempirically incorporating short-range nonbond repulsions by introducing inter-fragment orthogonality restraints into X-Pol [147] but practical applications of those ideas have not yet been explored.

It is worth mentioning that an alternative approach to QMFF development based on more reactive force fields that employ purely empirical potentials has been valuable. Most notably is the ReaxFF [148] framework that uses classical interatomic potentials to model chemical reactions [149–153] in large, complex systems. The ReaxFF does not attempt to solve explicitly for the electronic structure, and so is different conceptually from the methods described in the present topical review, but nonetheless is sufficiently computationally efficient to be applied in very large-scale molecular simulations [154].

4. Long-ranged electrostatic interactions

The primary source of inter-fragment MO coupling within QMFFs occurs indirectly through the Coulomb interactions between the fragments, and the manner in which the interactions are performed is thus important. For example, applications of the mDC method found that the inclusion of atomic dipoles and quadrupoles were paramount for obtaining proper hydrogen bond angles in water clusters [21, 59, 61]. Furthermore, condensed phase simulations of QMFFs require a treatment for the long-range electrostatic interactions, such as Ewald's method [155]. Ewald's method has seen widespread adoption within standard MM software packages after the development of particle mesh Ewald (PME) [156–159] because electrostatic cutoff methods have been found to produce structural artifacts in biomolecules [160–162] and in the simulations of thermodynamic properties of water [163–166]. Because the PME method was originally designed for static atomic charges, the literature has witnessed a recent resurgence in interest toward the development of long-range electrostatic models for atomic multipoles [167–169], semiempirical Hamiltonians [170, 171], an ab initio calculations [172–177].

The usage of MNDO models within the X-Pol QMFF naturally affords the adaptation of the semiempirical PME method [171] originally developed for QM/MM calculations [170, 178, 179], whereas the use of high-order atomic multipoles within the mDC model motivated the development of a multipolar PME method [60, 167]. It was found that the inclusion of atomic dipoles and quadrupoles slowed the mDC calculations by only a factor of 1.5 relative to a charge-only model [167]. In contrast, the semiempirical QM/MM PME method [170, 178, 179] has been found to not be readily amenable to ab initio Hamiltonians because the use of simple charge-mapping procedures, such as Mulliken or Löwdin charge analysis, causes SCF convergence instabilities or spurious electron polarizations when non-minimal basis sets are used [173, 174]. In order to overcome this problem, some have resorted to using CHELPG [180] charge-fitting procedures specifically to avoid the development of a wholly new Ewald method [173, 174, 181–185]. Others groups have opted to simply perform real-space electrostatics within a large (22 Å) cutoff that smoothly dampens the electrostatic interactions [175, 176], or to represent the long-range electrostatics by choosing a set of charges on a discretized sphere [177], thereby eliminating (or reducing) the necessary changes to the QM software package. Another approach to avoid using Ewald's method with ab initio Hamiltonians include the reaction field method [186, 187]; however, the predominant choice continues to be the use of real-space electrostatic truncation [127, 146, 188–194].

Very recently, a new type of ab initio QM/MM PME method—the ambient-potential composite Ewald (CEw) method—has been introduced [172]. The CEw method allows the ab initio nuclei and electron density to interact with the environment without resorting to an atomic charge representation of the QM density. Furthermore, the method does not require more plane waves than what are used in traditional MM calculations. Explicit calculation of the AO-product Fourier coefficients are avoided by numerically integrating the reciprocal-space Ewald potential via the molecular quadrature grid used to evaluate the DFT exchange-correlation potential. Additionally, the CEw method models the QM region's periodic replicas as static MM point charges, so that the Ewald contribution to the Fock matrix can be performed once, before the SCF procedure begins. In this sense, the evaluation of the long-range electrostatics within the CEw method becomes analogous to a pre-SCF evaluation of a local density approximation DFT exchange-correlation potential. In that work [172], it was argued that the cost associated of the underlying ab initio method wouldn't be fully taken advantage of if the electrostatics were to be performed merely through atomic charges. Furthermore, the CEw method circumvents the SCF instabilities and spurious polarizations that occur when simple charge-mapping procedures are used. Finally, the authors found that potential of mean force (PMF) profiles and radial distribution functions were sensitive to the treatment of long-range electrostatics. Specifically, electrostatic truncation and switching can cause artificial solvation shells to appear near the electrostatic cutoff, which changes the free energy difference as a dianionic solute dissociates into two monoanionic components. The CEw method is immediately amenable to an ab initio VMB fragment method in which the monomer fragments were modeled by a MM force field.

5. Illustrative examples

To illustrate the computational performance of ab initio density-functional models within fragment-based calculations, we consider two systems: metenkephalin, and a UpA dinucleotide (U5A3), shown in figure 3, to examine the performance and resource efficiency as the solute molecule is fragmented into smaller parts.

**Figure 1.** Atomic fluctuations of a n,n-dimethylglycine crystal from 220 ps of NVT simulation at 275 K using the mDC QMFF.
Download figure:
Standard image High-resolution image

**Figure 2.** (a) Comparison of water cluster interaction energies to high-level *ab initio* results (complete basis set extrapolated RI-MP2 with CCSD(T) corrections). Reference energies taken from [219]. (b) Comparison of liquid water bulk densities. Experimental values taken from [220] and [221]. (c) Condensed phase water pair distribution functions. Experimental values taken from [222].
Download figure:
Standard image High-resolution image

**Figure 3.** The fragmentation schemes used for metenkephalin (left column) and U5A3 (right column). The top row is a single fragment. The red lines indicate the QM/MM boundaries. The metenkaphalin is fragmented into 1, 3, and 6 fragments. The U5A3 system is fragmented into 1, 2, 4, and 5 fragments.
Download figure:
Standard image High-resolution image

The metenkephalin solute is solvated in a 42³ $\mathring{\rm A}^3$ box of TIP4P/Ew waters. The U5A3 system is solvated in a 46³ $\mathring{\rm A}^3$ box of TIP4P/Ew waters and an ion environment of 7 Na $^+$ and 6 Cl⁻. These systems were prepared by performing pure-MM simulations (1 fs/step) for 1 ns at 298 K and 1 atm to equilibrate the density. This was followed by 5 ps of NVT equilibration using QM/MM (1 fragment), where the solute was treated quantum mechanically with PBE0/6-31G*. The equilibrated systems were then used to compare the fragment calculations described below.

The total potential energy of the fragment-based calculations is given by equation (6):

$\begin{align} \displaystyle \label{E:FirstOrder} E({\bf R}) = E_{\rm{MM}}({\bf R}) + \sum_{A} [E_{\rm{QM/MM}}^{A}({\bf R};{\bf P}^{A})-E_{\rm{MM}}({\bf R})] \nonumber \end{align} \tag{ 6 }$

In other words, the system's MM energy is modified by computing QM/MM corrections for each fragment. A fragment's energy correction is computed by performing a QM/MM energy calculation, in which the fragment is the QM region and the remainder of the system is MM. The correction thus replaces the fragment's MM self-energy with its QM self-energy and further replaces the fragment's MM interactions with the remainder of the system with the corresponding QM/MM interactions. Equation (6) can be viewed as being analogous to a first-order FMO method; however, the fragments polarize to the MM electrostatic field rather than the field of fixed charge densities resulting from independent gas-phase QM calculations. Equation (6) differs from X-Pol, mDC, and higher-order FMO methods in that the fragment calculations are decoupled from each other. Therefore, the series of QM/MM SCF procedures are independent of one another. We compute all QM/MM calculations in tandem on different CPU cores, and the number of cores assigned to each QM/MM calculation is empirically adjusted so that each calculation completes in approximately the same amount of time.

Table 1 provides analysis of NVT simulation timings using equation (6) with PBE0/6-31G* (and the CEw method for long-range electrostatics [172]) for the fragmentation schemes shown in figure 3. The $N_{\rm{QM}}/N_{\rm{CPU}}$ notation specifies the number of QM + link atoms and the number of CPU cores used for a fragment. The '0/1' entries are pure-MM calculations performed on a single core. The row marked 'ps ${\rm d}^{-1}$ ' is the amount of simulation achieved in one day using a 1 fs time step. Larger values of ps ${\rm d}^{-1}$ indicate greater simulation performance. The row marked 'SU ${\rm ps}^{-1}$ ' are the number of service units (core-hours) required to perform 1 ps of simulation. Smaller values of SU ${\rm ps}^{-1}$ indicate a more efficient use of the compute resources.

Table 1. PBE0/6-31G* simulation performance and resource efficiency of the metenkephalin and U5A3 systems as a function of fragmentation (see figure 3).

$N_{\rm{frag}}$	Metenkephalin			U5A3
$N_{\rm{frag}}$	1	3	6	1	2	4	5
$N_{\rm{QM}}/N_{\rm{CPU}}$	75/96	29/96	16/96	62/96	36/144	26/132	18/120
		29/96	16/96		28/95	15/60	15/96
		21/95	15/96		0/1	15/24	15/60
		0/1	15/96			12/23	12/59
			12/96			0/1	10/48
			11/95				0/1
			0/1
sec/step	20.1	3.8	0.5	18.5	6.9	1.6	0.6
ps ${\rm d}^{-1}$	4.3	23.0	168.1	4.7	12.5	54.9	154.1
SU ${\rm ps}^{-1}$	535.8	300.5	82.2	490.2	460.8	104.9	59.8

With 96 cores, each time step takes 20.1 s for the full metenkephalin molecule. This time is reduced by a factor of 40 when using the highest fragmentation scheme (6 fragments). The increase in throughput would allow up to 168 ps of simulation per day. Similar timing results are apparent for the U5A3 system. Clearly, the fragmentation scheme greatly influences advantage of the fragmentation the computational efficiency: the smaller the fragments, the less Fock matrix elements that need to be constructed, and the smaller the matrices that need to be diagonalized.

The fragmentation schemes shown in figure 3 are different models for the total energy, and they will each yield their own set of atomic forces. Table 2 compares the extent to which fragmentation alters the net forces. Specifically, the tabulated values are time-averaged relative force differences (RFDs):

$\begin{align} \displaystyle \label{E:RFD} {\rm RFD} = \frac{1}{N_{\rm{frames}}} \sum_{t=1}^{N_{\rm{frames}}} \sqrt {\frac{\vert {\bf f}_t - {\bf f}_{\rm{ref},t} \vert^2 } {\vert {\bf f}_{\rm{ref},t} \vert^2 } } \nonumber \end{align} \tag{ 7 }$

where ${\bf f}_t$ is the array of net atomic forces evaluated at frame t of a reference trajectory consisting of $N_{\rm{frames}}$ frames, and ${\bf f}_{\rm{ref}, t}$ is a corresponding array of forces evaluated at the same set of coordinates using a reference Hamiltonian. The reference trajectories consist of 1000 frames taken from a 100 ps NVT simulation performed at the highest level of fragmentation (6 for metenkephalin and 5 for U5A3). The forces were obtained by re-evaluating each Hamiltonian at every frame. The 'all-atom' RFDs include the forces from the solute and solvent atoms; that is, the length of the force arrays include all atoms in the system. The 'solute atom' RFDs are computed from a petite list of forces that excludes the MM solvent. The 'non-boundary solute atom' RFDs further excludes the solute atom pairs whose bond is severed by the fragmentation. The 'non-boundary, non-adjacent solute atom' RFDs exclude the fragment boundary atoms and those atoms covalently bonded to the boundary atoms. The columns labeled 'MM' are the forces generated from the Amber ff14SB force field [195], which have been included for comparison.

Table 2. Relative force differences (equation (7)) between the PBE0/6-31G* fragmentation schemes (see table 1) and various reference Hamiltonians. The reference force evaluations do not fragment the QM region. The force differences between the MM force field and the references are shown for comparison.

${\bf f}_{\rm{ref}}$	Metenkephalin				U5A3
${\bf f}_{\rm{ref}}$	1	3	6	MM	1	2	4	5	MM
All-atom relative force differences
${\bf f}_{\rm{PBE0/6-31G*}}$	0.00	0.05	0.05	0.05	0.00	0.01	0.03	0.03	0.06
${\bf f}_{\rm{PBE/6-31G*}}$	0.03	0.06	0.05	0.06	0.03	0.04	0.05	0.05	0.08
${\bf f}_{\rm{PBE0/6-311G**}}$	0.01	0.05	0.05	0.05	0.01	0.02	0.03	0.04	0.06

Solute atom relative force differences
${\bf f}_{\rm{PBE0/6-31G*}}$	0.00	0.26	0.28	0.49	0.00	0.11	0.25	0.29	0.56
${\bf f}_{\rm{PBE/6-31G*}}$	0.27	0.38	0.41	0.60	0.30	0.32	0.40	0.42	0.69
${\bf f}_{\rm{PBE0/6-311G**}}$	0.09	0.27	0.29	0.48	0.11	0.16	0.28	0.32	0.56

Non-boundary solute atom relative force differences
${\bf f}_{\rm{PBE0/6-31G*}}$	0.00	0.21	0.22	0.49	0.00	0.08	0.15	0.20	0.56
${\bf f}_{\rm{PBE/6-31G*}}$	0.27	0.35	0.37	0.60	0.30	0.31	0.35	0.36	0.69
${\bf f}_{\rm{PBE0/6-311G**}}$	0.09	0.23	0.24	0.48	0.11	0.13	0.19	0.23	0.56

Non-boundary, non-adjacent solute atom relative force differences
${\bf f}_{\rm{PBE0/6-31G*}}$	0.00	0.17	0.16	0.49	0.00	0.06	0.09	0.12	0.56
${\bf f}_{\rm{PBE/6-31G*}}$	0.27	0.27	0.30	0.60	0.30	0.31	0.35	0.37	0.69
${\bf f}_{\rm{PBE0/6-311G**}}$	0.09	0.21	0.22	0.48	0.11	0.13	0.16	0.19	0.56

As expected, fragmentation of the solute changes the forces, and the extent of the change increases as the solute is carved into smaller fragments. The solute atom RFDs suggest that the changes incurred by fragmenting the PBE0/6-31G* solute are similar in magnitude to the changes observed upon switching to the PBE/6-31G* Hamiltonian without fragmentation. The contribution of fragmentation to the solute atom RFDs are generally larger than having increased the size of the AO basis to PBE0/6-311G** without fragmentation. Furthermore, the fragment boundary atoms and the atoms to which they covalently bonded contribute disproportionately to the solute atom RFDs. Therefore, the solute atom RFDs could be significantly improved by specifically parametrizing the MM bond, angle, and torsion potentials that cross the fragmented bond. Table 2 evaluated these nonelectrostatic MM correction terms using the Amber ff14SB force field without specific reparametrization for this study.

To illustrate how long-range electrostatics affects the stability of fragment calculations, Figure 4 uses the metenkephalin peptide system by examining two NVE simulation trajectories that differ in how the Ewald summation is performed. Both trajectories are begun from the same initial conditions and both are evaluated using equation (6) and the 3-fragment scheme shown in figure 3 with the PBE0/6-311G** QM Hamiltonian. The only difference between the two trajectories is the use of either the Mulliken-charge QM/MM-Ewald method [170, 178, 179] or the Composite Ewald method [172] for modeling the electrostatics.

**Figure 4.** Energy profiles and selected snapshots from NVE simulations of the metenkaphalin peptide system evaluated using the Mulliken-charge QM/MM-Ewald and Composite Ewald methods for long-range electrostatics. $E_{\rm{tot}}$ , $E_{\rm{kin}}$ , $E_{\rm{pot}}$ are the changes in the total, kinetic, and potential energy relative to the first step. The colored molecules are from the Mulliken-charge QM/MM-Ewald simulation. The black-and-white molecules are from the Composite Ewald simulation.
Download figure:
Standard image High-resolution image

Figure 4 emphasizes a subtle problem that can arise in molecular dynamics simulations using QMFFs in the condensed phase with Ewald electrostatics. If charge or multipolar mapping procedures are used to model electrostatic interactions between fragments, these mapping procedures may become vulnerable to instabilities under certain circumstances, such as with large basis sets (figure 4). The result is that simulations can become unstable and fail, unless special methods—such as the CEw method—are used.

6. Survey of applications

Although LSQM and QMFF methods continue to be rapidly developed, a wide range of applications has already been reported. Whereas LSQMs have been useful for many problems, the efficiency, and in some cases the accuracy of QMFFs have the potential to considerably extend the range of applications, particularly in condensed phase molecular simulations. Fragment-based methods have seen recent use within molecular dynamics simulations [59, 60, 109, 146, 196, 197], and to explore the importance of many-body interactions [53, 198]. The FMO and mDC methods have been used in drug docking applications of cyclin-dependent kinase 2 inhibitor [59, 199], and to perform PMF profiles of chemical reactions [60, 200]. Proof-of-principle demonstrations have been performed in the development of models that afford electron transfer between fragments [201, 202]. There has been growing interest in using fragment methods—especially within the context of large biomolecules, such as the protein crambin, and crystal polymorphs of small drug-like compounds—for predicting spectra, including: IR [203], Raman [204], vibrational circular dichroism [205], and NMR chemical shifts [206–209]. Electrostatically embedded generalized molecular fractionation with conjugate caps (EE-GMFCC) method [85] has been applied to a number of macromolecular systems, including simulations of proteins [146], and calculations of RNA [210] and ion-water clusters [211]. Molecular simulations using the frozen domain formulation of the FMO method have been performed to examine S_N2 reactions in explicit solvent and protein-ligand binding free energies for the Trp cage protein [110]. ONIOM-based fragmentation algorithms have been applied within a hybrid extended-Lagrangian, post-Hartree–Fock Born–Oppenheimer ab initio molecular dynamics scheme to study protonated water clusters and polypeptide fragments [196]. FMO methods have been further applied using tight-binding models in nonadiabatic dynamics simulations to model charge transfer in the subphtalocyanine(SubPc)/ ${\rm C}_{60}$ heterojunction [212].

The majority of QMFF applications to condensed phase simulations reported to date have utilized fast, approximate QM methods for construction of the intra-fragment QM Hamiltonian. The tremendous performance advantages afforded by these methods enable simulations to be performed on realistic-sized model systems of sufficient size and over time scales on time scales that yield meaningful statistical properties to be measured.

6.1. Water clusters and liquid water

Bulk water is the most common benchmark system for study in the liquid phase. It is well known that conventional force field models have difficulty in simultaneously describing water in environments that are outside the scope of their parmaterization, which often are restricted to bulk water properties. For example, the TIP4PEw model for water [164] performs exceptionally well for bulk properties of water, such as the oxygen–oxygen pair distribution function, especially considering the extremely simplistic form of the MM potential. However, this model does not perform well for intermolecular interactions in small water clusters [21, 60] (figure 2). The mDC model, however, not only performs equally well as TIP4PEw for the pair distribution function, it also moderately improves the temperature-dependent density of liquid water, and most strikingly, maintains close agreement with the geometries and binding energies of small water clusters. This illustrates the improved robustness of the QMFF potential to adapt to different environments due mainly to its explicit inclusion of quantum many-body response. Car–Parrinello simulations of liquid water have been performed with various DFT functionals in a plane-wave AO basis [213–215], and all of these studies inherently include explicit quantum many-body response; however, their O–O radial distribution functions are substantially worse than TIP4PEw. Specifically, Car–Parrinello DFT simulations produce radial distribution functions that are too structured [213–215]. This should not be surprising because TIP4PEw has been specifically parametrized to reproduce bulk properties of liquid water, whereas the DFT methods were not. QMFFs achieve high accuracy by parametrically tuning the intermolecular interactions; therefore, a QMFF may become more accurate than its underlying QM method for those properties that it is parametrized to reproduce.

6.2. Small molecule crystals

As an example of the solid state, QMFFs have been demonstrated in simulations of small-molecule crystals. Small molecule crystals are of pharmaceutical interest, as different crystalline polymorphs can result in different pharmicokinetic properties [21], and further are useful to study as fundamental benchmark systems that have experimentally well-characterized atomic structure and fluctuations. Figure 1 illustrates the average structure and atomic fluctuations for a crystal supercell of n,n-dimethylglycine containing 18 independent replicas of the fundamental unit cell (16 DMG molecules per unit cell, 288 DMG molecules in the supercell). Simulations were performed in the NVT ensemble with a 1 fs integration time step carried out to 220 ps at 275 K and using the mDC QMFF model with DFTB2 intra-fragment Hamiltonian, as described in detail elsewhere [60]. The average asymmetric unit structure from the crystal simulation had a root-mean-square (rms) displacement of 0.096 Å with the crystal structure, and the calculated fluctuations were highly correlated with the experimental Debye–Waller factors (B-factors). The mDC method with multipolar Ewald achieved a simulation rate of 160 ps per day using 16 cores (Intel Xeon E5-2670 @ 2.60 GHz), which is impressive for a large 4608-particle fully quantum mechanical simulation.

7. Challenges for the future

QMFFs offer a host of potential advantages—and disadvantages—relative to potential competing methods for condensed phase simulations. In comparison with LSQMs, for example, QMFFs are considerably faster using the same base level of theory, have no significant break-even point, and offer parametric freedom that can be tuned to obtain potentially more accurate condensed phase properties. However, the fragmentation approach that enables QMFFs to have greater computational speed-up also poses challenges in modeling reactive chemical events that involve bond formation and cleavage to occur between fragments. Thus, a direction of future work will involve creation of adaptive fragmentation approaches that allow general reactivity between fragments to occur smoothly, while maintaining a high level of computational efficiency.

In comparison with empirical force fields, including polarizable force fields, QMFFs have fewer parameters, particularly for bonded intrafragment interactions, and thus robust models can potentially be developed with greater ease. This advantage is useful for unconventional molecular systems (such as encountered in drug design) where empirical force field parameters may not be well tested, or in some cases may not even be available. Nonetheless, parameterization against condensed phase properties will still require considerable effort, particularly because the software infrastructure for performing simulations with QMFFs is still in its infancy. Along these same lines, extension of QMFFs to polymers that are covalently linked will require additional parameterization effort to ensure correct conformational distributions around rotatable bonds connecting fragments. Hence, a future area for research will involve developing robust automated methods for parameterizing and testing QMFFs for non-bonded interactions of arbitrary molecules, and general methods for determination of empirical terms required to connect fragments in polymer systems, and particularly for biopolymers.

While QMFFs offer tremendous speed advantages relative to fully QM methods, including LSQMs, they remain computationally inefficient relative to traditional force fields. Hence, an important area of future work will be to improve the performance of QMFFs through new methodology and software development, including optimization for acceleration on specialized hardware such as graphical processing units. Much effort has been made to develop GPU-accelerated software for molecular dynamics and quantum chemistry calculations. The latter currently poses special challenges due to the typically required limited precision of the GPUs, however the field remains dynamic, and this issue may become less of a restriction in the future with improvement of double precision performance.

Finally, the application scope of QMFFs within a multiscale modeling paradigm is yet to be ascertained. Ultimately, QMFFs likely will see their greatest application within a multiscale modeling framework, as opposed to as a substitution for conventional quantum methods and force fields. Potential directions for future work involve exploring combined QM, QMFF and MM models to optimize accuracy and performance for specific applications. As a further example, QMFFs could prove advantageous in refinement of MM free energy perturbation calculations when used as MM→QMFF endpoint corrections, as has been the recent focus of QM/MM free energy calculations [216–218]. Potential applications that likely will benefit from this approach include: (1) simulations of molecular crystals, (2) calculation of IR, Raman and NMR chemical shifts, and (3) docking of non-covalent and covalent ligands and drugs, (4) free energy calculations of metal-ion coupled pK_a shifts and redox potentials, and (5) combined QM/QMFF simulations of biocatalysis.

Acknowledgments

The authors are grateful for financial support provided by the National Institutes of Health (No. GM107485). Computational resources were provided by the National Institutes of Health under Grant No. S10OD012346 and by the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. OCI-1053575 (Project No. TG-MCB110101).

Quantum mechanical force fields for condensed phase molecular simulations

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

Dates

Peer review information

Abstract

1. Introduction

2. Intra-fragment QM Hamiltonians

3. Inter-fragment coupling schemes

4. Long-ranged electrostatic interactions

5. Illustrative examples

6. Survey of applications

6.1. Water clusters and liquid water

6.2. Small molecule crystals

7. Challenges for the future

Acknowledgments

Quantum mechanical force fields for condensed phase molecular simulations

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

Dates

Peer review information

Abstract

1. Introduction

2. Intra-fragment QM Hamiltonians

3. Inter-fragment coupling schemes

4. Long-ranged electrostatic interactions

5. Illustrative examples

6. Survey of applications

6.1. Water clusters and liquid water

6.2. Small molecule crystals

7. Challenges for the future

Acknowledgments