A First Look at Lattice Effects in Coarse-Grained Protein Models via Wang-Landau Simulations

In order to study the effects of lattice constraints on coarse-grained protein models, we apply Wang-Landau sampling to the continuum analogue of the hydrophobic-polar (HP) lattice protein model. The continuum version is inspired by the AB polymer model but incorporates potentials chosen specifically to mimic those of the lattice case. Because of their relative simplicity, both the lattice and continuum models offer significant computational advantage over all-atom simulations, but the impact of the additional lattice constraint on generic folding behavior is unknown. In this preliminary study, we compare and contrast thermodynamics during the folding process of the continuum model to the original HP lattice protein model for sequences mapped from Crambin, a 46 amino acid plant protein. We find that the folding process for both of these coarse-grained models is quite similar, with major structural transitions occurring at almost exactly the same temperatures.


Introduction
Protein folding has been studied for over 50 years, but we still cannot explain how such complex biomolecules fold so efficiently [1]. Coarse-grained models analyzed using statistical mechanics have greatly increased our understanding of protein folding [2], but it is not clear how much detail to include [3,4]. Thus, one persistent issue is deciding where the ideal balance between model simplicity and physical realism lies. Among coarse-grained models, one substantial difference is that of a lattice vs. continuum representation of the spatial degrees of freedom. The simplification of lattice models allows for efficient computation and more precise interactions, but clearly cannot represent all spatial configurations. Continuum models have the perceived advantage of full spatial resolution, but suffer from the complications associated with empirical potentials and higher computational cost.
The simplest protein models incorporate a reduction of the 20 amino acids into a limited number of types, such as in the hydrophobic-polar (HP) lattice model [5,6], and the continuum AB model [7]. Both of these models use monomers to represent only two distinct amino acids types (H,P or A,B), allowing for a reduction in the different kinds of interactions, but a seemingly substantial difference remains with regard to spatial representation (lattice vs. continuum). It has been difficult to quantify the effects associated with lattice constraints on such models because the potentials used in the HP and AB models are often different enough  Figure 1. Sample HP lattice model configuration. Hydrophobic and polar monomers are colored in gray and orange, respectively. The interaction between non-bonded hydrophobic monomers, such as monomers 4 and 9, contributes energy HH . In total, there are 3 HH contacts (n HH ). that discrepancies in the results can be attributed to the potentials themselves.
In order to efficiently sample the complex free-energy landscape associated with coarsegrained protein models, here we use Wang-Landau sampling [8,9] to offer a first look into the effects of lattice constraints on coarse-grained models by comparing and contrasting the original HP lattice model to an off-lattice analogue, inspired by the AB model, but for which the potentials are chosen to mimic the lattice case. We compare these models for sequences mapped from Crambin -a small (46 amino acid) plant protein [10,11]. One note regarding the HP model is that it is an NP-complete optimization problem and has been used as an algorithmic test-bed over a wide range of disciplines; see [12] for an overview of these studies.

HP Lattice Model
The HP model simplifies the complexity of protein folding by classifying each of the twenty amino acids as either hydrophobic (H) or polar (P), based on their desire to interact with water [5,6]. The protein is then modeled as a polymer performing a self-avoiding random walk with H and P monomers placed at sites of a simple cubic lattice, automatically accounting for excluded volume effects found in real proteins. To model the hydrophobic "driving force", each non-bonded, nearest neighbor HH contact contributes energy HH (= 1 in this study), resulting in the following Hamiltonian: where n HH is the number of non-bonded HH contacts. Thus, each such contact lowers the internal energy of the system, and leads to the formation of a hydrophobic core in the native state, as in real proteins. The major advantages of this model are that it allows for efficient sampling of conformation space and avoids the uncertainty associated with empirical potentials. Fig. 1 shows a sample HP model configuration with energy E = −3 based on three HH interactions.

HP Model Continuum Analogue
The AB continuum protein model shares many similarities with the HP model, such as the same reduced amino acids and simple interactions. However, the potentials in the AB model are not constructed to mimic the HP model case, and thus side-by-side comparisons are never made. In this study we introduce a continuum analogue to the HP model, where non-bonded hydrophobic-hydrophobic interactions are modeled via a Lennard-Jones potential [13], bonds are connected via a finitely-extensible-nonlinear-elastic (FENE) potential [14], and all other monomer interactions are purely repulsive -using the repulsive component of the  Figure 2. Sample continuum HP model configuration. Hydrophobic and polar monomers are colored in gray and orange, respectively. The interaction between non-bonded hydrophobic monomers, such as monomers 4 and 9, contributes energy HH V LJ . The interaction between non-bonded, non-hydrophobic monomers, such as monomers 1 and 6, contributes energy V EV . The interaction between bonded monomers, such as monomers 8 and 9, contributes energy V F EN E .
Jones potential, to mimic the excluded volume effects inherent to the lattice HP model. Thus, our continuum Hamiltonian is given as follows: where and where the values HH and r 0 set the energy and length scales of our system, respectively, and in this study are both set to 1 using dimensionless units, r ij is the distance between nonbonded monomers i and j, r bond is the distance between bonded monomers, and r other is the distance between non-bonded non-hydrophobic pairs. We choose for the stiffness k = 120 r 2 0 (to achieve similar stiffness as in [15]), and for the the maximum displacement from equilibrium r max = 1.2r 0 . Moreover, we use a truncated, shifted Lennard-Jones potential such that V LJ (r ≥ r c ) = 0, where the cutoff distance r c = 3.0r 0 . One important note is that the reference energy value, HH , is set to 1 in both the lattice and continuum case. Fig. 2 shows a sample continuum HP model configuration and Fig. 3 shows a plot of the continuum potentials used in this study. In contrast to a study using a similar model investigating homopolymer systems, where bonds were connected via a combination of FENE and Lennard-Jones potentials [15], we have found the quantitative differences to be quite small when using only the FENE potential between bonds and thus follow the latter approach here for simplicity.
In this study we investigate Crambin, a slightly hydrophobic protein found in cabbage [10,16]. Due to its relatively small size (N = 46 amino acids), Crambin is one of the more computationally tractable proteins to model. Here we use the same HP model mapping as in [17].

Wang-Landau Sampling
Traditional Monte Carlo methods like Metropolis sampling perform poorly at low temperatures and fail for systems with complex free energy landscapes. Wang-Landau sampling [ 8,9] is an iterative Monte Carlo method that obtains a system's density of states (DOS), g(E), by performing a random walk in energy space. During the simulation, a trial move to change the system from state A to B is proposed, and accepted with probability where g(E A ) and g (E B ) are the current estimates of the DOS for states A and B, respectively. After a trial move where state n is accepted, where f is a modification factor. At this point, a histogram H(E) of visited system states updated via H(E n ) → H(E n ) + 1 after state n is accepted. The histogram is checked at regular intervals, and reset to zero when all system states have been visited such that all histogram entries are at least 80% of the average histogram value. The modification factor is then updated via f → √ f , and this process is repeated until f < f min , some predefined minimum value. Since the Wang-Landau acceptance probability is independent of temperature, this sampling scheme is much more efficient at sampling systems with complex free energy landscapes.
In this study, f was initially set to e 1 , and f min = exp(10 −6 ). In the lattice model, our trial moves were chosen randomly such that the percentage of each move type were as follows: 75% pull moves [18], 23% bond-rebridging moves [19], and 2% pivot moves [20]. This combination of Monte Carlo moves has proven very efficient in studying lattice proteins together with Wang-Landau sampling [12,21]. In the continuum model, we use N diffusion moves for one of each of the following: reptation, bond-rebridging, crankshaft, and random pivot moves.

Thermal Properties and Protein Structures
With g(E) obtained from Wang-Landau sampling, the partition function (Z) can be computed at any temperature (T ) simply as where E is the system energy, and k B is the Boltzmann constant (= 1 in this study). We then calculate the ensemble average of any quantity Q as follows: The heat capacity (C V ) can be calculated from fluctuations in the energy via  Figure 4. Specific heat curves for the lattice and continuum HP models for Crambin, with characteristic structures shown relative to peak locations. Error bars smaller than the points are not shown.
The specific heat (C V /N ) offers insight into the folding process, with peaks indicating structural transitions. In smaller proteins, the collapse process occurs in two steps [22]: Firstly a coil-globule transition, in which the protein folds from a random coil to a globule. Secondly, the "folding transition" consists of the protein becoming even more compact, and ultimately relaxing to the native state. Fig. 4 shows the specific heat curve for the both the lattice and continuum HP models of Crambin, along with characteristic structures relative to the peak locations. Both models exhibit these two structural transitions (consistent with all-atom simulations of Crambin [23]): the lattice HP model displays a peak and broad shoulder, but in the continuum case they both appear as peaks. While the magnitude of these curves differ between models, the temperature scale is common, and thus the "transition" temperatures (and overall results) are strikingly similar. Note that although the specific heat curve for the continuum model appears to go to zero at low temperature, this is a non-physical result of cutting the sampled energy range off at a certain minimum. The true specific heat will level off to a value given by the equipartition theorem at zero temperature. In the lattice case, the drop-off to zero is because there is a gap between the ground-state and first-excited state energy.
The main difference between the two models is the emergence of a faint transition signal in the form of a small shoulder between the two peaks in the continuum case. We have not found substantial numerical evidence in any order parameter to explain this shoulder (no signal in the radius of gyration, radial distribution function, end-to-end distance, or core density), and thus this shoulder still remains somewhat mysterious. Its origin is currently under investigation and will be included in a further study.

Conclusion
We have used Wang-Landau sampling to offer a first look into lattice effects on coarse-grained protein models by examining the original HP model and its continuum analogue for a sequence mapped from Crambin. Analysis of the specific heat curves reveals that the folding process shares many similarities between the two models. The coil-globule transition occurs at almost exactly the same temperature and is marked by a peak in both cases. While the specific heat displays quantitative differences at the folding transition (peak vs. shoulder), the qualitative folding behavior is the same and the temperature at which this transition occurs is similar for both models. This suggests that the lattice constraint does not have a substantial impact on the generic folding behavior in this case. A more detailed study is in progress in which we aim to uncover the quantitative meaning of the small shoulder that appears in the continuum case, as well as investigate lattice effects on semi-flexible protein models.