Replica Exchange Wang–Landau Simulation of Lattice Protein Folding Funnels

. The resolution of Levinthal’s paradox concerning the ability of proteins to fold rapidly postulates the existence of a rough “folding funnel” in free energy space that guides the protein to its lowest free energy, native state. To study the folding of the protein ribonuclease A we have mapped it onto a 124 monomer, coarse-grained HP lattice model and onto an H0P model that also includes “neutral” 0-mers in addition to the hydrophobic H-mers and polar P-mers. Using Replica Exchange Wang-Landau sampling, we determined the density of states, g(E), which we then used to calculate the free energy of the protein vs end-to-end distance as a function of temperature. At low temperature the HP model shows a rather shallow and ﬂat free energy minimum, while the H0P model maintains a rough free energy funnel. Unlike the common, schematic ﬁgures, we ﬁnd an asymmetric folding funnel that also changes shape substantially as the temperature decreases. Even the location of the free energy minimum shifts as the temperature decreases.


Introduction
Understanding protein folding remains a Grand Challenge problem of modern science. The protein folding funnel concept [1,2], which is believed to be the explanation of the protein folding problem, has always been portrayed schematically as a function of some unknown reaction coordinate (as seen in Fig. 1) and has never actually been observed. However, due to the complexity of the problem, investigation is only possible through the combinations of coarse-grained models and advanced computational methods. The hydrophobic-polar (HP) lattice protein model [3,4] is arguably the simplest protein model, yet it has contributed to the understanding of many problems of biological interest [5]- [6] An extension of the HP protein model, the semi-flexible H0P lattice protein model [7], introduced some simple modifications which render the model more realistic with significantly reduced ground-state degeneracy [8].
Even though the HP model is highly simplified, the problem of finding the ground-state structure of a given HP sequence is extremely difficult [9] and ground states are often highly degenerate. Consequently, the HP model represents a significant challenge at the interface between statistical physics and biophysics. The HP model has thus become a testing ground for many advanced algorithms: sequential importance sampling [10], fragment regrowth Monte Carlo [11], chain-growth methods [12,13], e.g. the pruned-enriched Rosenbluth method (PERM) and its variants [14,15,16,17,18], multidomain sampler [19], genetic algorithm [20,21], evolutionary Monte Carlo [22], ant colony models [23] and Wang-Landau sampling [24,25]  Exchange Wang-Landau (REWL) sampling to both the HP model and the semi-flexible H0P model to uncover protein folding funnels.

HP and semi-flexible H0P lattice protein models
Based on the properties of their side chains, amino acids are classified into hydrophobic (H) and polar (P) in the HP lattice protein model [3,4]. In this model, a chain of connected monomers represents the amino acid sequence residing on a simple cubic lattice. An effective monomer-monomer coupling HH between non-bonded, nearest-neighbor H monomers is used for characterizing the driving force of protein folding: the hydrophobic interaction: where n HH is the number of non-bonded HH contacts. An improved version of the HP model, i.e. a semi-flexible H0P lattice protein model, introduces a "neutral" type monomer ("0") representing amino acids that are neither hydrophobic nor polar. This model also considers the stiffness of angles formed by bonds connecting monomers, and the general Hamiltonian of the semi-flexible H0P model can be written as where n HH is the number of non-bonded HH contacts (n H0 has corresponding meaning), and n θ represents the number of angles existing in the protein structure (see Fig.2). We used HH = 1, H0 = 0.5 and θ = −0.25 for this work, but other interaction values or interactions involving "0" monomers (or more distant neighbors) could be easily added. Our intent, however, was to add only as many features as are needed to render the original HP model more realistic while still keeping the model as simple as possible.

Wang-Landau sampling
Wang-Landau (WL) sampling [26,27] is an iterative Monte Carlo (MC) simulation method, which performs a random walk over the energy space for estimating the density of states of the system under investigation. This method, using pull moves and bond-rebridging moves to A schematic example of a semi-flexible H0P model. The interaction between monomers 1 and 6 is HH , and that between monomers 3 and 12 is H0 . The angle constituted by monomers 5, 6 and 7 contributes θ energy. In this particular 2-dimensional structure, n HH = 1, n H0 = 2 and n θ = 5. Hydrophobic and "neutral" monomers are colored dark gray and white, respectively, while polar monomers are colored orange. An HP model results if no 0-mers are present and the angle energy is zero. Both models are actually simulated in three dimensions.
generate new, trial states, has been shown to be powerful and efficient in both finding ground state structures and determining density of states (g(E)) [24,25]. During the simulation, a MC trial move, changing the system from energy E A to energy E B , will be accepted with probability where g (E) is an instantaneous estimator for the density of states. This estimator is updated through g (E B ) → f × g (E B ) at each step, where f is a modification factor. The simulation maintains a histogram of visited energies, which is increased by H(E B ) → H(E B ) + 1. The modification factor will be reduced if the histogram is sufficiently "flat" and at the same time, all the histogram entries are reset to zero. The simulation continues until the modification factor is less than some predefined threshold value f final . In this work, the initial modification factor is set to f init = e 1 and decreased via f → √ f , until reaching f final = 1 × 10 −8 . Initially, g (E) is set to 1 and we use the "80%" flatness criterion for the histogram, i.e., every entry in the histogram is no less than the 80% of the mean histogram height at each iteration.

Replica Exchange Wang-Landau sampling
For determining g(E) of systems with the size of interest in this work, it was necessary to employ Replica Exchange Wang-Landau sampling (REWL) [28,29], a generic, parallel Wang-Landau sampling scheme. As shown in Fig. 3, the energy range of the system is split into multiple, overlapping windows, each of which is simulated by independent processes (random walkers, running serial WL sampling). The replica exchanges between overlapping windows are attempted during the simulation at a fixed time interval, such that each replica can travel through the entire energy space of interest. The simulation terminates only when all random walkers have satisfied the termination criteria independently.

Results and discussion
REWL sampling was performed for both the HP sequence (denoted as HP124) [30] Figure 3. Illustration of the general framework of Replica Exchange Wang-Landau sampling. The system energy range is split into multiple, overlapping energy windows. Each window employs multiple independent processes (or walkers) running serial Wang-Landau sampling. At fixed time intervals, the replica exchange procedure is performed between two neighboring energy windows.
about the mapping.) The density of states was determined to high precision for both models and used to estimate the partition function and both thermodynamic and structural properties. If Q represents a structural quantity which can be used as a reaction coordinate, we can calculate the free energy with respect to temperature and Q based on the joint density of states g(E, Q) as: where Z(T, Q) is the partition function based on temperature and Q: After initial exploration, we found that the end-to-end distance: was apparently a good candidate for representing the folding funnel, where N is the number of monomers in the chain; r i represents the position of the ith monomer.
In Fig. 4 we present the normalized free energy vs end-to-end distance at two different temperatures for the HP124 lattice protein. The quality of the results was quite high and the statistical errors in the figure are smaller than the size of the symbols. The state, and associated end-to-end distance, with lowest free energy is indicated by a black arrow, while the mean r ee is marked by an orange arrow. At high temperature (as seen in Fig. 4(a)), the free energy shows a shallow, "symmetric" landscape which contains many local maxima and minima. The state with lowest free energy has a different end-to-end distance than the mean end-to-end distance, and there are many minima and maxima between the two states. Upon lowering the temperature we found the free energy curve changes shape and becomes skewed toward the region with low r ee values (as seen in Fig. 4(b)). Moreover, unlike the static schematic portrayals of the protein folding funnel which always has a fixed minimum, the lowest free energy position shifts with the change in temperature, showing the dynamic nature of the folding funnel. Finally, at quite low temperatures as seen in Fig. 4(b), the free energy landscape of HP124 becomes relatively flat near the minimum, and the system can easily move between states with different end-to-end distances. At low temperature, the state with the lowest free energy is then also the state with the mean end-to-end distance. For greater end-to-end distances, the free energy curve assumes a rather odd shape with a flat plateau region.  The normalized free energy vs end-to-end distance at two different temperatures for the H0P124 lattice protein shows significant differences from its HP counterpart. Results are shown in Fig. 5. The state with lowest free energy is indicated by a black arrow, while the mean r ee is marked by an orange arrow. At high temperatures (as seen in Fig. 5 (a)) we observed a rugged free energy landscape which is similar to that for HP124, but as the temperature is lowered, substantial differences become apparent. The free energy landscape remains rough but skews towards the region with low r ee values, and the position of the lowest free energy also shifts with temperature. However, at low temperatures, unlike the free energy landscape of HP124, that of H0P124 remains rough even near the bottom (as seen in Fig. 5 (b)) and does not show the curious, plateau region. The inset shows clearly that significant free energy barriers must be overcome to move between free energy minima and the protein can be easily trapped in a metastable state if traditional simulation methods are used. Black arrows indicate the lowest free energy at each temperature (T), orange arrows point to the mean end-to-end distance at that temperature. Error bars are smaller than the data points.

Conclusion
In summary, we investigated the free energy landscapes for two lattice protein models that are mapped from the protein ribonuclease A. Using the end-to-end distance as the reaction coordinate, we mapped out the folding funnels for these two model proteins. a relatively shallow free energy minimum at low temperature, while the H0P model develops a clear, rough free energy funnel, even near the minimum, at all temperatures. In both cases, we find an asymmetric folding funnel that changes shape substantially as the temperature changes. Unlike the schematic figures in the literature, we even observe that the location of the free energy minimum shifts with the change of temperature. Even though both model proteins are highly simplified descriptions of a real one, neither the mapping nor the interactions were tuned to produce a special free energy structure. Moreover, with only a few features added to the HP model, the H0P model already captures some of the essential thermodynamic and folding behavior of real proteins (e.g. rough free energy funnel). Therefore, we believe that the general characteristics of the folding funnels discovered in this work might well persist in a more realistic description of protein folding.