Modelling peptide self-assembly within a partially disordered tau filament

Peptide self-assemblies are a natural template for designing bio-inspired functional materials given the extensive characterisation of neurodegenerative and non-disease biological amyloid protein assemblies and advances in rational, modelling-led materials design. These bioinspired materials employ design rules obtained from known aggregation-prone peptides or de novo screening for sequences most amenable to self-assemble functional nanostructures. Here, we exploit the hybrid nature of a complex peptide with both ordered crystalline and intrinsically disordered regions, namely, the microtubule-binding domain (MBD) of tau protein, to probe the physical driving forces for self-assembly at the molecular level. We model the peptide in its native and mutated states to identify the supramolecular packing driving stabilisation at the prefibrillar level. We use extensive atomic-resolution molecular dynamics computer simulations, contact maps, hydrogen-bond networks and free energy calculations to model the tau MBD and its two known familial mutants, the P301L and K280Δ, along with a control double mutant, P301L + K280Δ as a first step towards understanding their effects on oligomer stability in fibrillar fold. Our results indicate that the mutations destabilise supramolecular packing in the pro-fibrillar hexamer by breaking contacts in the ordered domain of tau MBD, which helps explain mutation-induced toxicity levels as the more stable wild-type peptide assemblies may be less prone to crumbling, producing fewer toxic small oligomeric seeds. Our most important finding is that tau familial mutations causing frontotemporal dementia may show distinct morphologies delineating different stages of self-assembly. The models show that the P301L mutant is more pro-nucleating with low tendency for assembly polymerisation, whereas K280Δ is more pro-elongating with potential for protofibrillar growth. Our data provides a predictive mechanistic model for distinct peptide self-assembly features depending on the location and nature of single missense mutations on the partially disordered pathogenic MBD, which may explain the prevalence of polymorphic filamentous tau strains observed experimentally.


Introduction
Amyloids were first identified in the 17th century as large white stones in the human spleen. Later in the early 19th century, they were named as amyloid, meaning starch-like from the Latin word 'amylum'. In 1859, it was discovered that the material is neither fat nor carbohydrate, but 'albumoid'. Even though this new term could explain the structure of amyloids to some extent, the name 'amyloid' was kept [1] in a linguistic holdover mirroring the evolutionary persistence of these deposits in molecular biology and perhaps forecasting their still acute resistance to medical treatments.
In the past few decades, owing to advances in x-ray diffraction (XRD) crystallography, nuclear magnetic resonance (NMR) and most recently electron microscopy (EM), the molecular properties of amyloids have been in the brain. These are denoted as 2N4R, 1N4R, 0N4R, 2N3R, 1N3R and 0N3R, with the nomenclature describing the number of N-terminal domain inserts followed by number of C-terminal domain repeat regions [40]. The C-terminal domain repeat motifs and their nearby regions promote tau protein binding to microtubules and are hence known as the microtubule binding domain or MBD [41], and the N-terminal domain has also been suggested to be involved in microtubule stabilisation and acts as the projection doimain [42]. The largest isoform of MAPT, 2N4R is composed of 441 residues (see domain organisation in figure S1(A)) [43]. Here, we model the MBD of tau which contains the four repeats R1, R2, R3 and R4 that also house the two homologous hexapeptide motifs, PHF * ( 275 VQIINK 280 ) and PHF6 ( 306 VQIVYK 311 ) in R2 and R3 repeat regions, respectively. PHF6 * and PHF6 are known to become more solvent exposed to facilitate solvent-mediated aggregation of tau monomers [44]. We chose the MBD as the representative minimal fragment which reproduces the self-assembly behaviour of the full protein [45]. In addition, MBD consists of a well-defined ordered peptide-packing crystalline motif generating a core fibrillar structure, flanked by long peripheral intrinsically disordered regions known to house key familial mutants implicated in disease and so presumed to alter the aggregation propensity.
Thorugh extensive atomistic molecular dynamics (MD) simulations (∼4 μs I total), complemented by estimation of thermodynamic stabilities, hydrogen bond networks, and contact and interaction mapping of supramolecular packing, our physical models help delineate the role of familial mutations, namely, the P301L substitution and the deletion mutation K280Δ. Both mutations are known to cause frontotemporal dementia [46,47]) and we identify their pro-aggreation effects from analysis of core packing, overall stability of hexameric oligomers in their fibrillar fold and pro-fibrillar elongation. Previous reports on the aggregation kinetics of these two mutants have shown that K280Δ facilitates faster nucleation and elongation rates, whereas P301L has slower fibrillar elongation rates which shifts the assembly population towards oligomers [48]. However, the molecular level details accounting for such differences in aggregation propensities have not been reported to date. Here, we report models that delineate the atomic-scale effects of distinct familial missense mutations on tau disordered domains. Our data shows that the mutations significantly impact the ordered crystaline domain, modulating its ability to facilitate self-assembly of the hexames into either a predominantly compact oligomeric nanostructure or an extended fibrillar construct.

Model building
The structure of a full-length tau (hTau40; 441 residues) monomer remains to be solved experimentally, due mainly to its polymorphic nature. Much of the available knowledge is based on Nuclear Magnetic Resonance (NMR) spectra of its individual fragments, and complete spectral assignment has been achieved, denoting a predominantly random coil conformation of the isolated protein, with secondary structural propensities in different regions [49]. The two hexapeptide motifs within tau, designated PHF6 * and PHF6, have tendency to form β-sheet structures [50,51]. Thus, atomistic modelling through Molecular Dynamics (MD) simulations have mostly been carried out on truncated proteins or have added missing residues in random coil conformations to approximate full-length extended structures [52,53]. The same is also reported for fibrillar assemblies, as they are also partially solved for tau filaments in Alzheimer's disease with cryo-EM [54,55] and the MD simulations were performed on these partially solved structures [56].
We built our models from a starting crystal structure (PDB 5O3L [54]) having the paired helical filament (PHF) organisation of tau in Alzheimer's disease, consisting of amino acid residues in the sequence 306-372. At the larger end of the spectrum of intrinsically-disordered peptides/proteins [9], 62 amino acids were added to the N-terminal as random coil (spanning residues 244-305) on the selected tau crystal, which contains only a fragment of tau protein to include the regions in the protein most susceptible to familial mutations. The PDB structure of the crystal tau filament has five monomer chains in two layers. We considered a filament from one layer from the structure and added a sixth monomer chain to model a hexamer in fibrillar fold. Then we added the N-terminal residues as random coils to construct the full microtubule binding domain (MBD; residues 244-372) of tau (see figure 1(A) for structural organisation of MBD of tau). P301L mutation is located within the R2 of the MBD, while K280Δ mutation removes the sixth residue in PHF6 * of R2. Both mutations can be pronounced at both RNA and protein levels [58], which has been further characterised for P301L [59][60][61][62] and also modelled at the single monomer level [63]. From this starting structure, four tau MBD models were designed to represent the wild type (WT) protein and mutant proteins P301L, K280Δ, and double mutant with P301L + K280Δ. Extensive Molecular Dynamics (MD) simulations were performed with the models with a sampling time of 1 μs each (4 μs in total) during fully equilibrated room temperature dynamics in bulk water as described below.

Atomistic molecular dynamics simulations
MD simulations were performed using the GROMACS 2018.4 code [64]. The velocity verlet integrator was used with 2 fs timestep and hydrogen atom bond lengths were constrained using the LINCS algorithm. Coordinates were saved every 20 ps. Background Na + or Cl − counterions were added to neutralise excess formal charges in the peptide, and 0.15 M NaCl was added to mimic physiological salt concentration. Long-range electrostatics were treated by the Particle Mesh Ewald (PME) method with the peptide encased in a large water box with periodic boundary conditions to mimic bulk solvation. Systems were minimised for 100000 steps, then heated to 310 K by an external heat bath and equilibrated for 1 ns at constant volume. One microsecond production dynamics was performed for each system at constant pressure. Berendsen thermostat and barostats were used during equilibration with coupling time constant of 1 ps for the thermostat and 1 bar reference pressure and 5 ps time constant for the barostat. Production runs used the velocity-rescale thermostat and Parrinello-Rahman barostat with the same constants as for equilibration. The CHARMM36m [65] forcefield was used with modified TIP3P [66] explicit water model. [67] software was used to visualise trajectories. All analyses were performed with Gromacs tools. Standard cut-offs of 0.35 nm and 30°were used to count hydrogen bonds. The fraction of native contacts Q (figure S1B) was calculated to estimate the convergence of each simulation, and thus the last 250 ns was used for post-processing and analyses of data for each system, using the definition from Best, Hummer and Eaton [68] implemented in the MDTraj [69] python library using equation (1): where N is the set of all pairs of heavy atoms (i, j), and heavy atoms i and j are in contact if the distance between them is less than 5 Å and they are separated by at least 3 residues. ( ) r t ij is the distance between i and j in the structure sampled at time t and r ij 0 is the distance in the starting structure at time 0. β is a smoothening parameter taken to be 5 Å −1 and λ is a factor that describes fluctuations when the contact is formed, taken to be 1.8 for the all-atom model. For more details on the method and choice of parameter values, please see ref. [68].

Model visualisation and analysis
Tertiary structure analyses was performed by computing the contact probabilities between all residues. Pairs of residues are in contact if the minimum average distance between the heavy atoms of the two residues is within 5 Å. Interaction maps for contacts with probabilities >50% were computed to identify contributions from H-bond, salt bridge, and hydrophobic networks associated with fibrillar fold retention. We performed these analyses by using the CONAN [70] contact map analysis method, which utilizes the gmx mdmat tool in GROMACS [71].
The conformational energy was estimated using the GBMV implicit solvent model (Generalized Born using Molecular Volume) implemented in the CHARMM (v40b2) program [72]. The energy was calculated after 200step minimization of each MD snapshot using the GBMV II algorithm [73][74][75]. Other energy terms including bonded energy, van der Waals (vdW) energy, electrostatic energy, and solvation energy were also calculated with the GB implicit solvent model. The block average method was used to estimate the mean values and standard deviations during the last 250 ns of dynamics.

Overall impact of familial mutation on assembly morphologies
We observe that over the course of the simulations, the crystalline region of WT tau MBD starts to twist leading to some loss of β-sheet towards its N-terminal bridge on PHF6. The C-terminal domain in R4 also loses order during the first 100 ns dynamics and remains in the less ordered conformation ( figure 1(B)). These effects reflect the less contrained aqueous environment of MBD compared to the full tau protein structure, and are also visible in the mutants to varying degrees (figures 1(C)-(E)), suggesting that just single or double mutation at key structural sites impacts the overall fibrillar order. For instance, in P301L mutant (figure 1(C)) , the R3 and R4 domains forming the crystal core underwent the least twisting and the fibril axis was aligned better than in other variants, but with significant twisting of the PHF6 domain. The K280Δ and its double mutant with P301L (figures 1(D), (E)) showed loss of order in both R3-R4 and PHF6 domains. Moreover, in P301L and the double mutant, the flexible R2 region partially collapses on R3. Another observation with regards to the P301L mutant is the emergence of transient β-strands (see section 3.2) in the disordered fuzzy coat, especially in the hexapeptide PHF6 * region.

Structural stabilities of assemblies and their variants
We probed the secondary structure features in our model tau hexamers in their pro-fibrillar fold, in particular the β-sheet content. We mapped the secondary structures of the full peptide (figures S1(C)-(F)), the ordered 'crystal' region (figures 2(A)-(D)), the disordered 'fuzzy coat' (figures 2(E)-(H)), and compared the regions of ordered PHF6 (figures S1(G)-(J)) and disordered PHF6 * (figures 2(I)-(L)). We observe that when considering the full peptide, the WT protein (figure S1(C)) shows relatively stable β-sheet content (% residues) compared to the other three mutant constructs, P301L, K280Δ, and P301L + K280Δ, while the P301L mutant shows a gradual increase from ∼100 ns to the end of the simulation (figure S1(D)) with decrease in turn and bend regions. The double mutant sampled the most fluctuating β-sheet content with corresponding increased random coil (figure S1(F)). All models displayed near negligible α-helix. Comparing the ordered 'crystal' and disordered 'fuzzy coat' regions, we note that all models lost some β-sheet content and gained random coil in the crystal region (figures 2(A)-(D)). Significant decrease of β-sheets was observed in K280Δ tau mutant (figure 2(C)) and the smallest decrease observed in case of the double mutant, P301L + K280Δ (figure 2(D)) but with high fluctuations. When considering the secondary structure contents of the fuzzy coat only, all variants show slight increase in their β-sheet content at different time points (figures 2(E)-(H)), with P301L showing the largest increase towards the end of 1 μs dynamics ( figure 2(F)). To further investigate the contributions of the two hexpapetide domains in the disordered and ordered domains of tau towards the aggregation-prone βstructure, we computed secondary structure features in both PHF6 * (figures 2(I)-(L)) and PHF6 (figures S1(G)-(J)). In the PHF6 * segment, significant β-sheet formation (from random coil) was only observed for P301L mutant (especially between 400-450 ns dynamics, see figure 2(J)), with a small increase also seen in the double mutant, but at the beginning of dynamics ( figure 2(L)). There is high probability of formation of α-helical structures from the fuzzy coat close to the N-terminus of MBD of tau in the K280Δ and P301L + K280Δ constructs ( figure S2). However, β-strand could be seen replacing coil in P301L mutant model ( figure S2(B)). Overall, we identify that the formation of new β-sheets in the predominantly disordered PHF6 * domain compensates for the loss of β-sheets to random coil in the crystal part, accompanied by twisting of our models, most prominent for P301L mutant while retaining the in-register β-sheets in nucleating PHF6 segment [76]. This finding agrees well with the recent paramagnetic relaxation interference (PRI) data showing that short-and long-range correlated motions among residues in P301L mutated tau expose the hexapeptide and promote tau self-assembly [77], a feature that could be characterised as pro-nucleating for formation of oligomers.

Prediction of pro-fibrillar growth patterns from hydrogen bond networks
To probe the propensity of pro-fibrillar growth via chain elongation along the fibril axis, we mapped the H-bonding networks in the different tau MBD hexamers. We first computed the overall H-bond count (#Hbonds) which shows an increase during the first ∼500 ns before plateauing for all four models ( figure 3(A)). The timelines show most stable overall H-bonds for the double mutant P301L + K280Δ, followed by the WT protein. The H-bond counts for P301L and K280Δ are almost at par with each other. We also performed a detailed inspection of H-bond strengths with regards to three dimers in the hexamer-dimer at one end of fibril between chains A and B (denoted as E1), dimer in the center or middle of fibril between chains C and D (denoted as M) and dimer at the other end of fibril between chains E and F (denoted as E2). This allows us test the hypothesis that monomer addition for chain elongation will occur at one end of the protofibril via H-bonds (open end) while the other end will be closed to monomer addition. The H-bond timeline shows the difference between the strength of three adjacent dimers in the hexamer models (figure S3) with the K280Δ construct showing significant difference in population of H-bonds between two ends of fibril after ∼700 ns dynamics ( figure S3(C)). P301L + K280Δ double mutant sampling H-bond differences at two ends since the very beginning of dynamics (figure S3(D)), with P301L mutant showing least difference ( figure S3(B)) amongst the mutants. We further accounted for the average H-bond count strengths between two chains in dimers at E1, M and E2 ( figure 3(B)). We note that there is almost no difference between the H-bond strengths of E1 and E2 for WT tau indicating that the pro-fibrillar hexamer may not have the propensity to elongate chain length in one direction. Amongst the mutants, the maximum difference in H-bond strength of E1 and E2 is observed for K280Δ (∼17 H-bonds) followed by P301L + K280Δ (∼11 H-bonds) with P301L showing the least difference (∼8 H-bonds). The H-bond data predicts the unidirectional growth propensity of mutant tau MBD constructs ranked as K280Δ > P301L + K280Δ> P301L. This finding corroborates previous experimental measurements of tau aggregation kinetics, where the K280Δ mutation leads to an enhanced overall aggregation kinetics with increased nucleation and elongation rates, whereas the P301L mutation although leads to a faster nucleation rate with expedited formation of oligomers as evident from a high population of β-sheets in our models (see figure 2), but a stunted fibril growth rate compared to K280Δ [48].

Tertiary contacts in assembly folds
To map tertiary protofibrillar contacts in the tau MBD hexamers, we computed the intrafibrillar inter-residue contact probabilities and identified the type of interactions contributing to these contact probabilities. From the interaction lifetime maps (figure S4), we observe that P301L mutant has more anti-diagonal contacts with higher percent lifetime compared to the WT protein and other mutants. Specifically, R3-R4 contacts within a single chain of hexamer and across chains of hexamer are quite persistent for P301L mutant ( figure S4(B)). We further probe into the nature of the contacts via interaction type maps, which shows predominantly hydrogen bonds with occasional salt bridges and hydrophobic patches ( figure 4). We note however that the R3-R4 anti-diagonal contact frequencies for P301L correspond mainly to hydrophobic interactions with sparsely populated H-bonding network. Moreover, these interactions are mostly observed anti-diagonal to the main diagonal for WT, K280Δ and the double mutant of tau identifying intra-chain contacts (black broken circles in figures 4(A), (C), (D)). By contrast, a high population of anti-diagonal interactions off-diagonal to the main diagonal (blue broken circles) is observed for the P301L mutant reflecting inter-chain R3-R4 contacts ( figure 4(B)). In addition to the inter-peptide H-bonding network, the abundance of non-polar intrafibrillar interactions in P301L mutant creates a more compact shape in its crystal core, compared to K280Δ mutant (see radius of gyration, R g data in figure S5 and crystal shape in figure 5), and may potentially explain their slow growth rate to form mature fibrils compared to K280Δ mutant [48].

Thermodynamic stabilities of hexamers in pro-fibrillar fold
Conformational energies were calculated to estimate the thermodynamic stability of the tau hexamer structures (figure 6). We first computed the conformational energies for the full-length tau MBD peptide assembly constructs which show that P301L is conformationally most stable, while K280Δ is least energetically favourable ( figure S6). However, the model thermodynamics are not quantitatively comparable due to differences in sequence compositions because of missense mutations and deletions in the fuzzy coat. In order to make a direct comparison, conformational energies of the conserved crystalline domain devoid of the mutation points were compared ( figure 6). The data reveals WT crystal as the most conformationally favourable tau construct. This is expected as the cryo-EM structure of tau filaments [54] was solved for the WT protein. There are no high resolution experimental filament structures available to date for P301L and K280Δ tau mutants. These disease associated familial mutants used in this study may be polymorpic in nature and their structures depend on the experimental conditions used [78]. In the crystalline core region, we observe that P301L is most energetically favourable, followed by double mutant P301L + K280Δ, and K280Δ is least favourable (figure 6), with energies barriers of −11 kcal mol −1 and −35 kcal mol −1 to transit from P301L + K280Δ and K280Δ, respectively to P301L mutant. Taking into account the high propensity for β-sheet formation coupled with less propensity for fibrillar tip growth as evident from #H-bonds in P301L mutant, we propose that the higher thermodynamic stability of P301L compared to the other mutants originates from the formation of a stable hexameric nucleus [79]. The destabilised protofibrils in K280Δ, on the other hand, may undergo conformational re-arrangement to facilitate growth of protofibrils.

Conclusions
Amyloid protein self-assembly is thought to proceed via a nucleation dependent polymerisation mechanism and the microtubule-associated protein tau (MAPT) is no exception. The assembly process involves an initial lag phase which allows formation of a critical nucleus, followed by a rapid elongation phase, as peptide chains are  added to the growing tau fibril tip [80]. Several functional consequences of tau mutations have been extensively characterised [81]. Yet the physical driving forces at the molecular level are unclear, which has limited the understanding of how aggregation patterns in mutant tau self-assembly may relate to polymorphic 'strains' [37] in tauopathies. Here, we obtain atomic-scale mechanistic insights of tau pathogenesis through computational molecular simulations of partially disordered microtubule binding domain (MBD) housing the four repeat domains R1-R4 and two hexapeptides PHF6 * and PHF6. We compare wild-type (WT) MBD structures with familial missense mutants, P301L and K280Δ (known to cause frontotemporal dementia), in hexamer assemblies in their pro-fibrillar WT folds. These mutations are housed in the disordered fuzzy coat of tau MBD, and our models predict that the thermodynamic stabilities and core packing of tau crystalline domain is significantly altered by the missense mutations. It is known that the P301L tau mutant does not intrinsically promote growth/elongation to longer protofibrillar chain length unless seeded with preformed tau fibrils [82] or induced by adding heparin [83]. We observe that the mutation of residue position 301 from Pro to Leu in the fuzzy coat leads to formation of new β-sheets particularly in the otherwise intrinsically disordered PHF6 * with β-structural retention in the ordered PHF6. P301L is also less prone to loss of β-sheets from the crystalline domain compared to the other MBD constructs used in this study. The models reveal that P301L assemblies show low tendency for unidirectional growth of fibrils, unlike the K280Δ tau mutant (modelled by deletion of Lys at position 280 in the fuzzy coat) which shows high propensity for pro-fibrillar chain elongation. Moreover, a high population of R3-R4 inter-chain non-polar interactions prevailed in the P301L hexamer, which predominantly keeps the crystalline domain in a folded, compact conformation not amenable to monomer addition, unlike the stickier K280Δ mutant tau. Taken together with the calculation of lower stability of K280Δ compared with the P301L mutant, our models predict a more pro-nucleating (oligomer-like) core hexameric packing for P301L and a more pro-elongating (fibril-like) extended construct for K280Δ. Hence we identify two different assembly patterns for two point familial mutations in MBD tau separated only by 20 residues. Since our mutant models are designed based on a paired helical filament of WT tau, we note from previous reports that the P301L pro-fibrillar growth from a preformed nuclei may have a completely different morphology than the WT tau critical nuclei and may not advance pro-fibrillar elongation from a P301L nuclei [84], further strengthening our model predictions. In addition, our models strongly support previous experimental observations that oligomers are formed rapidly due to P301L mutation in tau due mainly to a faster nucleation rate but following up with a very slow elongation rate compared to K280Δ which undergoes an expedited protofibrillar growth kinetics [48]. Future combined simulation/experimental studies could further sample these and other mutants to comprehensively map the potential energy surface of the familial mutants in these very large and heterogeneous tau structures, scaling systematically from monomer to low molecular weight oligomers to fibrils. The models can guide experiments, for example, atomic force microscopy investigations of selfassembled structures formed on atomically clean substrates [30,85]. Thus, our predictive models provide atomic-scale design rules to re-redirect tau self-assembly to non-pathogenic constructs, and will inform also development of peptides and proteins for technology applications. For example, development of tailored and bioinspired supramolecular peptide assemblies and nanostructures ranging from crystals and thin films to hydrogels [86].
More generally, modelling and data-driven design is important to advance peptide self-assembly because there is no clear sequence or structural homology between all of the known amyloid-forming proteins, even though the general molecular properties of the amyloids are well established. A wide variety of different amino acid sequences are able to form amyloid plaques, and so any attempt to use designer (molecular engineered) amyloid-like assemblies as functional materials requires careful study of the properties of peptides with these sequences [87][88][89][90][91][92][93][94][95][96][97]. By doing so, we can understand what contributes to the stability of those peptides and design a sequence that is tailored to a specific function with high efficacy and high stability. In such cases as drug delivery applications, we can potentially encode controlled dissolution of the carrier at its site of action, e.g., programmed disassembly at tumour sites via pKa shift of key residues [98]. Ideally, we would like to identify the minimal peptide sequence required for a specific function, so we can truncate or re-engineer a naturally occurring fibril-forming protein. This may allow us form β-sheets with just that particular sequence, shedding the excess weight and thus creating a more effective material, as demonstrated by the competition between core and full-fibril packing predicted from the tau models presented here.