Computational modelling of bionano interface

The interface between inorganic and biological materials plays a crucial role in vital technological applications ranging from food processing and cosmetics to medicine but presents enormous technical challenges for computational modellers. These challenges stem from both conceptual and technical roots: the lengthscale and timescale gaps between the essential interactions and the properties of interest and the differences between the models of inorganic and biological materials. Research efforts of the last decade have led to significant advances in computational modelling of the bionano interface and allowed the construction of quantitative predictive models for both the structure of this interface and material functionalities based on descriptors obtained from the interface. In this work, we discuss advances in the field of bionano interface modelling and outline the directions of its further development.

Introduction. -Recent progress in biomedical and food technologies has prompted the interest to studying interactions between organic and inorganic materials at the nanoscale. Engineered materials may possess unexpected biological activities when brought into contact with biological fluids and tissues. The main concerns are related, but not limited, to the emerging risks for human health. Fully biocompatible materials are those which coexist with living tissues without eliciting any undesirable local or systemic effects such as immune response in the recipient [1]. The questions of biocompatibility of engineered materials arise naturally in respect to medical appliances such as stents, implants, or prosthetic joints as these can cause immediate hazards upon introduction into the human body.
While macroscopic surfaces are relatively easy to scan in a lab after exposure to biological fluids to identify possible interactions, the use of nanoparticles (NP) as implantable materials and components of medical devices poses more sophisticated safety concerns [2]. Some modern technologies face similar challenges even where the application is not immediately related to health risks. Lab-on-achip diagnostic devices are extremely sensitive to interactions between the parts of the chip and the biomolecules. Interactions with biomolecules play a crucial role in the (a) E-mail: vladimir.lobaskin@ucd.ie (corresponding author) environmental impact of nanomaterials [3]. In food industry, complex fluids like milk and other protein and sugar mixtures and emulsions experience multiple contacts with engineered materials on the way to the final product and their state can be affected by interface-induced transformations, complexations, or fouling of the surfaces [4][5][6].
The central concept in the description of the bionano interface is the corona of biomolecules surrounding the NP (or generally coating the surface of a foreign material) [7]. Initially, this corona was described primarily in terms of the adsorbed proteins, but more recently consideration has additionally been given to the small-molecule and peptide constituents which are also present in most relevant situations. The corona formed in any biological liquid possesses an enormous diversity and complexity, which limits the feasibility of first-principle computations. Yet, quantitative insights may be obtained using the system symmetries and patterns. Biomolecules consist of a relatively small number of repeat unit types, with only 20 amino acids (AAs) present in hundreds of thousands of proteins and only 6 elements -carbon, hydrogen, nitrogen, oxygen, phosphorus and sulphur-comprising most of the biomolecules. The secondary and tertiary structures of proteins can be predicted from their AA sequences using either homology models [8] or neural networks [9]. Given this highly structured and repetitive nature, it can 57001-p1 be hoped that the bionano interactions are controlled by only a few independent material parameters. Therefore, these inherent patterns can be employed to build predictive models. While there are many possible combinations of atoms or molecular fragments that may be biologically relevant, some of these combinations are far more frequent and may play the dominant role in the entire chain of molecular events at the bionano interface.
This suggests a predictive approach combining physicsbased modelling (up to the limits imposed by the system size) with data-driven schemes to relate the physical molecular events to the biological events of interest. Such physical molecular events (e.g., biomolecular adsorption on the NPs, change of molecular conformation, production of reactive oxygen species, biomembrane contact, penetration, or disruption) can be considered as initiating events of biological pathways, and thus be predictive of the complex biological outcomes [10]. Although direct molecular simulation can reach the relevant system sizes that include an NP and proteins or lipid membranes, in practice obtaining results on meaningful timescales using atomistic models would take an infeasible amount of computational time, with even state-of-the-art techniques enabling only a few microseconds of the system's evolution. To be able to discern between variations of NPs by their interaction with various biomolecules we therefore require a multiscale modelling scheme that can tackle the bionano interface and relate the respective interactions to the details of the NP structure derived from first principles.
In the following sections, we present an overview of the challenges, advances, and opportunities of the computational modelling of bionano interface.
Ab initio and atomistic level description of bionano interface. -Non-covalent interactions occurring at the bionano interface can be described at various resolutions using quantum, atomistic, and coarse-grained (CG) methods. The highest resolution is provided by quantummechanical (QM) methods that model the distribution of electronic density for interactions between atoms of nanomaterials and small molecules. This information helps to understand the forces governing such interactions in terms of polarisation, charge-charge interactions, dipole-dipole interaction, HOMO-LUMO interactions, and can model covalent bonding if necessary. Density functional theory (DFT) and Hartree-Fock-based semi-empirical methods provide a good match with experimental data, but are resource-consuming and applicable only to static systems. The ongoing increase in computational power of modern processors combined with various scaling-up techniques [11,12] allows the application of these methods to model large-scale systems, e.g., interfacial interactions between inorganic nanomaterials and small molecules (physisorption or chemisorption, solvation, (electro)catalysis, adhesion, etc.) [13,14] or larger bulk materials using a database of material properties available in the Materials Project repository [15].
Most often, calculated adsorption energies are reported for the gas phase. However, the inclusion of solvent explicitly or implicitly (e.g., via a polarisable continuum model [16]) is necessary for an adequate picture of the adsorption as water usually plays a crucial role in these processes. Some materials can induce restructuring of water near the NP surface due to the imagecharge interactions [17,18]. The existence of the solvent adlayer can impact adsorption of (bio)molecules fundamentally by either enhancing (solvent mediated adsorption) or reducing the interaction between adsorbates and nanomaterials [19][20][21] or by competing with the adsorbates for the contacts with the surface. The ion concentrations can also change the strength of physisorption of (bio)molecules [22]: the adsorption of Ti-binding peptides at the rutile TiO 2 (110)-water interface was shown to be improved in the presence of Ca 2+ ions which act as a bridge between negatively charged NP surface and the negatively charged aspartic acid residues [23]. These examples suggest that the explicit MD models for describing adsorption processes at interfaces might be more beneficial than static implicit QM approaches.
Modelling of the adsorption of bio(eco)molecules in aqueous environments is most often done using atomistic molecular-dynamics (MD) simulations in canonical (NVT or NPT) ensembles or in the grand canonical (µVT) ensemble. The atom-atom interaction potentials in MD simulation can be written either through empirical functions (e.g., Lennard-Jones potentials in classical all-atom MD) or through density functional theory (ab initio MD (AIMD) [24]). The latter methods typically provide a better picture of electronic density distribution across the interface because they do not rely on empirical parameterisations and have proven to be useful for understanding the adsorption of small biomolecules at atomically smooth crystalline materials [25][26][27], but the computational modelling of entire protein adsorption or adsorption to polymeric nanomaterials or those with irregular surfaces is not generally feasible with the AIMD approach. For more complex interfacial adsorption systems, classical Monte Carlo or MD simulations can be more useful, provided that all interactions are properly modelled and the conformational space is adequately explored (see review [28] and references therein). Despite the progress on both of these factors, there is not yet a consistent empirical force field (FF) which is sufficiently accurate for a wide range of materials. Developing empirical FFs requires fitting the FF parameters to reproduce experimental observables (heats of evaporation, free energy of solvation, etc.). The lack of systematic experimental data on properties of nanosized materials causes challenges in obtaining good FF parameters. Nonetheless, the existing experimental data for bulk properties of (bio)organic and inorganic materials provide a strong basis for development of compatible FFs (including reactive FFs) for bio-and nano-materials: CHARMM [29,30], GROMOS [31], COMPASS [32], AMBER [33], PCFF [34], 57001-p2 CVFF [35], OPLS-AA [36], INTERFACE FF [37,38], CLAY FF [39], GoIP [40], AMOEBA [41], ReaxFF [42,43], etc. These FFs were successfully applied to modelling solvated surfaces and protein behaviour in aqueous environments or within the cell membrane. However, as most of them do not contain parameters describing the inorganic and organic fragments in the same set, the majority of these FFs are not readily applicable to model interactions between organic and inorganic segments in the presence of solvent, and different FFs are not universally compatible [44]. Thus, interfacial FFs (IFFs) which can accurately describe bionano interfaces should be systematically fitted based on experimental properties [28,45] or computed ab initio based on DFT methods. The most advanced example of the former is the INTERFACE FF which covers a wide range of nanomaterials (metals, metal oxides, silica, CNTs, clays) and is compatible with either CHARMM/AMBER/CVFF/OPLS-AA parameters (12-6 Lennard-Jones functions) or COMPASS/PCFF parameters (9-6 Lennard-Jones functions), covering the majority of existing biomolecules. Examples of the latter, where FF parameters such as atom partial charges, Lennard-Jones potential parameters, and bond geometry are extracted from DFT simulations, are given by [39,46,47].
Another important issue for studying bionano interactions is the representation of polar inorganic surfaces in contact with water. According to experimental studies, at neutral pH oxide surfaces are covered with hydroxyl groups and are negatively charged, with ab initio MD simulations demonstrating a mechanism for hydroxylation of Ti [48]. The amount of bound hydroxyl groups depends on pH and type of the surface and dramatically changes the interaction of oxide (including clay) surfaces with water and biomolecules [39,46,49]. The interfacial atoms differ from the bulk ones due to different coordination and exposure to water, e.g., have lower partial charges than their bulk counterparts, requiring a careful parameterisation of atom types based on their local environment and coordination number [46,47] or the use of reactive FFs such as ReaxFF [42,43]. Even in the absence of hydroxylation, the bionano interactions can be surface-specific due to different densities of atoms at difference crystal faces. The effect of the crystal orientation, symmetry or type of the exposed crystallographic plane affects the interaction with water and biomolecules [46,50]. These factors are particularly relevant for nanomaterials, which are not necessarily in the equilibrium structure or phase for that material and may exhibit regions with high surface energy (e.g., high Miller indices) due to the curvature of the surface.
Polarisation effects arising from induced charges in response to dipoles on the biomolecules also impact on biomolecule-surface interactions [45]. A number of polarisable FFs were developed for modeling polarisation in various (bio)molecules -e.g., reactive AMOEBA FF and Drude CHARMM models. There are three common classical MD approaches [51,52] for explicit modelling of electronic polarisation in the molecule: fluctuating charge model (CHARMM [53]), Drude oscillator model (CHARMM [54]), induced point dipole (AMOEBA [55]). See reviews of these and other methods in [51,56].
As for inorganic materials a limited number of polarizable FF was proposed. In one example of modelling polarisability in FFs for noble-metal surfaces, the imagecharge approximation was invoked in GoIP FF via the fixed dipole rod model for modelling adsorption of peptides, AAs, and other adsorbates on Au, Ag, and graphene surfaces, and GRAPPA FF for AAs on graphitic interfaces [57][58][59][60][61]. As an extension to the non-polarisable IN-TERFACE FF, polarisable Lennard-Jones potentials with dummy electrons similar to the Drude oscillator model were proposed for predicting adsorption onto metallic surfaces with MD simulations [62]. This add-on correction to the FF, however, did not lead to substantial changes (approximately 10%) in the calculated water-metal adsorption energies and almost no change in water density profiles near Au(111) and Au(100) surfaces. However for the adsorption of highly polar peptides, the impact of polarisation was greater [63]. The multipole-multipole treatment of polarisation effects at the bionano interfaces was used in polarisable AMOEBA FF to study interfaces at CNT, graphene, graphite, and metal halide perovskites [64][65][66][67][68]. These FFs involve an increase in the computational cost and so in certain cases the polarisation can be more efficiently modelled by varying atomic partial charges based on their environment as obtained from DFT methods [47][48][49]69]. In this case, the improvement of the accuracy of non-polarisable IFF with fixed atomic charges can be achieved by introducing several sets of charges for the same atom type for atoms located on the surface and in the bulk of material. This implicit scheme was successfully applied for modelling proteins and lipid adsorption at the TiO 2 surfaces [47][48][49]69]. However, the most appropriate methodology and the overall effects of polarisation remain open questions.
Various MD techniques from docking and dynamic measurements to enhanced sampling methods to evaluate adsorption affinities for organic molecules have been thus far proposed [70], with similar approaches used for estimating the adsorption of small molecules onto solids. However, the size of a typical bionano system, e.g., a protein in contact with a NP, far exceeds what is presently achievable. Adsorption of proteins onto nanomaterial surfaces is a complex phenomenon which proceeds through several diffusive and conformation-altering stages with high energy barriers, which are currently beyond the capacity of even biased MD [71][72][73][74]. However, the current state of developments in improving enhanced sampling protocols with artificial intelligence and machine learning (ML) [75,76] raises the hope that this limitation can be overcome at some point in the future.

Coarse-grained models of bionano interface. -
Even where FFs for the bionano interface are available, this does not immediately allow for a computational 57001-p3 prediction of bionano interactions due to the immense number of particles which would need to be simulated at the all-atom resolution of a NP and relevant molecules. So far, full atomistic simulations of protein on surfaces have proven useful to advance the understanding of molecular interactions that determine the binding of proteins to inorganic NPs [77][78][79][80], but are limited to systems composed of one or few proteins on much shorter timescales. Thus, alternative methods for corona prediction are required. Two main routes for this can be identified. The first, CG MD, employs simplified representations of the NP and proteins to reduce the total computational time required and thus reach larger system sizes and timespans [81][82][83][84][85][86][87]. These typically use quite simplistic representations of the proteins and lack molecular detail which may affect the adsorption kinetics. Another approach draws on mathematical modelling of random sequential adsorption (RSA) [88] to express the number of each type of protein bound to the NP as a set of rate eqs. (mathematically similar to the Langmuir model but accounting for, e.g., excluded volume) and numerically integrates these equations or simulates them using kinetic Monte Carlo methods to obtain the corona composition as a function of time [89][90][91]. Compared to the CG-MD approach, this requires much less computational time and can be extended to very large systems of proteins, but has the drawback of typically neglecting protein-protein interactions and requiring several abstractions to be made to produce a tractable model. In both approaches, the main route is the same: reduce the large number of degrees of freedom present to a much more manageable number while preserving as much detail as possible. This reduction can be performed systematically in a step-wise procedure through the united atom methodology (e.g., [82]), which uses small fragments (AAs) to model large biomolecules (referred to as proteins but including lipids and carbohydrates) as shown in fig. 1.
First, atomistic FFs are used to produce potentials of mean force (PMF) describing the interaction between individual AAs (or other small biomolecules) and the NP of interest in the presence of water ( fig. 1). Evaluation of the PMF is the first stage in the coarse-graining procedure and is typically obtained via metadynamics simulations; the resulting potential implicitly represents the complex system of NP surface, biomolecule, solvent and any present solute ions in terms of a single NP-biomolecule distance. Under the assumption that the AA residues in a protein interact with the NP independently of their position in the protein and with a potential dependent only on the NP-AA distance, the protein may then be represented as a set of AA beads, each of which experiences forces described by the particular NP-AA PMF. This provides a dramatic simplification of the problem: the NP, which may consist of tens of millions of atoms, is reduced to a single entity and the protein is reduced from potentially tens of thousands of atoms to a few hundred AA beads. A further assumption is made that the protein remains rigid, such that it is not necessary to perform a full dynamics simulation but rather a simple energy calculation can be performed. The validity of this assumption depends on the protein in question. Minor perturbations of the structure of the protein, up to a root-mean-square deviation of ±1Å do not lead to significant changes in the observed binding energy [92], but for intrinsically disordered proteins which lack a well-defined structure to begin with, the binding energy obtained is much more dependent on the exact structure used as input and an ensemble should be employed rather than a single structure. While the PMFs are usually calculated between AAs and flat slabs of materials, they can be corrected for the NP shape (spherical, cylindrical, planar), long-range van der Waals attraction, and surface charge of the NP, and used to generate a NPprotein potential by summation over all AA beads as a function of their distance from the NP, taking into account the orientation of the protein relative to the NP and the surface curvature [46,93]. More recently, this procedure has been extended to allow for the single-bead NP to be replaced by a complex of multiple beads, enabling a wider range of NP geometries, e.g., core-shell and core-brush to be modelled [94].
From the NP-protein potential, the adsorption free energy can be extracted. This in itself provides useful information regarding the affinity of the protein to the NP, and further can be used either to parameterise a second stage of CG NP-protein potential for use in the first type of simulation, or to generate adsorption and desorption rate constants for the RSA model, taking into account the protein serum concentration, sizes of the proteins and NP, and the change in the translational entropy of the proteins [91,93,95]. This approach was found to produce a ranking of proteins adsorbing to silica NPs in 57001-p4 broad agreement with that found experimentally, with the disagreement attributed to the neglect of the soft corona in the numerical modelling. Importantly, it was observed that ranking the proteins simply by binding energy did not produce an accurate corona prediction, highlighting the importance of considering the effects of protein serum concentration and the area occupied on the surface of the NP by the protein. The corona formation is known to be highly sensitive to the size of the proteins, especially when the rigidity of the molecules is taken into account, which can be expected from considerations of the entropic effects involved in the corona formation [91,95].
Further improvements in what regards the adsorption of individual molecules can be made using flexible CG protein models coupled to flows via dissipative particle dynamics [96] or Brownian dynamics techniques [97] as a single chain or combined with elastic network model [98], or using Monte Carlo simulations [99]. These models allow one to address the processes where the protein conformation and kinetics are supposed to play a major role.

Data-driven models of bionano interface. -
The multiple patterns and rich internal structure of biomolecules enable the calculation of a wide variety of numerical descriptors to correlate to biomolecule-NP adsorption properties. This allows in principle the construction of data-driven schemes for the prediction of the interactions at different scales. ML strategies for the development of atomistic FFs have been demonstrated for a range of systems [100,101]. The direct prediction of PMFs via a ML model was recently done, enabling the CG simulations of a much wider range of NPs and biomolecules than is feasible using metadynamics simulations alone and in principle works with arbitrary small organic molecules and NP surfaces [102]. A coverage of a sufficient number of materials by direct simulations would allow one to use the material descriptors to predict bionano interactions for novel, even not-yet-existing, surfaces and NPs using ML approaches. By now, the list of materials covered by direct AIMD and MD simulations includes multiple metals (Au, Ag, Fe, Al, Cu), oxides in different crystalline and amorphous forms (TiO 2 , SiO 2 , ZnO), carbonaceous materials (graphene, graphene oxide, carbon black, carbon nanotubes), and other solids (ZnS, polymers like polystyrene). For most of the crystalline materials, several crystal planes have been modelled and in some cases the bionano interactions are presented as a weighted average of properties over surfaces with different Miller indices [103].
ML-based prediction of protein binding affinities to NPs using protein and material descriptors has been recently attempted [92], finding reasonable agreement for the set of proteins considered there, but requires the protein structure and is limited to titania and gold NPs. Recent advances such as AlphaFold have dramatically increased the availability of predicted protein structures and allow for a significantly wider range of proteins to be considered compared to relying solely on those for which experimental structures are known. The extremely large set of predicted protein structures is now being leveraged to construct a neural network model for the prediction of NP-protein binding energies based directly on the protein sequence for a much wider range of NPs. The adsorption of smaller biomolecules to NPs has been investigated using a biological surface adsorption index model, which represents NPs as a set of five descriptors and predicts adsorption rates as a linear function of these suitable for physics-based modelling of bionano interactions [104]. It has been found that the adsorption affinities for proteins highly correlate with the toxicity of the nanomaterials and thus the adsorption energy can be used as an interfacial descriptor for statistical models [105] or for artificial-intelligence-based protocols for "safe-by-design" development of novel nanomaterials [106] thus illustrating the power of nanoinformatics approach for the prediction of nanomaterials functionality (see fig. 2).
For a complex medium such as whole blood serum, which contains thousands of proteins, the variability of the corona content is immense and each corona may contain hundreds of unique proteins. Consequently, the corona composition may contain too much information to be directly useful or interpretable and, given the repetitiveness of the structure of the biomolecules, it is likely to be highly redundant. Thus, the more useful descriptors are not necessarily the list of adsorbed biomolecules but rather the (weighted) average properties of these adsorbates. These can then be used to obtain further insight into the bionano interface by identifying that, e.g., the corona contains predominantly proteins with a negative charge, or with high abundances of certain residues. In particular, some protein properties tend to increase the propensity of to molecules to adsorb on specific materials. This observation led to the idea of the construction of a fingerprint obtained from the NP corona, demonstrated to be useful for prediction of the NP-cell association using only a limited number of descriptors, e.g., the cell association of gold NPs correlated well with the sequence descriptors related to the protein charge (such as basic, acidic, and aspartic AAs percentage) as well as with molecular weight, and propensity of the protein to aggregation [107][108][109]. FFs for novel materials requires significant effort and thus represents a key limiting factor on how quickly materials may be parameterised for use in the models discussed here. This suggests that it may be of benefit to develop further data-driven models to handle this step by generating approximate FF parameters for a new structure to bypass this time-consuming step. Even when atomistic FFs exist, one of the major remaining issues is the adjustment of the surface hydroxylation, which is rarely systematically performed at arbitrary pH. Generally, most computational studies of bionano interface are done on pristine and idealised materials of low Miller index (where defined) without surface defects, impurities or ligands, since details of the surface structure are rarely available from experiments and computationally require additional ensemble methods [93,94]. Polarisation, image charge effects, and specific binding (e.g., chemisorption of cystein on gold) also require an efficient implementation.
Current models of the NP protein corona sometimes ignore protein unfolding induced by the interface. This strong approximation can be justified in some practical cases. Where the protein-NP interaction is weak (i.e., in the range of few k B T typically for small NPs of less than 5 nm in size or light material with a low Hamaker constant), the surface is usually unable to unfold the molecule. Conversely, where the adhesion is strong enough to change the protein conformation significantly (e.g., for large NPs or metallic surfaces), the binding is likely irreversible and thus the conformational changes in the protein are largely irrelevant. Nonetheless, it may be beneficial to relax the adsorbing proteins in solution at the specific conditions (e.g., pH, ionic strength, and temperature) and compute binding energies for an ensemble of several of the commonest conformations, since the experimental crystal structure as deposited in the PDB may be too compact and not representative of the actual structure of the protein at different conditions.
Another outstanding goal is the prediction of the content of the soft corona, which is more challenging than the hard corona since this is dominated primarily by more specific protein-protein interactions, which depend more strongly on the flexibility of the proteins and surface residues while the direct NP-protein interaction is typically less than thermal energy per molecule. The role of the NP in defining the soft corona therefore more likely arises indirectly in terms of selecting the proteins in the hard corona to which further proteins may bind, suggesting the use of further CG simulations or protein docking calculations to obtain the soft corona after the hard corona is found.
Though most of the recent research is focused on proteins and peptides as the key components of the bionano interface, in real systems other biomolecules may be equally involved. These include lipids, cholesterol, sugars, RNA and DNA, as well as metabolites and byproducts like humic and tannic acid. They may compete with proteins for the space on the adsorbing surface, form complexes with proteins or other molecules. A possible way forward is to perform atomistic modelling of these in contact with the surfaces or, once sufficient information is accumulated, use ML techniques to predict their interactions with the specific NPs and CG techniques to predict the corona content [102].
Finally, we note that the prediction of biological activity of NPs still remains challenging. We expect that careful characterisation of the main contributions to the bionano interactions (e.g., NP surface charge, polarisability, solvation energy) or of the NP ability to bind specific molecules (e.g., lung membrane lipids) can provide a key to the understanding why some NPs are toxic or why some perform better in applications. In a recent extensive meta-analysis study, it was found that the toxicity correlates with the NP dispersion energy, dipole moment, zeta potential, Hamaker constant, density of surface oxygen atoms [110], the properties that determine the bionano interactions. Based on this, we anticipate further advances in this direction in the nearest future. * * * The work has been funded by EU Horizon2020 under grant agreement No. 814572 (NanoSolveIT project), Horizon Europe under grant agreement No. 101092741 (nanoPASS project), and by Science Foundation Ireland through grant 16/IA/4506. The authors are grateful to Vio Buchete for insightful discussions.

Data availability statement:
No new data were created or analysed in this study.